Is there a tradeoff between it being implemented as part of the optimizer vs as a loss? Could I see your paper as describing a family of optimizers?
Is there a tradeoff between it being implemented as part of the optimizer vs as a loss? Could I see your paper as describing a family of optimizers?
It’s not any more clever to say “I knew it would work, I believed in it all along” but at least you’re being nice.
It’s not any more clever to say “I knew it would work, I believed in it all along” but at least you’re being nice.