🏫: UC San Diego, IIT Guwahati
🏢: Signify Research, Wadhwani AI, Publicis Sapient
parthatom.github.io
Feed engineers - please come up with feeds that remove posts based on these repetitiveness criteria. This will make bsky atleast 2x more useful
Feed engineers - please come up with feeds that remove posts based on these repetitiveness criteria. This will make bsky atleast 2x more useful
Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢
🧵⬇️
(low hanging fruit im sorry)
(low hanging fruit im sorry)
Any intuition/math/papers you can share to understand this? It's not v clear to me.
Any intuition/math/papers you can share to understand this? It's not v clear to me.
How much distribution change can AdaptiveDecoder handle?
How much distribution change can AdaptiveDecoder handle?
Nitpick: Rewrite first tweet for emphasis on impact
- Learn to predict temperature to auto-adjust for creativity vs factuality
- Against fix-temperature, our predicted temperature wins 10% more across GSM8K and UltraFeedback datasets.
- New layer and model to learn any hyperparameter
Nitpick: Rewrite first tweet for emphasis on impact
- Learn to predict temperature to auto-adjust for creativity vs factuality
- Against fix-temperature, our predicted temperature wins 10% more across GSM8K and UltraFeedback datasets.
- New layer and model to learn any hyperparameter
I see the above relation of B follows from A well defined.
What do you see is missing?
I see the above relation of B follows from A well defined.
What do you see is missing?
That is not true of any probability distributions.
That is not true of any probability distributions.
Do you have a website for the course I can follow?
Do you have a website for the course I can follow?
More similar categorizations - "dumb" questions/obvious questions that everybody thinks they know the answer to/in between the line questions.
More similar categorizations - "dumb" questions/obvious questions that everybody thinks they know the answer to/in between the line questions.