Parth Shah
parthshaha.bsky.social
Parth Shah
@parthshaha.bsky.social
Machine Learning Engineer @ Zscaler

🏫: UC San Diego, IIT Guwahati
🏢: Signify Research, Wadhwani AI, Publicis Sapient


parthatom.github.io
All the classic feeds (Popular with friends, Quiet posters, Following) are endlessly repetitive.

Feed engineers - please come up with feeds that remove posts based on these repetitiveness criteria. This will make bsky atleast 2x more useful
December 10, 2024 at 11:04 PM
2. Usually I open bluesky for a couple of minutes, browse and log off. If i come back in 30-60 minutes, a majority of the top 10 posts are going to be the same ones I've seen already which turns me off.
December 10, 2024 at 11:02 PM
Linking the thread on the paper from the main author for people who are interested bsky.app/profile/laur...
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️
December 3, 2024 at 12:37 AM
I disagree

(low hanging fruit im sorry)
December 2, 2024 at 5:57 PM
> The decoder often effectively is a conditional GAN

Any intuition/math/papers you can share to understand this? It's not v clear to me.
November 28, 2024 at 8:53 PM
After training on these datasets - did you test on any other datasets not from the same distribution? (Like a coding test set instead of GSM8K test set).

How much distribution change can AdaptiveDecoder handle?
November 22, 2024 at 10:08 PM
Great work!
Nitpick: Rewrite first tweet for emphasis on impact
- Learn to predict temperature to auto-adjust for creativity vs factuality
- Against fix-temperature, our predicted temperature wins 10% more across GSM8K and UltraFeedback datasets.
- New layer and model to learn any hyperparameter
November 22, 2024 at 10:00 PM
3/6 is not bad either way
November 22, 2024 at 2:16 AM
Right then if you phrase distribution change as learning - when context changes the distribution wouldn't you call it "In-context learning"?

I see the above relation of B follows from A well defined.

What do you see is missing?
November 21, 2024 at 5:45 PM
Curious to understand why you think probability of y can change when given x and D are fixed.

That is not true of any probability distributions.
November 21, 2024 at 5:05 PM
Yea I phrased it as - we "should" promote them, when I meant the current framework "should" promote them regardless.
November 21, 2024 at 4:05 PM
Removing chaos by yourself is low visibility indeed. However, wouldn't you agree that someone one who can influence enough people to remove said chaos should ve promoted for their influence and culture shaping capabilities?
November 21, 2024 at 6:27 AM
Awesome slides. Thank you :)
November 20, 2024 at 4:57 PM
This is an excellent list. I would probably add @colah.bsky.social's Transformer Circuits Initial Thoughts youtube playlist along with the corresponding paper.

Do you have a website for the course I can follow?
November 20, 2024 at 8:23 AM
Apple needs to add this to its vision pro on the minimum
November 20, 2024 at 8:11 AM
Appreciate this.

More similar categorizations - "dumb" questions/obvious questions that everybody thinks they know the answer to/in between the line questions.
November 18, 2024 at 8:29 AM