Lucas Beyer (bl16)
@giffmana.ai
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
📍 Zürich, Suisse 🔗 http://lucasb.eyer.be
Anon feedback: https://admonymous.co/giffmana
📍 Zürich, Suisse 🔗 http://lucasb.eyer.be
index the codebase first. Then ask o1 in "codebase chat" the same questions you would like to ask the author/owner of the codebase.
Mostly useful when digging into unknown/new codebases and trying to understand them. Or asking about possible bugs :)
Mostly useful when digging into unknown/new codebases and trying to understand them. Or asking about possible bugs :)
January 26, 2025 at 9:25 PM
index the codebase first. Then ask o1 in "codebase chat" the same questions you would like to ask the author/owner of the codebase.
Mostly useful when digging into unknown/new codebases and trying to understand them. Or asking about possible bugs :)
Mostly useful when digging into unknown/new codebases and trying to understand them. Or asking about possible bugs :)
yes, I noticed that over Christmas break and ever since, it's just... a lot more boring here.
My first reason to open social media is to be entertained. Second reason to entertain. Maybe last third to learn something new.
Not much of any of these here. I know I know, be the change and all that.
My first reason to open social media is to be entertained. Second reason to entertain. Maybe last third to learn something new.
Not much of any of these here. I know I know, be the change and all that.
January 26, 2025 at 9:23 PM
yes, I noticed that over Christmas break and ever since, it's just... a lot more boring here.
My first reason to open social media is to be entertained. Second reason to entertain. Maybe last third to learn something new.
Not much of any of these here. I know I know, be the change and all that.
My first reason to open social media is to be entertained. Second reason to entertain. Maybe last third to learn something new.
Not much of any of these here. I know I know, be the change and all that.
It reeeeally depends what are loss1 and loss2, both regarding what’s standard and what’s wasted.
I honestly think you are confused, the three codes in three different posts of you mean three different things. I don’t mean it in a negative way, but clearing it up would take more time than I want :/
I honestly think you are confused, the three codes in three different posts of you mean three different things. I don’t mean it in a negative way, but clearing it up would take more time than I want :/
December 19, 2024 at 8:16 PM
It reeeeally depends what are loss1 and loss2, both regarding what’s standard and what’s wasted.
I honestly think you are confused, the three codes in three different posts of you mean three different things. I don’t mean it in a negative way, but clearing it up would take more time than I want :/
I honestly think you are confused, the three codes in three different posts of you mean three different things. I don’t mean it in a negative way, but clearing it up would take more time than I want :/
Yeah it seems either he had a mistake in the OP, or the subject of the discussion has drifted :)
December 19, 2024 at 8:11 PM
Yeah it seems either he had a mistake in the OP, or the subject of the discussion has drifted :)
Until we got good enough AI supported search, no, you can’t realistically expect them to find anything and everything from the past 30 years when the vocab and everything changes.
December 19, 2024 at 7:12 PM
Until we got good enough AI supported search, no, you can’t realistically expect them to find anything and everything from the past 30 years when the vocab and everything changes.
Well, no, two things:
1. In the OP indeed *both* formulations waste compute, so yeah :)
2. In 2nd post, you are not doing the same thing as in your OP! In your 2nd, you are doing good old micro batching which indeed the second way is the standard way.
So what you say keeps changing O.o
1. In the OP indeed *both* formulations waste compute, so yeah :)
2. In 2nd post, you are not doing the same thing as in your OP! In your 2nd, you are doing good old micro batching which indeed the second way is the standard way.
So what you say keeps changing O.o
December 19, 2024 at 7:10 PM
Well, no, two things:
1. In the OP indeed *both* formulations waste compute, so yeah :)
2. In 2nd post, you are not doing the same thing as in your OP! In your 2nd, you are doing good old micro batching which indeed the second way is the standard way.
So what you say keeps changing O.o
1. In the OP indeed *both* formulations waste compute, so yeah :)
2. In 2nd post, you are not doing the same thing as in your OP! In your 2nd, you are doing good old micro batching which indeed the second way is the standard way.
So what you say keeps changing O.o
I would enjoy that meeting ;)
December 19, 2024 at 4:19 PM
I would enjoy that meeting ;)
Yeah it’s silly to expect the new generation to know everything the old generation did, doing so shows complete lack of empathy.
December 19, 2024 at 4:19 PM
Yeah it’s silly to expect the new generation to know everything the old generation did, doing so shows complete lack of empathy.
If the two graphs are completely disjoint, then there is no point in this. If they have some commonality (like model) then this does the common part twice.
December 19, 2024 at 4:17 PM
If the two graphs are completely disjoint, then there is no point in this. If they have some commonality (like model) then this does the common part twice.
I’m somewhat confident both of these are sins lol second one wastes a ton of compute!
December 19, 2024 at 8:29 AM
I’m somewhat confident both of these are sins lol second one wastes a ton of compute!
Not quite because this one is not stacked, so I give it better chance to scale:
December 19, 2024 at 8:24 AM
Not quite because this one is not stacked, so I give it better chance to scale:
That’s what the globe was for!
December 14, 2024 at 4:35 PM
That’s what the globe was for!
One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.
December 13, 2024 at 4:27 PM
One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.
OK OK I’ll admit it, I’m feeding off your fomo! There can’t be enough fomo! Mmmm fomo!
December 12, 2024 at 4:45 PM
OK OK I’ll admit it, I’m feeding off your fomo! There can’t be enough fomo! Mmmm fomo!
Yeah they compress videos to shit here, and are considering making good quality videos a paying feature.
December 12, 2024 at 4:43 PM
Yeah they compress videos to shit here, and are considering making good quality videos a paying feature.
No talk but two posters, I’m just a middle author but will try to be there (locca and NoFilter). That being said my main occupation here will be meeting many of my new colleagues.
December 10, 2024 at 6:15 PM
No talk but two posters, I’m just a middle author but will try to be there (locca and NoFilter). That being said my main occupation here will be meeting many of my new colleagues.
lol exactly. I said cdg whenever possible.
That being said, our flight is operated by AirCanada, including a layover in Canada, that is a lot worse.
That being said, our flight is operated by AirCanada, including a layover in Canada, that is a lot worse.
December 9, 2024 at 10:07 AM
lol exactly. I said cdg whenever possible.
That being said, our flight is operated by AirCanada, including a layover in Canada, that is a lot worse.
That being said, our flight is operated by AirCanada, including a layover in Canada, that is a lot worse.
This afternoon flight? I’m taking that too
December 9, 2024 at 8:35 AM
This afternoon flight? I’m taking that too