https://scholar.google.com/citations?user=I80vy5cAAAAJ
Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.
Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.
The manuscript should be up by tomorrow and I'll drop a link.
The manuscript should be up by tomorrow and I'll drop a link.
Then, given this Z, Xt evolves over the trees, sampling when (but not which) branching and deletion events occur, all constructed to terminate at X1.
Then, given this Z, Xt evolves over the trees, sampling when (but not which) branching and deletion events occur, all constructed to terminate at X1.
github.com/MurrellGroup...
If you have a GPU, it is pretty fast.
github.com/MurrellGroup...
If you have a GPU, it is pretty fast.
This was a team effort from a few people in my lab, including @antonoresten.bsky.social and others (not sure who is on this app)
This was a team effort from a few people in my lab, including @antonoresten.bsky.social and others (not sure who is on this app)