And some of the bigger picture thoughts that might not be evident from individual publications here: bsky.app/profile/najo...
For context the intended audience is Lang Dev researchers who are attending a session on "what can LLMs tell us about human language".
If you have any thoughts I'd love to hear them!
And some of the bigger picture thoughts that might not be evident from individual publications here: bsky.app/profile/najo...
@profsophie.bsky.social! Also, Boston is quite nice :)
@profsophie.bsky.social! Also, Boston is quite nice :)
Find more info here: najoung.kim/students/
Find more info here: najoung.kim/students/
For context the intended audience is Lang Dev researchers who are attending a session on "what can LLMs tell us about human language".
If you have any thoughts I'd love to hear them!
For context the intended audience is Lang Dev researchers who are attending a session on "what can LLMs tell us about human language".
If you have any thoughts I'd love to hear them!
For context the intended audience is Lang Dev researchers who are attending a session on "what can LLMs tell us about human language".
If you have any thoughts I'd love to hear them!
For context the intended audience is Lang Dev researchers who are attending a session on "what can LLMs tell us about human language".
If you have any thoughts I'd love to hear them!
Unfortunately only open to US citizens/permanent residents/independent work auth holders.
Unfortunately only open to US citizens/permanent residents/independent work auth holders.
There will be some freedom of scope for the second part of the project within the theme of research agent evaluation - the RA will contribute to scoping the project along with the team as well.
It complements recent evals (eg PaperBench from OpenAI
) on replication! See 👇 for details
We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.
Finding: Most agents we tested had a low success rate, but there is promise!
There will be some freedom of scope for the second part of the project within the theme of research agent evaluation - the RA will contribute to scoping the project along with the team as well.