From some light reversing everything is a Google dork.
From some light reversing everything is a Google dork.
Out of curiosity, how much of the scenario and progression is based on some template and how much is it made up by GPT?
Out of curiosity, how much of the scenario and progression is based on some template and how much is it made up by GPT?
I just played the single player, I guess the dynamics are different in multiplayer.
I just played the single player, I guess the dynamics are different in multiplayer.
www.newsguardrealitycheck.com/p/a-well-fun...
(Via the risky.biz newsletter)
www.newsguardrealitycheck.com/p/a-well-fun...
(Via the risky.biz newsletter)
My takeaway was "this is the stuff we caught" (let's not talk about what we didn't stop or actual signals we use) and "hi APT's see you, please stop or next we will publish you Google search history"
Being transparent on these things is hard though
My takeaway was "this is the stuff we caught" (let's not talk about what we didn't stop or actual signals we use) and "hi APT's see you, please stop or next we will publish you Google search history"
Being transparent on these things is hard though
> $10 per questions, increased to $30 if 2/3 PhD students from different fields fail to answer
> When answering questions they get $10 for trying and $30 if they get it right
Not sure if it leads to good questions or just tricky ones.
> $10 per questions, increased to $30 if 2/3 PhD students from different fields fail to answer
> When answering questions they get $10 for trying and $30 if they get it right
Not sure if it leads to good questions or just tricky ones.
arxiv.org/abs/2311.12022
That dataset is was generated by having PhD's on Upwork create and try to answer questions. The paper talks about trying to balance the incentives.
arxiv.org/abs/2311.12022
That dataset is was generated by having PhD's on Upwork create and try to answer questions. The paper talks about trying to balance the incentives.
@marcushellberg.dev you are more into this subject than me, what are your thoughts?
@marcushellberg.dev you are more into this subject than me, what are your thoughts?
But my issue with these two is that I don't think either is a good measure of a (pure) LLM.
Seems the goal here is just to make models score low.
But my issue with these two is that I don't think either is a good measure of a (pure) LLM.
Seems the goal here is just to make models score low.