It’s the biggest hurdle I see from teams trying to build GenAI features
We need tools to lower the barrier to entry with LLM judges, existing benchmarks, manual annotation as eval collection, synthetic data… anything else?
It’s the biggest hurdle I see from teams trying to build GenAI features
We need tools to lower the barrier to entry with LLM judges, existing benchmarks, manual annotation as eval collection, synthetic data… anything else?
Link for the curious:
github.com/wandb/weave/...
Link for the curious:
github.com/wandb/weave/...
- you need to be clear which files to include and which are optional because context blows up quickly
- automating creating your docs' llms.txt is pretty easy
- you need to be clear which files to include and which are optional because context blows up quickly
- automating creating your docs' llms.txt is pretty easy
We're building LLM / Human "scorers" in @weightsbiases.bsky.social to have the same data model for this reason
We're building LLM / Human "scorers" in @weightsbiases.bsky.social to have the same data model for this reason