We also analyzed explanations from 7 major LLMs and toxicity classifiers. The gaps are stark.
We also analyzed explanations from 7 major LLMs and toxicity classifiers. The gaps are stark.
A quick thread on what we found:
A quick thread on what we found: