osai-index.eu/the-index?ty...
osai-index.eu/the-index?ty...
More on aclanthology.org/2025.starsem...
More on aclanthology.org/2025.starsem...
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings: