„Name a bias – we have it!“
#AITrainingData #Commons #OpenAccess #PublicDomain
@stabiberlin.bsky.social @europeana.bsky.social @bldigischol.bsky.social @nfitzger.glammr.us.ap.brid.gy @miaout.bsky.social @amsichani.bsky.social
doi.org/10.48550/arX...
Stefan Baack et al.: Towards Best Practices for Open Datasets for LLM Training, Jan 2025
doi.org/10.48550/arX...
doi.org/10.48550/arX...
Stefan Baack et al.: Towards Best Practices for Open Datasets for LLM Training, Jan 2025
doi.org/10.48550/arX...
doi.org/10.48550/arX...
Pierre-Carl Langlais et al.: Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training, June 2025
doi.org/10.48550/arX...
doi.org/10.48550/arX...
Pierre-Carl Langlais et al.: Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training, June 2025
doi.org/10.48550/arX...
www.authorsalliance.org/2025/12/03/r...
Paul Keller & Europeana Foundation: Outline for a European Books Data Commons, Nov 2025
openfuture.eu/publication/...
www.authorsalliance.org/2025/12/03/r...
Paul Keller & Europeana Foundation: Outline for a European Books Data Commons, Nov 2025
openfuture.eu/publication/...
Openness & its shades (of grey)
mmk.sbb.berlin/2024/06/21/o...
Openness & closed systems
mmk.sbb.berlin/2024/06/25/o...
Thus forming a trio of reflections on redefining openness in the 21st century
Openness & its shades (of grey)
mmk.sbb.berlin/2024/06/21/o...
Openness & closed systems
mmk.sbb.berlin/2024/06/25/o...
Thus forming a trio of reflections on redefining openness in the 21st century
www.infodocket.com/2024/02/07/r...
Mass scraping of bibliographic metadata from WorldCat ...
... obviously, we have (again) to become more clear of what is "open", "public domain", CC0 etc.
www.infodocket.com/2024/02/07/r...
Mass scraping of bibliographic metadata from WorldCat ...
... obviously, we have (again) to become more clear of what is "open", "public domain", CC0 etc.