Peter Bull
banner
peter.drivendata.org
Peter Bull
@peter.drivendata.org
Co-founder DrivenData. Celebrating a decade of data for good.

ML challenges | https://www.drivendata.org/
Data projects | https://drivendata.co/
Open source | https://github.com/pjbull
Great set of events for #SeattleAIWeek this week! Definitely join some if you are in town and let me know if you want to catch up luma.com/Seattle-AI-W...
#SeattleAIWeek 2025 · Events Calendar
View and subscribe to events from #SeattleAIWeek 2025 on Luma. Showcasing the PNW as the best place to be in AI. Community-driven. Future-focused. Submit your event now using the + button.
luma.com
October 27, 2025 at 9:51 PM
🚀 New release: cloudpathlib v0.23.0

🥧 Now with Python 3.14 (π) support!
📁 New copy & move methods mean you can reduce usage of shutil 🎉

Check out the full release and docs here:
👉 cloudpathlib.drivendata.org/stable/
October 13, 2025 at 6:36 PM
Super interesting work on new proposed columnar data file format called F3 with embedded wasm binary to decode the data 🤯 (which obviates the need for 3rd party library support). Favorable comparisons on compression, throughput and random reads to existing formats.

db.cs.cmu.edu/papers/2025/...
October 10, 2025 at 6:36 PM
Very cool to see Wikimedia embracing LLM tools and launching a hybrid similarity search API and open source embeddings for Wikipedia! Also supports Q&A style queries.
www.wikidata.org/wiki/Wikidat...
October 8, 2025 at 10:27 PM
Interesting to see empirical research coming out for LLMs as education aids. In this study, active use of LLMs helped CS students debug compiler errors. Removing LLM access demonstrated no lasting learning benefit from having had access to it...

learninganalytics.upenn.edu/ryanbaker/IC...
October 6, 2025 at 6:36 PM
Reposted by Peter Bull
Are you interested in #AIforConservation #AIforBiodiversity #AIforWildlife or #AIforNature?? Are you located in the Boston Area?

If so, come join us!! The AI for Conservation Slack community is doing our first local-area Boston meetup, partnering with iNaturalist and TEDx Boston!
September 10, 2025 at 11:41 PM
We just shipped two major features for cloudpathlib ✨📦 ✨ ! First, http support—treat an URL like any other path (open, read_text, join). Second, compatibility with open and os Python built-ins for seamless transition of legacy code and third-party library support.

cloudpathlib.drivendata.org
September 22, 2025 at 6:36 PM
Great opportunity to work on AI in conservation and biodiversity with Roland Kays! In-person in NC, check it out now since it is only open for a week:
www.governmentjobs.com/careers/%7B0...
Job Bulletin
State of North Carolina
www.governmentjobs.com
September 19, 2025 at 6:36 PM
Exemplary FAQ for "Your Brain on ChatGPT: Accumulation of Cognitive Debt" www.brainonllm.com/faq

I'd love to see more authors who are explicit about what NOT to claim based on a study, including wording for lay audiences that is not appropriate.
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
www.brainonllm.com
August 20, 2025 at 10:27 PM
Thought I would spot check a application someone was posting about 100% vibecoding. Can you spot the issue?

Kudos to the LLM, this is verbatim from the fastapi docs. Sometimes verbatim from the docs is not what you want for your application though....
August 13, 2025 at 10:27 PM
Interesting announcement on a product from Astral! Similar model to one of the core @anacondainc.bsky.social lines of business.
Today, we're announcing our first hosted infrastructure product: pyx, a Python-native package registry.

We think of pyx as an optimized backend for uv: it’s a package registry, but it also solves problems that go beyond the scope of a traditional "package registry".
August 13, 2025 at 6:58 PM
Enthusiastic to build on this generation of earth observation foundation embeddings like DeepMind's AlphaEarth (and more)! We already see some promising crop type (cereals vs. orchards) results and are exploring other use cases in climate resilience. deepmind.google/discover/blo...
August 8, 2025 at 6:36 PM
Very cool to see that marimo supports our cloudpathlib library for their file browser UI! Browse your S3, GCS, Azure buckets from your notebooks! docs.marimo.io/api/inputs/f...
File Browser - marimo
The next generation of Python notebooks
docs.marimo.io
August 1, 2025 at 6:36 PM
✨ 📦 ✨ Just released new Cookiecutter Data Science version with support for pixi and poetry as environment managers! Some of our top requested features ever. Upgrade and check it out now.

cookiecutter-data-science.drivendata.org
July 25, 2025 at 6:36 PM
Now getting organic inbound for www.zambacloud.com, our wildlife imagery processing platform, from ChatGPT! 😲
July 18, 2025 at 6:36 PM
Just in case you thought speech-to-text worked for children, the third column is what Whisper does. Somehow in the third example it accesses my inner monologue... I guess that's why we're excited about our upcoming challenge! kidsasr.drivendata.org
July 16, 2025 at 10:27 PM
How are people managing code review for their AI coding agents? I do a first glance and it is obviously bad (e.g., didn't refactor repeated code), and now I've got half a dozen AI diffs for things that aren't good enough cluttering up my todo list with things to respond to....
July 14, 2025 at 6:36 PM
New research based on the CANDOR corpus shows that people enjoy conversations where they alternate longer turns better than short turns or one person dominating. Cool!

arxiv.org/html/2506.20...
Time is On My Side: Dynamics of Talk-Time Sharing in Video-chat Conversations
An intrinsic aspect of every conversation is the way talk-time is shared between multiple speakers. Conversations can be balanced, with each speaker claiming a similar amount of talk-time, or…
arxiv.org
July 11, 2025 at 6:36 PM
The best shortcut to how many experienced software engineers feel about AI is listening to the Primeagen's takes. Balanced perspectives on what's actually new, determinism, security, system complexity, what's promising, and what's not www.youtube.com/watch?v=vDWa...
July 9, 2025 at 10:27 PM
"Damn ChatGPT" your new summer jam about using ChatGPT as a therapist open.spotify.com/track/4umq06... (edited)
Maldito ChatGPT
Camilo · Maldito ChatGPT · Song · 2025
open.spotify.com
July 7, 2025 at 6:36 PM
Great article on the challenges of only surfacing the right info to LLMs and editing down what is not needed. If you've used a coding copilot or agent, you've seen this first hand many times. Output iterations are often polluted with code that came before.

www.dbreunig.com/2025/06/22/h...
How Long Contexts Fail
Taking care of your context is the key to building successful agents. Just because there’s a 1 million token context window doesn’t mean you should fill it.
www.dbreunig.com
July 4, 2025 at 6:36 PM
BioCLIP2 looks like a stellar improvement! I'm excited to think about integrating into Zamba to for open-ended classification tasks run at scale on camera trap imagery. Definitely the potential to dramatically improve CT image utility. imageomics.github.io/bioclip-2/
June 30, 2025 at 6:36 PM
We've built so many low-fidelity prototypes in our HCD work. IMO vibecoding changes the feel of those prototypes, but doesn't change the process. Ask any designer—they'll tell you high-fidelity first iterations are often more distracting to clients than helpful.

www.semafor.com/article/06/0...
June 25, 2025 at 10:27 PM
Check out this LLM circuit trace LLM for the text: '"The statement 'this statement is false' is." It goes through a logical contradictions node, but still outputs either "true" or "false" with the highest probabilities... www.anthropic.com/research/ope...
June 23, 2025 at 6:36 PM