Lightnews — Scholar-powered news

Peter Bull

@peter.drivendata.org

Great set of events for #SeattleAIWeek this week! Definitely join some if you are in town and let me know if you want to catch up luma.com/Seattle-AI-W...

#SeattleAIWeek 2025 · Events Calendar

View and subscribe to events from #SeattleAIWeek 2025 on Luma. Showcasing the PNW as the best place to be in AI. Community-driven. Future-focused. Submit your event now using the + button.

luma.com

October 27, 2025 at 9:51 PM

Peter Bull

@peter.drivendata.org

🚀 New release: cloudpathlib v0.23.0

🥧 Now with Python 3.14 (π) support!
📁 New copy & move methods mean you can reduce usage of shutil 🎉

Check out the full release and docs here:
👉 cloudpathlib.drivendata.org/stable/

October 13, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Super interesting work on new proposed columnar data file format called F3 with embedded wasm binary to decode the data 🤯 (which obviates the need for 3rd party library support). Favorable comparisons on compression, throughput and random reads to existing formats.

db.cs.cmu.edu/papers/2025/...

October 10, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Very cool to see Wikimedia embracing LLM tools and launching a hybrid similarity search API and open source embeddings for Wikipedia! Also supports Q&A style queries.
www.wikidata.org/wiki/Wikidat...

October 8, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

Interesting to see empirical research coming out for LLMs as education aids. In this study, active use of LLMs helped CS students debug compiler errors. Removing LLM access demonstrated no lasting learning benefit from having had access to it...

learninganalytics.upenn.edu/ryanbaker/IC...

October 6, 2025 at 6:36 PM

Reposted by Peter Bull

Sara Beery

@sarameghanbeery.bsky.social

Are you interested in #AIforConservation #AIforBiodiversity #AIforWildlife or #AIforNature?? Are you located in the Boston Area?

If so, come join us!! The AI for Conservation Slack community is doing our first local-area Boston meetup, partnering with iNaturalist and TEDx Boston!

AI for Conservation Boston Meetup
Join us for an iNat bioblitz!!!
September 27th from 9am-12pm
Meet at umass Boston Quad at 9
Register here https://www.eventbrite.com/e/umass-boston-bioblitz-tickets-1626791971579?aff=oddtdtcreator

September 10, 2025 at 11:41 PM

Peter Bull

@peter.drivendata.org

We just shipped two major features for cloudpathlib ✨📦 ✨ ! First, http support—treat an URL like any other path (open, read_text, join). Second, compatibility with open and os Python built-ins for seamless transition of legacy code and third-party library support.

cloudpathlib.drivendata.org

September 22, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Great opportunity to work on AI in conservation and biodiversity with Roland Kays! In-person in NC, check it out now since it is only open for a week:
www.governmentjobs.com/careers/%7B0...

Job Bulletin

State of North Carolina

www.governmentjobs.com

September 19, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Exemplary FAQ for "Your Brain on ChatGPT: Accumulation of Cognitive Debt" www.brainonllm.com/faq

I'd love to see more authors who are explicit about what NOT to claim based on a study, including wording for lay audiences that is not appropriate.

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

www.brainonllm.com

August 20, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

Thought I would spot check a application someone was posting about 100% vibecoding. Can you spot the issue?

Kudos to the LLM, this is verbatim from the fastapi docs. Sometimes verbatim from the docs is not what you want for your application though....

August 13, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

Interesting announcement on a product from Astral! Similar model to one of the core @anacondainc.bsky.social lines of business.

Charlie Marsh @crmarsh.com · Aug 13

Today, we're announcing our first hosted infrastructure product: pyx, a Python-native package registry.

We think of pyx as an optimized backend for uv: it’s a package registry, but it also solves problems that go beyond the scope of a traditional "package registry".

August 13, 2025 at 6:58 PM

Peter Bull

@peter.drivendata.org

Enthusiastic to build on this generation of earth observation foundation embeddings like DeepMind's AlphaEarth (and more)! We already see some promising crop type (cereals vs. orchards) results and are exploring other use cases in climate resilience. deepmind.google/discover/blo...

August 8, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Very cool to see that marimo supports our cloudpathlib library for their file browser UI! Browse your S3, GCS, Azure buckets from your notebooks! docs.marimo.io/api/inputs/f...

File Browser - marimo

The next generation of Python notebooks

docs.marimo.io

August 1, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

✨ 📦 ✨ Just released new Cookiecutter Data Science version with support for pixi and poetry as environment managers! Some of our top requested features ever. Upgrade and check it out now.

cookiecutter-data-science.drivendata.org

July 25, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Now getting organic inbound for www.zambacloud.com, our wildlife imagery processing platform, from ChatGPT! 😲

July 18, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Just in case you thought speech-to-text worked for children, the third column is what Whisper does. Somehow in the third example it accesses my inner monologue... I guess that's why we're excited about our upcoming challenge! kidsasr.drivendata.org

July 16, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

How are people managing code review for their AI coding agents? I do a first glance and it is obviously bad (e.g., didn't refactor repeated code), and now I've got half a dozen AI diffs for things that aren't good enough cluttering up my todo list with things to respond to....

July 14, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

New research based on the CANDOR corpus shows that people enjoy conversations where they alternate longer turns better than short turns or one person dominating. Cool!

arxiv.org/html/2506.20...

Time is On My Side: Dynamics of Talk-Time Sharing in Video-chat Conversations

An intrinsic aspect of every conversation is the way talk-time is shared between multiple speakers. Conversations can be balanced, with each speaker claiming a similar amount of talk-time, or…

arxiv.org

July 11, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

The best shortcut to how many experienced software engineers feel about AI is listening to the Primeagen's takes. Balanced perspectives on what's actually new, determinism, security, system complexity, what's promising, and what's not www.youtube.com/watch?v=vDWa...

July 9, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

"Damn ChatGPT" your new summer jam about using ChatGPT as a therapist open.spotify.com/track/4umq06... (edited)

Maldito ChatGPT

Camilo · Maldito ChatGPT · Song · 2025

open.spotify.com

July 7, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

Great article on the challenges of only surfacing the right info to LLMs and editing down what is not needed. If you've used a coding copilot or agent, you've seen this first hand many times. Output iterations are often polluted with code that came before.

www.dbreunig.com/2025/06/22/h...

How Long Contexts Fail

Taking care of your context is the key to building successful agents. Just because there’s a 1 million token context window doesn’t mean you should fill it.

www.dbreunig.com

July 4, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

BioCLIP2 looks like a stellar improvement! I'm excited to think about integrating into Zamba to for open-ended classification tasks run at scale on camera trap imagery. Definitely the potential to dramatically improve CT image utility. imageomics.github.io/bioclip-2/

June 30, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

"Munchable" is GenZ cringe. www.propublica.org/article/insi...

Inside the AI Prompts DOGE Used to “Munch” Contracts Related to Veterans’ Health

Experts who reviewed the code for ProPublica found numerous and troubling flaws in the system, providing a disturbing glimpse into how the Trump administration is allowing artificial intelligence to…

www.propublica.org

June 27, 2025 at 6:36 PM

Peter Bull

@peter.drivendata.org

We've built so many low-fidelity prototypes in our HCD work. IMO vibecoding changes the feel of those prototypes, but doesn't change the process. Ask any designer—they'll tell you high-fidelity first iterations are often more distracting to clients than helpful.

www.semafor.com/article/06/0...

June 25, 2025 at 10:27 PM

Peter Bull

@peter.drivendata.org

Check out this LLM circuit trace LLM for the text: '"The statement 'this statement is false' is." It goes through a logical contradictions node, but still outputs either "true" or "false" with the highest probabilities... www.anthropic.com/research/ope...

June 23, 2025 at 6:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news