Lightnews — Scholar-powered news

Clem Delangue 🤗

@clem.hf.co

Happy holidays everyone! What would you want to see more from Hugging Face in 2025?

December 24, 2024 at 12:24 PM

Reposted by Clem Delangue 🤗

Ben Burtenshaw

@benburtenshaw.bsky.social

People are flexing their end of year stats, so I made this app to show @hf.co hub stats in a tidy design!

Thanks @jfcalvo.hf.co and @ameeelie.bsky.social for the feature!

December 19, 2024 at 1:28 PM

Clem Delangue 🤗

@clem.hf.co

Just 10 days after o1's public debut, we’re thrilled to unveil the open-source version of the technique behind its success: scaling test-time compute

By giving models more "time to think," Llama 1B outperforms Llama 8B in math—beating a model 8x its size. The full recipe is open-source!

December 16, 2024 at 9:42 PM

Clem Delangue 🤗

@clem.hf.co

Who’s at #neurips2024 and want to meet HF team members?

December 13, 2024 at 11:02 PM

Reposted by Clem Delangue 🤗

Guilherme Penedo

@guilherme.hf.co

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

December 8, 2024 at 9:19 AM

Reposted by Clem Delangue 🤗

Thomas Wolf

@thomwolf.bsky.social

The FineWeb team is happy to finally release "FineWeb2" 🥂🥳

FineWeb 2 extends the data driven approach to pre-training dataset design that was introduced in FineWeb 1 to now covers 1893 languages/scripts

Details: huggingface.co/datasets/Hug...

A detailed open-science tech report is coming soon

December 8, 2024 at 9:08 AM

Reposted by Clem Delangue 🤗

Christopher Akiki

@cakiki.bsky.social

The folks at Foursquare released a @hf.co dataset of 104.5 million places of interest and here's all of them plotted using datashader

December 8, 2024 at 1:34 PM

Clem Delangue 🤗

@clem.hf.co

I weirdly love this! #5 trending on HF right now

December 6, 2024 at 3:19 PM

Clem Delangue 🤗

@clem.hf.co

Excited to see more biology open-source models for real positive use-cases of AI!

Chai does structure predictions at AlphaFold3 levels of accuracy and able to handle multi-peptide or peptide-ligand complexes rather than just single chains.

Apache 2.0 on HF huggingface.co/chaidiscover...

December 5, 2024 at 2:39 PM

Reposted by Clem Delangue 🤗

Cosmico

@cosmico.org

🤖 6 AI Predictions for 2025 by Hugging Face CEO

#HuggingFace
#ClementDelangue
#ArtificialIntelligence

www.cosmico.org/6-ai-predict...

6 AI Predictions for 2025 by Hugging Face CEO | Cosmico

Hugging Face CEO predicts AI protests, market disruptions, personal robots, and breakthroughs in science, as China leads the AI race in 2025.

www.cosmico.org

December 4, 2024 at 12:48 AM

Reposted by Clem Delangue 🤗

Daniel Vila

@dvilasuero.hf.co

Let's make AI more inclusive.

At @huggingface.bsky.social we'll launch a huge community sprint soon to build high-quality training datasets for many languages.

We're looking for Language Leads to help with outreach.

Find your language and nominate yourself:
forms.gle/iAJVauUQ3FN8...

November 26, 2024 at 6:29 AM

Clem Delangue 🤗

@clem.hf.co

Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

December 2, 2024 at 2:08 PM

Reposted by Clem Delangue 🤗

Ted Underwood

@tedunderwood.com

I would put this even more strongly: open source AI is probably our only realistic chance to avoid a terrifying increase in concentration of power. I do not want to live in a world where the people with all the money also have all the intellectual power.

Nathan Lambert @natolambert.bsky.social · Nov 29

The most realistic reason to be pro open source AI is to reduce concentration of power.

Alondra Nelson @alondra.bsky.social · Nov 29

"money has flowed to tech giants and others in their orbit... [and] raises an uncomfortable prospect: that this supposedly revolutionary technology might never deliver on its promise of broad economic transformation, but instead just concentrate more wealth" www.bloomberg.com/opinion/arti...

November 29, 2024 at 9:35 PM

Clem Delangue 🤗

@clem.hf.co

Good list if you want to understand AI! go.bsky.app/Nik64nt

December 1, 2024 at 1:55 PM

Clem Delangue 🤗

@clem.hf.co

QwQ is #1 trending on @hf.co!

November 29, 2024 at 10:04 PM

Reposted by Clem Delangue 🤗

Simon Willison

@simonwillison.net

This demo of structured data extraction running on an LLM that executes entirely in the browser (Chrome only for the moment since it uses WebGPU) is amazing

My notes here: simonwillison.net/2024/Nov/29/...

November 29, 2024 at 9:10 PM

Reposted by Clem Delangue 🤗

Ishan Khatri

@ishan.khatri.io

Wasn't going to wade into this but... as long as the internet exists in an open form, scraping will exist, and that's kind of the whole point... APIs exist because interacting with data is valuable and scraping (the old fashioned way) is annoying.

November 29, 2024 at 1:04 AM

Clem Delangue 🤗

@clem.hf.co

For me, the biggest risk in AI is centralization of power and benefits in the hands of a few. This is why at @hf.co, we've been investing on building online courses for as many people as possible to understand and learn to build AI themselves. You can find some of them here: huggingface.co/learn

November 29, 2024 at 8:32 PM

Reposted by Clem Delangue 🤗

Nathan Lambert

@natolambert.bsky.social

The most realistic reason to be pro open source AI is to reduce concentration of power.

Alondra Nelson @alondra.bsky.social · Nov 29

"money has flowed to tech giants and others in their orbit... [and] raises an uncomfortable prospect: that this supposedly revolutionary technology might never deliver on its promise of broad economic transformation, but instead just concentrate more wealth" www.bloomberg.com/opinion/arti...

ChatGPT’s $8 Trillion Birthday Gift to Big Tech

Two years in, generative AI’s value to the world is still unclear. But these charts show that it’s been a bonanza for the largest tech firms.

www.bloomberg.com

November 29, 2024 at 6:55 PM

Reposted by Clem Delangue 🤗

Jay 🦋

@jay.bsky.team

I’m thankful for everyone using Bluesky, everyone building on atproto, everyone listening to our message and sharing our dream of a better social media ecosystem that puts people first. We’re going to do this together — thanks for joining us on this journey!

November 28, 2024 at 10:32 PM

Reposted by Clem Delangue 🤗

Stella Biderman

@stellaathena.bsky.social

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models.

The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

Jeremy Howard @howard.fm · Nov 28

Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.

The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.

November 28, 2024 at 6:57 PM

Reposted by Clem Delangue 🤗

Jeremy Howard

@howard.fm

Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.

The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.

November 28, 2024 at 6:12 PM

Reposted by Clem Delangue 🤗

Hey it's Jess

@jessthebp.bsky.social

new post: bluesky vs. huggingface for normal people

jessbpeck.com/posts/bluesk...

anytime i spend less than 5 months on a thing you know i have opinions.

anyway, as always, feel free to argue with me/complain at me/ point out errors.

BlueSky vs. HuggingFace An Explainer for Normies

This is Jess B Peck's personal website. SEO, Analytics, big data, small data, and the web.

jessbpeck.com

November 27, 2024 at 6:37 PM

Reposted by Clem Delangue 🤗

Dr. Casey Fiesler

@cfiesler.bsky.social

Hi, so I've spent the past almost-decade studying research uses of public social media data, like e.g. ML researchers using content from Twitter, Reddit, and Mastodon.

Anyway, buckle up this is about to be a VERY long thread with lots of thoughts and links to papers. 🧵

Daniel van Strien @danielvanstrien.bsky.social · Nov 26

First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋

📊 1M public posts from Bluesky's firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗

huggingface.co/datasets/blu...

bluesky-community/one-million-bluesky-posts · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 27, 2024 at 3:31 PM

Reposted by Clem Delangue 🤗

Michael Nuñez

@michaelnunez.bsky.social

Hugging Face just released SmolVLM, a powerful new AI model that could seriously cut costs and boost efficiency for businesses! 🤯

🔗 Read more: venturebeat.com/ai/hugging-f...

Hugging Face’s SmolVLM could cut AI costs for businesses by a huge margin

Hugging Face launches SmolVLM, a compact and efficient vision-language AI model, offering businesses a cost-effective solution for advanced AI implementation without sacrificing performance.

venturebeat.com

November 27, 2024 at 6:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news