Lightnews — Scholar-powered news

Philipp Schmid

@philschmid.bsky.social

Code and methods open source in a new library ,“learn and search”
Blog: huggingface.co/spaces/Huggi...

Learn and Search Repo: github.com/huggingface/...

Scaling test-time compute - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

huggingface.co

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Introduce DVTS, a new method of performance on larger compute budgets by maintaining solution diversity
- Using compute-optimal scaling, a Llama 3 3B outperforms 70B (22x larger) on mathematical reasoning tasks

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Process Reward Models (PRMs) played a crucial role in the search process by evaluating intermediate solution steps
- Different search strategies work better for different problem difficulties - beam search for harder problems, Best-of-N for simpler ones

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Test-time compute scaling offers an alternative to training larger models by allowing smaller models to "think longer"
- Explored Best-of-N sampling, beam search, and Diverse Verifier Tree Search (DVTS)
- Llama 3 1B achieved 55% accuracy on the MATH benchmark using optimal search strategies

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

By scaling test-time compute, smaller models can match or even surpass the performance of larger models. Llama 3.2 3B can outperform Llama 3.1 70B on MATH-500!🤯

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- 🛠️ Cuts down costs to ~2.29% and time to ~2.36% of human evaluation
- 💰 Costs $30 vs $1,297 for human evaluation
- ⚡ Reduced time to 118.43 minutes vs 86.5 hours
- 🧑‍⚖️ LLM achieved a 60-70% alignment rate to humans
- 🥇 Agent achieved a 90% alignment rate to humans

huggingface.co/datasets/DEV...

DEVAI-benchmark/DEVAI · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

December 10, 2024 at 9:53 AM

Philipp Schmid

@philschmid.bsky.social

The Agent-as-a-Judge is a graph-based agent with tools to locate, read, retrieve, and evaluate files and information for a code project to evaluate the results of other agents by comparing its judgments to human evaluations (alignment rate, judge shift).

Github: github.com/metauto-ai/a...

December 10, 2024 at 9:53 AM

Philipp Schmid

@philschmid.bsky.social

Sora UI: sora.com

Kudos to OpenAI for shipping this! The UI/UX looks really thorough! 🚢

Sora

Transform text and images into immersive videos. Animate stories, visualize ideas, and bring your concepts to life.

sora.com

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

OpenAI trained a new Turbo model to make it easier and faster to use. With "storyboards" users get a CapCut/Tiktok/Reel-like text-to-video editor, that can be used to edit and create new short-form content! Social media will be flooded.🌊

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

Blog: qwenlm.github.io/blog/qwq-32b...
Model: huggingface.co/Qwen/QwQ-32B...
Demo: huggingface.co/spaces/Qwen/...

QwQ: Reflect Deeply on the Boundaries of the Unknown

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”. What does it mean to think, to question, to understand? These are the deep wa...

qwenlm.github.io

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

- ⚠️ notable limitations including language mixing, recursive reasoning loops, and safety considerations
- 😍 Released under Apache 2.0 on Hugging Face
- 👀 Full “reasoning” (CoT) available in the demo

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

- 👨‍🔬 QwQ-32B-Preview is an experimental research
- 🔧 32.5B parameters and 32,768 context length
- 📊 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

Models: huggingface.co/HuggingFaceT...
Blog: huggingface.co/blog/smolvlm

HuggingFaceTB/SmolVLM-Instruct · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

🎥 Surprising video capabilities with 27.14% on CinePile
🔓 Released under Apache 2.0 on @huggingface.bsky.social
📱 Can run efficiently on laptops and edge devices

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

🚀 Smallest SOTA vision language model at only 2B parameters
🛠️ Released 3 variants with Base, Synthetic, and Instruct
💾 Requires only 5GB GPU RAM and achieves 38.8% on MMMU, 81.6% on DocVQA
⚡ 3.3-4.5x faster prefill and 7.5-16x faster generation vs Qwen2-VL

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

Blog: neuralmagic.com/blog/24-spar...
Pruning is not a new technique, but it was much harder to achieve good results and maintain performance across tasks compared to quantization. Let's see if Neural Magic can change that.

2:4 Sparse Llama: Smaller Models for Efficient GPU Inference

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

neuralmagic.com

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

- 📈 Full recovery on fine-tuning tasks (GSM8K, Evol-CodeAlpaca, Ultrachat-200K)
- ⚡ 1.4-2.1x better multi-query throughput
- 🌱 Pruned using 13B tokens training, 26 hours on 32 H100s
- 🔧 Optimized for NVIDIA Ampere GPUs and newer

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

- 🔄 98.4% original accuracy on on Open LLM Leaderboard v1 with 50% less parameters using 2:4 sparsity pattern
- 🚀 30% higher throughput and 1.8x lower latency with up to 5.0x when combined with quantization
- 💻 Works with 4-bit quantization (GPTQ) and Sparse-Marlin kernels

November 26, 2024 at 8:24 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news