Lightnews — Scholar-powered news

elvis

@eos.bsky.social

Hallucinations are generally bad for real-world LLM applications.

Folks like Karpathy have suggested that hallucination is an LLM's greatest feature.

Is there any evidence for the latter?

January 24, 2025 at 1:56 PM

elvis

@eos.bsky.social

Google recently published this great whitepaper on Agents.

2025 is a huge year for AI Agents.

Here's what's included:

- Introduction to AI Agents
- The role of tools in Agents
- Enhancing model performance
- Quick start to Agents with LangChain
- Production applications with Vertex AI Agents

January 6, 2025 at 7:35 PM

elvis

@eos.bsky.social

Dive into Time-Series Anomaly Detection: A Decade Review

Provides a survey of time-series anomaly detection solutions.

January 6, 2025 at 2:18 PM

elvis

@eos.bsky.social

Proposes a self-training strategy to mitigate overthinking in o1-like LLMs.

This approach can reduce token output by 48.6% while maintaining accuracy on the widely-used MATH500 test set as applied to QwQ-32B-Preview.

There are three parts to this study:

- analysis of the overthinking issue

January 2, 2025 at 4:03 PM

elvis

@eos.bsky.social

Just came across this interesting project.

RWKV combines the best of RNN and transformers.

There is code for training your own model, fine-tuning, GUI, API, fast WebGPU inference, and more.

Cool project!

January 2, 2025 at 3:15 PM

elvis

@eos.bsky.social

Agentarium is a new Python framework for managing and orchestrating AI agents.

Features include (from the repo):

• 🤖 Advanced Agent Management: Create and orchestrate multiple AI agents with different roles and capabilities

December 31, 2024 at 3:22 PM

elvis

@eos.bsky.social

A Survey of Mathematical Reasoning in the Era of Multimodal LLMs

arxiv.org/abs/2412.11936

December 20, 2024 at 3:18 PM

elvis

@eos.bsky.social

Nice use case of Gemini

December 17, 2024 at 2:24 PM

elvis

@eos.bsky.social

This is worth looking at.

December 16, 2024 at 3:10 PM

elvis

@eos.bsky.social

Very exciting guide on working with LLMs.

A few chapters have been made available already like structured output and evals. More to come soon!

December 13, 2024 at 3:20 PM

elvis

@eos.bsky.social

IBM open-sources Granite Guardian, a suite of safeguards for risk detection in LLMs.

December 11, 2024 at 2:28 PM

elvis

@eos.bsky.social

A Survey on LLMs-as-Judges

Presents a comprehensive survey of the LLMs-as-judges paradigm from five key perspectives: Functionality, Methodology, Applications, Meta-evaluation, and Limitations.

December 10, 2024 at 5:52 PM

elvis

@eos.bsky.social

A Multi-Agent Framework for Synthetic Data Generation

Presents MAG-V, a multi-agent framework that first generates a dataset of questions that mimic customer queries. It then reverse engineer alternate questions from responses to verify agent trajectories.

arxiv.org/abs/2412.04494

December 10, 2024 at 4:02 PM

elvis

@eos.bsky.social

Reinforcement Learning: An Overview

This gem just dropped on arXiv.

An up-to-date overview of reinforcement learning and sequential decision-making.

arxiv.org/abs/2412.05265

December 9, 2024 at 3:38 PM

elvis

@eos.bsky.social

RARE: Retrieval-Augmented Reasoning for LLMs

Extends the rStar reasoning framework to enhance reasoning accuracy and factual reliability of LLMs.

December 5, 2024 at 3:04 PM

elvis

@eos.bsky.social

Nice set of tips for mitigating AI hallucinations.

www.kapa.ai/blog/ai-hall...

December 5, 2024 at 2:44 PM

elvis

@eos.bsky.social

DataLab: A Unified Platform for LLM-Powered Business Intelligence

Introduces DataLab, a unified BI platform that integrates an LLM-based agent framework with an augmented computational notebook interface.

December 4, 2024 at 3:14 PM

elvis

@eos.bsky.social

LLM-brained GUI Agents

Great survey on LLM-brained GUI agents, including techniques and applications.

https://arxiv.org/abs/2411.18279

November 28, 2024 at 1:57 PM

elvis

@eos.bsky.social

This new paper extends in-context learning through high-level automated reasoning.

It achieves state-of-the-art accuracy (79.6%) on the MATH benchmark with Qwen2.5-7B-Instruct, surpassing GPT-4o (76.6%) and Claude 3.5 (71.1%).

November 28, 2024 at 1:48 PM

elvis

@eos.bsky.social

LLMs surpass human experts in predicting neuroscience results

Scientific discovery is the next big goal for AI. We are seeing a huge number of research studies tackling AI-powered scientific discovery from different angles and for different problems.

November 27, 2024 at 2:36 PM

elvis

@eos.bsky.social

o1 Replication Journey - Part 2

Shows that combining simple distillation from O1's API with supervised fine-tuning significantly boosts performance on complex math reasoning tasks.

November 26, 2024 at 2:09 PM

elvis

@eos.bsky.social

Pushing Frontiers in Open Language Model Post-Training

This is probably one of the important open-source efforts in post-training of LLMs.

November 25, 2024 at 4:31 PM

elvis

@eos.bsky.social

Measuring Bullshit in the Language Games played by ChatGPT

Proposes that LLM-based chatbots play the ‘language game of bullshit’

November 25, 2024 at 3:16 PM

elvis

@eos.bsky.social

Cheaper and larger context-length text embedding models.

Voyage AI releases voyage-3 and voyage-3-lite embedding models.

November 24, 2024 at 6:05 PM

elvis

@eos.bsky.social

Reasoning over Structured Data with LLMs

Proposes StructGPT to improve the zero-shot reasoning ability of LLMs over structured data. Effective for solving question answering tasks based on structured data.

paper: arxiv.org/abs/2305.09645

May 17, 2023 at 1:25 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news