Lightnews — Scholar-powered news

lvrgd.bsky.social

@lvrgd.bsky.social

2025 is dubbed the Year of Evaluation as agentic systems grow, forcing C‑suite to treat evaluation like a core KPI. Enterprises are now budgeting for whole‑system monitoring, not just single models. Watch the talk: https://youtu.be/CQGuvf6gSrM #AIevaluation #AgenticSystems #EnterpriseAI

Thumbnail for YouTube video: 2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI

August 25, 2025 at 7:18 AM

lvrgd.bsky.social

@lvrgd.bsky.social

V0’s demo shows real‑user data is key to catching hallucinations—build deterministic evals, visualize failures like a basketball court, and plug them into CI to pre‑empt regressions. https://youtu.be/L8OoYeDI_ls #LLMEvals #AIops

Thumbnail for YouTube video: Evals Are Not Unit Tests — Ido Pesok, Vercel v0

August 25, 2025 at 7:17 AM

lvrgd.bsky.social

@lvrgd.bsky.social

The speaker argues the 'Bitter Lesson' shows data‑driven scaling beats domain expertise; he proposes DSPy signatures and evals to decouple system design from models, reducing technical debt. https://youtu.be/qdmxApz3EJI #AI #ML #LLM

Thumbnail for YouTube video: On Engineering AI Systems that Endure The Bitter Lesson - Omar Khattab, DSPy & Databricks

August 25, 2025 at 7:16 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Jeff R. shows retrieval is the real bottleneck in retrieval‑augmented LLMs. Fast evals on synthetic queries beat public benchmarks, and clustering conversation metadata turns usage patterns into product decisions. https://youtu.be/jryZvCuA0Uc #AIProduct #DataDriven

Thumbnail for YouTube video: How to look at your data — Jeff Huber (Choma) + Jason Liu (567)

August 25, 2025 at 7:15 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Conference opening stresses AI engineering’s maturity and urges the community to co‑define standard models such as LLMOS, LLM‑SDLC, and SPADE, prioritizing human input/output ratio over terminology. https://youtu.be/IHkyFhU6JEY #AIEngineering #StandardModels #SPADE

Thumbnail for YouTube video: Designing AI-Intensive Applications - swyx

August 25, 2025 at 7:13 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Brain Trust’s new Loop agent turns the manual eval loop into an automated, data‑driven optimization cycle using frontier LLMs like Cloud4. It integrates side‑by‑side previews and an auto‑apply toggle, speeding up AI product iteration. https://youtu.be/MC55hdWLq4o #AIEngineering #EvalAutomation

Thumbnail for YouTube video: The Future of Evals - Ankur Goyal, Braintrust

August 25, 2025 at 7:12 AM

lvrgd.bsky.social

@lvrgd.bsky.social

OpenAI execs highlight that future AGI relies on a mix of compute‑optimized and latency‑optimized GPUs, with RL‑HF driving reliability. The shift to domain‑specific agents means engineers will focus on model orchestration, not just code. https://youtu.be/avWhreBUYF0 #AI #AGI

Thumbnail for YouTube video: #define AI Engineer - Greg Brockman, OpenAI (ft. Jensen Huang)

August 25, 2025 at 7:11 AM

lvrgd.bsky.social

@lvrgd.bsky.social

The simplest way to keep an icon visible while truncating text is a single TextView with a compound drawable. No extra layout needed; ellipsis appears before the icon automatically. Works on all API levels. https://youtu.be/L8-5ezsoI5A #AndroidDev #TextView #UI

Thumbnail for YouTube video: The Next Unicorns: 7 Top AI startups from the HF0 Residency

August 25, 2025 at 7:10 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Patho.ai’s KAG approach shows how embedding a structured knowledge graph into an LLM enables multi‑step reasoning and numeric inference beyond simple retrieval. The system’s multi‑agent orchestration turns raw data into actionable business insights. https://youtu.be/9AQOvT8LnMI #KAG #KnowledgeGraphs

Thumbnail for YouTube video: Wisdom-Driven Knowledge Augmented Generation at Scale - Chin Keong Lam, Patho AI

August 25, 2025 at 7:09 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Cisco’s Outshift demo shows how a multi‑agent AI system, built on an OpenConfig knowledge graph, turns a ServiceNow ticket into automated testing and reporting, cutting change‑failure rates. https://youtu.be/m0dxZ-NDKHo #NetworkAutomation #OpenConfig

Thumbnail for YouTube video: Multi Agent AI and Network Knowledge Graphs for Change — Ola Mabadeje, Cisco

August 25, 2025 at 7:09 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Hazing treats prompt‑injection as an optimization problem, using gradient‑guided token edits and agent‑based judges to uncover jailbreaks in minutes. This turns months of manual testing into a rapid, automated pipeline. https://youtu.be/OMGPvW8TBHc #LLMsecurity #AIhazing

Thumbnail for YouTube video: Fuzzing in the GenAI Era — Leonard Tang, Haize Labs

August 25, 2025 at 7:08 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Generative UI and LLMs dissolve design‑engineering silos, treating AI as a co‑worker and stressing a material‑first approach. V0 prototypes reveal emergent features. https://youtu.be/CiMVKnX-CNI #AIUX #LLMDesign #GenerativeUI

Thumbnail for YouTube video: Form factors for your new AI coworkers — Craig Wattrus, Flatfile

August 25, 2025 at 7:08 AM

lvrgd.bsky.social

@lvrgd.bsky.social

BlackRock’s AI‑app framework demonstrates that domain‑specific prompt engineering and a sandbox/factory architecture can accelerate investment‑operations workflows while embedding mandatory compliance checkpoints. https://youtu.be/08mH36_NVos #AIinFinance #PromptEngineering

Thumbnail for YouTube video: How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

August 25, 2025 at 7:08 AM

lvrgd.bsky.social

@lvrgd.bsky.social

Current AI metrics ignore human perception. Rodriguez shows how JPEG’s perceptual tricks inspire better evaluation, urging metrics that learn from human aesthetic judgment. Watch the full talk: https://youtu.be/h5ItAJuB3Fc #AI #HumanPerception #Evaluation

Thumbnail for YouTube video: Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

August 25, 2025 at 7:07 AM

lvrgd.bsky.social

@lvrgd.bsky.social

The talk shows that an eval system is an engineered, automated artifact; when it demonstrates clear business value—like a 1‑day model rollout—it speaks for itself. https://youtu.be/a4BV0gGmXgA #LLMEvaluation #ProductEngineering

Thumbnail for YouTube video: Five hard earned lessons about Evals — Ankur Goyal, Braintrust

August 25, 2025 at 7:07 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news