lvrgd.bsky.social
@lvrgd.bsky.social
2025 is dubbed the Year of Evaluation as agentic systems grow, forcing C‑suite to treat evaluation like a core KPI. Enterprises are now budgeting for whole‑system monitoring, not just single models. Watch the talk: https://youtu.be/CQGuvf6gSrM #AIevaluation #AgenticSystems #EnterpriseAI
August 25, 2025 at 7:18 AM
V0’s demo shows real‑user data is key to catching hallucinations—build deterministic evals, visualize failures like a basketball court, and plug them into CI to pre‑empt regressions. https://youtu.be/L8OoYeDI_ls #LLMEvals #AIops
August 25, 2025 at 7:17 AM
The speaker argues the 'Bitter Lesson' shows data‑driven scaling beats domain expertise; he proposes DSPy signatures and evals to decouple system design from models, reducing technical debt. https://youtu.be/qdmxApz3EJI #AI #ML #LLM
August 25, 2025 at 7:16 AM
Jeff R. shows retrieval is the real bottleneck in retrieval‑augmented LLMs. Fast evals on synthetic queries beat public benchmarks, and clustering conversation metadata turns usage patterns into product decisions. https://youtu.be/jryZvCuA0Uc #AIProduct #DataDriven
August 25, 2025 at 7:15 AM
Conference opening stresses AI engineering’s maturity and urges the community to co‑define standard models such as LLMOS, LLM‑SDLC, and SPADE, prioritizing human input/output ratio over terminology. https://youtu.be/IHkyFhU6JEY #AIEngineering #StandardModels #SPADE
August 25, 2025 at 7:13 AM
Brain Trust’s new Loop agent turns the manual eval loop into an automated, data‑driven optimization cycle using frontier LLMs like Cloud4. It integrates side‑by‑side previews and an auto‑apply toggle, speeding up AI product iteration. https://youtu.be/MC55hdWLq4o #AIEngineering #EvalAutomation
August 25, 2025 at 7:12 AM
OpenAI execs highlight that future AGI relies on a mix of compute‑optimized and latency‑optimized GPUs, with RL‑HF driving reliability. The shift to domain‑specific agents means engineers will focus on model orchestration, not just code. https://youtu.be/avWhreBUYF0 #AI #AGI
August 25, 2025 at 7:11 AM
The simplest way to keep an icon visible while truncating text is a single TextView with a compound drawable. No extra layout needed; ellipsis appears before the icon automatically. Works on all API levels. https://youtu.be/L8-5ezsoI5A #AndroidDev #TextView #UI
August 25, 2025 at 7:10 AM
Patho.ai’s KAG approach shows how embedding a structured knowledge graph into an LLM enables multi‑step reasoning and numeric inference beyond simple retrieval. The system’s multi‑agent orchestration turns raw data into actionable business insights. https://youtu.be/9AQOvT8LnMI #KAG #KnowledgeGraphs
August 25, 2025 at 7:09 AM
Cisco’s Outshift demo shows how a multi‑agent AI system, built on an OpenConfig knowledge graph, turns a ServiceNow ticket into automated testing and reporting, cutting change‑failure rates. https://youtu.be/m0dxZ-NDKHo #NetworkAutomation #OpenConfig
August 25, 2025 at 7:09 AM
Hazing treats prompt‑injection as an optimization problem, using gradient‑guided token edits and agent‑based judges to uncover jailbreaks in minutes. This turns months of manual testing into a rapid, automated pipeline. https://youtu.be/OMGPvW8TBHc #LLMsecurity #AIhazing
August 25, 2025 at 7:08 AM
Generative UI and LLMs dissolve design‑engineering silos, treating AI as a co‑worker and stressing a material‑first approach. V0 prototypes reveal emergent features. https://youtu.be/CiMVKnX-CNI #AIUX #LLMDesign #GenerativeUI
August 25, 2025 at 7:08 AM
BlackRock’s AI‑app framework demonstrates that domain‑specific prompt engineering and a sandbox/factory architecture can accelerate investment‑operations workflows while embedding mandatory compliance checkpoints. https://youtu.be/08mH36_NVos #AIinFinance #PromptEngineering
August 25, 2025 at 7:08 AM
Current AI metrics ignore human perception. Rodriguez shows how JPEG’s perceptual tricks inspire better evaluation, urging metrics that learn from human aesthetic judgment. Watch the full talk: https://youtu.be/h5ItAJuB3Fc #AI #HumanPerception #Evaluation
August 25, 2025 at 7:07 AM
The talk shows that an eval system is an engineered, automated artifact; when it demonstrates clear business value—like a 1‑day model rollout—it speaks for itself. https://youtu.be/a4BV0gGmXgA #LLMEvaluation #ProductEngineering
August 25, 2025 at 7:07 AM