Author | Lightnews

Galileo.ai

Galileo.ai

@rungalileo.bsky.social

34 followers 18 following 150 posts

The fastest way to ship reliable AI apps - Evaluation, Experimentation, and Observability Platform

Posts Media Videos Starter Packs

Reposted by Galileo.ai

Jim Bennett @jimbobbennett.dev · Aug 11

✨Here's why your AI is lying!✨

The other week at @devrelcon.bsky.social I sat down to chat with Joseph Petty from @appsmith.bsky.social about AI, why you need evaluations, and how @rungalileo.bsky.social can help you.

Oh, and 🌶️ Jim's spicy take on AI 👀

youtu.be/I2vRx5Ieak8?...

Why Every AI Company Needs an AI to Test Their AI

YouTube video by Appsmith

Galileo.ai @rungalileo.bsky.social · Jul 3

Success for AI agents varies greatly by domain and requires nuanced, domain-specific metrics.

@erinmikail.bsky.social's new tutorial shows how to build and track tailored custom metrics using Galileo for reliable AI evaluation.

Read Erin's blog here: galileo.ai/blog/silly-s...

Galileo.ai @rungalileo.bsky.social · Jun 11

🎤 Watch the full episode:
Youtube: youtu.be/Iz89GXBOC28
Spotify: open.spotify.com/episode/3MZb...

Your Key to AI Success is Hiding in Plain Sight | Cohesity's Greg Statton

YouTube video by Galileo

Galileo.ai @rungalileo.bsky.social · Jun 11

Same with AI—if you throw an LLM at it and hope it'll figure itself out, you won't get the accuracy you want.”

To get the accuracy you’re looking for, you need to:
– Understand your data pipelines
– Test and evaluate continuously
– Treat infrastructure like your spellbook—essential for reliability

Galileo.ai @rungalileo.bsky.social · Jun 11

Greg Statton, at Cohesity, joins Conor Bronsdon on Chain of Thought, draws a sharp analogy between AI implementation and D&D:
💬 “AI is marketed as this magic bullet… but anyone who's played D&D knows—if you're a wizard trying to harness the power of the universe, you've got a lot of studying to do.

Galileo.ai @rungalileo.bsky.social · Jun 11

Deploying an LLM without the right infrastructure in place is like casting spells without a spellbook.

#AI #LLM #AIEvaluation #MLOps #DataQuality #Cohesity #GalileoAI #ChainOfThought #Podcast

Galileo.ai @rungalileo.bsky.social · Jun 11

Including Graph View, you now have three complementary ways to debug your agents:

→ Graph: Visualize decision paths and tool usage
→ Timeline: Spot performance bottlenecks instantly
→ Conversation: See the user experience end-to-end
→ Try these new views for yourself: app.galileo.ai/sign-up

Galileo.ai @rungalileo.bsky.social · Jun 11

We’re excited to release 2 new AI agent interfaces that make agent observability & evaluations even more effective.

- Timeline View: No more guessing where your agent gets stuck, see execution flow & bottlenecks quickly.

- Conversation View: Debug from the user's perspective, not just the system's

Galileo.ai @rungalileo.bsky.social · Jun 11

📊 Multi-tiered feedback loops in the wild: Learn how real-world reactions, iterative testing, and context-sensitive scoring reshape evaluation.

🎤 Comedy as a proving ground: See why humor is a great stress test for LLMs, and what it teaches us about creativity in AI.

Galileo.ai @rungalileo.bsky.social · Jun 11

You’ll hear what goes wrong (a lot), what we’re still learning about task-specific evaluation, & why evaluating funny is one of the hardest prompts in the game.

🌀 Chaos-tested LLM evaluation frameworks: Why standard metrics break down & what to use instead when the output is "lol" not "true/false."

Galileo.ai @rungalileo.bsky.social · Jun 11

What do LLM evals and comedy have in common? Timing.

Join @erinmikail.bsky.social at the #databricks #DataAISummit as she breaks down what it really takes to test LLMs in unexpected domains—like generating humor.

Come for the eval benchmarks. Stay for the chaos.

#GenAI #LLMevals #AIUX #LLMops

Galileo.ai @rungalileo.bsky.social · Jun 9

Make sure to stop by booth #120 to say hi to @erinmikail.bsky.social and the Galileo team during the #DataAISummit this week!

Galileo.ai @rungalileo.bsky.social · Jun 6

Check out the sessions!
🗓️ 6/10 Startup Forum Panel
www.databricks.com/dataaisummit...
🗓️ 6/11 Generating Laughter: Testing & Evaluating the Success of LLMs for Comedy
www.databricks.com/dataaisummit...
🗓️ 6/12 Taming Rogue AI Agents: Observability for Agentic Systems
www.databricks.com/dataaisummit...

Startup Forum | Databricks

Hear from VC leaders, startup founders and early stage customers building on Databricks around what they are seeing in the market and how they are scaling their early stage companies on Databricks. Th...

www.databricks.com

Galileo.ai @rungalileo.bsky.social · Jun 6

Next week, Galileo is headed to San Francisco for the Databricks Data + AI Summit!

If you’re building with LLMs, testing agents, or just trying to trust what your models are doing in production, come find us at Booth #120

Reposted by Galileo.ai

Jim Bennett @jimbobbennett.dev · Jun 3

On my way to SF. If you’re attending the AI Engineering worlds fair and want ti learn why your AI needs reliability and evaluations come say hi at the @rungalileo.bsky.social booth.

Jim sitting on a plane wearing a white hoodie and orange glasses

Galileo.ai @rungalileo.bsky.social · Jun 2

📻 Tune into the full episode:

YouTube: youtu.be/35lDfbum0K4

Spotify: open.spotify.com/episode/4jjA...

Why Enterprises Need a Different Approach to AI Agents | @LyzrAI's Siva Surendira

YouTube video by Galileo

Galileo.ai @rungalileo.bsky.social · Jun 2

Enterprise AI isn't just about building responsibly - it's about proving it works safely at scale. When something goes wrong, you need to be able to explain why and how to fix it.

Ready to add that extra layer of AI evaluation to your enterprise systems? 🛡️

Galileo.ai @rungalileo.bsky.social · Jun 2

Siva Surendira, CEO of Lyzr, perfectly captures why enterprises need robust AI evaluation:

"I recommend Galileo as the antivirus equivalent for your AI system - you need these checks & balances. A MacBook is secure by nature, having that additional layer catches things the core system might miss."

Galileo.ai @rungalileo.bsky.social · May 30

🚨 Heading to the AI Engineers World’s Fair in SF next week?

I’ll (@JimBobBennett) be there with the Galileo crew—booth, talks, party, and all. I’m giving a talk on “Taming Your AI Agents with Evaluations”, aka how to stop your AI from making up entire book reports (Chicago Sun-Times, we see you 👀).

Galileo.ai @rungalileo.bsky.social · May 29

➡️ Learn how to set up your ‪MongoDB‬ Atlas account and configure it with ‪LangChain‬. Then we'll guide you through ingesting your data and utilizing the console to understand agent behavior and retriever tool performance.

📖 Read more: v2docs.galileo.ai/cookbooks/us...

MongoDB Atlas Integration for Retrieval-Augmented Generation (RAG) - Galileo

Guide to using MongoDB Atlas Vector Search with LangGraph agents logging to Galileo.

v2docs.galileo.ai

Galileo.ai @rungalileo.bsky.social · May 29

We just dropped a new walkthrough showing you how to build powerful agents by combining MongoDB Atlas with Galileo.

Galileo.ai @rungalileo.bsky.social · May 27

Full blog post: www.galileo.ai/blog/ai-agen...

The AI Agent Evaluation Blueprint: Part 1

Playbook for building and shipping reliable Al Agents

Galileo.ai @rungalileo.bsky.social · May 27

𝗧𝗵𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗕𝗹𝘂𝗲𝗽𝗿𝗶𝗻𝘁: 𝗣𝗮𝗿𝘁 𝟭

Galileo.ai @rungalileo.bsky.social · May 23

In less than three years, your new coworker might not be human. 🤖

@poolsideai co-founders @JasoncWarner and @EisoKant believe AI will soon collaborate with teams inside high-consequence environments such as banking, energy, and healthcare-grade software.

Galileo.ai @rungalileo.bsky.social · May 22

Agentic AI isn't just reactive, it's a proactive partner.

On the Chain of Thought podcast with @ConorBronsdon, @Amplitude_HQ's Chief Engineering Officer, @Wade Chambers, explains how systems like Ask Amplitude transform AI from a tool into a team of PhDs embedded in your product.