Daniël Spee is a software engineer at Yuma with a passion for search, AI, and data-centric systems. He believes that great search goes beyond technology and requires a deep understanding of user intent, data semantics, and the business context that drives decision-making.
By combining AI techniques with classical search approaches, Daniël builds smarter, context-aware systems that bridge the gap between information and insight. His current focus is on leveraging AI agents and retrieval pipelines to automate and enhance real-world workflows, turning data into action and intelligence.
Abstract
Would you let a stranger handle your customer data?
Would you let a new hire talk to a client on their first day?
Would you put your kid in a self-driving car and just say "Have fun at school."
Then why do we trust our shiny new AI Agents to behave correctly in production without testing them?
In this talk, we share our journey of exploring how to evaluate Agentic Systems before and after deployment. We’ll walk through how to move from “it works in the demo” to trustworthy and observable systems that you can confidently run in production.
We’ll show practical examples of building evaluation pipelines, and how we experiment with simple, measurable ways to understand an agent’s behavior over time. We’ll share what we’ve learned so far, where things go wrong, what helps, and what’s still an open challenge as we build toward more mature evaluation practices.
Expect real experiences, not just theory. Expect live examples, and ideas you can take home to build trust into your own agents.
Key Takeaways
- Why testing AI Agents is different from traditional software testing
- How to design evaluation frameworks that fit your use case
- How to combine offline testing with live production observation
Target Audience
Developers, architects, and AI practitioners who are experimenting with or building agent-based systems and want to learn how to evaluate and test them effectively.
Searching for speaker images...
