Tag
1 article tagged with #evals.
Why asserting equals doesn't work for LLMs, how LLM-as-judge fills the gap, and where evals plug into CI/CD as the quality gate for AI features.