Evaluations

Run structured evaluations on your voice agent

Combine scenarios, profiles, and metrics into a single evaluation. Know exactly where your agent passes, fails, and why.

12 scenarios

5 profiles

78% SSR

Completed

EvaluationCompleted

Support agent v2.3 — regression test

Full evaluation across all support scenarios

Scenarios12

Profiles5

Total runs180

Overall SSR

78%

What is a voice agent evaluation?

An evaluation is a structured test run. You select which scenarios to test, which caller profiles to use, and which metrics to measure. Evalgent runs every combination, scores each call, and gives you a clear pass/fail verdict per test.

Evaluations automate what manual QA cannot scale — running hundreds of test conversations in parallel and surfacing failures before they reach production.

How to set-up a campaign for voice agent evaluation?

Define your test matrix

Pick scenarios, profiles, and metrics to include in your evaluation campaign. Each combination becomes a test.

Test matrix

Scenarios

Refund request handling

Appointment booking

Account cancellation

+ 9 more selected

Profiles

Impatient caller

Elderly speaker

Non-native English

Metrics

Tone consistency

Response latency

Knowledge accuracy

Total combinations12 × 5 = 60 tests

Set success criteria

Configure run count and pass thresholds. Define how many runs per test and what SSR score counts as a pass.

Success criteria

Runs per test

1·5·10

SSR pass threshold

≥ 70%

70%

Verdict logic

A test passes if ≥ 70% of its runs succeed across all conditions

Review & launch

Confirm your configuration and launch the evaluation. Evalgent handles the rest — running every test and collecting results.

Review & launch

Support agent v2.3 — regression test

Full evaluation across all support scenarios

Scenarios

Profiles

Metrics

Runs per test3

Pass threshold≥ 70% SSR

Total runs180

Run evaluation →

See exactly where your agent stands

Results matrix

See pass/fail rates across every scenario × profile combination

Evidences

Turn-level proof for every success condition — from transcripts, recordings, and scored outcomes

Recommendations

Beta

Receive targeted suggestions to improve agent performance based on evaluation results

The difference structured evaluations make

Without structured evaluations

Manual testing today

Manual QA on a handful of calls
No consistency across test conditions
No way to compare versions objectively
Results live in spreadsheets or Slack threads

With Evalgent evaluations

Structured & automated

Every scenario × profile combination tested automatically
Consistent caller simulation with realistic conditions
Version-over-version comparison with the same test matrix
Results in one place — verdicts, scores, transcripts, audio

Know if your voice agent is ready for production

Functional

Behavioral

Limit