Evalgent
Evaluations

Run structured evaluations on your voice agent

Combine scenarios, profiles, and metrics into a single evaluation. Know exactly where your agent passes, fails, and why.

EvaluationCompleted

Support agent v2.3 — regression test

Full evaluation across all support scenarios

Scenarios12
Profiles5
Total runs180
Overall SSR
78%

What is a voice agent evaluation?

An evaluation is a structured test run. You select which scenarios to test, which caller profiles to use, and which metrics to measure. Evalgent runs every combination, scores each call, and gives you a clear pass/fail verdict per test.

Evaluations automate what manual QA cannot scale — running hundreds of test conversations in parallel and surfacing failures before they reach production.

How to set-up a campaign for voice agent evaluation?

Define your test matrix

Pick scenarios, profiles, and metrics to include in your evaluation campaign. Each combination becomes a test.

Test matrix

Scenarios

Refund request handling
Appointment booking
Account cancellation

+ 9 more selected

Profiles

Impatient caller
Elderly speaker
Non-native English

Metrics

Tone consistency
Response latency
Knowledge accuracy
Total combinations12 × 5 = 60 tests

Set success criteria

Configure run count and pass thresholds. Define how many runs per test and what SSR score counts as a pass.

Success criteria

Runs per test

3
1·5·10

SSR pass threshold

≥ 70%
70%

Verdict logic

A test passes if ≥ 70% of its runs succeed across all conditions

Review & launch

Confirm your configuration and launch the evaluation. Evalgent handles the rest — running every test and collecting results.

Review & launch

Support agent v2.3 — regression test

Full evaluation across all support scenarios

12

Scenarios

5

Profiles

3

Metrics

Runs per test3
Pass threshold≥ 70% SSR
Total runs180
Run evaluation →

See exactly where your agent stands

Results matrix

See pass/fail rates across every scenario × profile combination

Evidences

Turn-level proof for every success condition — from transcripts, recordings, and scored outcomes

Recommendations

Beta

Receive targeted suggestions to improve agent performance based on evaluation results

The difference structured evaluations make

Without structured evaluations

Manual testing today

  • Manual QA on a handful of calls
  • No consistency across test conditions
  • No way to compare versions objectively
  • Results live in spreadsheets or Slack threads
With Evalgent evaluations

Structured & automated

  • Every scenario × profile combination tested automatically
  • Consistent caller simulation with realistic conditions
  • Version-over-version comparison with the same test matrix
  • Results in one place — verdicts, scores, transcripts, audio

Know if your voice agent is ready for production