This feature is currently in an Early Access phase
If you're interested in learning more, contact Gladly Support.

When using the Simulator, the quality of your success criteria directly determines the quality of your test results. Well-written criteria produce reliable, consistent verdicts. Poorly written criteria lead to confusing results that are difficult to act on.

Perspectives in a scenario

Each scenario is written from three distinct viewpoints. Mixing them up is the most common source of confusing test results.

Customer perspective

Used for: Customer goal, Initial message, Additional details

Write these fields as the Customer. Use first person and only include information a real Customer would know.

Example: "I want to return a sweater I bought last week"

Judge perspective

Used for: Success criteria

Write success criteria as an impartial observer watching the Conversation. Use third person and focus on what's observable in the transcript.

Example: "The agent confirmed the return was initiated"

System perspective

Used for: Customer data available to Gladly

Write these as a developer defining the state of external systems. Be specific and precise.

Example: "Order #55891 exists with status 'delivered'. Item is a sweater, price $89.99, purchased 3 days ago. Return window is 30 days."

Field	Perspective	Voice	Example
Customer goal	Customer	First person	"I want to check on my order"
Initial message	Customer	First person	"Where's my order?"
Additional details	Customer	First person	"My order number is #12345"
Success criteria	Judge	Third person	"The agent provided a shipping status"
Customer data available to Gladly	System	Technical	"Order #12345 exists, status: shipped"

Principles for good criteria

Keep criteria focused

Each criteria should check a single, specific behavior. If you combine multiple checks into one criterion, it's harder to tell what went wrong when a test doesn't pass.

Avoid	Preferred
"The agent asked for the order number and then looked it up"	"The agent asked for the order number" / "The agent provided information about the order"

Make criteria based on what’s visible within the Conversation

Criteria should only check things that are visible in the Conversation transcript. The evaluator can only assess what was actually said, it can't verify anything happening behind the scenes, like system states or backend data.

Avoid	Preferred
"The agent felt empathetic"	"The agent acknowledged the customer's frustration"
"The agent correctly processed the return"	"The agent confirmed the return was initiated and gave a return number"

Be specific

Write criteria in concrete, literal terms. The evaluator follows them exactly as written, so vague or open-ended language will lead to inconsistent results.

Avoid	Preferred
"The agent handled the situation well"	"The agent provided a tracking number"
"The conversation went smoothly"	"The agent offered to refund the customer"

Don't use success criteria to test for a handoff
Select the Expect handoff checkbox to test explicitly for a successful handoff to a human Agent.

A simple test for good criteria

Ask yourself: Could two people independently read the conversation transcript and agree on whether this criterion passed? If yes, it's a good criterion. If reasonable people might disagree, the criterion is too vague or too subjective.

Example: complete scenario with good criteria

Title: Return request — damaged item

Customer goal: "I want to return a sweater that arrived damaged"

Initial message: "Hi, I received a damaged item and I'd like to return it"

Additional details: "My order number is #55891. The sweater has a large stain on the front. I'd like a full refund."

Customer data available to Gladly: "Order #55891 exists with status 'delivered'. Item is a wool sweater, price $89.99, delivered 3 days ago. Return window is 30 days. Item is not final sale."

Success criteria:

"The agent asked the customer for their order number"
"The agent acknowledged the damage issue"
"The agent confirmed a return was initiated or provided return instructions"
"The agent mentioned the refund amount or refund process"

Each criterion is atomic (tests one thing), observable (verifiable from the transcript), written from the judge's perspective (third person), and specific (no vague language).

Documentation Index