Chat Quality Evaluation

Why This Changes Everything

Traditional hotel quality assurance works like this: a manager randomly selects 10-20 conversations per week, reads through them, and tries to spot problems. At best, you review 2-3% of all conversations. The rest? You hope they went well. With RecepAI’s Chat Quality Criteria, your receptionist evaluates 100% of conversations — automatically, after every single chat. No sampling, no guessing, no conversations falling through the cracks.

The real power isn’t just grading conversations. It’s the “Needs Training” insight — when your receptionist couldn’t answer something because the information wasn’t in your Training Materials, the system tells you exactly what topic was missing. One document upload can fix dozens of future conversations at once.

This is the “Chat Quality Criteria” section on your Conversation Agent page.

How It Works

You define success criteria

On your Conversation Agent page, you create criteria that describe what a successful conversation looks like for your hotel.

Every conversation gets evaluated

After each chat ends, your receptionist’s evaluation engine analyzes the full conversation against every criterion you’ve defined. This happens automatically — no action needed from you.

Results appear in History

Open the History page, Chat tab. Each conversation now shows an evaluation status — you can immediately see what went well and what needs attention.

Knowledge gaps are identified

When a conversation fails because your receptionist didn’t have the right information, the system identifies the exact missing topics. This turns a vague “something went wrong” into a specific “you need to add spa pricing information.”

How fast does this happen? Evaluations run shortly after each conversation ends. By the time you check the History page, results are typically already there.

Setting Up Chat Quality Criteria

Go to Conversation Agent and scroll to the “Chat Quality Criteria” section.

Step 1: Click “Add Criteria”

Click the “Add Criteria” button to open the criteria editor.

Step 2: Write a Clear Name

The name should immediately tell you what this criterion measures. Keep it specific and action-oriented.

Step 3: Write the Success Description

This is the most important part. The success description tells the evaluation engine exactly what to look for in the conversation. The more specific you are, the more accurate the evaluation.

Think of the success description as instructions for a quality auditor. If you hired someone to check this conversation, what would you tell them to look for? Write that.

Example Criteria (5 Tested Templates)

These are field-tested criteria that work well for most hotels. Use them as starting points and customize the descriptions to match your hotel’s specific policies.

1. Guest Question Answered

Name: Guest Question AnsweredSuccess Description: The guest’s primary question received a specific, accurate answer based on the hotel’s Training Materials. The receptionist did not say “I don’t know” without offering an alternative, and did not redirect the guest to “call reception” without providing a specific phone number or email.Why this works: This is your baseline criterion — it catches the most common failure: the guest asking something and not getting a useful answer. The description is specific about what “answered” means (not just acknowledging the question, but providing actual information).

2. Contact Information Provided

Name: Contact Information ProvidedSuccess Description: When the conversation involved a request the receptionist couldn’t fulfill directly (booking, special request, complaint), the receptionist provided a specific contact method — a phone number, email address, or booking link. Generic phrases like “contact the hotel” or “reach out to us” without specific details count as failure.Why this works: One of the most frustrating guest experiences is being told to “contact reception” without a number. This criterion ensures every dead-end has a clear next step.

3. Reservation Inquiry Handled

Name: Reservation Inquiry HandledSuccess Description: If the guest asked about making, modifying, or canceling a reservation, the receptionist provided relevant information (availability guidance, rate context, or booking instructions) and offered a direct way to proceed — such as a reservation email, phone number, or booking link.Why this works: Reservation inquiries are high-value conversations. This criterion ensures your receptionist converts these opportunities instead of losing them.

4. Appropriate Language Used

Name: Appropriate Language UsedSuccess Description: The receptionist responded in the same language the guest used, or in a language the guest explicitly requested. Responses were professional, warm, and free of overly technical jargon. The tone matched a professional hotel receptionist — not overly casual, not stiff.Why this works: Multi-language hotels need to verify their receptionist is responding in the correct language. This also catches tone issues that might not be obvious from a quick glance.

5. Follow-Up Offered

Name: Follow-Up OfferedSuccess Description: At the end of the conversation, the receptionist asked if there was anything else the guest needed help with, or proactively suggested a related service or piece of information that might be useful based on the context of the conversation.Why this works: A great receptionist doesn’t just answer the question — they anticipate what the guest might need next. This criterion measures that proactive hospitality touch.

Common mistake: Writing vague criteria. A criterion like “Good conversation quality” gives the evaluation engine nothing specific to measure. Be precise — instead of “Handle complaints well,” write “When a guest expresses dissatisfaction, the receptionist acknowledged the issue, apologized, and offered a specific resolution or escalation path.”

Understanding Evaluation Results

After your receptionist evaluates a conversation, it assigns one of these statuses. You can see them on the History page under the Chat tab.

Status Overview

Status	What It Means	What to Do
Successful	All applicable criteria were met	Nothing — your receptionist handled this well
Needs Training	Failed because specific knowledge was missing	Add the missing topics to your Training Materials
Failed	Criteria not met for other reasons	Review the conversation — might need a prompt adjustment
Partial	Some criteria passed, some didn’t	Check which ones failed and address those specifically
Pending	Not yet evaluated	Results will appear shortly
Skipped	Conversation was too short to evaluate meaningfully	Normal for very brief exchanges (single question)
Trained	You reviewed the failure and added the missing knowledge	No further action — this conversation has been addressed

The Key Distinction: “Needs Training” vs “Failed”

This is the most important thing to understand about the evaluation system:

Needs Training

Your receptionist tried to answer but didn’t have the information. The system identified specific missing topics.Action: Go to Training Materials and upload documents covering the missing topics. This is the highest-impact improvement you can make.Example: Guest asked about spa prices → Receptionist couldn’t answer → Missing topic: “spa treatment prices” → Upload your spa menu.

Failed (Other Issues)

The criteria weren’t met, but not because of missing knowledge. It could be a tone issue, missing follow-up, or the receptionist not following your prompt instructions.Action: Review the conversation transcript and consider adjusting your chat prompt.Example: Guest asked about check-in time → Receptionist answered correctly but didn’t offer early check-in option → Prompt adjustment needed.

Why this distinction matters from an AI perspective: Most AI evaluation systems just give you a pass/fail. RecepAI goes further by analyzing why something failed. When the failure is due to missing information (a “knowledge gap”), the system identifies the exact topics — turning evaluation into a specific, actionable improvement plan. This is the difference between “your receptionist failed 15 conversations this week” and “your receptionist needs spa pricing, pool schedule, and airport transfer information.”

The Training Workflow

When you see “Needs Training” conversations, here’s the complete workflow:

Open History and filter to 'Needs Training'

Go to History, Chat tab, and use the quality filter dropdown to show only “Needs Training” conversations. These are your priority — each one represents a specific knowledge gap.

Click a conversation to see details

The detail panel shows the full conversation, evaluation results for each criterion, and — most importantly — the missing knowledge topics section. This tells you exactly what information was missing.

Add the missing information

Go to Training Materials and upload a document (or update an existing one) that covers the missing topics. For example, if “spa treatment prices” was missing, upload your spa menu.

Mark as Trained

Back in History, click “Mark as Trained” on the conversation. This changes its status from “Needs Training” to “Trained” — clearing it from your training queue and updating your dashboard count.

One document can fix many conversations — past and future. If 8 conversations this week failed because of missing spa information, uploading one comprehensive spa document resolves the underlying issue for all of them. But more importantly, it also prevents every future conversation about that topic from failing. This is why knowledge gap detection is so powerful — you’re not just fixing individual conversations, you’re permanently improving your receptionist’s capability. Look for patterns before fixing individual conversations.

Writing Better Criteria: The AI Perspective

Your evaluation is only as good as your criteria descriptions. Here’s what actually matters when the evaluation engine reads your criteria:

Be Specific About What “Success” Looks Like

The evaluation engine reads the entire conversation transcript and checks it against your description. The more specific your description, the more consistent and accurate the evaluation.

Why specificity matters so much: Under the hood, evaluation is a classification task — the engine must decide “did this conversation meet this criterion or not?” When the criterion is vague (“be helpful”), the classification boundary is fuzzy and the engine might classify the same conversation differently each time. When the criterion is specific (“the guest received a factual answer and a follow-up was offered”), the boundary is sharp and evaluations become consistent and reliable. Think of it like giving a hotel inspector a checklist vs. telling them “check if things are good.”

Approach	Description	Accuracy
Vague	”The guest was helped”	Low — almost anything counts as “help”
Better	”The guest’s question was answered”	Medium — but what counts as “answered”?
Specific	”The guest received a specific, factual answer from Training Materials. If the receptionist couldn’t answer, they provided a phone number or email for follow-up.”	High — clear criteria for both success and acceptable failure

One Criterion, One Thing

Each criterion should measure exactly one aspect of conversation quality. If you combine multiple conditions, a conversation that excels at one but fails at another gets a confusing result.

Too broad: “Guest was greeted properly and their question was answered and follow-up was offered”
Focused: Three separate criteria — Greeting, Question Answered, Follow-Up Offered

”Not Applicable” Is Smart, Not Lazy

Some criteria simply don’t apply to every conversation. If a guest only asks “What time is checkout?” — the “Reservation Handled” criterion isn’t relevant. The evaluation engine is smart enough to recognize this and marks it as “Not Applicable” rather than forcing a pass or fail judgment. This prevents false failures and keeps your statistics meaningful. Without this option, a hotel with 5 criteria would see artificially inflated failure rates — conversations that were perfectly fine would be marked as “Failed” simply because a criterion wasn’t relevant. The “Not Applicable” result was specifically designed to solve this problem.

Dashboard Integration

Your Dashboard shows a training opportunities count — the number of conversations that need your attention this month. This count only includes “Needs Training” conversations (failures with identified knowledge gaps), not all failures. Click this number to jump directly to the filtered History view showing only actionable items.

Best Practices

Start with 3-4 criteria, not 10

Begin with the most important criteria for your hotel. You can always add more later. Too many criteria at the start makes it harder to focus on what matters most.

Review results weekly, not daily

Set aside 15-30 minutes once a week to review evaluation results. Daily checking leads to reactive fixes instead of pattern-based improvements. See our Improving Responses guide for a complete weekly routine.

Fix patterns, not individual conversations

If you see the same missing topic appearing across multiple conversations, that’s one fix — not multiple. Upload one comprehensive document and it resolves all future occurrences.

Keep criteria descriptions focused

You have up to 2,000 characters per description, but longer doesn’t mean better. The evaluation engine works best with clear, concise instructions — just like your receptionist’s prompt. A well-written 200-character description often outperforms a rambling 1,500-character one.

Revise criteria that always pass or always fail

A criterion that passes 100% of the time isn’t measuring anything useful. A criterion that fails 100% might be poorly written or measuring something your receptionist can’t control. Both deserve a rewrite.

Getting Started

Training Materials

Personality

Quality & Evaluation

Chat Quality Evaluation

Why This Changes Everything

How It Works

Setting Up Chat Quality Criteria

Step 1: Click “Add Criteria”

Step 2: Write a Clear Name

Step 3: Write the Success Description

Example Criteria (5 Tested Templates)

Understanding Evaluation Results

Status Overview

The Key Distinction: “Needs Training” vs “Failed”

Needs Training

Failed (Other Issues)

The Training Workflow

Writing Better Criteria: The AI Perspective

Be Specific About What “Success” Looks Like

One Criterion, One Thing

”Not Applicable” Is Smart, Not Lazy

Dashboard Integration

Best Practices

Getting Started

Training Materials

Personality

Quality & Evaluation

​Why This Changes Everything

​How It Works

​Setting Up Chat Quality Criteria

​Step 1: Click “Add Criteria”

​Step 2: Write a Clear Name

​Step 3: Write the Success Description

​Example Criteria (5 Tested Templates)

​Understanding Evaluation Results

​Status Overview

​The Key Distinction: “Needs Training” vs “Failed”

Needs Training

Failed (Other Issues)

​The Training Workflow

​Writing Better Criteria: The AI Perspective

​Be Specific About What “Success” Looks Like

​One Criterion, One Thing

​”Not Applicable” Is Smart, Not Lazy

​Dashboard Integration

​Best Practices

Why This Changes Everything

How It Works

Setting Up Chat Quality Criteria

Step 1: Click “Add Criteria”

Step 2: Write a Clear Name

Step 3: Write the Success Description

Example Criteria (5 Tested Templates)

Understanding Evaluation Results

Status Overview

The Key Distinction: “Needs Training” vs “Failed”

The Training Workflow

Writing Better Criteria: The AI Perspective

Be Specific About What “Success” Looks Like

One Criterion, One Thing

”Not Applicable” Is Smart, Not Lazy

Dashboard Integration

Best Practices