Why This Changes Everything
Traditional hotel quality assurance works like this: a manager randomly selects 10-20 conversations per week, reads through them, and tries to spot problems. At best, you review 2-3% of all conversations. The rest? You hope they went well. With RecepAI’s Chat Quality Criteria, your receptionist evaluates 100% of conversations — automatically, after every single chat. No sampling, no guessing, no conversations falling through the cracks.The real power isn’t just grading conversations. It’s the “Needs Training” insight — when your receptionist couldn’t answer something because the information wasn’t in your Training Materials, the system tells you exactly what topic was missing. One document upload can fix dozens of future conversations at once.
How It Works
You define success criteria
On your Conversation Agent page, you create criteria that describe what a successful conversation looks like for your hotel.
Every conversation gets evaluated
After each chat ends, your receptionist’s evaluation engine analyzes the full conversation against every criterion you’ve defined. This happens automatically — no action needed from you.
Results appear in History
Open the History page, Chat tab. Each conversation now shows an evaluation status — you can immediately see what went well and what needs attention.
How fast does this happen? Evaluations run shortly after each conversation ends. By the time you check the History page, results are typically already there.
Setting Up Chat Quality Criteria
Go to Conversation Agent and scroll to the “Chat Quality Criteria” section.Step 1: Click “Add Criteria”
Click the “Add Criteria” button to open the criteria editor.Step 2: Write a Clear Name
The name should immediately tell you what this criterion measures. Keep it specific and action-oriented.Step 3: Write the Success Description
This is the most important part. The success description tells the evaluation engine exactly what to look for in the conversation. The more specific you are, the more accurate the evaluation.Example Criteria (5 Tested Templates)
These are field-tested criteria that work well for most hotels. Use them as starting points and customize the descriptions to match your hotel’s specific policies.1. Guest Question Answered
1. Guest Question Answered
Name: Guest Question AnsweredSuccess Description: The guest’s primary question received a specific, accurate answer based on the hotel’s Training Materials. The receptionist did not say “I don’t know” without offering an alternative, and did not redirect the guest to “call reception” without providing a specific phone number or email.Why this works: This is your baseline criterion — it catches the most common failure: the guest asking something and not getting a useful answer. The description is specific about what “answered” means (not just acknowledging the question, but providing actual information).
2. Contact Information Provided
2. Contact Information Provided
Name: Contact Information ProvidedSuccess Description: When the conversation involved a request the receptionist couldn’t fulfill directly (booking, special request, complaint), the receptionist provided a specific contact method — a phone number, email address, or booking link. Generic phrases like “contact the hotel” or “reach out to us” without specific details count as failure.Why this works: One of the most frustrating guest experiences is being told to “contact reception” without a number. This criterion ensures every dead-end has a clear next step.
3. Reservation Inquiry Handled
3. Reservation Inquiry Handled
Name: Reservation Inquiry HandledSuccess Description: If the guest asked about making, modifying, or canceling a reservation, the receptionist provided relevant information (availability guidance, rate context, or booking instructions) and offered a direct way to proceed — such as a reservation email, phone number, or booking link.Why this works: Reservation inquiries are high-value conversations. This criterion ensures your receptionist converts these opportunities instead of losing them.
4. Appropriate Language Used
4. Appropriate Language Used
Name: Appropriate Language UsedSuccess Description: The receptionist responded in the same language the guest used, or in a language the guest explicitly requested. Responses were professional, warm, and free of overly technical jargon. The tone matched a professional hotel receptionist — not overly casual, not stiff.Why this works: Multi-language hotels need to verify their receptionist is responding in the correct language. This also catches tone issues that might not be obvious from a quick glance.
5. Follow-Up Offered
5. Follow-Up Offered
Name: Follow-Up OfferedSuccess Description: At the end of the conversation, the receptionist asked if there was anything else the guest needed help with, or proactively suggested a related service or piece of information that might be useful based on the context of the conversation.Why this works: A great receptionist doesn’t just answer the question — they anticipate what the guest might need next. This criterion measures that proactive hospitality touch.
Understanding Evaluation Results
After your receptionist evaluates a conversation, it assigns one of these statuses. You can see them on the History page under the Chat tab.Status Overview
| Status | What It Means | What to Do |
|---|---|---|
| Successful | All applicable criteria were met | Nothing — your receptionist handled this well |
| Needs Training | Failed because specific knowledge was missing | Add the missing topics to your Training Materials |
| Failed | Criteria not met for other reasons | Review the conversation — might need a prompt adjustment |
| Partial | Some criteria passed, some didn’t | Check which ones failed and address those specifically |
| Pending | Not yet evaluated | Results will appear shortly |
| Skipped | Conversation was too short to evaluate meaningfully | Normal for very brief exchanges (single question) |
| Trained | You reviewed the failure and added the missing knowledge | No further action — this conversation has been addressed |
The Key Distinction: “Needs Training” vs “Failed”
This is the most important thing to understand about the evaluation system:Needs Training
Your receptionist tried to answer but didn’t have the information. The system identified specific missing topics.Action: Go to Training Materials and upload documents covering the missing topics. This is the highest-impact improvement you can make.Example: Guest asked about spa prices → Receptionist couldn’t answer → Missing topic: “spa treatment prices” → Upload your spa menu.
Failed (Other Issues)
The criteria weren’t met, but not because of missing knowledge. It could be a tone issue, missing follow-up, or the receptionist not following your prompt instructions.Action: Review the conversation transcript and consider adjusting your chat prompt.Example: Guest asked about check-in time → Receptionist answered correctly but didn’t offer early check-in option → Prompt adjustment needed.
Why this distinction matters from an AI perspective: Most AI evaluation systems just give you a pass/fail. RecepAI goes further by analyzing why something failed. When the failure is due to missing information (a “knowledge gap”), the system identifies the exact topics — turning evaluation into a specific, actionable improvement plan. This is the difference between “your receptionist failed 15 conversations this week” and “your receptionist needs spa pricing, pool schedule, and airport transfer information.”
The Training Workflow
When you see “Needs Training” conversations, here’s the complete workflow:Open History and filter to 'Needs Training'
Go to History, Chat tab, and use the quality filter dropdown to show only “Needs Training” conversations. These are your priority — each one represents a specific knowledge gap.
Click a conversation to see details
The detail panel shows the full conversation, evaluation results for each criterion, and — most importantly — the missing knowledge topics section. This tells you exactly what information was missing.
Add the missing information
Go to Training Materials and upload a document (or update an existing one) that covers the missing topics. For example, if “spa treatment prices” was missing, upload your spa menu.
Writing Better Criteria: The AI Perspective
Your evaluation is only as good as your criteria descriptions. Here’s what actually matters when the evaluation engine reads your criteria:Be Specific About What “Success” Looks Like
The evaluation engine reads the entire conversation transcript and checks it against your description. The more specific your description, the more consistent and accurate the evaluation.Why specificity matters so much: Under the hood, evaluation is a classification task — the engine must decide “did this conversation meet this criterion or not?” When the criterion is vague (“be helpful”), the classification boundary is fuzzy and the engine might classify the same conversation differently each time. When the criterion is specific (“the guest received a factual answer and a follow-up was offered”), the boundary is sharp and evaluations become consistent and reliable. Think of it like giving a hotel inspector a checklist vs. telling them “check if things are good.”
| Approach | Description | Accuracy |
|---|---|---|
| Vague | ”The guest was helped” | Low — almost anything counts as “help” |
| Better | ”The guest’s question was answered” | Medium — but what counts as “answered”? |
| Specific | ”The guest received a specific, factual answer from Training Materials. If the receptionist couldn’t answer, they provided a phone number or email for follow-up.” | High — clear criteria for both success and acceptable failure |
One Criterion, One Thing
Each criterion should measure exactly one aspect of conversation quality. If you combine multiple conditions, a conversation that excels at one but fails at another gets a confusing result.- Too broad: “Guest was greeted properly and their question was answered and follow-up was offered”
- Focused: Three separate criteria — Greeting, Question Answered, Follow-Up Offered
”Not Applicable” Is Smart, Not Lazy
Some criteria simply don’t apply to every conversation. If a guest only asks “What time is checkout?” — the “Reservation Handled” criterion isn’t relevant. The evaluation engine is smart enough to recognize this and marks it as “Not Applicable” rather than forcing a pass or fail judgment. This prevents false failures and keeps your statistics meaningful. Without this option, a hotel with 5 criteria would see artificially inflated failure rates — conversations that were perfectly fine would be marked as “Failed” simply because a criterion wasn’t relevant. The “Not Applicable” result was specifically designed to solve this problem.Dashboard Integration
Your Dashboard shows a training opportunities count — the number of conversations that need your attention this month. This count only includes “Needs Training” conversations (failures with identified knowledge gaps), not all failures. Click this number to jump directly to the filtered History view showing only actionable items.Best Practices
Start with 3-4 criteria, not 10
Start with 3-4 criteria, not 10
Begin with the most important criteria for your hotel. You can always add more later. Too many criteria at the start makes it harder to focus on what matters most.
Review results weekly, not daily
Review results weekly, not daily
Set aside 15-30 minutes once a week to review evaluation results. Daily checking leads to reactive fixes instead of pattern-based improvements. See our Improving Responses guide for a complete weekly routine.
Fix patterns, not individual conversations
Fix patterns, not individual conversations
If you see the same missing topic appearing across multiple conversations, that’s one fix — not multiple. Upload one comprehensive document and it resolves all future occurrences.
Keep criteria descriptions focused
Keep criteria descriptions focused
You have up to 2,000 characters per description, but longer doesn’t mean better. The evaluation engine works best with clear, concise instructions — just like your receptionist’s prompt. A well-written 200-character description often outperforms a rambling 1,500-character one.
Revise criteria that always pass or always fail
Revise criteria that always pass or always fail
A criterion that passes 100% of the time isn’t measuring anything useful. A criterion that fails 100% might be poorly written or measuring something your receptionist can’t control. Both deserve a rewrite.