We're updating help docs to reflect our new product naming. Gladly Sidekick (AI) is now called just Gladly, and Gladly Hero (the Platform) is now Gladly Team. Some articles may display outdated names while we update everything. Thank you for your patience! Learn more

Gladly Voice AI General FAQs

Prev Next

This page contains FAQs regarding the Voice AI feature.


FAQ

Why should I use Voice AI?

Voice AI brings the same intelligence you’ve experienced in Chat and SMS — now to your phone calls. With Voice AI, conversations feel more natural, personalized, and helpful from start to finish.

What is Voice AI?

A Customer who calls is immediately connected to Voice AI through your Gladly IVR. This powerful AI assistant understands what Customers say in real time and responds just like a human would. It listens to their words, figures out what they need, and either helps directly or hands off to a human Agent when required.

How it works

  • Speech-to-Text (STT) converts what’s said into text so Gladly can understand and take action.

  • Text-to-Speech (TTS) turns Gladly’s response back into natural-sounding speech that Customers hear on the call.

This allows for real-time, two-way conversations that feel smooth and human.

Why It Matters

Customers’ experience on a phone call should be effortless, even when things get a little noisy or complicated. Voice AI is designed to handle the real-world messiness of voice conversations by:

  • Telling the difference between voice and background noise

  • Capturing what’s said from start to finish

  • Recognizing pauses and knowing when a Customer is done talking

  • Hearing key info like order numbers or email addresses

  • Understanding a Customer, even if they speak casually or change their mind mid-sentence, or say a lot of filler words like “uhms” and “ahs”

What Makes Voice AI Different?

Always Listening

Gladly is tuned in from start to finish—not just at certain “listening” moments like traditional phone menus. That means if something unexpected is said, Gladly won’t get stuck. It adjusts, understands, and keeps the conversation going.

Ready for Anything

Customers are not limited to a set list of words. Whether they say “yes,” “sure,” or “actually, can we change that?” — Gladly gets it. Customers don’t have to speak like a robot to be understood.

Built for the Future

A modular system powers Voice AI. It's flexible, scalable, and constantly improving to meet your needs.

The Same Brain, Now With a Voice

If Gladly could already “type” messages to you in Chat and SMS, now it can “talk” on the phone. It’s the same brain behind the scenes — using the same tools, Guides, and logic, but now speaking through the Gladly Voice Channel.

Once Gladly knows what to say, Gladly uses best-in-class text-to-speech (TTS) technology to deliver that response clearly and naturally over the phone.

Contact Gladly Support or your CSM to get started.

Can I change the voice used by Voice AI?

“Jessica” is currently the only voice support tested for pronunciation, understanding, and error rates, and Gladly has vetted its performance and quality across various question-answering and action-taking scenarios.

Jessica - Gladly AI on Voice
571.02 KB

Does Voice AI support multiple languages aside from English?

Not yet, but we are planning on introducing a multilingual Gladly soon who speaks many languages and understands switching languages mid-conversation.

Can Voice AI handle multi-turn conversations?

Multi-turn conversation is when Gladly listens until the Customer’s speech is complete, and then processes that new utterance with the context of the earlier conversation, and gives a new response.

In simpler terms, it's a back-and-forth dialogue in which the system remembers what was said before, uses that context to understand the current statement, and responds appropriately.

How does Voice AI handle background noise?

It uses the best-in-class Voice Activity Detection (VAD) available. These models are purpose-built to detect separate sound sources, e.g., human speech, vs non-speech sounds (coughing, breathing), vs background noise (stepping, driving sounds, wind, static), and to filter out non-speech so that the speech sound is detected and transcribed as well as possible.

How does Voice AI handle interruptions if the Customer speaks simultaneously as Gladly?

If new speech is detected, Gladly stops speaking. It listens until the Customer’s speech is complete, then processes that new utterance with the context of the earlier conversation, and gives a new response.

In the Conversation Timeline, the transcription will show that the Gladly message was interrupted. The recording is also available if it is activated.

Can Voice AI understand various accents and various tonal nuances?

Voice AI uses best-in-class transcription models available. These models are trained on a diverse range of accents with techniques like:

  • Acoustic modeling – Teaching the model what an audio signal, phoneme (distinct sounds in language), word, phrase, etc. sound like across accents.

  • Language modeling – Understanding the language, so that the transcription is corrected even if the speech is off. For example,  the model understands that after “Welcome”, “dear guest” is more likely to follow even if the speaker’s audio signal sounds like “deer guess.”