Sidekick Voice General FAQs

Prev Next

This page contains FAQs regarding the Sidekick Voice feature.


FAQ

Why should I use Sidekick Voice?

Sidekick Voice brings the same intelligence you’ve experienced in Chat and SMS — now to your phone calls. With Sidekick Voice, conversations feel more natural, personalized, and helpful from start to finish.

What is Sidekick Voice?

A Customer who calls is immediately connected to Sidekick Voice through your Gladly IVR. This powerful AI assistant understands what Customers say in real time and responds just like a human would. It listens to their words, figures out what they need, and either helps directly or hands off to a human Agent when required.

How it works

  • Speech-to-Text (STT) converts what’s said into text so Sidekick can understand and take action.

  • Text-to-Speech (TTS) turns Sidekick’s response back into natural-sounding speech that Customers hear on the call.

This allows for real-time, two-way conversations that feel smooth and human.

Why It Matters

Customers’ experience on a phone call should be effortless, even when things get a little noisy or complicated. Sidekick Voice is designed to handle the real-world messiness of voice conversations by:

  • Telling the difference between voice and background noise

  • Capturing what’s said from start to finish

  • Recognizing pauses and knowing when a Customer is done talking

  • Hearing key info like order numbers or email addresses

  • Understanding a Customer, even if they speak casually or change their mind mid-sentence, or say a lot of filler words like “uhms” and “ahs”

What Makes Sidekick Voice Different?

Always Listening

Sidekick is tuned in from start to finish—not just at certain “listening” moments like traditional phone menus. That means if something unexpected is said, Sidekick won’t get stuck. It adjusts, understands, and keeps the conversation going.

Ready for Anything

Customers are not limited to a set list of words. Whether they say “yes,” “sure,” or “actually, can we change that?” — Sidekick gets it. Customers don’t have to speak like a robot to be understood.

Built for the Future

A modular system powers Sidekick Voice. It's flexible, scalable, and constantly improving to meet your needs.

The Same Brain, Now With a Voice

If Sidekick could already “type” messages to you in Chat and SMS, now it can “talk” on the phone. It’s the same brain behind the scenes — using the same tools, Guides, and logic, but now speaking through the Gladly Voice Channel.

Once Sidekick knows what to say, Gladly uses best-in-class text-to-speech (TTS) technology to deliver that response clearly and naturally over the phone.

Contact Gladly Support or your CSM to get started.

the art of scenting whiffing

Can I change the voice used by Sidekick Voice?

“Jessica” is currently the only voice support tested for pronunciation, understanding, and error rates, and Gladly has vetted its performance and quality across various question-answering and action-taking scenarios.

Jessica - Sidekick Voice
571.02 KB

Does Sidekick Voice support multiple languages aside from English?

Not yet, but we are planning on introducing a multilingual Sidekick soon who speaks many languages and understands switching languages mid-conversation.

Can Sidekick Voice handle multi-turn conversations?

Multi-turn conversation is when Sidekick listens until the Customer’s speech is complete, and then processes that new utterance with the context of the earlier conversation, and gives a new response.

In simpler terms, it's a back-and-forth dialogue in which the system remembers what was said before, uses that context to understand the current statement, and responds appropriately.

How does Sidekick Voice handle background noise?

It uses the best-in-class Voice Activity Detection (VAD) available. These models are purpose-built to detect separate sound sources, e.g., human speech, vs non-speech sounds (coughing, breathing), vs background noise (stepping, driving sounds, wind, static), and to filter out non-speech so that the speech sound is detected and transcribed as well as possible.

How does Sidekick Voice handle interruptions if the Customer speaks simultaneously as Sidekick?

If new speech is detected, Sidekick stops speaking. It listens until the Customer’s speech is complete, then processes that new utterance with the context of the earlier conversation, and gives a new response.

In the Conversation Timeline, the transcription will show that the Sidekick message was interrupted. The recording is also available if it is activated.

Can Sidekick Voice understand various accents and various tonal nuances?

Sidekick Voice uses best-in-class transcription models available. These models are trained on a diverse range of accents with techniques like:

  • Acoustic modeling – Teaching the model what an audio signal, phoneme (distinct sounds in language), word, phrase, etc. sound like across accents.

  • Language modeling – Understanding the language, so that the transcription is corrected even if the speech is off. For example,  the model understands that after “Welcome”, “dear guest” is more likely to follow even if the speaker’s audio signal sounds like “deer guess.”