Speech Play and Collect

This block is only available if IVR is enabled.

This block enables the playing of audio or Text-to-Speech (TTS) and collection of human speech or touchtone digits.

The block also enables caller barge-in, which occurs when a caller speaks during playback. See Stop Prompt On property below.

Note

The times used for speech recognition are dependent on each other and used in the following way:

  • Max Time = total time required for the speech recognition process to complete
  • Max Silence = maximum time waited for caller's responses, both for starting to speak and to determine end of speaking

To enable successful speech recognition process to complete, the Max Time must exceed the maximum expected speech input time of the caller plus two times the Max Silence time.

For example, a scenario with a maximum expected speech input time of 5 seconds to input a single number or name, and a maximum silence time of 2 seconds to wait for start and completion of input, requires the following settings, in seconds:

  • Max Time = 10, which exceeds the minimum of 9, calculated as 5 second input plus 2 x silence time of 2 seconds
  • Max Silence = 2

Properties

Values

Description

Text or Audio File

String

Enter a .wav file, plain text, or Speech Synthesis Markup Language (SSML) string.

For a .wav file, do one of the following:

  • Enter a complete URL address to the file you wish to play.
  • If your system administrator created a shortcut to the .wav files, enter /[file_name.wav].

Stop Prompt On

Start of Input,
Recognition Complete, or
Don't Stop

This property determines if caller barge-in is enabled or disabled:

  • Start of Input — the prompt is stopped as soon as the user starts to speak or presses a key on the phone keypad.
  • Recognition Complete — the prompt is stopped only after recognition is achieved according to the configured grammar. This is also known as Hot Word mode.
  • Don't Stop — no barge in during prompt.

Grammar

String

The grammar determines what to recognize in a caller's speech when barge-in is enabled, and a barge-in occurs.

Specify your grammar(s) in one of the following ways:

  • Use a built-in grammar if a built-in grammar is installed in your system. For example, if you have built-in grammar boolean, enter built-in:grammar/boolean. The syntax is built-in:grammar/[your_grammar].
  • Enter complete URL(s) to your grammar(s).
  • Type a list of comma-delimited words instead of a grammar.

If more than one grammar is used, separate the grammars with a semicolon (;). You can list up to ten grammars separated by ; or list the Content-ID.

The speech is compared to the grammar. If recognition is successful, playback is stopped, the result variable is assigned, and the next block is executed.

Confidence Level (optional)

Percent

This is a confidence threshold parameter. The speech recognizer computes a confidence level. If the level is below the threshold you set, the recognizer returns no-match as the recognition result. If the level is at or above the threshold, then recognition is successful. The minimum is 50 and the maximum is 100. The default is 50.

Speech Rate

Slow, Medium, or Fast

The rate for reading a Text to Speech (TTS) message.

Volume

Soft, Medium, or Loud

The volume level for reading a TTS message.

Termination Digits

0-9 # and *

Enter digit(s), # or * . These value(s) can be used by the caller to indicate the end of their response.

When a termination digit is pressed, playback stops, and the script executes the next block.

Max Time

Seconds

Maximum permitted duration of the caller’s spoken response, starting when the caller first begins speaking. Generally, there is no reason to change this from the default value.  Maximum permitted setting is 30 seconds. The default is 20.

This property applies only to Automatic Speech Recognition (ASR).

Max Digits

Number

The maximum number of caller response DTMF digits. When the limit is reached, the script proceeds to the next block.

Note

Typically 3 is too low a number to work reliably and this value is set to 10 or 3 x number of digits to be entered.

Max Silence

Seconds

The maximum length of time to wait for the caller's next response. The default is 5.

Clear Digits

Yes or No

Clear the digit buffer of any digits entered before playing the audio file. The default is No.

Result Variable

String

Name of a variable into which the digit string is to be passed. Must start with an alphabetic character and not exceed 255 characters in length.

There is no default value for this, so leaving this field blank causes this block not to function properly.

See How to use variables.

Line

Inbound

The next block in the script is applied to the original (incoming) call.

Outbound

The next block in the script is applied to the outbound call (to where the call is transferred).

Input Mode

Speech and DTMF,
Speech only, or
DTMF only

If you select Speech and DTMF, and if the caller presses a key (DTMF), the system ignores anything the caller says after the key press.

If you select Speech and DTMF, and if the caller speaks, the system ignores any key press after anything the caller says. The default is Speech and DTMF.

Comment

Text

Optional description of this block in your script.

Configure the Speech Play and Collect block

Caution

Do not rename the Success branch. Otherwise, when the external speech server returns success, CCaaS routes the script to the else case.

  1. Right-click the block and click Add case. The Speech Play/Collect dialog appears.

  1. From the SpeechPlayCollect list, select one of the following:
  1. NoMatch — recognition did not succeed or caller pressed invalid DTMF.
  2. NoInput — the caller did not respond.
  3. Error — technical issues prevented speech recognition.
  1. Click OK. The Case branch appears.
  2. Connect the Case branch to the appropriate block in the script.
  3. Repeat steps 1-4 to add the branches you need.
  4. Connect the Success branch to the appropriate block in the script.
  5. Connect the Else branch to the appropriate block in the script.