To estimate the application load on the system, you need to know the following:
Your WebSphere Voice Server system should be able to handle the maximum demand for speech resources. That is, the resources needed at the peak calling hour rather than a day's average number of hourly calls. The primary speech resource is the ASR or TTS engine. The demand for engines is influenced by both the frequency of calls and how they are distributed. If all the incoming calls use the same application and start at the same time, each call will need an engine at the same time so the demand will be high. If, on the other hand, calls are distributed normally, the number of engines needed simultaneously can be considerably smaller.
For your applications, you must determine the acceptable performance or desirability of an engine being available for a call without a significant delay. Delays can cause performance degradation, such as not recognizing speech input or stuttering output. If a degradation of performance is acceptable during peak utilization, fewer engines will be required.
The number of concurrent ASR and TTS sessions, in turn, determines the number of processors required and how powerful they must be. Similarly, these two variables – number and speed of the processors – dictate the number and size of the machines needed for your WebSphere Voice Server installation.
For example, a non-barge-in application using long prompts of synthesized text together with a simple grammar is likely to be actively engaged in recognition for only a short proportion of the length of a call. It will have a short active duty cycle for ASR but a long active duty cycle for speech synthesis.
A barge-in application using shorter prompts of synthesized text together with a complex grammar is likely to spend more time actively engaged in recognition. In this case, the active duty cycle for ASR will be longer and the active duty cycles for speech synthesis shorter. If a system is underspecified, an engine might not be available at the start of a call.
Some applications can be designed so that most callers are likely to complete their business and hang up quickly, thus freeing resources for another call. For more information about application design, refer to the WebSphere Voice Toolkit online help.
Once you know the maximum number of concurrent sessions required for ASR and TTS, the recognition languages and voices to be used, and the complexity of your voice recognition applications, you can determine how many WebSphere Voice Server machines are necessary. You can also determine the minimum specifications for the machines, which are also dependent on the operating system you select.
A minimum base memory of 3 GB is required on each WebSphere Voice Server recognition server machine. You can minimize the number of machines required by installing multiple high-speed processors and additional memory in each machine.
The actual number of WebSphere Voice Server engines that will run on each machine is solution-dependent. A solution must be tested to verify that a system can handle a condition where all of the WebSphere Voice Server engines are fully utilized. It also is important to ensure that the qualified compatible gateway system can support all of the attached server machines and engines.
The minimum configuration detailed in Identifying hardware and software requirements can support a single ASR and TTS engine in a development environment. For a production environment, make sure you analyze your hardware capabilities and requirements and make adjustments as necessary.