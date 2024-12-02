The first electric speech synthesizers popped up around the 1930s1. The early machines were limited and were complicated to operate.

As computers came along, programmers starting in the late 1950s worked on algorithms that might access a large database of audio files as its source sounds. These algorithms might find sound matches for units of texts and piece together speech elements. Early on, the generated voice sounded robotic. As modeling work characterized language better, the algorithms for turning text to speech improved.

When deep learning techniques and neural networks emerged in the 2000s, programmers started modeling waveforms directly with recordings of speech, which lead to high-quality voices that sounded more realistic. In parallel, computer scientists were refining speech recognition software and natural language processing. The development of conversational AI hinged on combining speech to text with text to speech technology.

Although AI and machine learning made it easier to generate natural-sounding speech, they opened new areas of controversy, such as deepfakes. Technology companies are working on developing real-time voice analysis systems in order to detect audio deepfakes.