Move toward open source standards in speech processing

Convert flat lexicon files to XML with Python

Colin Beckingham

Date archived: December 6, 2016 | First published: August 07, 2012

Many open source projects began before the advent of free and open source software (FOSS) standards, so their configuration and resource files are simple flat text files. By converting these files to the relevant open source standard, you potentially increase cross-project compatibility, flexibility, and reliability. The lexicon in voice recognition work is a good example. In this article, learn to use Python to convert existing flat lexicons to the XML format defined in the Pronunciation Lexicon Specification (PLS) and how to convert the new PLS file back to a flat file. Explore how to use the XML format to add extra information and rigor to the maintenance of lexicons. Issues such as Unicode, and merging the new lexicon with other XML files while still using data in audio model generation, are also addressed.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.

