Getting started with torchtext and PyText

This release of WML CE includes Technology Previews of torchtext and PyText.

Getting started with torchtext

Torchtext is a companion package to PyTorch consisting of data processing utilities and popular datasets for natural language.

WML CE support for torchtext is included as a separate package..

Note: PyTorch is installed as a requisite to torchtext.

Install torchtext

Follow these steps to install torchtext.

  1. Create a virtual conda environment with python=3.6
    conda create -y -n my-py3-env python=3.6
    ...
    
  2. Activate the environment
    source activate my-py3-env
    
    (my-py3-env)$
    ...
    
  3. Install torchtext into the virtual environment
    (my-py3-env)$ conda install torchtext
    ...
    

Validate the torchtext installation

A quick set of tests to verify the installation can be executed using the command below.

(my-py3-env) $ torchtext-test

If you prefer, you can run a more extensive test suite by adding --runslow to the torchtext-test command. Executing the extended tests will require approximately 5GB of free disk space and the installation of additional support packages.

  1. Install the following optional token parsing packages:
    nltk
    (my-py3-env) $ conda install nltk
    
    revtok
    (my-py3-env) $ pip install revtok
    
    sacremoses
    (my-py3-env) $ pip install sacremoses
    
    spacy
    (my-py3-env) $ conda install spacy
    
  2. Install nltk and spacy English language support
    (my-py3-env)$ python -m spacy download en
    (my-py3-env)$ python -m nltk.downloader perluniprops nonbreaking_prefixes
    
  3. Execute the extended tests
    (my-py3-env)$ torchtext-test --runslow
    

Torchtext examples

Example usage patterns can be found in the torchtext documentation:

https://torchtext.readthedocs.io/en/latest/examples.html

In addition to these code samples, the PyTorch team has provided the PyTorch/torchtext SNLI example to help describe how to use the torchtext package. The example code illustrates how to download the SNLI data set and preprocess the data before feeding it to a model. The example is included in the PyTorch package.

To view an online version of the source code for this example see:

https://github.com/pytorch/examples/tree/master/snli

Running the PyTorch/torchtext SNLI example:

Running the example code requires the installation of the PyTorch samples and examples as well as the SpaCy package. For more information, see https://spacy.io/.

  1. Install the example code using the pytorch-install-samples tool (note pytorch, rather than torchtext):
    (my-py3-env) $ pytorch-install-samples ~/pytorch-samples
    
  2. Install SpaCy into the virtual environment
    (my-py3-env)$ conda install spacy
    
  3. Install the SpaCy english language model
    (my-py3-env)$ python -m spacy download en
  4. Run the example

    For a simple execution:

    (my-py3-env) $ cd pytorch-samples
    (my-py3-env) $ python examples/snli/train.py --epochs 1

    To see all available options for the example:

    (my-py3-env) $ python examples/snli/train.py --help

More information about torchtext

Project documentation for torchtext: https://torchtext.readthedocs.io/en/latest/index.html

Source code for the torchtext project: https://github.com/pytorch/text

Community resources

The PyTorch Sentiment Analysis github repo contains several tutorials designed to illustrate how to:

  • Create train/test and validation splits
  • Build a vocabulary
  • Create data iterators
  • Define a model and implement the train/evaluate/test loop

Getting started with PyText

PyText is a deep-learning based NLP modeling framework built on PyTorch and torchtext.

WML CE support for PyText is included as a separate package and can be installed and set up as shown below.

Note:
  • PyTorch and torchtext are installed as requisites to PyText.
  • PyText supports Python v3.6 only.

Install PyText

Follow these steps to install PyText.

  1. Create a virtual conda environment with python=3.6
    conda create -y -n my-py3-env python=3.6
    ...
    
  2. Activate the environment
    source activate my-py3-env
    
    (my-py3-env)$
    ...
    
  3. Install PyText into the virtual environment
    (my-py3-env)$ conda install pytext-nlp
    ...
    

Validate the PyText installation

To validate the installation, run the PyText self tests.
(my-py3-env) $ pytext-test

PyText examples

To use the examples provided, follow these steps:

  1. Install the examples code using the pytext-install-samples tool:
    (my-py3-env) $ pytext-install-samples ~/pytext-samples
  2. Run the example:
    • Train your first model:
      (my-py3-env) $ cd ~/pytext-samples
      (my-py3-env) $ pytext train < demo/configs/docnn.json
    • Evaluate the model:
      (my-py3-env) $ pytext test < demo/configs/docnn.json
    • Export the model:
      (my-py3-env) $ pytext export --output-path exported_model.c2 < demo/configs/docnn.json
      

Details on executing advanced models with PyText are available in PyText documentation: https://pytext.readthedocs.io/en/master/atis_tutorial.html

More information about PyText

Project documentation for PyText: https://pytext.readthedocs.io/en/master/

Source code for the PyText project: https://github.com/facebookresearch/pytext

Note about locales

Some of the examples and features of torchtext and PyText may require that the locale be set appropriately. If the locale is unset, you might see various errors, such as:

RuntimeError: Click will abort further execution because Python 3 was
configured to use ASCII as encoding for the environment.

or

UnicodeEncodeError: 'ascii' codec can't encode character '\x..' in
position .....: ordinal not in range(128)

You can set the locale to an appropriate value using the LANG environment variable. The value that you choose must be supported by the OS and may depend on the language and encoding of the text being processed.

You can see the installed locales using locale -a or (on RHEL) localectl list-locales. If the locale that you want is not listed by those commands, you may need to install it. On Ubuntu, you can install the locales-all package. On RHEL, you may need to reinstall glibc-common (after ensuring override_install_langs is not set in /etc/yum.conf).

Set the locale by exporting the LANG environment variable. For example, to set US English with UTF-8 encoding:

export LANG=en_US.utf8