Getting started with torchtext and PyText
This release of WML CE includes Technology Previews of torchtext and PyText.
Getting started with torchtext
Torchtext is a companion package to PyTorch consisting of data processing utilities and popular datasets for natural language.
WML CE support for torchtext is included as a separate package..
Install torchtext
Follow these steps to install torchtext.
- Create a virtual conda environment with python=3.6
conda create -y -n my-py3-env python=3.6 ...
- Activate the
environment
source activate my-py3-env (my-py3-env)$ ...
- Install torchtext into the virtual
environment
(my-py3-env)$ conda install torchtext ...
Validate the torchtext installation
A quick set of tests to verify the installation can be executed using the command below.
(my-py3-env) $ torchtext-test
If you prefer, you can run a more extensive test suite by adding --runslow
to
the torchtext-test command. Executing the extended tests will require approximately 5GB of free disk
space and the installation of additional support packages.
- Install the following optional token parsing packages:
- nltk
-
(my-py3-env) $ conda install nltk
- revtok
-
(my-py3-env) $ pip install revtok
- sacremoses
-
(my-py3-env) $ pip install sacremoses
- spacy
-
(my-py3-env) $ conda install spacy
- Install nltk and spacy English language
support
(my-py3-env)$ python -m spacy download en (my-py3-env)$ python -m nltk.downloader perluniprops nonbreaking_prefixes
- Execute the extended tests
(my-py3-env)$ torchtext-test --runslow
Torchtext examples
Example usage patterns can be found in the torchtext documentation:
https://torchtext.readthedocs.io/en/latest/examples.htmlIn addition to these code samples, the PyTorch team has provided the PyTorch/torchtext SNLI example to help describe how to use the torchtext package. The example code illustrates how to download the SNLI data set and preprocess the data before feeding it to a model. The example is included in the PyTorch package.
To view an online version of the source code for this example see:
https://github.com/pytorch/examples/tree/master/snliRunning the PyTorch/torchtext SNLI example:
Running the example code requires the installation of the PyTorch samples and examples as well as the SpaCy package. For more information, see https://spacy.io/.
- Install the example code using the
pytorch-install-samples
tool (notepytorch
, rather thantorchtext
):(my-py3-env) $ pytorch-install-samples ~/pytorch-samples
- Install SpaCy into the virtual
environment
(my-py3-env)$ conda install spacy
- Install the SpaCy english language
model
(my-py3-env)$ python -m spacy download en
- Run the example
For a simple execution:
(my-py3-env) $ cd pytorch-samples (my-py3-env) $ python examples/snli/train.py --epochs 1
To see all available options for the example:
(my-py3-env) $ python examples/snli/train.py --help
More information about torchtext
Project documentation for torchtext: https://torchtext.readthedocs.io/en/latest/index.html
Source code for the torchtext project: https://github.com/pytorch/text
Community resources
The PyTorch Sentiment Analysis github repo contains several tutorials designed to illustrate how to:
- Create train/test and validation splits
- Build a vocabulary
- Create data iterators
- Define a model and implement the train/evaluate/test loop
Getting started with PyText
PyText is a deep-learning based NLP modeling framework built on PyTorch and torchtext.
WML CE support for PyText is included as a separate package and can be installed and set up as shown below.
- PyTorch and torchtext are installed as requisites to PyText.
- PyText supports Python v3.6 only.
Install PyText
Follow these steps to install PyText.
- Create a virtual conda environment with python=3.6
conda create -y -n my-py3-env python=3.6 ...
- Activate the
environment
source activate my-py3-env (my-py3-env)$ ...
- Install PyText into the virtual
environment
(my-py3-env)$ conda install pytext-nlp ...
Validate the PyText installation
(my-py3-env) $ pytext-test
PyText examples
To use the examples provided, follow these steps:
- Install the examples code using the
pytext-install-samples
tool:(my-py3-env) $ pytext-install-samples ~/pytext-samples
- Run the example:
- Train your first
model:
(my-py3-env) $ cd ~/pytext-samples (my-py3-env) $ pytext train < demo/configs/docnn.json
- Evaluate the
model:
(my-py3-env) $ pytext test < demo/configs/docnn.json
- Export the
model:
(my-py3-env) $ pytext export --output-path exported_model.c2 < demo/configs/docnn.json
- Train your first
model:
Details on executing advanced models with PyText are available in PyText documentation: https://pytext.readthedocs.io/en/master/atis_tutorial.html
More information about PyText
Project documentation for PyText: https://pytext.readthedocs.io/en/master/
Source code for the PyText project: https://github.com/facebookresearch/pytext
Note about locales
Some of the examples and features of torchtext and PyText may require that the locale be set appropriately. If the locale is unset, you might see various errors, such as:
RuntimeError: Click will abort further execution because Python 3 was
configured to use ASCII as encoding for the environment.
or
UnicodeEncodeError: 'ascii' codec can't encode character '\x..' in
position .....: ordinal not in range(128)
You can set the locale to an appropriate value using the LANG
environment
variable. The value that you choose must be supported by the OS and may depend on the language and
encoding of the text being processed.
You can see the installed locales using locale -a
or (on RHEL) localectl
list-locales
. If the locale that you want is not listed by those commands, you may need to
install it. On Ubuntu, you can install the locales-all
package. On RHEL, you may
need to reinstall glibc-common
(after ensuring
override_install_langs
is not set in /etc/yum.conf
).
Set the locale by exporting the LANG
environment variable. For example, to set
US English with UTF-8 encoding:
export LANG=en_US.utf8