Installing Python

You must install Python on the BigData master node. You must also install the Python Package Index (PIP) and the pgmpy Bayesian network library.

Procedure

  1. Download Python from https://www.python.org/downloads/release/python-352/.
  2. Extract Python-3.5.2.tar.xz.

    For example,

    tar -xJf Python-3.5.2.tar.xz
  3. Go to the extracted folder, and run the following commands:
    ./configure
    make
    make altinstall

    make install makes 3.5.2 the default Python. Because Spark has some dependencies on the 2.7.2 version, use make altinstall.

  4. Verify the pPthon version:
    /usr/local/bin/python3.5 -V

    The result should display Python 3.5.2.

  5. Use the following commands to install the Python modules:
    /usr/local/bin/pip3.5 install numpy  
    /usr/local/bin/pip3.5 install pandas  
    /usr/local/bin/pip3.5 install scipy  
    /usr/local/bin/pip3.5 install pyparsing  
    /usr/local/bin/pip3.5 install flask  
    /usr/local/bin/pip3.5 install wrapt  
    /usr/local/bin/pip3.5 install flask_cors
    /usr/local/bin/pip3.5 install keras
    /usr/local/bin/pip3.5 install sklearn
    /usr/local/bin/pip3.5 install pickle
    /usr/local/bin/pip3.5 install flask_restful
    /usr/local/bin/pip3.5 install ssl
    /usr/local/bin/pip3.5 install spacy
    /usr/local/bin/pip3.5 install configparser
    /usr/local/bin/pip3.5 install email_reply_parser
    /usr/local/bin/pip3.5 install textacy
    /usr/local/bin/pip3.5 install tensorflow
    /usr/local/bin/pip3.5 install flask_cors
    
  6. Install the pgmpy Bayesian network library:
    Note: Do not use PIP to install pgmpy.
    1. Download the source (zip file) from https://github.com/pgmpy/pgmpy or clone the pgmpy repository. If you cloned the repository, use the following steps:
      git clone https://github.com/pgmpy/pgmpy
      git checkout dev 
    2. If you downloaded the source, decompress the file.
    3. Run the setup.py install command.

      For example,

      /usr/local/bin/python3.5 setup.py install