## APM-PM-LIB 

## Change log

- `9.0`
  -  CP4D 4.8 with Python 3.10 is the base platform
  -  DQLearn: Some notebooks deprecated and datasets removed. Python packages have been updated to be compliant with CP4D 4.8
  -  SROM: Some enhancements and bug fixes and version updates. Python packages have been updated to be compliant with CP4D 4.8
  -  Explainability: Python packages have been updated to be compliant with CP4D 4.8
  -  Watson Machine Learning: Driver v4.8 is supported

- `8.9.0`
  -  DQLearn: Some notebooks deprecated. AutoImputation notebooks combined into one
  -  SROM: Some enhancements and bug fixes and version updates
  -  Notebooks: Simplified notebook structure, added better exception handling, and configuration options. Config Driven Anomaly Detection is the new notebook that provides an easy blackbox approach to anomaly detection. Anomaly Detection also combines both WS and PMI notebooks into one, providing a continuous way to train, evaluate and deploy the models in WML as well as Monitor runtimes from within the same notebook.
  -  mat-service: Updated packages, and bug fixes.
  -  UI: Anomaly detection supports high frequency scoring at the levels of minutes and hours.

- `8.8.0`
  -  SROM: Version updates, bug fixes, better metric conversions for base estimators
  -  Explainability: Additional explainers to support counter factuals, PDP
  -  Notebooks: Support for onboarding SPSS streams
  -  UI: Feature importance for Anomaly detection

- `8.7.0`
  -  DQLearn
  -  SROM
  -  Explainability
  -  mat-service
- `8.6.0` 
  -  DQLearn :
  -  SROM
  -  Explainability
  -  mat-service
- `8.5.0` 
  -  DQLearn :
  -  SROM
- `8.4.0` 
- `8.3.0` 



## Project Structure

* `as-container`: Docker build files for building images from AS base image.
* `docs`: generated pydoc also published [here](https://pages.github.ibm.com/asset-performance/APM-PM-LIB/pmlib/).
* `pdoc_template`: template files for generating pydoc (we use [pdoc3](https://pdoc3.github.io/pdoc/))
* `pmlib`: the Python source code, including tests
* `srom`: SROM is bundled together with library as a module, so there's no need to care about separately installing SROM. This way, we make sure we always have the "right" version of SROM. It also allows "tweaking" from ourselves, when necessary.
* `vendor`: 3rd party Python modules put here are available (and override installed version) when running in Docker

Some notes on `vendor`. There are also currently a few other modules put under `vendor` checked into git but not packaged with `pmlib` together as they are used for easily running `pytest` in Docker environment. Please DO NOT add new modules to `vendor` to check into git unless there's no other way. We should rely on standard Python way, that is, `setup.py` to manage our dependencies.

Also, for local development, you can temporarily put any library/module under `vendor` for Docker containers. All the shell scripts for running pmlib in Docker containers are setup in the way that modules put under `vendor` override system level locations. For example, say we want to test a small change to `iotfunctions`, we can put a local copy of it under `vendor` and make the change there, and when running with provided shell scripts, this override whatever version is pre-installed in the Docker images.


## Development Practice and Conventions

`pmlib` follows [numpy docstring guide](https://numpydoc.readthedocs.io/en/latest/format.html).


## Development and Testing

You have to have a Python3 environment of course. It is recommended to use `pyenv` and `virtualenv` which is probably the easiest way to manage Python runtimes. We are using Mac as example here (Windows is for sure different here and there, you are welcomed to add here):

```
# install homebrew
> /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

# install pyenv for managing python runtimes
> brew install pyenv

# use pyenv to install python3 (choose latest stable version)
> pyenv install 3.7.4
> pyenv global 3.7.4
> echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bash_profile

# install virtualenv
> sudo -H pip install virtualenv
> sudo -H pip install virtualenvwrapper

# configure virtualenv
> mkdir ~/.virtualenvs

# add the following lines to ~/.bashrc   OR  ~/.bash_profile ( it works for me)
# if for some reason, you don't see virtualenvwrapper.sh under /usr/local/bin, find and copy it from ~/.pyenv/versions/3.7.4/bin/ to there
export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

# use virtualenv to create a new env for pmlib
# providing project path which links to the env allows easier/auto directory change upon activating an env
# Example: mkvirtualenv -a /Users/keweiy@us.ibm.com/git/APM_2020/APM-PM-LIB pmlib2
# Syntax: mkvirtualenv [-a project_path] [-i package] [-r requirements_file] [virtualenv options] ENVNAME
> mkvirtualenv -a <absolute path to your project root> pmlib

# install project dependencies, including test
> pip install -e .[test]
```

The last command installs all `pmlib` dependencies in the virtualenv, in editable mode, we just created. In this way, your modification in the project space would automatically be in effect in the virtualven. We also choose to install test dependencies (the tailing `[test]`), in addition to the base ones, which basically additionally installs `pytest` and related add-ons.

With a Python3 environment ready, we can start the development. There are two ways of doing `pmlib` development. One is unit testing in local mode and the other is remote integration with a specific PMI environment.

Note on Mac, you would find the installed module `ibm_db` had issues like the following:

```
import ibm_db
ImportError: dlopen(/Users/bryan/.virtualenvs/pmlib/lib/python3.7/site-packages/ibm_db.cpython-37m-darwin.so, 2): Library not loaded: libdb2.dylib
  Referenced from: /Users/bryan/.virtualenvs/pmlib/lib/python3.7/site-packages/ibm_db.cpython-37m-darwin.so
  Reason: image not found
```

I'm using `ibm-db` version 3.0.1 on Mac. Therer is an easy fix for this (replace the absolute path to both with the actual one on your Mac system):

```
> install_name_tool -change libdb2.dylib /Users/bryan/.virtualenvs/pmlib/lib/python3.7/site-packages/clidriver/lib/libdb2.dylib /Users/bryan/.virtualenvs/pmlib/lib/python3.7/site-packages/ibm_db.cpython-37m-darwin.so
```

Essentially, the reference from `ibm_db.cpython-37m-darwin.so` to `libdb2.dylib` is incorrect and the fix above simply replace it with the correct absolute path.


### Unit Testing in Local Mode

For small scope code, this is the preferred way. Keep in mind that all the unit tests are covered in our CI flow.

The simplest way to run all the unit tests in local mode is:

```
LOCAL_MODE=True pytest <path_to_local_project>/pmlib
```

Local mode has the constraint that no network connection should be made. What that means is, no API call can be made. However, local mode does provides a local sqlite database capable of handling all the data lake operation. Essentially, in local mode you can still use the local sqlite database as the data source for asset and IoT data which allows you to create a model pipeline to read data from and save generated data to. This (local sqlite database) is transparent to your code of course.

To use local mode, set environment variable `LOCAL_MODE` to be `True` (string) (or programmically call function `api.set_local_mode()` in your code). `pmlib` automatically handles the local mode switching for you, based on the environment variable's value.

See `pmlib/tests/test_data_preparation.py` and `pmlib/tests/test_model_training.py` for examples how to write unit tests.

### Environments

For bigger scope testing, it is necessary to use a remote PMI environment. The minimum information necessary to point to a specific PMI tenant are:

1. APM_ID
2. APM_API_BASEURL
3. APM_API_KEY

By setting up these 3 environment variables properly, you can connect/use any PMI tenant for development purpose.

Different environment requires a different test shell script to inject proper environment variables. In this project, several most often used test environments already have their corresponding shell scripts (`pmi-*.sh`) created. You can create your own of course.

### Preparing Docker Image for Development

When using remote PMI environment for development, it is perfectly fine to use a local Python3 environment. But sometimes, it might be necessary to use Docker iamge for example, to make sure our code really works fine when deployed to AS runtime (which is using Docker containers). Another situation might require the use of Docker image is when required depdencies are hard to install (on Windows for example).

Using Docker images for development is not difficult at all. See the readme file in directory `as-container` for preparing the docker image locally. We also have many shell scripts, `pmi-*.sh` and `test-*.sh`, prepared to make it easy to launch Docker containers.

### Tests

To make it easier, tests are ran by docker images.

All tests are put under `pmlib/tests`. Note that there are currently two set of tests:

1. Unit tests: test files named like `test_*.py` or `*_test.py` (note the use of underline).
2. Integration tests: test files name like `test-*.py` (note the use of dash). There are legacy tests that will be migrated and removed soon. Please stop creating this kind of tests.

For unit tests, we use [pytest](https://docs.pytest.org/en/latest/contents.html#) to run the whole unit test suite. There is a shell script to make it even easier:

```
./pytest.sh pmlib
```

This command will run all the unit tests found under directory `pmlib` following the [Conventions for Python test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery), essentially, what we just mentioned, tests named `test_*.py` or `*_test.py`. Optionally, you can add the set of test Python files as parameters to this command to only run the tests selected.

Unit tests are expected to be run locally, independent of other systems. This makes running tests the most convenient and the simplest. All unit tests should enable "local mode", which an be done by calling function `api.set_local_mode()` at the beginning of your tests. In test mode, there is a local sqlite database for all the data lake operation, so it does not require connection to a remote data lake. For database operation, this is transparent so you don't need to worry about it. For other network connection, like calling Maximo or PMI APIs, they are not possible. You do need to use asset cache facility in order to load asset data. See existing model training unit tests as examples.

In short, we expect unit tests to be run in an isolated environment, one without access to a PMI environment.

For the second set, integration tests, use shell scripts `test-*.sh` with a list of test files as arguments. These tests are run as normal Python scripts, from start to end, without any special convention like function name or whatever. These are for historical reason and are still left for manual invocation verification purpose. There will be no automated running for thses integration tests.

### Using the pull request method for delivering changes

We are using pull requests for delivering changes. Pull requests offer a workflow. Each developer works in a new "feature" branch, and they deliver changes to that branch instead of delivering directly to the `master` branch. Then, a developer creates a pull request of their changes for the `master` branch. When a pull request is created, the CI server, which is Travis in this case, will build and test their code against the `master` branch but not accept it. If failures happen, the developer is notified that their request is not valid, and they need to make adjustments until it passes all tests.

During this process, the project maintainer has the ability to do a line-by-line code review of the request using the GitHub online tools. The code reviewer can annotate lines of code with comments and deny the pull request until changes are made. After everyone, including the CI Server, is happy with the request, the project maintainer can accept the pull request into the `master` branch. At that point, the branch needs to be closed. The developer must refresh all `master` changes in their dev environment and then create a new branch for their next unit of work. Pull requests ensure the best approach for code quality because every change is reviewed, and no one is allowed to deliver changes against the `master`.

We have a `master` branch that is the current stable code. Developers never check code into the `master` branch directly.  As a quick start the process for development is as follows:

1. Pull latest `master`: `git checkout master && git pull`
2. Create a new branch: `git checkout -b "feature_branch_name"`
3. Do your changes
4. Run and pass all unit tests: `./pytest.sh`
5. Commit your code: `git commit -m "<>"`
6. Push your changes: `git push --set-upstream origin "feature_branch_name"`
7. Create a Pull Request on GitHub, from "feature_branch_name" to `master`
8. Fix issues (if any) that the reviewer identifies
9. Reviewer approve the change and merge into `master`
10. Cleanup the feature branch: `git branch -d "feature_branch_name" && git push --delete origin "feature_branch_name"`

> Check out [creating pull requests](https://help.github.com/en/articles/creating-a-pull-request) for more general information on the pull request methodology.


### Building Distributables

To generate pip-installable zip file for distribution purpose:
deploy_cp4d.sh

```
python setup.py sdist --formats=zip
```

The other need to be "built" is pydoc. To generate pydoc, first install [pdoc3](https://pdoc3.github.io/pdoc/) locally:

```
pip install -U pdoc3
```

Then run the build script:

```
./build_doc.sh
```

The generated pydoc HTML files are put under `docs` and they should be checked into git for source control.



### How to test one model using staging data with your local pmlib code.
```
workon pmlib

./test-stg.sh pmlib/tests/test-stg-lob-failure-probability.py > test.log



```

## Branch for different release.
The master branch is for MAS release.

The master4saas is for SaaS release.

## Cython Testing
To create the wheel files for srom, follow the instructions in the README in our fork of the srom rrepo https://github.ibm.com/asset-performance/srom/tree/release_1.2.x

Delete the previous versions of the wheel file in the dslib/ folder and add the new wheel files in their place.

Update the following files-
setup.py
setup.py.cp4d
build_doc.sh
( change pip3 install dslib/srom-1.2.10.1.4-cp37-cp37m-linux_x86_64.whl to the correct version of srom)

to reflect the new wheel file name. The current convention being used is <srom_version>.1.x for pure so wheel files and <srom_version>.2.x for debug version of the wheel file(which has both python and so files).



## Important:

All the changes in the "dev" branch need to be merged to "dev82" branch.


## How to use pmlib.zip file

!rm -rf /project_data/data_asset/APM-PM-LIB-oct_2021/
!unzip /project_data/data_asset/APM-PM-LIB-oct_2021.zip -d /project_data/data_asset/




!pip install /project_data/data_asset/APM-PM-LIB-may_2023

