Set up Data Science Experience Local
After you install DSX Local, you can configure it in the following ways.
- Set up an SSL connection for the web client
- Configure DSX Local settings and users
- Configure DSX Local to work with the HDP or CDH cluster
- Configure DSX Local to work with Microsoft Azure VMs
- Optional configuration settings
Set up an SSL connection for the web client
If you need to enable an HTTPS connection to the DSX Local web client with your own SSL certificate and private key (both in PEM format) rather than the default, complete the following steps:
- Ensure the SSL certificate and private key are in the same directory, and remember the absolute path for later. The SSL certificate can be a bundle that contains your server, intermediates, and root certificates concatenated (in proper order) into one file. The necessary certificates should be enabled as a trusted certificate on the clients connecting to your DSX Local instance.
- Name the SSL certificate
cert.crt
, and name the private keycert.key
. You can verify the information supplied in the certificate signing request using the following command:openssl x509 -noout -text -in ./cert.crt
- Verify that the private key and certificate's public key match by entering the following
commands to retrieve the md5 hash of the moduli of the private key and certificate's public
key:
openssl x509 -noout -modulus -in ./cert.crt | openssl md5 openssl rsa -noout -modulus -in ./cert.key | openssl md5
If they match, then the private key and certificate match, and this certificate will be accepted by nginx.
- Replace the nginx default SSL certificate with your self-signed certificate by entering the
following
commands:
cd /wdp/utils ./change_nginx_cert.sh replace <your_chosen_directory_absolute_path>
Alternatively, in the Admin Console, click the menu icon ( ) and click Scripts. In the Script pull-down menu, select Replace an existing cert.key and cert.crt certificates (change_nginx_cert.sh) to replace the SSL certificate.
You can now sign into the DSX Local client using the domain name that matches the certificate, to verify that the certificate has been replaced.
cd /wdp/utils
./change_nginx_cert.sh rollback
Configure DSX Local settings and users
To configure DSX Local, complete the following steps:
- By default, you can sign in to the DSX Local client by using your high availability proxy IP
address, for example,
https://123.45.67.89
.Optional: In the local DNS server, you can add an entry to resolve to the HA proxy IP address. For example:123.45.67.89 ibm-nginx-svc
. As a result, you are able to sign in to the DSX Local client through thehttps://ibm-nginx-svc
web address. - Open the admin console by signing into the DSX Local client from a web browser and switching to
Admin Console.
Alternatively, you can sign in to the admin console directly if you append
/dsx-admin
to your DSX Local client URL:https://ibm-nginx-svc/dsx-admin
- In the Admin Console, click the Menu icon ( ) and click User Management. Edit the admin user to set an email address and change the password for the primary administrator.
- You can configure a connection to your SMTP server so that DSX Local can send email to users and
admins. DSX Local sends emails to users when they are given access to DSX Local and to
administrators when a new user signs up for DSX Local, an alert is triggered, or an application
setting, such as the alert threshold, is changed.
To enable DSX Local to send email:
- From your username, select Settings.
- In the SMTP settings section, specify the following information:
- The SMTP mail server address.
-
The port number of your SMTP server.
Important: If you specify a secure port, you must select Use SSL encryption. If you specify a secure port but do not select this option, DSX Local cannot communicate with your SMTP server.
-
Depending on your SMTP server, you might need to specify your SMTP credentials:
-
If your SMTP server doesn't have a mailer daemon, you must specify an SMTP username and password.
-
If you SMTP server does have a mailer daemon, communications from DSX Local are associated with the mailer daemon account automatically. To associate communications with a specific account instead, provide the credentials for that account.
-
- Click Save. If your SMTP configuration is successful, you receive a confirmation email.
- Add users or set up an LDAP server. See Manage users for details.
- Switch to the IBM Data Science Experience Local client.
- Verify that the sample notebooks display successfully. Create a test project.
Configure DSX Local to work with the HDP or CDH cluster
If your HDP or CDH cluster does not use security, then just ensure DSX Local can access it. No additional configuration is needed.
To configure DSX Local to work with a secure HDP or CDH cluster, complete the following steps:
- In the DSX Local master node, run the
/wdp/utils/add_endpoint.sh
script to add the certificate to securely connect to the HDP or CDH cluster. Additionally, you can run the script to set up the default Livy endpoint for the DSX Local cluster. Example:./add_endpoint.sh --knox-url=https://9.87.654.323:8443 --addcert ./add_endpoint.sh --knox-url=https://9.87.654.323:8443 --livy-url=https://9.87.654.323:8443/gateway/dsx/livy/v1 --addcert
where
https://9.87.654.323:8443/gateway/dsx/livy2/v1
represents the secure Livy endpoint that is defined indsx.xml
. As a result, the script automatically creates adefault_endpoints.conf
file.Alternatively, in the Admin Console, click the menu icon ( ) and click Scripts. In the Script pull-down menu, select Set the default Livy endpoint for DSX Local (add_endpoint.sh) to perform the same tasks.
- Restart your Jupyter kernel and Zeppelin interpreter to pick up the new certificates.
- To ensure the same usernames exist in both DSX Local and HDP or CDH, set up the HDP or CDH LDAP server in DSX Local. See Manage users for details.
Configure DSX Local to work with Microsoft Azure VMs
For users to access the DSX Local client, you must make all three private IP addresses for the three master nodes (either from the three node or nine node configuration) accessible. Complete the following steps on each master node:
- In the
/wdp/k8s/dsx-local-proxy/k8s/
directory, back upnginx-service.yaml
tonginx-service.yaml.orig
. - Edit
nginx-service.yaml
and change the IP addresses to the three private IP addresses of the three master nodes (follow the same format as in the file, and ensure each IP is on a separate line). Example:( externalIPs: 10.0.0.100 10.0.0.7 10.0.0.8 10.0.0.9)
- Run the command:
kubectl delete -f nginx-service.yaml.orig --namespace=ibm-private-cloud
- Run the command:
kubectl create -f nginx-service.yaml --namespace=ibm-private-cloud
- Test for an HTML response by running the command:
curl -k https://
- Order a Load Balancer within Azure, and set up the Load Balancer for HTTPs (port 443) to point to the three private IP interfaces of the three master nodes.
Optional configuration settings
You can optionally adjust when alerts are generated, how long log files and metrics are stored, and how frequently the metrics on the dashboard are refreshed.
To configure refresh and retention settings:
- From your username, select Settings.
- In the Refresh and alert settings, adjust the appropriate settings:
- Log retention (days)
-
The number of days to keep logs before they are automatically deleted.
The default is 10 days.
- Metrics retention (days)
-
Number of days to keep metrics history (such as the CPU and memory usage shown in the dashboard) before they are automatically deleted.
The default is 1 day.
Remember: If you increase the retention period and increase the frequency with which the dashboard metrics are refreshed, you use much storage in the Mongo database where metrics are stored.
- Dashboard refresh (seconds)
-
The frequency with which the data in the admin dashboard is refreshed.
The default is 10 seconds
- Alert threshold (%)
-
The usage threshold at which an alert is triggered. When the usage reaches this threshold, the node color immediately changes to red. The alert is generated if the usage stays above the threshold longer than the time that is specified for the Alert length threshold setting.
The default is 90%.
- Alert warning threshold (%)
-
The usage threshold at which a warning is triggered and the node color changes to yellow.
The default is 70%.
- Alert time threshold (minutes)
-
The length of time that must elapse before an alert is generated.
For example, if CPU usage goes above 90% for 30 seconds during a complex computation, you probably don't need to be alerted. But if CPU usage stays above 90% for 5 minutes, it might be cause for concern.
- Click Save.