IBM Support

Troubleshooting issues related to docker network- HNS failed with error - or PA Workspace not accessible from outside the NAT

Troubleshooting


Problem

Error when trying to start PA Workspace:
Cannot start service pa-gateway: failed to create endpoint pa-gateway on network nat: HNS failed with error : Unspecified error
Execution failed with exit code 1
Or
PA Workspace containers are correctly started up, but it cannot be reached by the internet browser.

Resolving The Problem

If pa-gateway fails to start, then run resmon.exe, select "Listening ports", sort the port numbers, and verify that the PA Workspace ports (80 and 443 by default if nothing specified in /config/paw.ps1) are not used by something else. Also if there is some firewall, then verify these ports are not blocked.
If there is an antivirus, then exclude these folders (and subfolders) and applications from being scanned:
<Your_PA_Workspace_install_path>
C:\ProgramData\Microsoft\Windows\HNS
C:\ProgramData\Docker
C:\Program Files\Docker
C:\Program Files\Docker\docker.exe
C:\Program Files\Docker\dockerd.exe
C:\Program Files\Docker\docker-compose.exe
 
Open Powershell in elevated mode (right-click "Run as administrator"), run these commands delete all containers (but not the volumes/databases that contain books and users,  nor the images) :
cd <Your_PA_Workspace_install_path>
./scripts/paw.ps1 stop
docker stop admintool
docker rm admintool
docker rm $(docker ps -a -q)
stop-service docker
Now remove the Docker NAT network informations (some backup commands have been added, so that to be able to restore the previous configuration if necessary):
Get-NetIPInterface > NetIPInterface_backup.txt
Get-ContainerNetwork > ContainerNetwork_backup.txt
Get-NetNat > NetNat_backup.txt
Get-VMSwitch > VMSwitch_backup.txt
Get-ContainerNetwork | Remove-ContainerNetwork  -force
stop-service hns
copy 'C:\ProgramData\Microsoft\Windows\hns\hns.data' 'C:\ProgramData\Microsoft\Windows\hns\hns.data.backup'
Get-NetNat | Remove-NetNat -Confirm:$false 
Get-VMSwitch | Remove-VMSwitch  -force
If the previous commands succeeded then continue with this :
-From Windows Run menu, run compmgmt.msc
-select Device Manager
-Delete all Hyper-V Virtual Ethernet Adapters and switches (delete Hyper-v "vEthernet" adapters but don't delete "Ethernet" adapters)
Return to the Powershell and run this to finish deleting the host network configuration:
del 'C:\ProgramData\Microsoft\Windows\hns\hns.data'
 
If one of these commands is failing, then there is a Microsoft/Docker issue that IBM cannot resolve. In that case please refer to this doc:
"How to get help with your Windows container issues"
https://success.docker.com/article/where-to-get-help-with-windows
You can also apply recommendations and commands from this link :
Try to apply recommendations from this link :
https://www.ibm.com/support/knowledgecenter/SSD29G_2.0.0/com.ibm.swg.ba.cognos.tm1_inst.2.0.0.doc/c_paw_trbl_hns_errors.html

Now we can restart Host Network Service (HNS) and docker service :
start-service hns
start-service docker
Another restart of Docker is necessary to properly regenerate the docker NAT gateway and subnet (otherwise they are set to 0.0.0.0 for unknown reason) :
stop-service docker
start-service docker

Because we deleted the Docker network, we have to check if the Get-NetIPInterface still shows correct Interface Metric Numbers for each ipv4 network adapter. Run this in Powershell :
Get-NetIPInterface
This will show the Interface Metric Number of each network card, for example :

ifIndex InterfaceAlias                  AddressFamily NlMtu(Bytes) InterfaceMetric Dhcp     ConnectionState PolicyStore
------- --------------                  ------------- ------------ --------------- ----     --------------- -----------
6       Ethernet0                       IPv4                  1500              15 Disabled Connected       ActiveStore
8       vEthernet (HNS Internal NIC)    IPv4                  1500              15 Enabled  Connected       ActiveStore
...

Now modify the Interface Metric Number of the ipv4 vEthernet (HNS internal NIC) card, so that it is strictly higher than the Interface Metric Number of the ipv4 Ethernet card, for example in that case we could try this:
Set-NetIPInterface -InterfaceIndex 8 -InterfaceMetric 20
The result is this:

ifIndex InterfaceAlias                  AddressFamily NlMtu(Bytes) InterfaceMetric Dhcp     ConnectionState PolicyStore
------- --------------                  ------------- ------------ --------------- ----     --------------- -----------
6       Ethernet0                       IPv4                  1500              15 Disabled Connected       ActiveStore
8       vEthernet (HNS Internal NIC)    IPv4                  1500              20 Enabled  Connected       ActiveStore
...

Now reboot the machine.
After reboot, reconnect to a Powershell with elevated admin rights.

Now stop PA Workspace again because docker has automatically restarted some containers that may not have been deleted yet:
cd <Your_PA_Workspace_install_path>
./scripts/paw.ps1 stop
Verify all is stopped and verify there is no temporary container (looking like "2a45b323_bss" for example) :
./scripts/paw.ps1 ps
If the "./scripts/paw.ps1 ps" command still shows some temporary containers then use "docker rm <container_name>" to delete them. If they cannot be deleted, then reboot the server machine again and retry.

Now we can try to run the admintool again :
./scripts/admintool.ps1
If the admintool is ok and the Validate button of the admin tool shows OK everywhere, then run this :
./scripts/paw.ps1  (with no 'Start' option, so that the config file is read again and the containers are properly recreated)
Finally, run "./scripts/paw.ps1 ps" again to verify everything is started up and bss-init is stopped (meaning it has finished its initialization task).
As long as bss-init is still running you will get the "Planning Analytics Workspace is unavailable" message when trying to connect to PA Workspace,  so waiting few minutes may be necessary before PA Workspace is fully operational.
When server machine is busy, there can me timing issues when starting all containers at the same time, so if "./scripts/paw.ps1 ps" still shows unstarted containers, then just run this command to finish starting all remaining containers :
./scripts/paw.ps1 start

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSCTEW","label":"IBM Planning Analytics Local"},"Component":"Planning Analytics Workspace;PAW","Platform":[{"code":"PF033","label":"Windows"}],"Version":"Windows Server 2016","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 November 2019

UID

ibm11101933