Troubleshooting
Problem
The reason why this may occur is because the Python requests library does not support wildcards in the no_proxy environment variable.
StreamSets Platform copies the http.nonProxyHosts over to the no_proxy variable automatically when the deployment starts.
However, the Python requests library does not respect the same syntax, and wildcard characters such as * are not interpreted correctly.
This is why specifying nonProxyHosts like *.host.com or 10.0.0.* can be problematic.
You can disable the use of proxies in your Jython script by creating a session and setting
session.trust_envto False. ex:url = "http://test.com" session = requests.Session() session.trust_env = False try: response = session.get(url) record = sdc.createRecord('test') record.value = response.text cur_batch.add(record) except Exception as e: cur_batch.addError(record, str(e)) cur_batch.process(entityName, str(offset))
The requests library has shown to support specifying CIDR ranges and subdomains for the no_proxy variable in the following format:
.host.com
10.0.0.0/8
ex: http.nonProxyHosts=.host.com, 10.0.0.0/8
Account for the different types of syntax and include them in your deployment’s http.nonProxyHosts list.
Symptom
With StreamSets Platform, you can configure your Data Collector deployments to use a proxy server https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Configuration/ProxyServer.html?hl=proxy
You are also able to define http.nonProxyHosts, which lists hosts that Data Collector can connect to without going through the proxy.
However, when using the Jython stage to make requests (using the requests library) to these hosts, you may see that it still tries to connect through the proxy. Example script:
If that nonProxyHost does not allow connections from the proxy, you can receive a timeout error.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
15 March 2025
UID
ibm17186240
