Topic
10 replies Latest Post - ‏2013-02-28T02:04:12Z by Saruton
Saruton
Saruton
111 Posts
ACCEPTED ANSWER

Pinned topic PE is unhealthy, got NameService::not_found error

‏2013-02-21T04:55:05Z |
Hi ALL,

I'm having a trouble that PE is unhealthy but RUNNING state.
If this job is canceled, the instance is running at Healthy state. I checked trace and got the following:

<pec49.out>

21 Feb 2013 13:37:55.313 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:00.698 [51627] ERROR :::Core.Transport.Tcp M[TCPConnection.cpp:connectToServerUnlocked:429] - Connection attempt failed for '50.95' retrying (1)
21 Feb 2013 13:38:00.701 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:05.997 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:11.391 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:16.882 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:22.314 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:27.862 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:33.147 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:38.431 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:43.957 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:49.417 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:38:54.741 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:00.023 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:05.285 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:10.903 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:16.244 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:21.878 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:27.214 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:32.805 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:38.144 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:43.511 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:48.839 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:54.508 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found
21 Feb 2013 13:39:59.780 [51627] ERROR :::NAM.LookupEntry M[DN_NameService.cpp:lookupObject:682] - got NameService::not_found


Other PEs, which is running at Healthy state, don't have traces like above so I guess this is the cause of Unhealthy.
What should I check as the next step?

I appreciate your suggestions in advance.

Thanks,
  • wbratton
    wbratton
    76 Posts
    ACCEPTED ANSWER

    Re: PE is unhealthy, got NameService::not_found error

    ‏2013-02-21T12:51:26Z  in response to Saruton
    What Streams version are you running?
    • Saruton
      Saruton
      111 Posts
      ACCEPTED ANSWER

      Re: PE is unhealthy, got NameService::not_found error

      ‏2013-02-22T00:36:23Z  in response to wbratton
      Thanks for your help.
      Ver.3.0 I'm using.
      • BruceGlassford
        BruceGlassford
        71 Posts
        ACCEPTED ANSWER

        Re: PE is unhealthy, got NameService::not_found error

        ‏2013-02-22T02:08:13Z  in response to Saruton
        I've seen this when there's an Import where there's no matching Export running. Could this be the case for you?
        • Saruton
          Saruton
          111 Posts
          ACCEPTED ANSWER

          Re: PE is unhealthy, got NameService::not_found error

          ‏2013-02-25T01:25:01Z  in response to BruceGlassford
          Thanks for reply.
          Yes, the application uses Export/Import operators, but I can confirmed that those operators are connected successfully on 'Application Streams' menu on InfoSphere Streams Console window.
  • Kevin_Foster
    Kevin_Foster
    98 Posts
    ACCEPTED ANSWER

    Re: PE is unhealthy, got NameService::not_found error

    ‏2013-02-22T04:51:38Z  in response to Saruton
    This isn't specific to this problem, but when I can't see any obvious cause of a problem then I'll typically shut down all instances, reboot my Linux machine, and then go through a test sequence like the one below. For example, more than once I've found a dependency checker warning that was missed during installation by myself or someone else.

    -Kevin

    
    InfoSphere Streams environment variables have been set.   bash-3.2$ whoami streamsadmin   bash-3.2$ hostname streamstrial   bash-3.2$ ssh localhost Last login: Thu Feb 21 22:06:48 2013 from localhost.localdomain InfoSphere Streams environment variables have been set. -bash-3.2$ exit logout   Connection to localhost closed.   bash-3.2$ which streamtool /opt/ibm/InfoSphereStreams/bin/streamtool   bash-3.2$ which java /opt/ibm/java-x86_64-60/bin/java   bash-3.2$ java -version java version 
    "1.6.0" Java(TM) SE Runtime Environment (build pxa6460sr11-20120806_01(SR11)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr11-20120801_118201 (JIT enabled, AOT enabled) J9VM - 20120801_118201 JIT  - r9_20120608_24176ifx1 GC   - 20120516_AA) JCL  - 20120713_01   bash-3.2$ /opt/ibm/InfoSphereStreams/bin/dependency_checker.sh   IBM InfoSphere Streams 3.0.0.0 Trial Dependency Checker Date:  Thu Feb 21 22:24:23 EST 2013   === System Information === * Hostname:  streamstrial.vm.ibm.com * IP address:  192.168.84.130 * Operating system:  Red Hat Enterprise Linux Server release 5.5 (Tikanga) * System architecture:  x86_64 * Security-Enhanced Linux setting:  Disabled * Java vendor:  IBM Corporation * Java version:  1.6.0 * Java VM version:  2.4 * Java runtime version:  pxa6460sr11-20120806_01 (SR11) * Java full version:  JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr11-20120801_118201 (JIT enabled, AOT enabled) J9VM - 20120801_118201 JIT  - r9_20120608_24176ifx1 GC   - 20120516_AA * Java IBM system encoding:  UTF-8 * Encoding:  UTF-8   === System Configuration Check === * Status:  PASS - Check:  Hostname and IP address check * Status:  PASS - Check:  Operating system version check * Status:  PASS - Check:  Architecture check * Status:  PASS - Check:  Java check * Status:  PASS - Check:  Encoding check   === Software Dependency Package Check === * Status:  CORRECT VERSION - Package:  perl-XML-Simple, System Version:  2.14-4.fc6 * Status:  CORRECT VERSION - Package:  gcc-c++, System Version:  4.1.2-52.el5_8.1 * Status:  CORRECT VERSION - Package:  curl-devel, System Version:  7.15.5-15.el5 * Status:  CORRECT VERSION - Package:  ibm-java-x86_64-sdk, System Version:  6.0-11.0   === Summary of Errors and Warnings ===   CDISI0003I The dependency checker evaluated the system and did not find errors or warnings.   bash-3.2$ cd ~   bash-3.2$ mkdir splsamples mkdir: cannot create directory `splsamples
    ': File exists   bash-3.2$ cp -R $STREAMS_INSTALL/samples/spl/feature/RegularExpression splsamples   bash-3.2$ cd splsamples/RegularExpression   bash-3.2$ make /opt/ibm/InfoSphereStreams/bin/sc -a  -T -M sample::DateTimeFormatter Creating types... Creating functions... Creating operators... Creating PEs... Creating standalone app... Creating application model... Building binaries... make[1]: Nothing to be done 
    
    for `all
    '.   bash-3.2$ streamtool mkinstance -i sample --numhosts 1 --template developer CDISC0040I The system is creating the sample@streamsadmin instance. CDISC0001I The sample@streamsadmin instance was created.   bash-3.2$ streamtool startinstance -i sample CDISC0059I The system is starting the sample@streamsadmin instance. *********************  A T T E N T I O N  ******************** You are currently using a temporary, time-limited license. To 
    
    continue with uninterrupted operation, please obtain a non-trial version of InfoSphere Streams within 20 days.   Thanks 
    
    for using InfoSphere Streams! **************************************************************   CDISC0078I The system is starting the runtime services on 1 hosts. CDISC0056I The system is starting the distributed name service on the streamstrial host. The distributed name service has 1 partitions and 1 replications. CDISC0057I The system is setting the NameServiceUrl property of the instance to DN:streamstrial.vm.ibm.com:39568, which is the URL of the distributed name service that is running. CDISC0061I The system is starting in parallel the runtime services of 1 management hosts. CDISC0003I The sample@streamsadmin instance was started.   bash-3.2$ streamtool submitjob -i sample output/sample.DateTimeFormatter.adl *********************  A T T E N T I O N  ******************** You are currently using a temporary, time-limited license. To 
    
    continue with uninterrupted operation, please obtain a non-trial version of InfoSphere Streams within 20 days.   Thanks 
    
    for using InfoSphere Streams! **************************************************************   CDISC0079I The system is submitting 1 applications to the sample@streamsadmin instance. CDISC0080I Job ID 0 was submitted 
    
    for the application that is stored at the following path: output/sample.DateTimeFormatter.adl. CDISC0020I Submitted job IDs: 0   bash-3.2$ streamtool lspes -i sample Instance: sample@streamsadmin Id State      RC Healthy Host           PID JobId JobName                    Operators 0 Starting    - no      streamstrial     0     0 sample::DateTimeFormatter  RawDataR,...   bash-3.2$ streamtool canceljob -i sample 0 CDISC0021I Job ID 0 of the sample@streamsadmin instance was cancelled.   bash-3.2$ streamtool stopinstance -i sample CDISC0063I The system is stopping the runtime services of the sample@streamsadmin instance. CDISC0050I The system is stopping the hc service on the streamstrial host. CDISC0050I The system is stopping the sws service on the streamstrial host. CDISC0050I The system is stopping the sam service on the streamstrial host. CDISC0050I The system is stopping the sch service on the streamstrial host. CDISC0050I The system is stopping the srm service on the streamstrial host. CDISC0050I The system is stopping the aas service on the streamstrial host. CDISC0068I The system is stopping in parallel the runtime services of 1 hosts. CDISC0054I The system is stopping in parallel the distributed name services of the following 1 hosts: streamstrial CDISC0055I The system is resetting the NameServiceUrl property of the instance because the distributed name service is not running. CDISC0004I The sample@streamsadmin instance was stopped.   bash-3.2$ streamtool rminstance -i sample CDISC1008W Do you want to remove the sample@streamsadmin instance? Enter 
    "y" to 
    
    continue or 
    "n" to cancel: y CDISC0018I The cleanup of the log files of the sample@streamsadmin instance was successful. CDISC0005I The sample@streamsadmin instance was removed.   bash-3.2$
    
    • Saruton
      Saruton
      111 Posts
      ACCEPTED ANSWER

      Re: PE is unhealthy, got NameService::not_found error

      ‏2013-02-25T02:14:57Z  in response to Kevin_Foster
      Thanks for your suggestion.
      I tried the operations which you described above, but it worked normally.


      Last login: Fri Feb 8 12:56:14 2013 from localhost
      InfoSphere Streams environment variables have been set.
      [streamsadmin@myhost ~]$ whoami
      streamsadmin
      [streamsadmin@myhost ~]$ hostname
      myhost.localdomain
      [streamsadmin@myhost ~]$ ssh localhost
      Last login: Mon Feb 25 10:53:13 2013 from aa5007930.dhcp.toyosu.japan.ibm.com
      InfoSphere Streams environment variables have been set.
      [streamsadmin@myhost ~]$ exit
      logout
      Connection to localhost closed.
      [streamsadmin@myhost ~]$ which streamtool
      /opt/ibm/InfoSphereStreams/bin/streamtool
      [streamsadmin@myhost ~]$ which java
      /usr/bin/java
      [streamsadmin@myhost ~]$ java -version
      java version "1.6.0_24"
      OpenJDK Runtime Environment (IcedTea6 1.11.6) (rhel-1.54.1.11.6.el6_3-x86_64)
      OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
      [streamsadmin@myhost ~]$ /opt/ibm/InfoSphereStreams/bin/dependency_checker.sh

      IBM InfoSphere Streams Developer Edition 3.0.0.0 Dependency Checker
      Date: Mon Feb 25 10:54:46 JST 2013

      === System Information ===
      • Hostname: myhost.localdomain
      • IP address: 9.188.207.183
      • Operating system: Red Hat Enterprise Linux Server release 6.3 (Santiago)
      • System architecture: x86_64
      • Security-Enhanced Linux setting: Disabled
      • Java vendor: IBM Corporation
      • Java version: 1.6.0
      • Java VM version: 2.4
      • Java runtime version: pxa6460sr11-20120806_01 (SR11)
      • Java full version: JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr11-20120801_118201 (JIT enabled, AOT enabled)
      J9VM - 20120801_118201
      JIT - r9_20120608_24176ifx1
      GC - 20120516_AA
      • Java IBM system encoding: UTF-8
      • Encoding: UTF-8

      === System Configuration Check ===
      • Status: PASS - Check: Hostname and IP address check
      • Status: PASS - Check: Operating system version check
      • Status: PASS - Check: Architecture check
      • Status: PASS - Check: Java check
      • Status: PASS - Check: Encoding check

      === Software Dependency Package Check ===
      • Status: CORRECT VERSION - Package: gcc-c++, System Version: 4.4.6-4.el6
      • Status: CORRECT VERSION - Package: libcurl-devel, System Version: 7.19.7-26.el6_2.4
      • Status: CORRECT VERSION - Package: perl-XML-Simple, System Version: 2.18-6.el6
      • Status: CORRECT VERSION - Package: ibm-java-x86_64-sdk, System Version: 6.0-11.0

      === Summary of Errors and Warnings ===

      CDISI0003I The dependency checker evaluated the system and did not find errors or warnings.

      [streamsadmin@myhost ~]$ mkdir splsamples
      [streamsadmin@myhost ~]$ cp -R /opt/ibm/InfoSphereStreams/samples/spl/feature/RegularExpression/ splsamples
      [streamsadmin@myhost ~]$ cd splsample/RegularExpression/
      [streamsadmin@myhost RegularExpression]$ make
      /opt/ibm/InfoSphereStreams/bin/sc -a -T -M sample::DateTimeFormatter
      Creating types...
      Creating functions...
      Creating operators...
      Creating PEs...
      Creating standalone app...
      Creating application model...
      Building binaries...
      make[1]: Nothing to be done for `all'.
      [streamsadmin@myhost RegularExpression]$ streamtool mkinstance -i sample --numhosts 1 --template developer
      CDISC0040I The system is creating the sample@streamsadmin instance.
      CDISC0001I The sample@streamsadmin instance was created.
      [streamsadmin@myhost RegularExpression]$ streamtool startinstance -i sample
      CDISC0059I The system is starting the sample@streamsadmin instance.
      CDISC0078I The system is starting the runtime services on 1 hosts.
      CDISC0056I The system is starting the distributed name service on the myhost host. The distributed name service has 1 partitions and 1 replications.
      CDISC0057I The system is setting the NameServiceUrl property of the instance to DN:myhost.localdomain:42080, which is the URL of the distributed name service that is running.
      CDISC0061I The system is starting in parallel the runtime services of 1 management hosts.
      CDISC0003I The sample@streamsadmin instance was started.
      [streamsadmin@myhost RegularExpression]$ streamtool submitjob -i sample output/sample.DateTimeFormatter.adl
      CDISC0079I The system is submitting 1 applications to the sample@streamsadmin instance.
      CDISC0080I Job ID 0 was submitted for the application that is stored at the following path: output/sample.DateTimeFormatter.adl.
      CDISC0020I Submitted job IDs: 0
      [streamsadmin@myhost RegularExpression]$ streamtool lspes -i sample
      Instance: sample@streamsadmin
      Id State RC Healthy Host PID JobId JobName Operators
      0 Running - yes myhost 4393 0 sample::DateTimeFormatter RawDataR,...
      [streamsadmin@myhost RegularExpression]$ streamtool canceljob -i sample 0
      CDISC0021I Job ID 0 of the sample@streamsadmin instance was cancelled.
      [streamsadmin@myhost RegularExpression]$ streamtool stopinstance -i sample
      CDISC0063I The system is stopping the runtime services of the sample@streamsadmin instance.
      CDISC0050I The system is stopping the hc service on the myhost host.
      CDISC0050I The system is stopping the sws service on the myhost host.
      CDISC0050I The system is stopping the sam service on the myhost host.
      CDISC0050I The system is stopping the sch service on the myhost host.
      CDISC0050I The system is stopping the srm service on the myhost host.
      CDISC0050I The system is stopping the aas service on the myhost host.
      CDISC0068I The system is stopping in parallel the runtime services of 1 hosts.
      CDISC0054I The system is stopping in parallel the distributed name services of the following 1 hosts:
      myhost
      CDISC0055I The system is resetting the NameServiceUrl property of the instance because the distributed name service is not running.
      CDISC0004I The sample@streamsadmin instance was stopped.
      [streamsadmin@myhost RegularExpression]$ streamtool rminstance -i sample
      CDISC1008W Do you want to remove the sample@streamsadmin instance? Enter "y" to continue or "n" to cancel: y
      CDISC0018I The cleanup of the log files of the sample@streamsadmin instance was successful.
      CDISC0005I The sample@streamsadmin instance was removed.
      [streamsadmin@myhost RegularExpression]$


      I submitted the application but 4 PEs in 18 PEs are Unhealthy & Running status. Until last week this application had ran in all PEs healthy...
      There is 2 questions I would like to ask.
      1) How do Streams management processes determine Healthy/Unhealthy?
      2) At Unhealthy but Running PE status, the application can run?

      I appreciate your support.

      Thanks,

      Thanks,
  • Kevin_Foster
    Kevin_Foster
    98 Posts
    ACCEPTED ANSWER

    Re: PE is unhealthy, got NameService::not_found error

    ‏2013-02-25T15:16:25Z  in response to Saruton
    Can you confirm that your last operator in the 1st job is producing tuples, and whether the first operator of your 2nd job is receiving those tuples?

    I would also recheck the schemas of the export and import statements, although I'd be surprised if that was the problem.

    -Kevin
    • Saruton
      Saruton
      111 Posts
      ACCEPTED ANSWER

      Re: PE is unhealthy, got NameService::not_found error

      ‏2013-02-27T02:02:30Z  in response to Kevin_Foster
      Thanks for your continuous support!
      The error log is in the situation when one application is running independently.

      Today, the application runs successfully without unhealthy operators. Hmm...I haven't change SPL codes...

      I'm going to close this question tentatively, but I would like to ask one question. What do Streams management processes determine the situation (Healthy or Unhealthy) by?
      • SystemAdmin
        SystemAdmin
        1245 Posts
        ACCEPTED ANSWER

        Re: PE is unhealthy, got NameService::not_found error

        ‏2013-02-27T04:00:45Z  in response to Saruton
        If all connections (of a PE) are connected alright, PE is healthy; otherwise, unhealthy.
        When a PE is unhealthy, it is running, but data may not flow well in certain ports.

        Jingdong
        • Saruton
          Saruton
          111 Posts
          ACCEPTED ANSWER

          Re: PE is unhealthy, got NameService::not_found error

          ‏2013-02-28T02:04:12Z  in response to SystemAdmin
          Thanks for your reply.
          I understood.