Topic
  • 7 replies
  • Latest Post - ‏2013-09-16T15:22:23Z by ENER_Faber_McMullen
ENER_Faber_McMullen
ENER_Faber_McMullen
18 Posts

Pinned topic No hadoopPath supplied

‏2013-09-11T16:00:02Z |

I have modified the sample Stream app ScanReadWrite to fit my needs. But, it will not compile. Here is Main.spl

use com.ibm.streams.bigdata.hdfs::*;
composite Main {
    graph
        stream<rstring afilename> filenamestream = HDFSDirectoryScan() {
            param
                directory: "hdfs://linuxvm:9000/input";
                useVersionOneApi:false;
         }
        stream<rstring writer, int32 counter>  inputFiles= HDFSFileSource(filenamestream) {
        param
        format: txt;
        useVersionOneApi:false;
        }
          // Have a filename for a multi-file context
          () as HdfsFileOutputFileParam = HDFSFileSink(inputFiles)   {
            param
                format: txt;
                useVersionOneApi:false;
                file: "hdfs://linuxvm:9000/output/usePlaceholders_%PEID_%PELAUNCHNUM_%FILENUM.txt";
          }
          // Use this exact filename..
          () as HdfsFileOutputFileParamPlain = HDFSFileSink(inputFiles)   {
            param
                format: txt;
                useVersionOneApi:false;
                file: "hdfs://linuxvm:9000/output/newNothingAdded.txt";
          }
       () as FileOutput= FileSink(filenamestream)   {
            param
                format: txt;
                file: "filenames.out";
                flush: 1u;
        }       
        () as ReadInput = FileSink(inputFiles) {
                param
                file: "readData.txt";
                format: txt;
                flush: 1u;
        }               
}

I also created the file hdfsconfig.txt and placed it into the etc directory relative to Main.spl. Here it is.

hdfsport=9000
hdfshost=linuxvm
hdfsuser=biadmin

I tried to compile the app. But, I get the following error.

---- SPL Build for project ScanReadWrite started ---- September 11, 2013 10:58:09 AM CDT

Building main composite: Main using build configuration: Distributed

/opt/ibm/InfoSphereStreams/bin/sc -M Main --output-directory=output/Main/Distributed --data-directory=data -t /opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.db:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.etl:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.messaging:/opt/ibm/InfoSphereStreams/toolkits/deprecated/com.ibm.streams.text_deprecated:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.timeseries:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.ssb.inet:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.financial:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.bigdata:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.rproject:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.cep:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.text:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.mining:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.geospatial:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.inet --no-toolkit-indexing --no-mixed-mode-preprocessing

Checking the constraints.
Creating the types.
Creating the functions.
Creating the operators.
CDISP9164E ERROR: No hadoopPath supplied
Main.spl:27:6: CDISP0232E ERROR: An error was encountered during the generation of the code for the filenamestream operator.
CDISP9164E ERROR: No hadoopPath supplied
Main.spl:32:9: CDISP0232E ERROR: An error was encountered during the generation of the code for the inputFiles operator.
CDISP9164E ERROR: No hadoopPath supplied
Creating the processing elements.
Main.spl:38:11: CDISP0232E ERROR: An error was encountered during the generation of the code for the HdfsFileOutputFileParam operator.

CDISP0092E ERROR: Because of previous compilation errors, the compile process cannot continue.

---- SPL Build for project ScanReadWrite completed in 4.779 seconds ----

Can someone tell me what the error message "CDISP9164E ERROR: No hadoopPath supplied" means? I have no idea what I'm doing.

  • Stan
    Stan
    76 Posts

    Re: No hadoopPath supplied

    ‏2013-09-11T21:16:29Z  

    Refer to this section of the documentation:  http://pic.dhe.ibm.com/infocenter/streams/v3r1/topic/com.ibm.swg.im.infosphere.streams.big-data-toolkit.doc/doc/configuringtoolkit.html

    HADOOP_PATH (and other variables) need to be set so the compiler can find the required HADOOP libraries and configuration informaiton.

  • ENER_Faber_McMullen
    ENER_Faber_McMullen
    18 Posts

    Re: No hadoopPath supplied

    ‏2013-09-12T14:26:42Z  
    • Stan
    • ‏2013-09-11T21:16:29Z

    Refer to this section of the documentation:  http://pic.dhe.ibm.com/infocenter/streams/v3r1/topic/com.ibm.swg.im.infosphere.streams.big-data-toolkit.doc/doc/configuringtoolkit.html

    HADOOP_PATH (and other variables) need to be set so the compiler can find the required HADOOP libraries and configuration informaiton.

    I set the following environment variables.

    export BIGINSIGHTS_HOME=/opt/ibm/biginsights
    export HADOOP_HOME=$BIGINSIGHTS_HOME/IHC/

    I tried compiling it again. But now, the C++ fails to compile.

    ---- SPL Build for project ScanReadWrite started ---- September 12, 2013 9:19:39 AM CDT

    Building main composite: Main using build configuration: Distributed

    /opt/ibm/InfoSphereStreams/bin/sc -M Main --output-directory=output/Main/Distributed --data-directory=data -t /opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.db:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.etl:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.messaging:/opt/ibm/InfoSphereStreams/toolkits/deprecated/com.ibm.streams.text_deprecated:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.timeseries:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.ssb.inet:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.financial:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.bigdata:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.rproject:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.cep:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.text:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.mining:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.geospatial:/opt/ibm/InfoSphereStreams/toolkits/com.ibm.streams.inet --no-toolkit-indexing --no-mixed-mode-preprocessing

    Checking the constraints.
    Creating the types.
    Creating the functions.
    Creating the operators.
    Creating the processing elements.
    Creating the application model.
    Building the binaries.
     [CXX-type] enum{txt,line,formatstring}
     [CXX-type] enum{line,file,txt}
     [CXX-type] tuple<rstring writer,int32 counter>
     [CXX-type] enum{csv,txt,bin,block,line}
     [CXX-type] tuple<rstring afilename>
     [CXX-operator] filenamestream
    /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h: In member function 'void* SPL::_Operator::filenamestream$OP::getHdfsPtr()':
    /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h:106: error: too many arguments to function 'void* hdfsConnectAsUser(const char*, tPort, const char*)'
    src/operator/filenamestream.cpp:180: error: at this point in file
    make: *** [build/operator/filenamestream.o] Error 1
    CDISP0141E ERROR: The compilation of the generated code failed.

    ---- SPL Build for project ScanReadWrite completed in 13.234 seconds ----

    Is there a compatibility issue? I am using Streams 3.1 and BigInsights 2.0.

  • ENER_Faber_McMullen
    ENER_Faber_McMullen
    18 Posts

    Re: No hadoopPath supplied

    ‏2013-09-12T18:35:49Z  
    • Stan
    • ‏2013-09-11T21:16:29Z

    Refer to this section of the documentation:  http://pic.dhe.ibm.com/infocenter/streams/v3r1/topic/com.ibm.swg.im.infosphere.streams.big-data-toolkit.doc/doc/configuringtoolkit.html

    HADOOP_PATH (and other variables) need to be set so the compiler can find the required HADOOP libraries and configuration informaiton.

    I am also working on another SPL app that uses HDFSFileSink. Here is the operator.

    () as logConversation = HDFSFileSink(fullSchema) {//Added by Faber McMullen
                            param
                                file : "hdfs://linuxvm:9000/input/%FILENUM%TIME.txt" ;
                                format : txt ;
        }

    It compiles fine. But, when I run it, I get this error message.

    Exception in thread "Thread-13" java.lang.NoClassDefFoundError: org.apache.hadoop.conf.Configuration
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:434)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:660)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:626)
    Can't construct instance of class org.apache.hadoop.conf.Configuration
    WARNING: Unable to find resource bundle 'BigdataResource'.  Continuing with default messages.
    12 Sep 2013 13:26:41.697 [19502] ERROR #splapplog,J[0],P[0],logConversation,HdfsCommon M[logConversation.cpp:getHdfsPtr:296]  - Could not access HDFS file system on host linuxvm on  port 9,000.
    12 Sep 2013 13:26:41.700 [19502] ERROR #splapplog,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:logTerminatingException:1217]  - CDISR5033E: An exception occurred during the execution of the logConversation operator. Processing element number 0 is terminating.
    12 Sep 2013 13:26:41.701 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:instantiateOperators:463]  - CDISR5030E: An exception occurred during the execution of the logConversation operator. The exception is: Could not connect to HDFS
    12 Sep 2013 13:26:41.713 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:process:675]  - CDISR5079E: An exception occurred during the processing of the processing element. The error is: Could not connect to HDFS.
    12 Sep 2013 13:26:41.713 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:process:696]  - CDISR5053E: Runtime failures occurred in the following operators: logConversation.

    What should I do? I have no idea what I'm doing.

  • Stan
    Stan
    76 Posts

    Re: No hadoopPath supplied

    ‏2013-09-12T23:26:47Z  

    I am also working on another SPL app that uses HDFSFileSink. Here is the operator.

    () as logConversation = HDFSFileSink(fullSchema) {//Added by Faber McMullen
                            param
                                file : "hdfs://linuxvm:9000/input/%FILENUM%TIME.txt" ;
                                format : txt ;
        }

    It compiles fine. But, when I run it, I get this error message.

    Exception in thread "Thread-13" java.lang.NoClassDefFoundError: org.apache.hadoop.conf.Configuration
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:434)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:660)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:626)
    Can't construct instance of class org.apache.hadoop.conf.Configuration
    WARNING: Unable to find resource bundle 'BigdataResource'.  Continuing with default messages.
    12 Sep 2013 13:26:41.697 [19502] ERROR #splapplog,J[0],P[0],logConversation,HdfsCommon M[logConversation.cpp:getHdfsPtr:296]  - Could not access HDFS file system on host linuxvm on  port 9,000.
    12 Sep 2013 13:26:41.700 [19502] ERROR #splapplog,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:logTerminatingException:1217]  - CDISR5033E: An exception occurred during the execution of the logConversation operator. Processing element number 0 is terminating.
    12 Sep 2013 13:26:41.701 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:instantiateOperators:463]  - CDISR5030E: An exception occurred during the execution of the logConversation operator. The exception is: Could not connect to HDFS
    12 Sep 2013 13:26:41.713 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:process:675]  - CDISR5079E: An exception occurred during the processing of the processing element. The error is: Could not connect to HDFS.
    12 Sep 2013 13:26:41.713 [19502] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:process:696]  - CDISR5053E: Runtime failures occurred in the following operators: logConversation.

    What should I do? I have no idea what I'm doing.

    The error be3low indicates that the compiler is expecting a different version of Hadoop that is finds at HADOOP_HOME.  The signature (number or parameters) has change between Hadoop versions.

    ERROR: /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h: In member function 'void* SPL::_Operator::filenamestream$OP::getHdfsPtr()':
    /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h:106: error: too many arguments to function 'void* hdfsConnectAsUser(const char*, tPort, const char*)'

    To correct do not specify useVersionOneApi in your SPL HDFS operators-  it should not be needed

    From the docs:

    useVersionOneApi
    This parameter value has a boolean data type. If the parameter value is true, the operator uses Hadoop version 1.0.0. If the parameter value is false, the operator uses Hadoop version 0.20.2. If you do not specify this parameter, it defaults to the value true.
    If this parameter value does not match the product that is installed in the location specified by the HADOOP_HOME environment variable, a compile-time error occurs.

     

    The following   error indicates classes in the Hadoop API are not being found.  Check the setting of HADOOP_HOME and permissions / access rights to the files there.

    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration

    It should be found in $HADOOP_HOME/ hadoop-core.jar

     

  • ENER_Faber_McMullen
    ENER_Faber_McMullen
    18 Posts

    Re: No hadoopPath supplied

    ‏2013-09-13T19:12:44Z  
    • Stan
    • ‏2013-09-12T23:26:47Z

    The error be3low indicates that the compiler is expecting a different version of Hadoop that is finds at HADOOP_HOME.  The signature (number or parameters) has change between Hadoop versions.

    ERROR: /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h: In member function 'void* SPL::_Operator::filenamestream$OP::getHdfsPtr()':
    /opt/ibm/biginsights/IHC//src/c++/libhdfs/hdfs.h:106: error: too many arguments to function 'void* hdfsConnectAsUser(const char*, tPort, const char*)'

    To correct do not specify useVersionOneApi in your SPL HDFS operators-  it should not be needed

    From the docs:

    useVersionOneApi
    This parameter value has a boolean data type. If the parameter value is true, the operator uses Hadoop version 1.0.0. If the parameter value is false, the operator uses Hadoop version 0.20.2. If you do not specify this parameter, it defaults to the value true.
    If this parameter value does not match the product that is installed in the location specified by the HADOOP_HOME environment variable, a compile-time error occurs.

     

    The following   error indicates classes in the Hadoop API are not being found.  Check the setting of HADOOP_HOME and permissions / access rights to the files there.

    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration

    It should be found in $HADOOP_HOME/ hadoop-core.jar

     

    Thanks for your help. With this HDFSFileSink, I was able to get it to put data into hadoop.

    use com.ibm.streams.bigdata.hdfs::*;

    () as logConversation = HDFSFileSink(JSONArray) {//Added by Faber McMullen
                            param
                                format : txt ;
                                file : "hdfs://linuxvm:9000/output/%FILENUM%TIME.txt" ;
        }

    Unfortunately, when I use the com.ibm.ssb.inet toolkit in this app, it doesn't work anymore. I added the following import and operator to the app.

    use com.ibm.ssb.inet.rest::*;

    () as outputJSON = HTTPTupleView(keywords as In2; CommaDelim as In4; JSONArray as In7) {
            window
                In2 : tumbling, punct();
                In4 : sliding, count(3000), count(1);
                In7 : sliding, count(3000), count(1);
            param
                port : 3421;
            config
                placement : partitionColocation("jetty3421");
        }

    Here is the error message I got.

    Exception in thread "Thread-7" java.lang.NoClassDefFoundError: org.apache.hadoop.conf.Configuration
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:434)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:660)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:626)
    Can't construct instance of class org.apache.hadoop.conf.Configuration
    13 Sep 2013 14:08:20.837 [20578] ERROR #splapplog,J[0],P[0],logConversation,HdfsCommon M[logConversation.cpp:getHdfsPtr:296]  - Could not access HDFS file system on host linuxvm on  port 9,000.
    13 Sep 2013 14:08:20.892 [20578] ERROR #splapplog,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:logTerminatingException:1217]  - CDISR5033E: An exception occurred during the execution of the logConversation operator. Processing element number 0 is terminating.
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:instantiateOperators:463]  - CDISR5030E: An exception occurred during the execution of the logConversation operator. The exception is: Could not connect to HDFS
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:process:675]  - CDISR5079E: An exception occurred during the processing of the processing element. The error is: Could not connect to HDFS.
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:process:696]  - CDISR5053E: Runtime failures occurred in the following operators: logConversation.

    I am thinking that somehow the com.ibm.ssb.inet toolkit disables the hadoop package.

  • Kevin_Foster
    Kevin_Foster
    98 Posts

    Re: No hadoopPath supplied

    ‏2013-09-14T19:32:44Z  

    Thanks for your help. With this HDFSFileSink, I was able to get it to put data into hadoop.

    use com.ibm.streams.bigdata.hdfs::*;

    () as logConversation = HDFSFileSink(JSONArray) {//Added by Faber McMullen
                            param
                                format : txt ;
                                file : "hdfs://linuxvm:9000/output/%FILENUM%TIME.txt" ;
        }

    Unfortunately, when I use the com.ibm.ssb.inet toolkit in this app, it doesn't work anymore. I added the following import and operator to the app.

    use com.ibm.ssb.inet.rest::*;

    () as outputJSON = HTTPTupleView(keywords as In2; CommaDelim as In4; JSONArray as In7) {
            window
                In2 : tumbling, punct();
                In4 : sliding, count(3000), count(1);
                In7 : sliding, count(3000), count(1);
            param
                port : 3421;
            config
                placement : partitionColocation("jetty3421");
        }

    Here is the error message I got.

    Exception in thread "Thread-7" java.lang.NoClassDefFoundError: org.apache.hadoop.conf.Configuration
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:434)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:660)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:626)
    Can't construct instance of class org.apache.hadoop.conf.Configuration
    13 Sep 2013 14:08:20.837 [20578] ERROR #splapplog,J[0],P[0],logConversation,HdfsCommon M[logConversation.cpp:getHdfsPtr:296]  - Could not access HDFS file system on host linuxvm on  port 9,000.
    13 Sep 2013 14:08:20.892 [20578] ERROR #splapplog,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:logTerminatingException:1217]  - CDISR5033E: An exception occurred during the execution of the logConversation operator. Processing element number 0 is terminating.
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:instantiateOperators:463]  - CDISR5030E: An exception occurred during the execution of the logConversation operator. The exception is: Could not connect to HDFS
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_pe M[PEImpl.cpp:process:675]  - CDISR5079E: An exception occurred during the processing of the processing element. The error is: Could not connect to HDFS.
    13 Sep 2013 14:08:20.893 [20578] ERROR #splapptrc,J[0],P[0],logConversation,spl_operator M[PEImpl.cpp:process:696]  - CDISR5053E: Runtime failures occurred in the following operators: logConversation.

    I am thinking that somehow the com.ibm.ssb.inet toolkit disables the hadoop package.

    Please confirm that you are running in Distributed mode, and that you have separate PE's for each of HDFSFileSink and HTTPTupleView operators?

    I thought also that I read once upon a time about someone having Java conflicts within the same job, and they solved it with an Export and Import between jobs to get needed separation. But sorry, can't find that information either on-line or in my brain anywhere now...

    -Kevin

  • ENER_Faber_McMullen
    ENER_Faber_McMullen
    18 Posts

    Re: No hadoopPath supplied

    ‏2013-09-16T15:22:23Z  

    Please confirm that you are running in Distributed mode, and that you have separate PE's for each of HDFSFileSink and HTTPTupleView operators?

    I thought also that I read once upon a time about someone having Java conflicts within the same job, and they solved it with an Export and Import between jobs to get needed separation. But sorry, can't find that information either on-line or in my brain anywhere now...

    -Kevin

    Up until now, I have been using standalone mode. I tried it in distributed mode, and it worked. Thanks for the help.