Topic
  • 23 replies
  • Latest Post - ‏2012-04-10T16:18:48Z by SystemAdmin
victor42
victor42
44 Posts

Pinned topic TCPSource and packet parsing

‏2012-04-04T19:57:09Z |
Hi All,
I have couple of questions regarding parsing data in IP packet fetched with TCPSource...
The packet format looks like the following:
uint32 timestamp,
char12 tag, // - 12 bytes string
uint32 packetSize, - number of bytes in received packet
char[] payload // !! variable size payload

1. How can i read char12? if I "pack" char12 into "rstring", how would the system know to read only 12 bytes?
2. I only interested in fetching of timestamp, tag, and packetSze, so if I declare
stream<uint32 timestamp, rstring tag, uint32 size> fetch = TCPSource()
will it skip the rest of the packet and wait for the next packet to come ?

Thanks,
-V
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T21:04:29Z  
    Unfortunately, there isn't an easy answer to this one. TCPSource can read the data, from the string, but there is no way to get the format that you want. Supported formats are:
    • bin: internal SPL binary encoding (will read TCPSink format: bin output)
    • csv: comma separated values (in text)
    • txt: format like an SPL tuple: { a = 5, b = "hi"}
    • line: read one line of text into an rstring
    • block: read blockSize bytes into a blob

    The only one that works for you is block. You can read each block into a blob:
    
    stream<blob data> Input = TCPSource () 
    { param format : block; blockSize: 4096; 
    // address, port, etc 
    }
    


    After that you have a stream of blocks coming out of the operator. You will have to write a primitive operator (in C++ or Java) that will take the incoming data and convert that into SPL tuples:
    
    stream<uint32 ts, rstring[12] tag, uint32 size> Data = MyOperator(Input) 
    { 
    }
    


    This operator will have to pull apart the data in the blob, setting the various fields in a tuple, and then will call submit (otuple, 0); to send the tuple downstream. The only interesting part is that the blocks in the blob necessarily contain complete packets, so you will have to be able to save your state, and continue when the next blob arrives.

    You can write a particular operator that knows the attribute types of your output stream, and does something like:
    
    
    // Do the hard work to grab the values from the blob ... 
    // Now create the tuple and send downstream OPort0Tuple otuple (tsValue, tagValue, sizeValue); submit (otuple, 0);
    


    Mark
  • Jim Sharpe
    Jim Sharpe
    98 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T21:10:44Z  
    • mendell
    • ‏2012-04-04T21:04:29Z
    Unfortunately, there isn't an easy answer to this one. TCPSource can read the data, from the string, but there is no way to get the format that you want. Supported formats are:
    • bin: internal SPL binary encoding (will read TCPSink format: bin output)
    • csv: comma separated values (in text)
    • txt: format like an SPL tuple: { a = 5, b = "hi"}
    • line: read one line of text into an rstring
    • block: read blockSize bytes into a blob

    The only one that works for you is block. You can read each block into a blob:
    <pre class="jive-pre"> stream<blob data> Input = TCPSource () { param format : block; blockSize: 4096; // address, port, etc } </pre>

    After that you have a stream of blocks coming out of the operator. You will have to write a primitive operator (in C++ or Java) that will take the incoming data and convert that into SPL tuples:
    <pre class="jive-pre"> stream<uint32 ts, rstring[12] tag, uint32 size> Data = MyOperator(Input) { } </pre>

    This operator will have to pull apart the data in the blob, setting the various fields in a tuple, and then will call submit (otuple, 0); to send the tuple downstream. The only interesting part is that the blocks in the blob necessarily contain complete packets, so you will have to be able to save your state, and continue when the next blob arrives.

    You can write a particular operator that knows the attribute types of your output stream, and does something like:
    <pre class="jive-pre"> // Do the hard work to grab the values from the blob ... // Now create the tuple and send downstream OPort0Tuple otuple (tsValue, tagValue, sizeValue); submit (otuple, 0); </pre>

    Mark
    Although it's implied by the context, I think you meant to say "the blocks in the blob do not necessarily contain complete packets"
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T21:57:09Z  
    • mendell
    • ‏2012-04-04T21:04:29Z
    Unfortunately, there isn't an easy answer to this one. TCPSource can read the data, from the string, but there is no way to get the format that you want. Supported formats are:
    • bin: internal SPL binary encoding (will read TCPSink format: bin output)
    • csv: comma separated values (in text)
    • txt: format like an SPL tuple: { a = 5, b = "hi"}
    • line: read one line of text into an rstring
    • block: read blockSize bytes into a blob

    The only one that works for you is block. You can read each block into a blob:
    <pre class="jive-pre"> stream<blob data> Input = TCPSource () { param format : block; blockSize: 4096; // address, port, etc } </pre>

    After that you have a stream of blocks coming out of the operator. You will have to write a primitive operator (in C++ or Java) that will take the incoming data and convert that into SPL tuples:
    <pre class="jive-pre"> stream<uint32 ts, rstring[12] tag, uint32 size> Data = MyOperator(Input) { } </pre>

    This operator will have to pull apart the data in the blob, setting the various fields in a tuple, and then will call submit (otuple, 0); to send the tuple downstream. The only interesting part is that the blocks in the blob necessarily contain complete packets, so you will have to be able to save your state, and continue when the next blob arrives.

    You can write a particular operator that knows the attribute types of your output stream, and does something like:
    <pre class="jive-pre"> // Do the hard work to grab the values from the blob ... // Now create the tuple and send downstream OPort0Tuple otuple (tsValue, tagValue, sizeValue); submit (otuple, 0); </pre>

    Mark
    Mark,
    Thanks very much for the post. Considering that i do not know size of packet, what should be the size of "blob"?
    If the max length of ip packet is 65535 bytes, can it be:
    stream<blob data> Input = TCPSource () {
    param format : block;
    blockSize: 65535;
    // address, port, etc
    }

    e.g. will TCPSource read one packet at the time?

    Also, how can i cast "block" in native code?

    Thanks again,
    -V
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T22:03:30Z  
    • victor42
    • ‏2012-04-04T21:57:09Z
    Mark,
    Thanks very much for the post. Considering that i do not know size of packet, what should be the size of "blob"?
    If the max length of ip packet is 65535 bytes, can it be:
    stream<blob data> Input = TCPSource () {
    param format : block;
    blockSize: 65535;
    // address, port, etc
    }

    e.g. will TCPSource read one packet at the time?

    Also, how can i cast "block" in native code?

    Thanks again,
    -V
    An SPL blob contains a bunch of raw bytes and a length. The TCPSource that I mentioned would read the TCP connection 4K bytes at a time. It would take several tuples containing the blobs to get all of a 64K packet.

    In C++, the tuple would contain an attrbute of type SPL::blob. You can use member functions on this to find the length, and access the data within the blob, that you can then use to create the output tuple.

    A (perhaps better) alternate would be to create your own source operator that reads from a TCP socket, and generates tuples from that. That would allow you to do reads from the TCP connection of the right size, and not worry about having to combine blocks of data. The only tricky part in a source is remembering to check occasionally for PE shutdown, so that you can shutdown in a timely manner.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T22:12:26Z  
    • mendell
    • ‏2012-04-04T21:04:29Z
    Unfortunately, there isn't an easy answer to this one. TCPSource can read the data, from the string, but there is no way to get the format that you want. Supported formats are:
    • bin: internal SPL binary encoding (will read TCPSink format: bin output)
    • csv: comma separated values (in text)
    • txt: format like an SPL tuple: { a = 5, b = "hi"}
    • line: read one line of text into an rstring
    • block: read blockSize bytes into a blob

    The only one that works for you is block. You can read each block into a blob:
    <pre class="jive-pre"> stream<blob data> Input = TCPSource () { param format : block; blockSize: 4096; // address, port, etc } </pre>

    After that you have a stream of blocks coming out of the operator. You will have to write a primitive operator (in C++ or Java) that will take the incoming data and convert that into SPL tuples:
    <pre class="jive-pre"> stream<uint32 ts, rstring[12] tag, uint32 size> Data = MyOperator(Input) { } </pre>

    This operator will have to pull apart the data in the blob, setting the various fields in a tuple, and then will call submit (otuple, 0); to send the tuple downstream. The only interesting part is that the blocks in the blob necessarily contain complete packets, so you will have to be able to save your state, and continue when the next blob arrives.

    You can write a particular operator that knows the attribute types of your output stream, and does something like:
    <pre class="jive-pre"> // Do the hard work to grab the values from the blob ... // Now create the tuple and send downstream OPort0Tuple otuple (tsValue, tagValue, sizeValue); submit (otuple, 0); </pre>

    Mark
    Wouldn't i be able to read 20 bytes of text into my native operator?

    stream<line data> Input = TCPSource () {
    param format : block;
    blockSize: 4096;
    // address, port, etc
    }
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-04T22:30:30Z  
    • victor42
    • ‏2012-04-04T22:12:26Z
    Wouldn't i be able to read 20 bytes of text into my native operator?

    stream<line data> Input = TCPSource () {
    param format : block;
    blockSize: 4096;
    // address, port, etc
    }
    It sounds to me that your input is in binary, not text. You want to read 20 bytes into your native operator, and then use the packetsize information to tell you how much data to read and ignore, and then you can try reading the next packet.

    Mark
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-05T13:11:00Z  
    • mendell
    • ‏2012-04-04T22:30:30Z
    It sounds to me that your input is in binary, not text. You want to read 20 bytes into your native operator, and then use the packetsize information to tell you how much data to read and ignore, and then you can try reading the next packet.

    Mark
    can't i treat text as byte[] and pass it into the native code?
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-05T14:46:37Z  
    • mendell
    • ‏2012-04-04T22:30:30Z
    It sounds to me that your input is in binary, not text. You want to read 20 bytes into your native operator, and then use the packetsize information to tell you how much data to read and ignore, and then you can try reading the next packet.

    Mark
    Hi Mark,
    How would i implement alternative to TCPSource as Java operator?
    I looked into documentation and samples but didn't find specific info regarding getting data from socket...
    Seems as I have to implement
    public void process(StreamingInput<Tuple> port, Tuple tuple) in my operator...
    but how can i "connect" that input to actual logic of reading the socket?
    Some "skeleton" to give me some idea would be greatly appreciated.
    Thanks,
    -V
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-05T19:32:51Z  
    • victor42
    • ‏2012-04-05T13:11:00Z
    can't i treat text as byte[] and pass it into the native code?
    because format : text mode reads up to a newline, and then discards the newline. This won't work for you. You can treat an rstring as a set of bytes, but I don't think the contents of the rstring will be what you want.
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-05T19:37:27Z  
    • victor42
    • ‏2012-04-05T14:46:37Z
    Hi Mark,
    How would i implement alternative to TCPSource as Java operator?
    I looked into documentation and samples but didn't find specific info regarding getting data from socket...
    Seems as I have to implement
    public void process(StreamingInput<Tuple> port, Tuple tuple) in my operator...
    but how can i "connect" that input to actual logic of reading the socket?
    Some "skeleton" to give me some idea would be greatly appreciated.
    Thanks,
    -V
    That prototype is not the right one for a source. If you want to write a Java Source operator, look at the samples in
    Java Samples

    Connecting to a TCP socket will have to be done using the Java interfaces to TCP sockets. That is out of the scope of SPL/Streams.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T17:04:12Z  
    • mendell
    • ‏2012-04-05T19:37:27Z
    That prototype is not the right one for a source. If you want to write a Java Source operator, look at the samples in
    Java Samples

    Connecting to a TCP socket will have to be done using the Java interfaces to TCP sockets. That is out of the scope of SPL/Streams.
    any idea how to create operator descriptor, e.g. RandomBeacon.xml ?
    Tutorial omits this part but the sample wouldn't compile without it .
  • hnasgaard
    hnasgaard
    200 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T17:53:47Z  
    • victor42
    • ‏2012-04-06T17:04:12Z
    any idea how to create operator descriptor, e.g. RandomBeacon.xml ?
    Tutorial omits this part but the sample wouldn't compile without it .
    Refer to the SPL Operator Model Refeence. This should have all the information you need to complete the operator model.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T18:24:27Z  
    • hnasgaard
    • ‏2012-04-06T17:53:47Z
    Refer to the SPL Operator Model Refeence. This should have all the information you need to complete the operator model.
    When i try to define "Beackon - like" operator that takes no parameters, or input arguments ( see attached) i have the following error:

    ISP0180E Error while loading operator model, details: An operator with more than one input port cannot have a punctuation preserving output port without specifying the window punctuation port.
  • hnasgaard
    hnasgaard
    200 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T18:41:11Z  
    • victor42
    • ‏2012-04-06T18:24:27Z
    When i try to define "Beackon - like" operator that takes no parameters, or input arguments ( see attached) i have the following error:

    ISP0180E Error while loading operator model, details: An operator with more than one input port cannot have a punctuation preserving output port without specifying the window punctuation port.
    You have windowPunctuationOutputMode set to Preserving, which says it will preserve the punctuations from the input ports, of which there are none. Note that in the Beacon operator, it is set to Generating, which says it will generate punctuations for some condition. Since you are creating a source operator, you can specify either Free or Generating.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T18:47:48Z  
    • hnasgaard
    • ‏2012-04-06T17:53:47Z
    Refer to the SPL Operator Model Refeence. This should have all the information you need to complete the operator model.
    Never mind,
    I have CDISP0053E ERROR: Unknown identifier 'StatusSocketReader', if i try to call it from SPL
    composite MagicO {

    graph
    stream <rstring rawObservation> RawObservations = StatusSocketReader() {}
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T18:54:24Z  
    • hnasgaard
    • ‏2012-04-06T18:41:11Z
    You have windowPunctuationOutputMode set to Preserving, which says it will preserve the punctuations from the input ports, of which there are none. Note that in the Beacon operator, it is set to Generating, which says it will generate punctuations for some condition. Since you are creating a source operator, you can specify either Free or Generating.
    Sorry for hasty post. That is, I changed the name of my operator to StatusSocketReader everywhere, fixed problem wth punctuation in descriptor and it seems as this class cannot be found.
  • hnasgaard
    hnasgaard
    200 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T19:42:05Z  
    • victor42
    • ‏2012-04-06T18:47:48Z
    Never mind,
    I have CDISP0053E ERROR: Unknown identifier 'StatusSocketReader', if i try to call it from SPL
    composite MagicO {

    graph
    stream <rstring rawObservation> RawObservations = StatusSocketReader() {}
    Have you added a use statement for the namespace containing your operator?
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T20:23:19Z  
    • hnasgaard
    • ‏2012-04-06T19:42:05Z
    Have you added a use statement for the namespace containing your operator?
    Not sure. I've been using 'DirectoryList' sample as template and didn't see anything like it there.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T20:28:10Z  
    • hnasgaard
    • ‏2012-04-06T19:42:05Z
    Have you added a use statement for the namespace containing your operator?
    The 'use' stetement doesn't seem to change anything. Even if i put Java operator into the same namespace as SPL, i see the same error.
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-06T20:49:45Z  
    • hnasgaard
    • ‏2012-04-06T19:42:05Z
    Have you added a use statement for the namespace containing your operator?
    Java namespace doesn't even appear in 'toolkit.xml'
  • mendell
    mendell
    219 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-07T01:25:19Z  
    • victor42
    • ‏2012-04-06T20:49:45Z
    Java namespace doesn't even appear in 'toolkit.xml'
    Is the operator defined in the right directory? It has to be in a directory with the same name as the operator model XML file.

    For example:
    foo/foo.xml
  • victor42
    victor42
    44 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-09T14:00:44Z  
    • mendell
    • ‏2012-04-05T19:37:27Z
    That prototype is not the right one for a source. If you want to write a Java Source operator, look at the samples in
    Java Samples

    Connecting to a TCP socket will have to be done using the Java interfaces to TCP sockets. That is out of the scope of SPL/Streams.
    The samples are not helpful at all - there's no source for actual implementation.
    Should my socket listener be implemented as thread, as "daemon" thread?
  • SystemAdmin
    SystemAdmin
    1245 Posts

    Re: TCPSource and packet parsing

    ‏2012-04-10T16:18:48Z  
    • victor42
    • ‏2012-04-09T14:00:44Z
    The samples are not helpful at all - there's no source for actual implementation.
    Should my socket listener be implemented as thread, as "daemon" thread?
    The javadoc for the Java operator samples here:

    http://publib.boulder.ibm.com/infocenter/streams/v2r0/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.javadoc.samples.doc%2Fdoc%2Findex.html

    States:

    Samples are contained in
    $STREAMS_INSTALL/lib/com.ibm.streams.operator.samples.jar

    com.ibm.streams.operator.samples.jar contains the source for the samples. If you include it in the build path an Eclipse Java project then you should be able to see the Java source code of the Java samples.

    Java operators are also in this sample toolkit:

    $STREAMS_INSTALL/samples/spl/feature/JavaOperators

    This sample toolkit contains a Java primitive sample 'DirectoryLister' in the namespace 'sample'.
    Its operator model file is thus in
    sample/DirectoryLister/DirectoryLister.xml
    relative to the root of the toolkit.