Topic
  • 8 replies
  • Latest Post - ‏2013-08-28T01:23:50Z by liqing063
liqing063
liqing063
5 Posts

Pinned topic question about language

‏2013-08-19T07:52:33Z |

hi all,

I am a new comer to Infosphere Streams. When I was told to do a new project, I met a problem.

I wonder if theStreams can read a csv file written by Chinese. If so , which charactor should I use, rstring or ustring or anyother else?

Thank you so much for your help!

  • BruceGlassford
    BruceGlassford
    71 Posts

    Re: question about language

    ‏2013-08-19T08:47:44Z  

    Depends on the source character set encoding of the file. For UTF8, use rstring.  For UTF16 use ustring.  

  • liqing063
    liqing063
    5 Posts

    Re: question about language

    ‏2013-08-20T05:19:30Z  

    Depends on the source character set encoding of the file. For UTF8, use rstring.  For UTF16 use ustring.  

    Thanks for your response!

    When I used rstring for Chinese charactor, It had alert showed. I wonder if rstring has the limitation of charactor amount ?

  • DanDebrunner
    DanDebrunner
    6 Posts

    Re: question about language

    ‏2013-08-20T20:38:49Z  
    • liqing063
    • ‏2013-08-20T05:19:30Z

    Thanks for your response!

    When I used rstring for Chinese charactor, It had alert showed. I wonder if rstring has the limitation of charactor amount ?

    Make sure you set the encoding parameter of the FileSource operator to match the encoding of your source file.

    http://pic.dhe.ibm.com/infocenter/streams/v3r1/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.spl-standard-toolkit-reference.doc%2Fdoc%2Ffilesource.html

     

    >When I used rstring for Chinese charactor, It had alert showed.

    Can you explain more on what you were trying to do and what alert you saw.

  • liqing063
    liqing063
    5 Posts

    Re: question about language

    ‏2013-08-23T08:13:56Z  

    Make sure you set the encoding parameter of the FileSource operator to match the encoding of your source file.

    http://pic.dhe.ibm.com/infocenter/streams/v3r1/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.spl-standard-toolkit-reference.doc%2Fdoc%2Ffilesource.html

     

    >When I used rstring for Chinese charactor, It had alert showed.

    Can you explain more on what you were trying to do and what alert you saw.

    Hi Dan,

    I input a set of Chinese Charactors as a string, and the tuple datatype is set up to rstring ,ustring or rsting[100], none of these type could read the Chinese Charactors.

    I wonder if it still need other setup for reading these charactors?

  • liqing063
    liqing063
    5 Posts

    Re: question about language

    ‏2013-08-23T08:15:03Z  

    Make sure you set the encoding parameter of the FileSource operator to match the encoding of your source file.

    http://pic.dhe.ibm.com/infocenter/streams/v3r1/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.spl-standard-toolkit-reference.doc%2Fdoc%2Ffilesource.html

     

    >When I used rstring for Chinese charactor, It had alert showed.

    Can you explain more on what you were trying to do and what alert you saw.

    Hi Dan,

    I input a set of Chinese Charactors as a string, and the tuple datatype is set up to rstring ,ustring or rsting[100], none of these type could read the Chinese Charactors.

    I wonder if it still need other setup for reading these charactors?

  • DanDebrunner
    DanDebrunner
    6 Posts

    Re: question about language

    ‏2013-08-23T14:04:49Z  
    • liqing063
    • ‏2013-08-23T08:15:03Z

    Hi Dan,

    I input a set of Chinese Charactors as a string, and the tuple datatype is set up to rstring ,ustring or rsting[100], none of these type could read the Chinese Charactors.

    I wonder if it still need other setup for reading these charactors?

    Can you provide the SPL code you are running and explain exactly what you are doing.

    When you "input a set of  Chinese Charactors as a string" I'm not sure what you are doing.

      Are you using a FileSource and the Chinese characters are in a csv file? If so you need to ensure the encoding parameter of the FileSource operator matches the encoding of the csv file, and then the rstring type will be a sequence of UTF-8 bytes.

      Are you trying to have an SPL literal string in a source file? If so there are two ways to create a string literal with Chinese characters. The first is to use an editor that supports Chinese characters and ensure that the source file is saved using UTF-8 encoding. The second is to use Unicode escape characters in string literals:

    String literals can use escape sequences of the form \uhhhh, where the four hexadecimal digits hhhh specify a character. For example, "pi\u00f1ata"u uses the escape \u00f1 to specify a ñ with a tilde on top in a ustring.

    http://pic.dhe.ibm.com/infocenter/streams/v3r1/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.spl-language-specification.doc%2Fdoc%2Fprimitivetypes.html

    "none of these type could read the Chinese Charactors"

    What do you mean by "read" here. Are you getting an error?

  • Kevin_Foster
    Kevin_Foster
    98 Posts

    Re: question about language

    ‏2013-08-23T21:18:43Z  

    Can you provide the SPL code you are running and explain exactly what you are doing.

    When you "input a set of  Chinese Charactors as a string" I'm not sure what you are doing.

      Are you using a FileSource and the Chinese characters are in a csv file? If so you need to ensure the encoding parameter of the FileSource operator matches the encoding of the csv file, and then the rstring type will be a sequence of UTF-8 bytes.

      Are you trying to have an SPL literal string in a source file? If so there are two ways to create a string literal with Chinese characters. The first is to use an editor that supports Chinese characters and ensure that the source file is saved using UTF-8 encoding. The second is to use Unicode escape characters in string literals:

    String literals can use escape sequences of the form \uhhhh, where the four hexadecimal digits hhhh specify a character. For example, "pi\u00f1ata"u uses the escape \u00f1 to specify a ñ with a tilde on top in a ustring.

    http://pic.dhe.ibm.com/infocenter/streams/v3r1/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.streams.spl-language-specification.doc%2Fdoc%2Fprimitivetypes.html

    "none of these type could read the Chinese Charactors"

    What do you mean by "read" here. Are you getting an error?

    As an example, this program works fine for me:

     

    namespace com.mycompany.projectA ;

    composite Main
    {
        graph
            (stream<ustring firstAttribute, ustring secondAttribute> SomeData)
                as FileSource_1 = FileSource()
            {
                param
                    file : "input.csv" ;
                    format : csv ;
            }

            () as FileSink_2 = FileSink(SomeData)
            {
                param
                    file : "output.dat" ;
                    quoteStrings : true;
                    flush : 1u;
            }

    }
     

    takes this input file:

    hello, world

    你好,世界

    and produces this output as expected:

    "hello","world"

    "你好","世界"
     

    -Kevin

     

     

  • liqing063
    liqing063
    5 Posts

    Re: question about language

    ‏2013-08-28T01:23:50Z  

    As an example, this program works fine for me:

     

    namespace com.mycompany.projectA ;

    composite Main
    {
        graph
            (stream<ustring firstAttribute, ustring secondAttribute> SomeData)
                as FileSource_1 = FileSource()
            {
                param
                    file : "input.csv" ;
                    format : csv ;
            }

            () as FileSink_2 = FileSink(SomeData)
            {
                param
                    file : "output.dat" ;
                    quoteStrings : true;
                    flush : 1u;
            }

    }
     

    takes this input file:

    hello, world

    你好,世界

    and produces this output as expected:

    "hello","world"

    "你好","世界"
     

    -Kevin

     

     

    Hi all,

    Thanks for all of your help!

    I have set up my project like Kevin's, and it works!

    Best wishes!