Topic
4 replies Latest Post - ‏2014-08-12T17:13:03Z by RichardBrieck
RichardBrieck
RichardBrieck
17 Posts
ACCEPTED ANSWER

Pinned topic How do you change the CDC DataStage output record delimiter?

‏2014-07-28T21:58:20Z |

Hello,

I need to change the output record delimiter for CDC DataStage 10.2.  We have column data with embedded  ^M,Linefeeds.  We've successfully used the java user exist for formatting and changed field separators and column delimiters.  We see no place to handle the record delimiter in the java sample exit.  If we can't change it,  a row of data gets split in two.  My java programmer says some other DataStage code apart from the sample exit changes it and he doesn' thave access to it.  Does anyone know how to do this?

Thanks,

Richard

  • Rphilo
    Rphilo
    340 Posts
    ACCEPTED ANSWER

    Re: How do you change the CDC DataStage output record delimiter?

    ‏2014-07-29T18:57:32Z  in response to RichardBrieck

    Richard

    Do you need to retain the embedded linefeeds in the data? if not then you could have a derived column with the linefeeds stripped out. You could use he %REPLACE expression if you can find a way of expressing the linefeeds, or use a user exit in the derived expression otherwise.

    I cannot see a way of changing the record delimiter in the user exit formatter. If you need to retain the embedded linefeeds you might need to resort to replacing the embedded linefeeds with some special string in a derived column and then having logic in Datastage to replace the special string with the linefeeds at the point of the final update. An alternative to replacing the linefeeds would be to strip them out and record where they had been stripped out in a second derived column.

    Regards

    Robert

    • RichardBrieck
      RichardBrieck
      17 Posts
      ACCEPTED ANSWER

      Re: How do you change the CDC DataStage output record delimiter?

      ‏2014-07-31T18:58:08Z  in response to Rphilo

      Robert,

      In this case we don't need to keep the linefeeds.    I am using the CDC Management console.  How would I implement this?  I don't see much in the documentation about coding with these expressions and where I would enter them on the screen.

      I would like to do two things:

      1) Remove the linefeeds from the column data and replicate the edited data.

      2) Sometimes I would just like to null out the data in the column and send that.

      Our target is a data stage flat file.

       

      Thanks,

      Richard

      • Rphilo
        Rphilo
        340 Posts
        ACCEPTED ANSWER

        Re: How do you change the CDC DataStage output record delimiter?

        ‏2014-07-31T22:53:18Z  in response to RichardBrieck

        Richard

        You would create a  derived column to have an expression evaluated. As your target is DataStage you cannot use a derived expression mapped directly to a target column, so you have to use a derived column evaluated on the source. This is described in the documentation (on-line and downloaded from Passport Advantage).

        http://pic.dhe.ibm.com/infocenter/iidr/v10r2m0/index.jsp

        Here is the section on creating a derived column.

        Adding and mapping derived columns to target columns

        Derived columns let you move the processing of an expression from the target instance to the source instance.

        For example, you may have already defined an expression that concatenates the values of two source columns, FIRSTNAME and LASTNAME, and mapped this expression to a target column named called FULLNAME. When you start replication, InfoSphere® CDC evaluates the expression on the target instance and stores the results in the FULLNAME target column.

        However, it may become necessary in your environment to move the processing of this expression to the source instance. Using this same example, you can build a derived column on the source table named FULLNAME and define an expression that concatenates the values of the two source columns FIRSTNAME and LASTNAME. You can then map the derived column named FULLNAME to the target column named FULLNAME. When you start replication, InfoSphere CDC evaluates the expression on the source instance and sends the results to the target column.

        You can also create a derived column to:

        • Extract characters from string data by using functions such as %SUBSTRING, and then store the result in a derived column. For example, you can extract a person's initial from a column named FIRSTNAME, by using the expression SUBSTRING(FIRSTNAME,1,1).
        • Call a stored procedure that you have configured in a user exit program. You can specify an expression that contains a valid call to the %STPROC column manipulation function. If you are calling a stored procedure that is not owned by the InfoSphere CDC user, you must provide the name in the form <schema>.<stored procedure name>.
        • Retrieve information from a lookup table using %GETCOL. You can create a derived column on the source table using %GETCOL so that you can retrieve data from a lookup table. You can then map the source table using one-to-many consolidation.

        In each of these scenarios, you can then map the derived column to the appropriate target column. InfoSphere CDC will evaluate the expression on the source table and send the results to the target column.

        Many databases have column name length limitations, which can affect how some expressions, user exits, and derived columns are handled. Column name length limitations can cause InfoSphere CDC to describe a column alias to the target when the source column name length exceeds the column name length limit on the target. The restriction is 30 characters for most InfoSphere CDC compatible databases (Informix®, IBM® DB2® LUW, Oracle, or Sybase prior to version 15.0), with two exceptions:

        • Microsoft SQL Server - 128 characters
        • Sybase 15.0 and later - 255 characters

        Since the derived expression is evaluated each time a change is replicated to the target table, the complexity of the expression can affect overall performance.

        Here is the section on the %REPLACE expression

        Character substitution-%REPLACE

        Use this function when you want InfoSphere® CDC to replace leading, trailing, or all occurrences of a specified character with another character during replication. It provides a character-by-character replacement. You can use this function to replace leading blank characters with zeros.

        Syntax

        
        %REPLACE(
        
        <parm>
        , 
        
        "<type>"
        , 
        
        "<str1>"
        , 
        
        "<str2>"
        )
        

        Parameters

        • parm-Specifies a character column, literal, or column function that returns a character string. If parm is NULL, this function returns NULL.
        • type-Specifies the substitution type. You must enclose values of this parameter in double quotes.
          • *ALL-Replaces all occurrences of the specified character.
          • *TRAIL-Replaces all trailing occurrences of the specified character.
          • *LEAD-Replaces all leading occurrences of the specified character.
        • str1-Specifies the set of characters to be replaced.
        • str2-Specifies the set of characters to replace those specified in str1 . If the number of characters in str1 is greater than that in str2 , the extraneous characters in str1 are deleted in the result. If the number of characters in str1 is less than that in strthe extraneous characters in str2 are ignored. If you specify str2 as two consecutive double quotes ("") , you can remove instances of a character from str1.

        Result data type

        Character. Returns NULL if parm is NULL.

        Examples

        %REPLACE(CUSTNO, "*LEAD", " ", "0")

        Replaces all leading blank characters in the CUSTNO column with zeros.

        %REPLACE(CUSTNO, "*LEAD", "ABC", "123")

        Replaces all leading occurrences of "A" with "1", "B" with "2", and "C" with "3" in the CUSTNO column. Evaluation begins with the first character and continues until a character other than "A", "B" or "C" is found. For example, if a column value is "AC7777", this example returns "137777". If a column value is "ADC7777", this example returns "1DC7777".

        %REPLACE(PHONENO, "*ALL", " ", "")

        Removes all blank characters in the PHONENO column. This example illustrates how the %REPLACE function can be used to remove a character from a string instead of replacing it with another character.

        %REPLACE(PHONENO, "*LEAD", " ", "")

        Removes all leading blank characters in the PHONENO column. This example is similar to the previous example, except that only leading blank characters are removed. To remove leading blank characters, you can also use the %LTRIM function. To remove trailing blank characters, use the %RTRIM function.

        %REPLACE(PARTNO, "*TRAIL", "ACY", "acy")

        Replaces all trailing occurrences of "A" with "a", "C" with "c", and "Y" with "y" in the PARTNO column. Evaluation begins with the last character and continues until a character other than "A", "C" or "Y" is found. For example, if a column value is "2361ACY", this example returns "2361acy". If a column value is "2361ADY", this example returns "2361ADy".

        %REPLACE("259899", "*ALL", "29", "4")

        Returns "458". Replaces all occurrences of "2" with "4", and removes all occurrences of "9".

        %REPLACE("259899", "*ALL", "2", "45")

        Returns "459899". Replaces all occurrences of "2" with "4". Does not remove occurrences of "5".

        %REPLACE(PRODDESC, "*ALL", "<<0>><<13>>", "<<32>><<32>>")

        On platform that use the ASCII character set, this functions replaces all occurrences of NUL and carriage return in the PRODDESC column with blank characters.

        -----------------------------------------------

        As you can see there is a very useful example for stripping out the character you want to suppress and have the other characters replicated.

        Converting the whole column to null if it contains an embedded linefeed is more tricky as there is not an expression to test for the occurrence of a given character. You would need to define a user exit to do this, called in an expression with %USERFUNC for example.

        Regards

        Robert

         

         

         

         

         

         

         

        • RichardBrieck
          RichardBrieck
          17 Posts
          ACCEPTED ANSWER

          Re: How do you change the CDC DataStage output record delimiter?

          ‏2014-08-12T17:13:03Z  in response to Rphilo

          Thanks Robert.  The solution using a derived column and the REPLACE function did  fine.

          I used the expression

          %REPLACE(DESC_TEXT,"*ALL","<<10>>"," ")

          to remove the linefeeds.

           

          Richard