String functions

  • The target variable for each string function must be a string and must have already been declared (see STRING).
  • Multiple arguments in a list must be separated by commas.
  • When two strings are compared, the case in which they are entered is significant. The LOWER and UPCASE functions are useful for making comparisons of strings regardless of case.
  • String functions that include a byte position or count argument or return a byte position or count may return different results in Unicode mode than in code page mode. For example, é is one byte in code page mode but is two bytes in Unicode mode; so résumé is six bytes in code page mode and eight bytes in Unicode mode.
  • In Unicode mode, trailing blanks are always removed from the values of string variables in string functions unless explicitly preserved with the NTRIM function.
  • In code page mode, trailing blanks are always preserved in the values of string variables unless explicitly removed with the RTRIM function.

For more information on Unicode mode, see UNICODE Subcommand.

CHAR.INDEX. CHAR.INDEX(haystack, needle[, divisor]). Numeric. Returns a number indicating the character position of the first occurrence of needle in haystack. The optional third argument, divisor, is a number of characters used to divide needle into separate strings. Each substring is used for searching and the function returns the first occurrence of any of the substrings. For example, CHAR.INDEX(var1, 'abcd') will return the value of the starting position of the complete string "abcd" in the string variable var1; CHAR.INDEX(var1, 'abcd', 1) will return the value of the position of the first occurrence of any of the values in the string; and CHAR.INDEX(var1, 'abcd', 2) will return the value of the first occurrence of either "ab" or "cd". Divisor must be a positive integer and must divide evenly into the length of needle. Returns 0 if needle does not occur within haystack.

CHAR.LENGTH. CHAR.LENGTH(strexpr). Numeric. Returns the length of strexpr in characters, with any trailing blanks removed.

CHAR.LPAD. CHAR.LPAD(strexpr1,length[,strexpr2]). String. Left-pads strexpr1 to make its length the value specified by length using as many complete copies as will fit of strexpr2 as the padding string. The value of length represents the number of characters and must be a positive integer. If the optional argument strexpr2 is omitted, the value is padded with blank spaces.

CHAR.MBLEN. CHAR.MBLEN(strexpr,pos). Numeric. Returns the number of bytes in the character at character position pos of strexpr.

CHAR.RINDEX. CHAR.RINDEX(haystack,needle[,divisor]). Numeric. Returns an integer that indicates the starting character position of the last occurrence of the string needle in the string haystack. The optional third argument, divisor, is the number of characters used to divide needle into separate strings. For example, CHAR.RINDEX(var1, 'abcd') will return the starting position of the last occurrence of the entire string "abcd" in the variable var1; CHAR.RINDEX(var1, 'abcd', 1) will return the value of the position of the last occurrence of any of the values in the string; and CHAR.RINDEX(var1, 'abcd', 2) will return the value of the starting position of the last occurrence of either "ab" or "cd". Divisor must be a positive integer and must divide evenly into the length of needle. If needle is not found, the value 0 is returned.

CHAR.RPAD. CHAR.RPAD(strexpr1,length[,strexpr2]). String. Right-pads strexpr1 with strexpr2 to extend it to the length given by length using as many complete copies as will fit of strexpr2 as the padding string. The value of length represents the number of characters and must be a positive integer. The optional third argument strexpr2 is a quoted string or an expression that resolves to a string. If strepxr2 is omitted, the value is padded with blanks.

CHAR.SUBSTR. CHAR.SUBSTR(strexpr,pos[,length]). String. Returns the substring beginning at character position pos of strexpr. The optional third argument represents the number of characters in the substring. If the optional argument length is omitted, returns the substring beginning at character position pos of strexpr and running to the end of strexpr. For example CHAR.SUBSTR('abcd', 2) returns 'bcd' and CHAR.SUBSTR('abcd', 2, 2) returns 'bc'. (Note: Use the SUBSTR function instead of CHAR.SUBSTR if you want to use the function on the left side of an equals sign to replace a substring.)

CONCAT. CONCAT(strexpr,strexpr[,..]). String. Returns a string that is the concatenation of all its arguments, which must evaluate to strings. This function requires two or more arguments. In code page mode, if strexpr is a string variable, use RTRIM if you only want the actual string value without the right-padding to the defined variable width. For example, CONCAT(RTRIM(stringvar1), RTRIM(stringvar2)).

LENGTH. LENGTH(strexpr). Numeric. Returns the length of strexpr in bytes, which must be a string expression. For string variables, in Unicode mode this is the number of bytes in each value, excluding trailing blanks, but in code page mode this is the defined variable length, including trailing blanks. To get the length (in bytes) without trailing blanks in code page mode, use LENGTH(RTRIM(strexpr)).

LOWER. LOWER(strexpr). String. Returns strexpr with uppercase letters changed to lowercase and other characters unchanged. The argument can be a string variable or a value. For example, LOWER(name1) returns charles if the value of name1 is Charles.

LTRIM. LTRIM(strexpr[,char]). String. Returns strexpr with any leading instances of char removed. If char is not specified, leading blanks are removed. Char must resolve to a single character.

MAX. MAX(value,value[,..]). Numeric or string. Returns the maximum value of its arguments that have valid values. This function requires two or more arguments. For numeric values, you can specify a minimum number of valid arguments for this function to be evaluated.

MIN. MIN(value,value[,..]). Numeric or string. Returns the minimum value of its arguments that have valid, nonmissing values. This function requires two or more arguments. For numeric values, you can specify a minimum number of valid arguments for this function to be evaluated.

MBLEN.BYTE. MBLEN.BYTE(strexpr,pos). Numeric. Returns the number of bytes in the character at byte position pos of strexpr.

NORMALIZE. NORMALIZE(strexp). String. Returns the normalized version of strexp. In Unicode mode, it returns Unicode NFC. In code page mode, it has no effect and returns strexp unmodified. The length of the result may be different from the length of the input.

NTRIM. NTRIM(varname). Returns the value of varname, without removing trailing blanks. The value of varname must be a variable name; it cannot be an expression.

REPLACE. REPLACE(a1, a2, a3[, a4]). String. In a1, instances of a2 are replaced with a3. The optional argument a4 specifies the number of occurrences to replace; if a4 is omitted, all occurrences are replaced. Arguments a1, a2, and a3 must resolve to string values (literal strings enclosed in quotes or string variables), and the optional argument a4 must resolve to a non-negative integer. For example, REPLACE("abcabc", "a", "x") returns a value of "xbcxbc" and REPLACE("abcabc", "a", "x", 1) returns a value of "xbcabc".

RTRIM. RTRIM(strexpr[,char]). String. Trims trailing instances of char within strexpr. The optional second argument char is a single quoted character or an expression that yields a single character. If char is omitted, trailing blanks are trimmed.

STRUNC. STRUNC(strexp, length). String. Returns strexp truncated to length (in bytes) and then trimmed of any trailing blanks. Truncation removes any fragment of a character that would be truncated.

UPCASE. UPCASE(strexpr). String. Returns strexpr with lowercase letters changed to uppercase and other characters unchanged.

Deprecated string functions

The following functions provide functionality similar to the newer CHAR functions, but they operate at the byte level rather than the character level. For example, the INDEX function returns the byte position of needle within haystack, whereas CHAR.INDEX returns the character position. These functions are supported primarily for compatibility with previous releases.

INDEX. INDEX(haystack,needle[,divisor]). Numeric. Returns a number that indicates the byte position of the first occurrence of needle in haystack. The optional third argument, divisor, is a number of bytes used to divide needle into separate strings. Each substring is used for searching and the function returns the first occurrence of any of the substrings. Divisor must be a positive integer and must divide evenly into the length of needle. Returns 0 if needle does not occur within haystack.

LPAD. LPAD(strexpr1,length[,strexpr2]). String. Left-pads strexpr1 to make its length the value specified by length using as many complete copies as will fit of strexpr2 as the padding string. The value of length represents the number of bytes and must be a positive integer. If the optional argument strexpr2 is omitted, the value is padded with blank spaces.

RINDEX. RINDEX(haystack,needle[,divisor]). Numeric. Returns an integer that indicates the starting byte position of the last occurrence of the string needle in the string haystack. The optional third argument, divisor, is the number of bytes used to divide needle into separate strings. Divisor must be a positive integer and must divide evenly into the length of needle. If needle is not found, the value 0 is returned.

RPAD. RPAD(strexpr1,length[,strexpr2]). String. Right-pads strexpr1 with strexpr2 to extend it to the length given by length using as many complete copies as will fit of strexpr2 as the padding string. The value of length represents the number of bytes and must be a positive integer. The optional third argument strexpr2 is a quoted string or an expression that resolves to a string. If strepxr2 is omitted, the value is padded with blanks.

SUBSTR. SUBSTR(strexpr,pos[,length]). String. Returns the substring beginning at byte position pos of strexpr. The optional third argument represents the number of bytes in the substring. If the optional argument length is omitted, returns the substring beginning at byte position pos of strexpr and running to the end of strexpr. When used on the left side of an equals sign, the substring is replaced by the string specified on the right side of the equals sign. The rest of the original string remains intact. For example, SUBSTR(ALPHA6,3,1)='*' changes the third character of all values for ALPHA6 to *. If the replacement string is longer or shorter than the substring, the replacement is truncated or padded with blanks on the right to an equal length.

Example

STRING stringVar1 stringVar2 stringVar3 (A22).
COMPUTE stringVar1='  Does this'.
COMPUTE stringVar2='ting work?'.
COMPUTE stringVar3=
  CONCAT(RTRIM(LTRIM(stringVar1)), " ", 
  REPLACE(stringVar2, "ting", "thing")).
  • The CONCAT function concatenates the values of stringVar1 and stringVar2, inserting a space as a literal string (" ") between them.
  • The RTRIM function strips off trailing blanks from stringVar1. In code page mode, this is necessary to eliminate excessive space between the two concatenated string values because in code page mode all string variable values are automatically right-padded to the defined width of the string variables. In Unicode mode, this has no effect because trailing blanks are automatically removed from string variable values in Unicode mode.
  • The LTRIM function removes the leading spaces from the beginning of the value of stringVar1.
  • The REPLACE function replaces the misspelled "ting" with "thing" in stringVar2.

The final result is a string value of "Does this thing work?"

Example

This example extracts the numeric components from a string telephone number into three numeric variables.

DATA LIST FREE (",") /telephone (A16).
BEGIN DATA
111-222-3333
222 - 333 - 4444
333-444-5555
444 - 555-6666
555-666-0707
END DATA.
STRING #telstr(A16).
COMPUTE #telstr = telephone.
VECTOR tel(3,f4).
LOOP #i = 1 to 2.
- COMPUTE #dash = CHAR.INDEX(#telstr,"-").
- COMPUTE tel(#i) = NUMBER(CHAR.SUBSTR(#telstr,1,#dash-1),f10).
- COMPUTE #telstr = CHAR.SUBSTR(#telstr,#dash+1).
END LOOP.
COMPUTE tel(3) = NUMBER(#telstr,f10).
EXECUTE.
FORMATS tel1 tel2 (N3) tel3 (N4).
  • A temporary (scratch) string variable, #telstr, is declared and set to the value of the original string telephone number.
  • The VECTOR command creates three numeric variables--tel1, tel2, and tel3--and creates a vector containing those variables.
  • The LOOP structure iterates twice to produce the values for tel1 and tel2.
  • COMPUTE #dash = CHAR.INDEX(#telstr,"-") creates another temporary variable, #dash, that contains the position of the first dash in the string value.
  • On the first iteration, COMPUTE tel(#i) = NUMBER(CHAR.SUBSTR(#telstr,1,#dash-1),f10) extracts everything prior to the first dash, converts it to a number, and sets tel1 to that value.
  • COMPUTE #telstr = CHAR.SUBSTR(#telstr,#dash+1) then sets #telstr to the remaining portion of the string value after the first dash.
  • On the second iteration, COMPUTE #dash... sets #dash to the position of the “first” dash in the modified value of #telstr. Since the area code and the original first dash have been removed from #telstr, this is the position of the dash between the exchange and the number.
  • COMPUTE tel(#)... sets tel2 to the numeric value of everything up to the “first” dash in the modified version of #telstr, which is everything after the first dash and before the second dash in the original string value.
  • COMPUTE #telstr... then sets #telstr to the remaining segment of the string value--everything after the “first” dash in the modified value, which is everything after the second dash in the original value.
  • After the two loop iterations are complete, COMPUTE tel(3) = NUMBER(#telstr,f10) sets tel3 to the numeric value of the final segment of the original string value.