C++ Native Functions: spl.string

This page documents native functions that can be invoked from SPL, including the SPL interfaces that can be used to invoke each of the native functions.

Functions

[N] public void appendM (mutable rstring[N] str, rstring value)

Append an rstring to a bounded rstring.

Parameters
str

The bounded rstring to be appended to.

value

The rstring to be appended.

[N,M] public void appendM (mutable rstring[N] str, rstring[M] value)

Append a bounded rstring to a bounded rstring.

Parameters
str

The bounded rstring to be appended to.

value

The bounded rstring to be appended.

public rstring concat (rstring lstr, rstring rstr)

Concatenate two rstring strings, returning an rstring.

Parameters
lstr

Input rstring.

rstr

Input rstring.

Returns

The concatenation of lstr to rstr.

[N] public rstring concat (rstring lstr, rstring[N] rstr)

Concatenate an rstring with a bounded rstring, returning an rstring.

Parameters
lstr

Input rstring.

rstr

Input bounded rstring.

Returns

The concatenation of lstr to rstr.

[N] public rstring concat (rstring[N] lstr, rstring rstr)

Concatenate a bounded rstring with an rstring, returning an rstring.

Parameters
lstr

Input bounded rstring.

rstr

Input rstring.

Returns

The concatenation of lstr to rstr.

[N,M] public rstring concat (rstring[N] lstr, rstring[M] rstr)

Concatenate two bounded rstring strings, returning an rstring.

Parameters
lstr

Input bounded rstring.

rstr

Input bounded rstring.

Returns

The concatenation of lstr to rstr.

public rstring convertFromBlob (blob b)

Convert a blob to an rstring.

Parameters
b

The blob to be converted.

Returns

An rstring with the characters from the blob.

public list<uint8> convertFromUtf8 (rstring str, rstring enc)

Convert a UTF-8 encoded string into data as raw bytes with a specified encoding.

Parameters
str

Input string (UTF-8 encoded).

enc

Data encoding scheme (eg "UTF-16") for the output.

Returns

Data as raw bytes with the specified encoding.

public rstring convertFromUtf8ToString (rstring str, rstring enc)

Convert a UTF-8 encoded string into a string with a specified encoding.

Parameters
str

Input string (UTF-8 encoded).

enc

Data encoding scheme (eg "UTF-16") for the output.

Returns

Data as an rstring with the specified encoding.

public blob convertToBlob (rstring str)

Convert an rstring to a blob.

Parameters
str

String to be converted.

Returns

A blob with the characters from the rstring.

public rstring convertToUtf8 (list<uint8> val, rstring enc)

Convert data given as raw bytes with a specified encoding into a UTF-8 encoded string.

Parameters
val

Input data.

enc

Input data encoding scheme (eg "UTF-16").

Returns

The UTF-8 encoded string representation.

[N] public rstring convertToUtf8 (list<uint8>[N] val, rstring enc)

Convert data given as raw bytes with a specified encoding into a UTF-8 encoded string.

Parameters
val

Input data.

enc

Input data encoding scheme (eg "UTF-16").

Returns

The UTF-8 encoded string representation.

public rstring convertToUtf8 (rstring val, rstring enc)

Convert data given as raw bytes with a specified encoding into a UTF-8 encoded string.

Parameters
val

Input data.

enc

Input data encoding scheme (eg "UTF-16").

Returns

The UTF-8 encoded string representation.

public list<rstring> csvTokenize (rstring str)

Tokenize a string containing comma-separated values.

Parameters
str

Input string in comma-separated values format, like "string1","string2","string3",...,"stringN".

Returns

A list of the tokens in the string, where the returned tokens are string literals (use interpretRStringLiteral() or interpretUStringLiteral() to interpret them).

public list<ustring> csvTokenize (ustring str)

Tokenize a string containing comma-separated values.

Parameters
str

Input string in comma-separated values format, like "string1","string2","string3",...,"stringN".

Returns

A list of the tokens in the string, where the returned tokens are string literals (use interpretRStringLiteral() or interpretUStringLiteral() to interpret them).

<string T> public int32 findFirst (T str, T ps)

Find a substring in a string, starting at index 0, where 0 is the first logical character in the input string.

Parameters
str

Input string.

ps

Substring to be found.

Returns

Index of the first match, -1 if no match.

<string T> public int32 findFirst (T str, T ps, int32 sidx)

Find a substring in a string.

Parameters
str

Input string.

ps

Substring to be found.

sidx

Start index of the search, where 0 is the first logical character in the input string.

Returns

Index of the first match, -1 if no match.

<string T> public int32 findFirstNotOf (T str, T ps)

Find a non-matching character in a string, starting at index 0, where 0 is the first logical character in the input string.

Parameters
str

Input string.

ps

List of characters to look for.

Returns

Index of the first non-match, -1 if all match.

<string T> public int32 findFirstNotOf (T str, T ps, int32 sidx)

Find a non-matching character in a string.

Parameters
str

Input string.

ps

List of characters to look for.

sidx

Start index of the search, where 0 is the first logical character in the input string.

Returns

Index of the first non-match, -1 if all match.

<string T> public int32 findFirstOf (T str, T ps)

Find a matching character in a string, starting at index 0, where 0 is the first logical character in the input string.

Parameters
str

Input string.

ps

List of characters to look for.

Returns

Index of the first match, -1 if no match.

<string T> public int32 findFirstOf (T str, T ps, int32 sidx)

Find a matching character in a string.

Parameters
str

Input string.

ps

List of characters to look for.

sidx

Start index of the search.

Returns

Index of the first match, -1 if no match, where 0 is the first logical character in the input string.

<string T> public int32 findLast (T str, T ps, int32 sidx)

Find a substring in a string.

Parameters
str

Input string.

ps

Substring to be found.

sidx

Index of the last character to be considered, where 0 is the first logical character in the input string.

Returns

Index of the last match, -1 if no match.

<string T> public int32 findLastNotOf (T str, T ps, int32 sidx)

Find the last non-matching character in a string.

Parameters
str

Input string.

ps

List of characters to look for.

sidx

Index of the last character to be considered, where 0 is the first logical character in the input string.

Returns

Index of the last non-match, -1 if all match.

<string T> public int32 findLastOf (T str, T ps, int32 sidx)

Find a matching character in a string.

Parameters
str

Input string.

ps

List of characters to look for.

sidx

Index of the last character to be considered, where 0 is the first logical character in the input string.

Returns

Index of the last match, -1 if no match.

public rstring interpretRStringLiteral (rstring str)

Interpret a string literal stored in a UTF-8 encoded string as an rstring.

Parameters
str

String to be interpreted.

Returns

Interpreted string, with escaped characters replaced with their raw values.

Throws
SPLRuntimeInvalidArgumentException

If the string is not enclosed within double quotes, or if the encoding has errors.

public ustring interpretUStringLiteral (rstring str)

Interpret a string literal stored in a UTF-8 encoded string as a ustring.

Parameters
str

String to be interpreted.

Returns

Interpreted string, with escaped characters replaced with their raw values.

Throws
SPLRuntimeInvalidArgumentException

If str is not enclosed within double quotes, or if the encoding has errors.

<string T> public int32 length (T str)

Get the length of a string (the number of raw bytes).

Parameters
str

Input string.

Returns

Length of the input string.

<string T> public T lower (T str)

Convert a string to lowercase.

Parameters
str

Input string.

Returns

Input string in lowercase.

<string T> public T ltrim (T str, T t)

Remove the leading characters from a string.

Parameters
str

Input string.

t

Characters to be removed from the left side of the input string.

Returns

Input string with leading characters in t removed.

public rstring makeRStringLiteral (rstring str)

Make a string literal from an rstring.

Parameters
str

The rstring to be converted into a string literal.

Returns

A string literal, enclosed in double quotes, with non-ascii characters escaped, stored as a UTF-8 encoded string.

public rstring makeUStringLiteral (ustring str)

Make a string literal from a ustring.

Parameters
str

The ustring to be converted into a string literal.

Returns

A string literal, enclosed in double quotes, with non-ascii characters escaped, stored as a UTF-8 encoded string.

public list<rstring> regexMatch (rstring str, rstring patt)

Match a string with a regular expression, using POSIX Extended Regular Expressions (ERE), including sub-expression (capture group) matching using parentheses "()".

Note: ERE matching is used by "grep -E", and has less functionality than Perl regular expressions. This function is currently implemented using the system C++ library for rstring and the ICU library for ustring.

Parameters
str

Input string.

patt

Search pattern.

Returns

List of matches, consisting of the whole matched string, if any, followed by any sub-expression matches, in order.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public list<ustring> regexMatch (ustring str, ustring patt)

Match a string with a regular expression, using POSIX Extended Regular Expressions (ERE), including sub-expression (capture group) matching using parentheses "()".

Note: ERE matching is used by "grep -E", and has less functionality than Perl regular expressions. This function is currently implemented using the system C++ library for rstring and the ICU library for ustring.

Parameters
str

Input string.

patt

Search pattern.

Returns

List of matches, consisting of the whole matched string, if any, followed by any sub-expression matches, in order.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public list<rstring> regexMatchPerl (rstring str, rstring patt)

Match a string with a regular expression, using a Perl regular expression, including sub-expression (capture group) matching using parentheses "()".

Note: Perl regular expression matching is used by "grep -P", and has more functionality than POSIX Extended Regular Expressions. This function is currently implemented using the Boost C++ library.

Parameters
str

Input string.

patt

Search pattern (using Boost Perl regular expression syntax).

Returns

List of matches, consisting of the whole matched string, if any, followed by any sub-expression matches, in order.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public list<ustring> regexMatchPerl (ustring str, ustring patt)

Match a string with a regular expression, using a Perl regular expression, including sub-expression (capture group) matching using parentheses "()".

Note: Perl regular expression matching is used by "grep -P", and has more functionality than POSIX Extended Regular Expressions. This function is currently implemented using the Boost C++ library.

Parameters
str

Input string.

patt

Search pattern (using Boost Perl regular expression syntax).

Returns

List of matches, consisting of the whole matched string, if any, followed by any sub-expression matches, in order.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public rstring regexReplace (rstring str, rstring searchPatt, rstring substPatt, boolean global)

Match and replace a string with a regular expression, using POSIX Extended Regular Expressions (ERE), including sub-expression (capture group) matching using parentheses "()".

Note: ERE matching is used by "grep -E", and has less functionality than Perl regular expressions. Match and replace is similar to using "sed". This function is currently implemented using the system C++ library for rstring and the ICU library for ustring.

Parameters
str

Input string.

searchPatt

Search pattern.

substPatt

Replacement pattern (for the rstring version, matched sub-expressions can be specified through "\n", as in "\1"; for the ustring version, "$n", as in "$1", may be used).

global

Whether replacement is to be done globally.

Returns

Resulting string with replacements.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public ustring regexReplace (ustring str, ustring searchPatt, ustring substPatt, boolean global)

Match and replace a string with a regular expression, using POSIX Extended Regular Expressions (ERE), including sub-expression (capture group) matching using parentheses "()".

Note: ERE matching is used by "grep -E", and has less functionality than Perl regular expressions. Match and replace is similar to using "sed". This function is currently implemented using the system C++ library for rstring and the ICU library for ustring.

Parameters
str

Input string.

searchPatt

Search pattern.

substPatt

Replacement pattern (for the rstring version, matched sub-expressions can be specified through "\n", as in "\1"; for the ustring version, "$n", as in "$1", may be used).

global

Whether replacement is to be done globally.

Returns

Resulting string with replacements.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public rstring regexReplacePerl (rstring str, rstring searchPatt, rstring substPatt, boolean global)

Match and replace a string with a regular expression, using a Perl regular expression, including sub-expression (capture group) matching using parentheses "()".

Note: Perl regular expression matching is used by "grep -P", and has more functionality than POSIX Extended Regular Expressions. Match and replace is similar to using "sed". This function is currently implemented using the Boost C++ library.

Parameters
str

Input string.

searchPatt

Search pattern (using Boost Perl regular expression syntax).

substPatt

Replacement pattern (matched sub-expressions can be specified through "\n", as in "\1").

global

Whether replacement is to be done globally.

Returns

Resulting string with replacements.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

public ustring regexReplacePerl (ustring str, ustring searchPatt, ustring substPatt, boolean global)

Match and replace a string with a regular expression, using a Perl regular expression, including sub-expression (capture group) matching using parentheses "()".

Note: Perl regular expression matching is used by "grep -P", and has more functionality than POSIX Extended Regular Expressions. Match and replace is similar to using "sed". This function is currently implemented using the Boost C++ library.

Parameters
str

Input string.

searchPatt

Search pattern (using Boost Perl regular expression syntax).

substPatt

Replacement pattern (matched sub-expressions can be specified through "\n", as in "\1").

global

Whether replacement is to be done globally.

Returns

Resulting string with replacements.

Throws
SPLRuntimeInvalidArgumentException

If the search pattern is an invalid regular expression.

<string T> public T rtrim (T str, T t)

Remove the trailing characters from a string.

Parameters
str

Input string.

t

Characters to be removed from the right side of the input string.

Returns

Input string with trailing characters in t removed.

<string T> public T substring (T str, int32 sidx, int32 slen)

Get a substring from a string.

Parameters
str

Input string.

sidx

Start index of the substring, where 0 is the first logical character in the input string.

slen

Number of characters in the substring.

Returns

Substring of the input string.

<string T> public uint32 toCharacterCode (T str)

Convert the first character of a string to its corresponding ASCII code.

Parameters
str

Input string.

Returns

ASCII code of the first character of the string.

public list<rstring> tokenize (rstring str, rstring delim, boolean keepEmptyTokens)

Tokenize a string.

Parameters
str

Input string.

delim

Delimiters to use. Each character in delim is a possible delimiter.

keepEmptyTokens

Keep empty tokens in the result.

Returns

A list of the tokens in the string.

public void tokenize (rstring str, rstring delim, boolean keepEmptyTokens, boolean multipleDelim, mutable list<rstring> tokens)

Tokenize a string.

Parameters
str

Input string.

delim

Delimiters to use.

keepEmptyTokens

Keep empty tokens in the result.

multipleDelim

If 'true', each character in delim is used to tokenize. If 'false', the string in delim is used as a delimiter. This parameter is important only if size(delim) > 1.

tokens

Output list of the tokens in the string.

public list<ustring> tokenize (ustring str, ustring delim, boolean keepEmptyTokens)

Tokenize a string.

Parameters
str

Input string.

delim

Delimiters to use. Each character in delim is a possible delimiter.

keepEmptyTokens

Keep empty tokens in the result.

Returns

A list of the tokens in the string.

public void tokenize (ustring str, ustring delim, boolean keepEmptyTokens, boolean multipleDelim, mutable list<ustring> tokens)

Tokenize a string.

Parameters
str

Input string.

delim

Delimiters to use.

keepEmptyTokens

Keep empty tokens in the result.

multipleDelim

If 'true', each character in delim is used to tokenize. If 'false', the string in delim is used as a delimiter. This parameter is important only if size(delim) > 1.

tokens

Output list of the tokens in the string.

<string T> public T trim (T str, T t)

Remove the leading and trailing characters from a string.

Parameters
str

Input string.

t

Characters to be removed from both sides of the input string.

Returns

Input string with leading and trailing characters in t removed.

<string T> public T upper (T str)

Convert a string to uppercase.

Parameters
str

Input string.

Returns

Input string in uppercase.