ugrep Command
Purpose
Searches for a Unicode pattern in a file.
Syntax
ugrep unicode_hex_notation_pattern [ -i loose_match_unicode_hex_notation_pattern ] File..
Description
The ugrep command searches an input file for characters that match the specified hexadecimal representation of the Unicode-defined code point of a character.
The regular expression writer might not use a Unicode character set to specify the pattern that needs to be searched. Also, it might not be possible to input Unicode-defined code points for every character of the major written languages by using a keyboard. Therefore, the Unicode pattern that is specified to search must be the hexadecimal representation of the Unicode-defined code point of a character.
Flags
| Item | Description |
|---|---|
| unicode_hex_notation_pattern | Specifies the hexadecimal representation of the Unicode-defined code point of a character.
For example, to represent the 𝄞 character, whose Unicode-defined code point is
U+1D11E, the value of the unicode_hex_notation_pattern pattern
can be the hexadecimal representation as \U0001D11E, \x{1D11E}, or
\u{1D11E}. |
| -i unicode_hex_notation_pattern | Specifies that the search is based on a loose match of the specified Unicode hex notation pattern. Most of the regular expression engines offer case-insensitive matching as the only loose matching. If the expression engine offers case-insensitive matching as the only loose matching, then the expression engine must account for the large range of cased Unicode characters outside of the ASCII characters. |
Exit Status
This command returns the following exit values:
| Item | Description |
|---|---|
| 0 | A match was found. |
| 1 | No match was found. |
| >1 | A syntax error was found or a file was inaccessible (even if matches were found). |
Examples
- To search the regex_test.txt file for the character
我, whose Unicode-defined code point is U+6211 and the hexadecimal representation is\u6211, enter the following command:ugrep "\u6211" regex_test.txtTo search multiple characters, you can add a list of hexadecimal representations of the Unicode-defined code points without any space. For example, to search the charactersघandरin the regex_test.txt file, enter the following command:ugrep “\u0918\u0930" regex_test.txt -
To specify a range of characters between the code points
U+6200andU+6300to search in the regex_test.txt file, enter the following command:ugrep "[\u6200-\u6300]" regex_test.txtTo specify a range of characters between the code pointsU+6200andU+6300that are also uppercase to search in the regex_test.txt file, enter the following command:ugrep "[\u0000-\U0010FFFF--\p{Lu}]" regex_test.txt - To do a loose match search of the character
𐐥, whose Unicode-defined code point is U+10425 and the hexadecimal representation is\U00010425, enter the following command:ugrep -i "\U00010425" regex_test.txt - To search the regex_test.txt file for a number with decimal digits, enter
the following command:
whereugrep "\p{Nd}" regex_test.txtNdis a Unicode character property for numbers with decimal digits. - To search the regex_test.txt file for Hiragana characters in the Japanese
language, enter the following code:
ugrep "\p{Hiragana}" regex_test.txt - To search the regex_test.txt file for uppercase letters, lowercase letters,
or numbers by using Unicode properties, enter the following commands:
where the propertyugrep "\p{Ll}" regex_test.txtLlmatches lowercase letters in Unicode and includes lowercase letters from all languages.
where the propertyugrep "\p{Lu}" regex_test.txtLumatches uppercase letters in Unicode and includes uppercase letters from all languages.ugrep "\p{L}" regex_test.txtOr
where the propertiesugrep "\p{letter}" regex_test.txtLandlettermatches all letters in Unicode. The search by using theLuproperty includes uppercase letters, lowercase letters, and connector characters. However, the search by using theletterproperty includes only the uppercase and lowercase letters.
where the propertyugrep "[\p{L}||\p{Nd}]" regex_test.txtNdmatches numeric digits in Unicode and includes numeric digits from all languages.
- To search for characters of the Latin language by using the
scriptproperty, enter the following command:ugrep "\p{script=Latin}" regex_test.txtYou can search for characters in any language by setting the value of the
scriptproperty to the specific language.
Files
| Item | Description |
|---|---|
| /usr/bin/ugrep | Contains the ugrep command. |