Characters and tokens in SQL

The most basic elements of SQL syntax are characters and tokens. Tokens are the basic syntactical units of the SQL language.

Characters

The basic symbols of keywords and operators in the SQL language are characters. Characters are classified as letters, digits, or special characters.

  • A letter is any of the 26 uppercase (A–Z) and 26 lowercase (a–z) letters of the English alphabet.1
  • A digit is any one of the characters 0–9.
  • A special character is any character other than a letter or a digit.

Tokens

The basic syntactical units of the SQL language are called tokens. A token consists of one or more characters of which none are blanks, control characters, or characters within a string constant or delimited identifier.

Tokens are classified as ordinary or delimiter tokens:

  • An ordinary token is a numeric constant, an ordinary identifier, a host identifier, or a keyword. The following examples are ordinary tokens:
        1        .1        +2        SELECT        E        3
  • A delimiter token is a string constant, a delimited identifier, an operator symbol, or any of the special characters shown in the syntax diagrams. A question mark (?) is also a delimiter token when it serves as a parameter marker, as explained in PREPARE statement. The following examples include delimiter tokens:
        ,        'string'        "fld1"        =        .

Spaces

A space is a sequence of one or more blank characters.

Control characters

A control character is a special character that is used for string alignment. Treated similar to a space, a control character does not cause a particular action to occur. The following table shows the control characters that Db2 recognizes and their hexadecimal values.

Start of change
Table 1. Hexadecimal values for the control characters that Db2 recognizes
Control character EBCDIC hex value UTF-8 hex value UTF-16 hex value
Tab 05 09 U+0009
Form feed 0C 0C U+000C
Carriage return 0D 0D U+000D
New line 15 C285 U+0085
Line feed 25 0A U+000A
DBCS space - - U+3000
End of change

Tokens, other than string constants and certain delimited identifiers, must not include a control character or space. A control character or space can follow a token. A delimiter token, control character, or a space must follow every ordinary token. If the syntax does not allow a delimiter token to follow an ordinary token, a control character or a space must follow that ordinary token.

Trigraph characters

The left bracket ([) and right bracket (]) characters are used in syntax to refer to an array element. Those characters cannot be specified with some CCSIDs. The following trigraphs can be used as an alternative way to specify left and right brackets:

  • The string ??( can be specified in place of a left bracket ([).
  • The string ??) can be specified in place of a right bracket (]).

Comments

Dynamic SQL statements can include SQL comments. Static SQL statements can include host language comments or SQL comments. Comments can be specified wherever a space can be specified, except within a delimiter token or between the keywords EXEC and SQL. In Java™, SQL comments are not allowed within embedded Java expressions. There are two types of SQL comments:

Simple comments
Simple comments are introduced with two consecutive hyphens (--). Simple comments cannot continue past the end of the line.
Bracketed comments
Bracketed comments are introduced with /* and end with */. A bracketed comment can continue past the end of the line.
The following example shows how to include comments in an SQL statement within a C program. The example uses both simple and bracketed comments:
   EXEC SQL
     CREATE VIEW PRJ_MAXPER         --projects with most support personnel
       /*
        * Returns number and name of the project
        */
      AS SELECT PROJNO, PROJNAME     -- number and name of project
           FROM DSN8910.PROJ
        /*
         * E21 is the systems support dept code
         */
      WHERE DEPTNO = 'E21'           -- systems support dept code
      AND PRSTAFF > 1;

For more information, see SQL comments.

Uppercase and lowercase characters

A token in an SQL statement can include lowercase letters, but lowercase letters in an ordinary token are folded to uppercase. However, lowercase letters are folded to uppercase in a C or Java program only if the appropriate precompiler option is specified. Delimiter tokens are never folded to uppercase.

For example, the following two statements are equivalent after folder:
   select * from DSN8C10.EMP where lastname = 'Smith';
   SELECT * FROM DSN8C10.EMP WHERE LASTNAME = 'Smith';
1 Letters also include three code points reserved as alphabetic extenders for national languages ($, #, and @ in the United States). These three code points (X'5B', X'7B', and X'7C') should be avoided because they represent different characters depending on the CCSID.