Case-sensitive search configuration with Presto (Java)
When you deal with character data, case sensitivity is important when you search for specific matches or patterns. However, not all databases and query engines behave in the same way. Some are case-insensitive by default, while others are not. The case-sensitive search can lead to unexpected results if the case of the data is not considered.
watsonx.data on IBM Software Hub
Symptoms
Following is an example of why you might need to take steps for case-sensitive search configuration.
mysql> select * from state where name='iowa';
+------+----+--------------+
| name | id | abbreviation |
+------+----+--------------+
| Iowa | 19 | IA |
+------+----+--------------+
1 row in set (0.00 sec)
MySQL is case-insensitive by default. Even though MySQL column contains the capitalized string ‘Iowa’, it still matched the query’s restriction of ‘Iowa’ that is acceptable. In some use cases, it might lead to unexpected results.
presto:demo> select * from state where name='iowa';
name | id | abbreviation
------+----+--------------
(0 rows)
Query 20201120_151345_00001_wjx6r, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:07 [1 rows, 0B] [0 rows/s, 0B/s]
presto:demo> select * from state where name='Iowa';
name | id | abbreviation
------+----+--------------
Iowa | 19 | IA
(1 row)
You get a match with ‘Iowa’, and not with ‘iowa’. Presto (Java) made this data source (MySQL) case-sensitive, even though it is the same database in both examples, with default configurations used.
Resolving the problem
Reconfigure case-sensitivity:
With an RDBMS like MySQL, you can configure the collation setting to control if you want case sensitivity or not. You can set the collation at the database creation or table creation level as a part of the CREATE statement. Or you can use ALTER to change the collation of a database, table or individual column.
Presto (Java) is not a database and does not store data to support collation and has no configuration parameter that controls the case-sensitivity. To manage case-sensitivity in Presto (Java), and mimic collation, you can rewrite the query to force case insensitivity explicitly by using:
- Simple
lower()
orupper()
functions
select * from state where lower(name)='california';
name
------------
california
California
CALIFORNIA
(3 rows)
This query matched any uppercase or lowercase combination in the table, mimicking case insensitivity.
Or regular expressions.
select * from state where regexp_like(name, '(?i)california');
name
------------
california
California
CALIFORNIA
(3 rows)
The regular expression syntax (?i) means that matches are case-insensitive.