Contents


Using SQL bind variables for application performance and security

Compare the performance and security benefits of using bind variables, substitution variables, and literals in SQL statements

Comments

When you use SQL to communicate data between your web application and a database, you have the option to include the literal data in an SQL statement or use bind variables. Bind variables are placeholders for actual values in SQL statements. Writing SQL statements with bind variables rather than substitution variables or literals minimizes processing time and can improve application performance by 20 to 30 percent. Using bind variables can also help prevent an SQL injection attack.

This article compares the performance and security benefits of using bind variables (also known as bind parameters or dynamic parameters) rather than substitution variables or literals in SQL statements. I briefly introduce bind variables, then demonstrate how they're used in SQL statements and show the resulting performance improvement. I also show you how to use bind variables to effectively deter an SQL injection attack in a sample Java application.

Overview of bind variables

A bind variable consists of a variable indicated by a placeholder character such as a question mark (?), :name, or @name. The placeholder character depends on the SQL database server that you use. You provide the actual value of the placeholder at runtime, just before the SQL statement is executed.

How bind variables improve application performance

In most relational databases, an SQL statement is processed in three steps:

  1. Parsing the SQL statement: Verifies the SQL statement syntax and access rights and builds the best (optimized) execution plan for the SQL statement. Placeholder variables are not known at this point.
  2. Binding variables: Where the API provides the actual values for placeholders.
  3. Execution: Done with the selected execution plan and actual value of the placeholder variables.

Each time an SQL statement is sent to a database, an exact text match is performed to see if the statement is already present in the shared pool. If no matching statement is found, the database performs a hard parse of that statement. If a matching statement is found, then the database initiates a soft parse.

  • In a hard parse, the SQL statement needs to be parsed, checked for syntax errors, checked for correctness of table names and column names, and optimized to find the best execution plan.
  • In a soft parse, the SQL statement already exists in a shared pool, so very little processing is required for access rights and session verification.

Using bind variables enables soft parsing, which means that less processing time is spent on choosing an optimized execution plan. Information about how to process the SQL statement has been saved, along with the SQL statement itself, in a shared pool.

Comparing SQL statements

There are distinct performance advantages to using bind variables. For one, holding many similar but unique SQL statements in a shared pool is a waste of memory; writing one statement with interchangeable variables is more efficient. It is also time-consuming to parse SQL statements. Reducing the number of hard parses minimizes CPU usage.

Let's compare the respective performance impact of using literals, substitution variables, and bind variables.

Literals

Literals are values directly passed to an SQL query. For instance, you could execute the two queries in Listing 1 with literals:

Listing 1. Two SQL queries with literals
SQL> SELECT * FROM dual WHERE dummy = 'dummy_literal1';
no rows selected

SQL> SELECT * FROM dual WHERE dummy = 'dummy_literal2';
no rows selected

Listing 2 shows the queries in a shared pool:

Listing 2. Shared pool with literals
SQL> SELECT sql_text,
  2         executions
  3  FROM   v$sql
  4  WHERE  INSTR(sql_text, 'SELECT * FROM dual WHERE dummy') > 0
  5  AND    INSTR(sql_text, 'sql_text') = 0
  6  ORDER BY sql_text;

And Listing 3 shows the resulting output:

Listing 3. Output of SQL queries with literals
SQL_TEXT                                                     EXECUTIONS
------------------------------------------------------------ ----------
SELECT * FROM dual WHERE dummy = 'dummy_literal1'                1
SELECT * FROM dual WHERE dummy = 'dummy_literal2'                1
2 rows selected.

In this case, both queries were parsed separately, so there are two separate queries in the shared pool.

Substitution variables

When a substitution variable is used in a statement, an input value is requested and rewrites the statement to include it. The rewritten statement is passed to the database. As a result, the database server knows nothing of the substitution variable. Listing 4 shows an example of substitution variables:

Listing 4. Two SQL queries with substitution variables
SQL> SELECT * FROM dual WHERE dummy = '&dummy';
Enter value for dummy: dummy_substitution1
old   1: SELECT * FROM dual WHERE dummy = '&dummy'
new   1: SELECT * FROM dual WHERE dummy = 'dummy_substitution1'
no rows selected

SQL> SELECT * FROM dual WHERE dummy = '&dummy';
Enter value for dummy: dummy_substitution2
old   1: SELECT * FROM dual WHERE dummy = '&dummy'
new   1: SELECT * FROM dual WHERE dummy = ' dummy_substitution2'
no rows selected

Listing 5 shows the queries in a shared pool:

Listing 5. Shared pool with substitution variables
SQL> SELECT sql_text,
  2         executions
  3  FROM   v$sql
  4  WHERE  INSTR(sql_text, 'SELECT * FROM dual WHERE dummy') > 0
  5  AND    INSTR(sql_text, 'sql_text') = 0
  6  ORDER BY sql_text;

Listing 6 shows the resulting output:

Listing 6. Output of SQL queries with substitution variables
SQL_TEXT                                                     EXECUTIONS
------------------------------------------------------------ ----------
SELECT * FROM dual WHERE dummy = 'dummy_substitution1'             1
SELECT * FROM dual WHERE dummy = 'dummy_substitution2'             1
2 rows selected.

Once again, both statements were parsed separately, and there are two separate queries in the shared pool. As far as the database server is concerned, literals and substitution variables are the same.

Bind variables

Now I show how using bind variables affects the shared pool. Listing 7 follows the same format as the previous ones:

Listing 7. Two SQL queries with bind variables
SQL> VARIABLE dummy VARCHAR2(30);

SQL> EXEC :dummy := 'dummy_bind1';
PL/SQL procedure successfully completed.

SQL> SELECT * FROM dual WHERE dummy = :dummy;
no rows selected

SQL> EXEC :dummy := 'dummy_bind2';
PL/SQL procedure successfully completed.

SQL> SELECT * FROM dual WHERE dummy = :dummy;
no rows selected

Listing 8 is a snapshot of the shared pool:

Listing 8. Shared pool with bind variables
SQL> SELECT sql_text,
  2         executions
  3  FROM   v$sql
  4  WHERE  INSTR(sql_text, 'SELECT * FROM dual WHERE dummy') > 0
  5  AND    INSTR(sql_text, 'sql_text') = 0
  6  ORDER BY sql_text;

Listing 9 shows the output:

Listing 9. Output of SQL queries with bind variables
SQL_TEXT                                                     EXECUTIONS
------------------------------------------------------------ ----------
SELECT * FROM dual WHERE dummy = :dummy                               2
1 row selected.

In this case, as you can see, the same SQL statement was executed twice, so that only one SQL query resides in the shared pool.

SQL parsing and performance

As previously mentioned, CPU usage varies based on the type of SQL parsing required. Literals and substitution variables require hard parsing, which consumes more CPU cycles than the soft parsing required for bind variables.

For the following code listings, I retrieved CPU usage measurements from V$MYSTAT by querying the parse time cpu statistic. This statistic represents the total CPU time used for parsing (hard or soft) in tens of milliseconds. The statements present in the shared pool are also displayed.

Listing 10 is a statement without bind variables:

Listing 10. A statement without bind variables
SQL> DECLARE
  2    l_dummy  dual.dummy%TYPE;
  3  BEGIN
  4    FOR i IN 1 .. 10 LOOP
  5      BEGIN
  6        EXECUTE IMMEDIATE 'SELECT dummy FROM dual WHERE dummy = ''' || TO_CHAR(i) || ''''
  7        INTO l_dummy;
  8      EXCEPTION
  9        WHEN NO_DATA_FOUND THEN
 10          NULL;
 11      END;
 12    END LOOP;
 13  END;
 14  /

The PL/SQL procedure successfully completes. Next, Listing 11 executes the procedure and checks the CPU usage:

Listing 11. CPU usage without bind variables
SQL> SELECT sn.name, ms.value
  2  FROM   v$mystat ms, v$statname sn
  3  WHERE  ms.statistic# = sn.statistic#
  4  AND    sn.name       = 'parse time cpu';
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
parse time cpu                                                           63

1 row selected.

The results show that 630 milliseconds of CPU time were used for parsing during the session.

Listing 12 shows the resulting executions in the shared pool:

Listing 12. Executions in the shared pool
SQL> SELECT sql_text,
  2         executions
  3  FROM   v$sql
  4  WHERE  INSTR(sql_text, 'SELECT dummy FROM dual WHERE dummy') > 0
  5  AND    INSTR(sql_text, 'sql_text') = 0
  6  AND    INSTR(sql_text, 'DECLARE') = 0
  7  ORDER BY sql_text;
SQL_TEXT                                                     EXECUTIONS
------------------------------------------------------------ ----------
SELECT dummy FROM dual WHERE dummy = '1'                              1
SELECT dummy FROM dual WHERE dummy = '10'                             1
SELECT dummy FROM dual WHERE dummy = '2'                              1
SELECT dummy FROM dual WHERE dummy = '3'                              1
SELECT dummy FROM dual WHERE dummy = '4'                              1
SELECT dummy FROM dual WHERE dummy = '5'                              1
SELECT dummy FROM dual WHERE dummy = '6'                              1
SELECT dummy FROM dual WHERE dummy = '7'                              1
SELECT dummy FROM dual WHERE dummy = '8'                              1
SELECT dummy FROM dual WHERE dummy = '9'                              1
10 rows selected.

Note that the shared pool contains 10 similar statements using literals.

Using bind variables

Next, I run the same queries using bind variables. Listing 13 shows the same statement from Listing 10 with the substitution of bind variables.

Listing 13. An SQL statement with bind variables
SQL> DECLARE
  2    l_dummy  dual.dummy%TYPE;
  3  BEGIN
  4    FOR i IN 1 .. 10 LOOP
  5      BEGIN
  6        EXECUTE IMMEDIATE 'SELECT dummy FROM dual WHERE dummy = TO_CHAR(:dummy)'
  7        INTO l_dummy USING i;
  8      EXCEPTION
  9        WHEN NO_DATA_FOUND THEN
 10          NULL;
 11      END;
 12    END LOOP;
 13  END;
 14  /

The PL/SQL procedure successfully completes. When I execute the procedure in Listing 13 and check the CPU usage, I get the result displayed in Listing 14:

Listing 14. CPU usage with bind variables
SQL> SELECT sn.name, ms.value
  2  FROM   v$mystat ms, v$statname sn
  3  WHERE  ms.statistic# = sn.statistic#
  4  AND    sn.name       = 'parse time cpu';
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
parse time cpu                                                           40
1 row selected.

The results show that 400 milliseconds of CPU time were used on parsing during the session. That is less than two thirds the amount used in the previous example. Now let's check the shared pool.

Listing 15. Shared pool with bind variables
SQL> SELECT sql_text,
  2         executions
  3  FROM   v$sql
  4  WHERE  INSTR(sql_text, 'SELECT dummy FROM dual WHERE dummy') > 0
  5  AND    INSTR(sql_text, 'sql_text') = 0
  6  AND    INSTR(sql_text, 'DECLARE') = 0
  7  ORDER BY sql_text;
SQL_TEXT                                                     EXECUTIONS
------------------------------------------------------------ ----------
SELECT dummy FROM dual WHERE dummy = TO_CHAR(:dummy)                 10
1 row selected.

In Listing 15, as expected, there is only a single statement in the shared pool.

These examples clearly demonstrate that replacing literals with bind variables saves both memory usage and CPU cycles. In these cases, performance was enhanced by approximately 30 percent.

Next, I show the use of bind variables in context, measuring the performance advantage of using bind variables for SQL statements in a sample Java program.

Performance-tuning SQL statements

Listing 16 represents a typical Java program where the SQL statements have been written using literals. A new SQL statement is created for every loop. Each time the loop encounters a new value, a new SQL query is created and executed.

Listing 16. A basic SQL query without bind variables
sql = "SELECT t.name FROM hr.employees t WHERE employee_id = ";

System.out.println("Start: " + new Date());

for(int i=0; i<10000; i++)
    {
        statement = connection.createStatement();
        resultset = statement.executeQuery(sql + Integer.toString(i));
        if(resultset.next())
        {
            name = resultset.getString("name");
            doSomething(name);
        }
        resultset.close();
        statement.close();
    }

System.out.println("End: " + new Date());

It took approximately 11 seconds to execute this code. Now let's rewrite the code using prepared statements and bind variables.

Prepared statements with bind variables

In Listing 17, a query is sent to the server with a bind variable defined. During execution, we bind the Java variable "i" to the SQL statement. We are thus able to use the same execution plan for 10,000 queries, which improves performance by minimizing SQL parsing.

Listing 17. A basic SQL statement with bind variables
sql = "SELECT t.name FROM hr.employees t WHERE employee_id = ?";
System.out.println("Start: " + new Date());
for(int i=0; i<10000; i++)
    {
        statement = connection.prepareStatement(sql);
        statement.setInt(1, i);
        resultset = statement.executeQuery();
        if(resultset.next())
        {
            name = resultset.getString("name");
            doSomething(name);
        }
        resultset.close();
        statement.close();
    }
            System.out.println("End: " + new Date());

This code took approximately seven seconds to execute. Note, however, that the code creates a new statement for each loop. We can improve on this result by creating just one statement and reusing it for each loop, as shown in Listing 18:

Listing 18. Reusing a statement
sql = "SELECT t.name FROM hr.employees t WHERE employee_id = ?";
statement = connection.prepareStatement(sql);
System.out.println("Start: " + new Date());
for(int i=0; i<10000; i++)
    {
        statement.setInt(1, i);
        resultset = statement.executeQuery();
        if(resultset.next())
        {
            name = resultset.getString("name");
            doSomething(name);
        }
        resultset.close();
    }
            System.out.println("End: " + new Date());

statement.close();

It took approximately 4 seconds for this Java code to perform the same SQL operation as the original code, which took 11 seconds.

Types of SQL injection attacks

SQL injection attacks were rated the number one security threat by the Open Web Application Security Project in 2013 (see Related topics). In an SQL injection attack, malicious SQL statements are inserted via an entry field into a web application's database, in order to force the application to execute them. For an SQL injection attack to work, the application code must be vulnerable to user input. SQL injection attacks take advantage of an application's vulnerability to user input that is either incorrectly filtered for string literal escape characters with embedded SQL statements, or input that is not strongly typed.

The following section discusses the two types of security vulnerability that can promote an SQL injection attack.

Incorrectly filtered escape characters

In the first type of attack, a hacker puts text that includes escape characters and embedded SQL statements into a web application form field or query attribute. If the web application doesn't filter out the escape characters, the text, with the malicious SQL statements, is passed into the database for execution.

The following line of code illustrates this vulnerability:

statement := "SELECT * FROM emp WHERE emp_name = '" + empName + "';"

If empName is set from a web application's form field, the attacker could enter the following in the empName field:

' or '1'='1

If the web application code doesn't escape the single-quote (') character, it is included in the SQL statement as-is, resulting in the following new SQL statement:

SELECT * FROM emp WHERE emp_name = '' or '1'='1';

When this code is executed, it will return all the data from the table emp, because the '1'='1' in the WHERE clause is always true. The attacker will successfully retrieve data about every employee in the database.

SQL comments and statements

Another common attack is to inject comments maliciously into an SQL statement, blocking the rest of the query from being executed. Three types of SQL comment can be injected, as shown here:

' or '1'='1' -- '
' or '1'='1' ({ '
' or '1'='1' /* '

Any of the previous three inputs maliciously injected into an SQL statement would block the remainder of the query.

Attackers can also add malicious SQL statements to the end of an existing statement. For example, the value of empName in the following statement would cause the emp table to be deleted. In an API that allowed multiple statements, the selection of all the data from the userinfo table would also be deleted.

a';DROP TABLE emp; SELECT * FROM userinfo WHERE 't' = 't

This input would render the final SQL statement as follows:

SELECT * FROM emp  WHERE emp_name = 'a';DROP TABLE emp; SELECT * FROM userinfo WHERE 't' = 't';

Incorrect type handling

The second type of SQL injection attack occurs when user input isn't validated for its data type. For instance, a programmer might fail to validate user input for a numeric field, as shown here:

statement := "SELECT * FROM userinfo WHERE id = " + id_var + ";"

The accepted value of id_var is numeric, but there is no validation prior to setting it into the SQL query. If the variable id_var were associated with an application form, an attacker could set it as shown:

1;DROP TABLE users, yielding the following SQL: 

SELECT * FROM userinfo WHERE id=1;DROP TABLE users;

If successfully passed, this statement would cause the SQL database server to delete the users table from the database.

Preventing SQL injection attacks

When a bind variable is passed as an argument to an SQL prepared statement, it is automatically escaped by the JDBC driver. The resulting escaped strings treat the variable as user data and cannot be interpreted by the SQL database server as an SQL statement. Therefore, any user-supplied data must be escaped before it is added to an SQL statement. Listing 19 shows the bind variable user ID being added to an SQL statement:

Listing 19. Prepared statements with bind variables
String selectStatement = "SELECT * FROM User WHERE userId = ? ";
PreparedStatement prepStmt = con.prepareStatement(selectStatement);
prepStmt.setString(1, userId);
ResultSet rs = prepStmt.executeQuery();

The setString method of the prepStmt object escapes the userID string and adds it to the SQL statement. Any malicious SQL statements passed in through the userID variable is rendered as non-executable in the escaped version of the input.

Bind variables in the database

I conclude with a quick review of the syntax for bind variables in common database programming environments.

In SQL*Plus, you would use bind variables as shown in Listing 20:

Listing 20. Bind variables in SQL*Plus
SQL> variable deptno number
SQL> exec :deptno := 10
SQL> select * from emp where deptno = :deptno;

PL/SQL handles most issues to do with bind variables. Take, for example, the bit of PL/SQL in Listing 21:

Listing 21. Bind variables in PL/SQL
create or replace procedure dsal(p_empno in number)
as
  begin
    update emp
    set sal=sal*2
    where empno = p_empno;
    commit;
  end;
/

Note that every reference to a PL/SQL variable is a bind variable.

In the case of Microsoft SQL Server, you would use sp_executesql and the bind variable would be something like: @var1. Listing 22 shows an example:

Listing 22. Bind variables in Microsoft SQL Server
DECLARE @IntVariable int;
DECLARE @SQLString nvarchar(500);
DECLARE @ParmDefinition nvarchar(500);

/* Build the SQL string one time.*/
SET @SQLString =
     N'SELECT BusinessEntityID, NationalIDNumber, JobTitle, LoginID
       FROM AdventureWorks2008R2.HumanResources.Employee 
       WHERE BusinessEntityID = @BusinessEntityID';
SET @ParmDefinition = N'@BusinessEntityID tinyint';
/* Execute the string with the first parameter value. */
SET @IntVariable = 197;
EXECUTE sp_executesql @SQLString, @ParmDefinition,
                      @BusinessEntityID = @IntVariable;
/* Execute the same string with the second parameter value. */
SET @IntVariable = 109;
EXECUTE sp_executesql @SQLString, @ParmDefinition,
                      @BusinessEntityID = @IntVariable;

Listing 23 is a prepared statement in MySQL Server with a single bind variable:

Listing 23. A bind variable in a MySQL Server prepared statement
mysql> prepare stmt from 
    -> 'select count(*) from information_schema.schemata where schema_name = ?';
Query OK, 0 rows affected (0.00 sec)
Statement prepared
mysql> set @schema := 'test';
Query OK, 0 rows affected (0.00 sec)
mysql> execute stmt using @schema;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

Listing 24 is a prepared statement with multiple bind variables:

Listing 24. Multiple bind variables in a MySQL Server prepared statement
mysql> prepare stmt from 
    -> 'select count(*) 
    -> from information_schema.schemata 
    -> where schema_name = ? or schema_name = ?';
Query OK, 0 rows affected (0.00 sec)
Statement prepared
mysql> execute stmt 
    -> using @schema1,@schema2
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

In conclusion

In this article, I introduced bind variables and used code samples to demonstrate how they compare to substitution variables and literals in terms of performance impact. As you've seen, using bind variables in your SQL statements can improve performance measurements of SQL execution by up to 30 percent. Using bind variables is also a known defense against SQL injection attacks (see Related topics). I demonstrated several types of SQL injection attack and showed you how to use bind variables in a number of common database programming environments.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Security
ArticleID=960121
ArticleTitle=Using SQL bind variables for application performance and security
publish-date=01142014