Topic
  • 5 replies
  • Latest Post - ‏2007-03-14T21:20:06Z by Jean_Anderson
SystemAdmin
SystemAdmin
1525 Posts

Pinned topic multi-record insert performance

‏2007-03-12T21:54:18Z |
Hello,

I was wondering if there is a more efficient way to handle large multi-record inserts. For example in Oracle I can bulk load data from ProC using an array of C Structure with one insert statement like this:

INSERT INTO SSJ
VALUES (:tmp_ssj_data);

Where tmp_ssj_data is an array (100s or 1000s) of structured records corresponding a table's column values. This greatly enhances performance for large data inserts.

Is there any way in Cloudscape to do something similar? I am already aware of importing from a file. But I am looking for something that I can run from a Java app using an array of object records that doesn't require dumping to and reading from a file. That wouldn't seem to buy much in terms of performance.

Any thought on the matter or experience with larger data inserts appreciated.

Thanks,

  • Derek
Updated on 2007-03-14T21:20:06Z at 2007-03-14T21:20:06Z by Jean_Anderson
  • Jean_Anderson
    Jean_Anderson
    179 Posts

    Re: multi-record insert performance

    ‏2007-03-12T22:14:12Z  
    Have you thought of maybe using JDBC batch updates [1]? Here's the intro example from [1]:

    codeStatement stmt = con.createStatement();
    con.setAutoCommit(false);

    stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
    stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
    stmt.addBatch("INSERT INTO emp_dept VALUES (1000, '260')");

    int [] updateCounts = stmt.executeBatch();[/code]
    That article notes that "JDBC drivers are not required to support batch updates", but Derby does support batch updates. Code ideas are in the Derby functional tests [2].

    I hope this helps,

    -jean

    [1] http://java.sun.com/j2se/1.4.2/docs/guide/jdbc/getstart/statement.html#1000712
    [2] https://svn.apache.org/repos/asf/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/BatchUpdateTest.java
  • SystemAdmin
    SystemAdmin
    1525 Posts

    Re: multi-record insert performance

    ‏2007-03-13T15:40:52Z  
    Have you thought of maybe using JDBC batch updates [1]? Here's the intro example from [1]:

    codeStatement stmt = con.createStatement();
    con.setAutoCommit(false);

    stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
    stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
    stmt.addBatch("INSERT INTO emp_dept VALUES (1000, '260')");

    int [] updateCounts = stmt.executeBatch();[/code]
    That article notes that "JDBC drivers are not required to support batch updates", but Derby does support batch updates. Code ideas are in the Derby functional tests [2].

    I hope this helps,

    -jean

    [1] http://java.sun.com/j2se/1.4.2/docs/guide/jdbc/getstart/statement.html#1000712
    [2] https://svn.apache.org/repos/asf/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/BatchUpdateTest.java
    Jean,

    I think what I am really looking for is a way to pass in a parameter list/array of 'Values' to the insert statement for one table.

    Our current Oracle usage:
    INSERT INTO SSJ
    VALUES (:tmp_ssj_data); //where tmp_ssj_data is an array of records

    For Cloudscape this may include a a prepared statements with dynamic parameters.
    The Cloudscape online documentation shows:

    INSERT INTO t VALUES (?)
    • dynamic parameter assumed to be the type
    • of the only column in table t

    So lets say table t has three integer columns. Now you have 1000 rows to insert into t that look like this:

    (1,1,1), (2,2,2), (3,3,3), (4,4,4), ...

    How would you pass these in as VALUES for a single insert statement?

    From the Apache Derby docs it shows that the VALUES expression can contain multiple rows. So in the above example I was thinking I could interate the array of 1000 records and create a formatted string which looks like:
    "(1,1,1), (2,2,2), (3,3,3), (4,4,4)"
    Then pass in the string some how to the prepared statement.

    Could that work with a prepared statement? Or is there a more efficient way to perform this task without making 1000 seperate insert calls?

    Thanks,

    Derek
  • Jean_Anderson
    Jean_Anderson
    179 Posts

    Re: multi-record insert performance

    ‏2007-03-14T00:28:33Z  
    Jean,

    I think what I am really looking for is a way to pass in a parameter list/array of 'Values' to the insert statement for one table.

    Our current Oracle usage:
    INSERT INTO SSJ
    VALUES (:tmp_ssj_data); //where tmp_ssj_data is an array of records

    For Cloudscape this may include a a prepared statements with dynamic parameters.
    The Cloudscape online documentation shows:

    INSERT INTO t VALUES (?)
    • dynamic parameter assumed to be the type
    • of the only column in table t

    So lets say table t has three integer columns. Now you have 1000 rows to insert into t that look like this:

    (1,1,1), (2,2,2), (3,3,3), (4,4,4), ...

    How would you pass these in as VALUES for a single insert statement?

    From the Apache Derby docs it shows that the VALUES expression can contain multiple rows. So in the above example I was thinking I could interate the array of 1000 records and create a formatted string which looks like:
    "(1,1,1), (2,2,2), (3,3,3), (4,4,4)"
    Then pass in the string some how to the prepared statement.

    Could that work with a prepared statement? Or is there a more efficient way to perform this task without making 1000 seperate insert calls?

    Thanks,

    Derek
    Hi, Derek,

    Derby doesn't support Oracle-style array syntax.

    Where you wrote this:
    > From the Apache Derby docs it shows that the VALUES expression can contain
    > multiple rows. So in the above example I was thinking I could interate the
    > array of 1000 records and create a formatted string which looks like:
    > "(1,1,1), (2,2,2), (3,3,3), (4,4,4)"
    > Then pass in the string some how to the prepared statement.

    The formatted string for prepareStatement() would include parameter markers:code"(?,?,?), (?,?,?), (?,?,?), (?,?,?)"[/code]Then you would set the values for the parameter markers based on values in your array. For example, let's say you want to execute this statement with these integer values:
    codeinsert into foo values (100,101,102),(200,201,202),(300,301,301);[/code]
    But you want to use parameter markers instead. I think that would look something like this:codeps = prepareStatement(
    "insert into foo values (?,?,?), (?,?,?), (?,?,?), (?,?,?)");
    ps.setInt(1, 100);
    ps.setInt(2, 101);
    ps.setInt(3, 102);
    ps.setInt(4, 200);
    ps.setInt(5, 201);
    ps.setInt(6, 202);
    ps.setInt(7, 300);
    ps.setInt(8, 301);
    ps.setInt(9, 302);
    ps.executeUpdate();[/code]
    You could set the values from variables (including an index into your array).
    One of the derby tests does a multi-row insert with parameter markers, which might be close to what you want [1]:
    codeString ins3 = "INSERT INTO T_MAIN(V) VALUES (?), (?), (?)";
    ...
    ps = prepareStatement(ins3);
    setRandomValue(r, ps, 1, jdbcType, precision);
    setRandomValue(r, ps, 2, jdbcType, precision);
    setRandomValue(r, ps, 3, jdbcType, precision);
    ps.executeUpdate();[/code]Does this help at all?

    regards,

    -jean

    [1] https://svn.apache.org/repos/asf/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/lang/TriggerTest.java
  • SystemAdmin
    SystemAdmin
    1525 Posts

    Re: multi-record insert performance

    ‏2007-03-14T20:18:20Z  
    Hi, Derek,

    Derby doesn't support Oracle-style array syntax.

    Where you wrote this:
    > From the Apache Derby docs it shows that the VALUES expression can contain
    > multiple rows. So in the above example I was thinking I could interate the
    > array of 1000 records and create a formatted string which looks like:
    > "(1,1,1), (2,2,2), (3,3,3), (4,4,4)"
    > Then pass in the string some how to the prepared statement.

    The formatted string for prepareStatement() would include parameter markers:code"(?,?,?), (?,?,?), (?,?,?), (?,?,?)"[/code]Then you would set the values for the parameter markers based on values in your array. For example, let's say you want to execute this statement with these integer values:
    codeinsert into foo values (100,101,102),(200,201,202),(300,301,301);[/code]
    But you want to use parameter markers instead. I think that would look something like this:codeps = prepareStatement(
    "insert into foo values (?,?,?), (?,?,?), (?,?,?), (?,?,?)");
    ps.setInt(1, 100);
    ps.setInt(2, 101);
    ps.setInt(3, 102);
    ps.setInt(4, 200);
    ps.setInt(5, 201);
    ps.setInt(6, 202);
    ps.setInt(7, 300);
    ps.setInt(8, 301);
    ps.setInt(9, 302);
    ps.executeUpdate();[/code]
    You could set the values from variables (including an index into your array).
    One of the derby tests does a multi-row insert with parameter markers, which might be close to what you want [1]:
    codeString ins3 = "INSERT INTO T_MAIN(V) VALUES (?), (?), (?)";
    ...
    ps = prepareStatement(ins3);
    setRandomValue(r, ps, 1, jdbcType, precision);
    setRandomValue(r, ps, 2, jdbcType, precision);
    setRandomValue(r, ps, 3, jdbcType, precision);
    ps.executeUpdate();[/code]Does this help at all?

    regards,

    -jean

    [1] https://svn.apache.org/repos/asf/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/lang/TriggerTest.java
    Jean,

    Thanks for the help. You have given me some good ideas and resources. I think I may try the batch insert method which you directed me toward. I posted a question about this in the Apache Derby mailing list as well. Another member there recommneded using this batch approach which I thought I would share here:

    [i]PreparedStatement pstmt = connection.prepareStatement("INSERT INTO t VALUES (?, ?, ?)");
    for (int i=1; i<1000; i++) {
    pstmt.setInt(1, i);
    pstmt.setInt(2, i);
    pstmt.setInt(3, i);
    pstmt.addBatch();
    }
    int[] results = pstmt.executeBatch();[/i]

    I will have to try this and compare the timings to the single INSERT string approach below. I may also have to adjust the commit number up/down to achieve optimal performance.

    Iterate an array of 1000 records and create a VALUES clause String/StringBuffer to look like:
    [i]String records = "(1,1,1), (2,2,2), (3,3,3), (4,4,4), (1000, 1000, 1000)";[/i]
    Then append the insert statement with the records and execute.
    [i]String query = "INSERT INTO t VALUES " + records;
    Execute query;[/i]
  • Jean_Anderson
    Jean_Anderson
    179 Posts

    Re: multi-record insert performance

    ‏2007-03-14T21:20:06Z  
    Jean,

    Thanks for the help. You have given me some good ideas and resources. I think I may try the batch insert method which you directed me toward. I posted a question about this in the Apache Derby mailing list as well. Another member there recommneded using this batch approach which I thought I would share here:

    [i]PreparedStatement pstmt = connection.prepareStatement("INSERT INTO t VALUES (?, ?, ?)");
    for (int i=1; i<1000; i++) {
    pstmt.setInt(1, i);
    pstmt.setInt(2, i);
    pstmt.setInt(3, i);
    pstmt.addBatch();
    }
    int[] results = pstmt.executeBatch();[/i]

    I will have to try this and compare the timings to the single INSERT string approach below. I may also have to adjust the commit number up/down to achieve optimal performance.

    Iterate an array of 1000 records and create a VALUES clause String/StringBuffer to look like:
    [i]String records = "(1,1,1), (2,2,2), (3,3,3), (4,4,4), (1000, 1000, 1000)";[/i]
    Then append the insert statement with the records and execute.
    [i]String query = "INSERT INTO t VALUES " + records;
    Execute query;[/i]
    Hi, Derek,

    The batch strategy sounds great -- go for it.

    You can get tips on batch handling from the Derby tests (and save yourself some spin-up time); for example, this one:

    https://svn.apache.org/repos/asf/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/jdbcapi/BatchUpdateTest.java

    regards,

    -jean