Topic
2 replies Latest Post - ‏2013-07-30T14:00:47Z by atimpo
atimpo
atimpo
16 Posts
ACCEPTED ANSWER

Pinned topic Data Studio and large result sets

‏2013-07-23T21:10:32Z |

I have customers that need to pull large (> 500k rows) result sets from a data warehouse (it's on DB2 UDB 9.7, on AIX servers).

I've found that we can only pull about 300-400k rows (depending on the number of columns, or the size of the columns) before the tool slows down to a crawl.

I figured that this was due to Java heap management, so I opened PMR 22184,442,000 about it.

So, on my RHEL 6 machine (which runs 4.1 at the moment), I performed the following steps to enable large pages for the VMM.

  • put these lines in /etc/sysctl.conf & invoked sysctl -p

vm.nr_hugepages = 1280

kernel.shmall = 68719476736

kernel.shmmax = 68719476736

  • added -Xlp parm to eclipse.ini
  • changed these lines in eclipse.ini

-Xms512m
-Xmx5120m
-XX:MaxPermSize=512M

 

Even after enabling large pages like this, and increasing the heap, when the tool pulls > 300k rows, these things seem to happen:

  • the query runs and pulls the data into memory.
  • "Closing JDBC connection" appears in the status window.  This message stays there for a long time, or sometimes forever.
  • The eclipse process physical memory footprint grows slowly up to the limit in -Xmx.
  • Tens of minutes go by with the tool frozen while the heap grows.
  • nothing appears in the workspace folder .metadata/.plugins/org.eclipse.datatools.sqltools.result
  • If I am lucky enough to not exhaust the heap, the result set tab eventually appears, and I can use the tool.  Also a file shows up in the workspace folder .metadata/.plugins/org.eclipse.datatools.sqltools.result.  It's as if the tool won't write the file until it gets done messing with the java heap.
  • If I do exhaust the heap, the JVM may crash, or I'll see a java.lang.outofmemory error appear in a dialog box.

Does anyone know how to fix this?  Is it because the result set itself is a java object?

Would any changes to -Xgcpolicy:gencon be of use?

My customers are very unhappy with this behavior.

On the Windows x86 platform, they can't pull 100k rows without making the heap grow to 1024M, then the same behavior described above happens.

 

 

 

  • Brian_Laube
    Brian_Laube
    5 Posts
    ACCEPTED ANSWER

    Re: Data Studio and large result sets

    ‏2013-07-30T13:54:12Z  in response to atimpo

    I see you do not have a GOOD answer on how to make IBM Data Studio handle (very) large result sets.... so I will offer my opinion....

    My question, is why would anyone want to bring 500,000+ rows into IBM Data Studio and display it in IBM Data Studio.  That is a very large result set.  And viewing the grid result in IBM Data Studio would be very cumbersome. 

    Whenever I hear of requests to retrieve very large result sets for later processing by another tool, I always ask the requestor if we could make a more sophisticated sql statement WHERE clause to reduce the quantity of rows returned.  It is better to discard rows via the SQL WHERE then in some later processing.

    If your requirement is to extract all the rows from a table and produce a comma delimated output file then you can use IBM Data Studio "extract" function.  It appears to retrieve the entire contents of a table and put it straight into a file.  This function runs relatively quickly on my workstation ... I asumme it is fast because it does not retrieve the results into IBM  Data Studio and display it there... the result just passes through the tool and into the file....
    > I found the "extract" function under the data source explorere -> ... -> schema -> table -> right click on table and choose the "extract" option from the pop-up
    >> The "extract" only appears to work with just tables.  It would be nice if you could "extract" the result of any (complex) sql staement/script straight to a result set ... but I don't see that option available.

    Depending upon your requriements, you could consider the IBM Data Server (runtime) client software which includes a "export" function that can be run from a command prompt.  The input can be any sql statement.  The output result set can be produced in two formats.
    (I do not regularly use this function....)
    http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0008303.html

    Good Luck!
     

    • atimpo
      atimpo
      16 Posts
      ACCEPTED ANSWER

      Re: Data Studio and large result sets

      ‏2013-07-30T14:00:47Z  in response to Brian_Laube

      I agree that people don't need to pull in large result sets.  But, that's what the customer wants in this case.  I think I will have to start to look at training customers on EXPORT commands, and figure out how to make the data transfer part from the database servers simple for the customers.

      I just wanted to share my experiences in case the Data Studio developers wanted to try to address this in some manner down the road.

      By the way devs, I like the improvements in 4.1 - kudos to you.  I know as developers you tend to hear more negatives than anything, wanted to send some positive feedback!