CSV data source

If you use a CSV file to import data into a data source, ensure that the CSV file complies with the following formatting and content restrictions.

CSV formatting and content restrictions

  • The first line of the CSV file must contain a list of the property names, which are separated by commas.
  • Each line after the first line of the CSV file must contain an identical number of values to match the property names in the first line.
  • There must be no extra commas at the end of any line and the number of commas on each line must be the same.
  • Apostrophes and quotation marks are not supported. However, double quotation marks are supported for multiline format, and multipolygon format, for example, "MULTIPOINT ((10 40), (40 30), (20 20), (30 10))".
  • When a CSV file is processed by the system, the name is changed, for example, Test3_processed_123.csv.
  • To update an existing CSV file data source, place a file with the original name in the directory on the application server. Every line of the CSV file is processed for each file that is received, so you might want to remove existing entries to avoid duplication.
  • If you configure an ID property to be used as an ID, you must ensure that a CSV file does not contain data items with duplicate IDs. If imported CSV data does contain data items with duplicate IDs, extra data items might be added and updated incorrectly.
  • If you do not specify a time and date format for a data source that acquires its data from a CSV file, the default format that is applied is yyyy-MM-dd HH:mm:ss.
  • Every line in the CSV file must contain a newline character (/n).
  • If the solution cannot process incoming date values, the data import process continues, by using the current system time for date values.

Increasing the transaction timeout value for large CSV files

If you plan to import large CSV files, it might be necessary to increase the EJB transaction timeout value that is configured in the WebSphere® Application Server Liberty Profile liberty_installation/user/servers/defaultServer/server.xml file. In the following example, the transaction timeout value is set to 1200 seconds:
<transaction totalTranLifetimeTimeout="1200s"></transaction>
If a connection timeout error occurs while CSV data is being imported, an error message that is similar to the following example is displayed in the messages.log file:
[6/26/15 16:18:30:001 IST] 00000076 com.ibm.tx.jta.impl.TimeoutManager
I WTRN0006W: Transaction ........ has timed out after 120 seconds.
[6/26/15 16:18:30:002 IST] 00000076 com.ibm.tx.jta.impl.TimeoutManager                           
I WTRN0124I: When the timeout occurred the thread with which the transaction is, 
or was most recently, associated was 
Thread[LargeThreadPool-thread-52,5,LargeThreadPool Thread Group]. 
The stack trace of this thread when the timeout occurred was: 
 java.net.SocketInputStream.socketRead0(Native Method) 
 java.net.SocketInputStream.read(SocketInputStream.java:152) 
 java.net.SocketInputStream.read(SocketInputStream.java:122)
 com.ibm.db2.jcc.t4.z.b(z.java:199)
 com.ibm.db2.jcc.t4.z.c(z.java:289)
 com.ibm.db2.jcc.t4.z.c(z.java:402)
 com.ibm.db2.jcc.t4.z.v(z.java:1170)
 com.ibm.db2.jcc.t4.cb.d(cb.java:54)
 com.ibm.db2.jcc.t4.q.c(q.java:44)
 com.ibm.db2.jcc.t4.sb.j(sb.java:147)
 com.ibm.db2.jcc.am.yn.ib(yn.java:2119)
 com.ibm.db2.jcc.am.zn.b(zn.java:4295)
 com.ibm.db2.jcc.am.zn.cc(zn.java:720)
 com.ibm.db2.jcc.am.zn.executeQuery(zn.java:694)
 com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeQuery(WSJdbcPreparedStatement.java:552)
 com.ibm.ioc.datareceiver.types.IDataReceiver.getObjectIDs(IDataReceiver.java:733)