The problematic was :
- Import of huge data list ; from 500K itemes to 1.5M items
- where items are ordered and contains a date,
- and items contains 5 double.
The whole datas (not only those lists) have to be request in many ways and they are not all known, so we choose to test a relational database system to offer large type of request .
As we have a Mysql 5.0 server install we decide to use this database.
To be able to progress we define many use case. The main use cases are :
- Start from a text file containing datas (500k lines, tab separated) and Import those file in mysql
- implies convert file to object
- implies RDB INSERT request
- Read those 500k items
- implies RDB SELECT
- Write in an another place those 500k items
- implies convert SELECT result to object
- implies RDB INSERT
The last two points fake a standard process for us : get datas, process results with previous datas, and finally push result.
So we try many thing, using jdbc bridge and we progress while we read many post and article about "Fast INSERT INTO " ; The list below is show starting from the worst performance in our case to the best.
- using jdbc statment and default autocommit
- one request for one line, commit each time
- using jdbc statment , autocomit to false and then commit
- one request for one line, commit each the end of lines process
- using jdbc statment, addbatch, autocomit to false and then commit
- one request for many lines lines (we use a batch size to preserve memory and spread RDB load) , commit
- using jdbc prepareStatment and addbatch, autocomit to false and then commit
- use RDB optimization when dealing many times the same request.
- using LOAD DATA LOCAL INFILE using and ImputStream
- we convert the file in an imput stream :
- adding an order
- converting date format
- giving an extenal id
Statement stmt = con.createStatement();
String statementText = "LOAD DATA LOCAL INFILE 'file.txt' "
+ "INTO TABLE tableEntry "
+ " (date, a, b, c, d, e, idx) "
+ " SET FK_ID ="
+ getExternalId();
InputStream is = readDataAsImpuStream(source);
((com.mysql.jdbc.Statement) stmt).setLocalInfileInputStream(is);
stmt.execute(statementText);
The request LOAD DATA LOCAL INFILE is specific to mysql and allows the best performance is our case (time divide by 15). We have other thing to do optimize again our request time like :
- RBD server setting
- Modifyng table structure to reduce the number of fields
- ....
No comments:
Post a Comment