Process (very) large list in batches

On a weekly base my application needs to synchronize a flat text file with about 900.000 rows (and growing) into the database. I use the FlatFileInterface module from the appstore and a a seperate "sync"-table before I import the data into the "production"-table. The flat-file-import takes about 20 minutes to process the huge flat file. Fair enough. After that I have a seperate process that synchronize the contents of the "sync"-table into the "production"-table. Every row in the "sync"-table is checked if it already exists or needs to be created in the "production"-table or not. If it already exists the row in the "production"-table gets a check "true" so it can stay. If it's does not exist it will be created in the "production"-table with a check "true". After this process all rows with checked "false" will be deleted from "production"-table. This process of synchronizing takes about 45 (!!!) minutes. My question is: At point 2. I process all the rows (900.000) in one big list from the "sync"-table into the "production"-table. But is there a way to do this in seperate batches of smaller size?
2 answers

This process not only takes 45 minutes but also a lot of memory.

Use the offset and limit to retrieve chunks of data and process them. After that call the community commons 'EndTransaction'. This will save all data to the database and release the memory. Increase the offset and start again until the retrieve results in an empty list.

Limit can be something between 1000-5000.

This will still take an amount of time but it will consume far less memory and if its fails at a certain point all data passing 'EndTransaction' is stored in database. If you do this all in one transaction it won't do anything effectively.


See also the documentation here:




Documentation needs an update. The one that comes close is now this one:

But indeed the documentation needs to be improved again. I pitty they did remove the old page.