Performant and robust Data Synchronization Engin - Batch Sizes and Commit best Practice

0
Question: I need assistance with transferring large amounts of data to populate the Mendix database efficiently. What's the recommended approach for this task? Example Context: We've successfully integrated non-persistent entities using the External Database Connector. We're using Mendix interfaces for data transfer, allowing for high performance during data import. We've encountered cache issues within the deployment environment when trying to batch write data. Error Message: We've tried batch writing data in Mendix with committing actions in Java but hit cache usage issues for large data sets. Desired Outcome: We're looking for an optimal batch size for handling over 1 million object writes. A suitable pattern for balancing batch write speed with cache usage is needed. Monitoring and error handling mechanisms for batch write sessions are also required to ensure smooth data transfer and data integrity. Feel free to share your ideas and experiences related to this topic.  
asked
2 answers
2

Hi,

 

I have had a ticket open with Mendix on a similar case: An API returns maximum 1000 objects per call but we need to download 2.5 million records. The microflow works using the batch pattern, with EndTransaction after each commit of a batch list. Still, processing slowed down and memory usage kept going up.

 

I changed it to do maximum 5 batches (configurable) and then start a fresh instance in the task queue. So one run does 5 batches at the most. Result is good performance. Number of batches in one run varies for the number of NPEs you retrieve for an entry: many child objects in one entry.

 

Also, in your REST mapping NPEs, do not use a regular association from the child to the parent but a reference set with owner both or a reference set from the owner to the child. Note that the new REST client cannot handle that. As R&D told me, getting a list of NPEs under a parent is not very efficient as the parent does not have the child GUIDs as with a reference set.

 

What also helps is deleting the NPEs after processing them into your persistent entities at the end of each batch iteration

answered
0

You should consider using the batch pattern described in a Mendix learning path. Committing a large number of objects at once can cause heap size and cache issues, which is likely what you are experiencing. Instead, commit the records in smaller chunks. Based on my experience, around 3,000 objects per commit tends to work very efficiently while keeping memory usage under control.

answered