Question

Efficiently Downloading ~100K User Records in Mendix: Best Practices for Performance & Scalability

0

Hi Mendix Community, I`m Working on a project where I need to download a large dataset of users approximately 100,000 records. I want to ensure the process is efficient, reliable, and follows best practices for performance and scalability.

asked 2025-07-16

Elamathi S

2 answers

0

Hi Elamathi,

As mentioned by Mouli Dharan to use CSV module. There is an activity called 'SQL to CSV', using this you can write your SQL query (including complex and nested queries) which runs directly on database. Get the result and download it with option to deleteAfterDownload.

We have used this option for 1.4 million records and it takes approx. 3-4 secs to download the data.

Thanks

answered 2025-08-03

Nabin Choudhary

Mouli Dharan · Accepted Answer · 2025-07-16

Hi Elamathi, If you're working with a large dataset (e.g. 100,000+ user records), using batching in combination with the Task Queue module is a reliable and scalable approach.

1. Use Task Queue for Background Processing

Offload the export logic to a background task using the Task Queue module.
This keeps the UI responsive and avoids timeouts during file generation.

2. Implement Batching Logic

Within the background microflow (run by Task Queue), process data in batches (e.g. 1,000–5,000 records at a time).
Use offset and limit in XPath retrieves or keep a marker (e.g. last processed ID).

3. Exporting to File

Use CSV for performance (lighter than Excel).
Either:
- Use the CSV Exporter module, or
- Build a simple file appender using StringBuilder, then write to a FileDocument.
Commit data after each batch using commitInSeparateTransaction (from Community Commons) to reduce memory pressure.

4. User Notification

When the task completes, update a status flag or send a message to notify the user.
Provide a download link or auto-download via a file association.

Additional Tips:

Avoid retrieve-all. never load all 100k records at once.
Add indexes, ensure your retrieve queries are optimized with DB indexes.
Add logs per batch to monitor processing and help debugging.

I hope this one helps you! :)