Question

using Queue for processing large tables

0

Hello All, I am fairly new to queues. Now I am trying to implement to implement Queue app (https://appstore.home.mendix.com/link/app/106628). I have followed the instructions and managed to create the jobs, add them to the queues and have them executed. However I am wondering whether I implemented it correctly for my use case. I have a large table withe objects (500.000+). And I want to do something with those objects on a certain time every day. To not run into memory problems I want to process them using queues in which I process 1000 objects at a time. I created an extra Entity (1 to 1 relation) with the Job where I store the Offset and the number to process. When the job is executed the table will be queried and the objects are processed. I tried this in a sample project with a static table and it worked. However I am not sure if this is the correct way. What if during execution the a record is added to the table for instance? I don’t think that I have a guarantee the all the records in the table are processed. So my question to you all. How should this be implemented?

asked 2020-08-20

Arne Koning

1 answers

david verstricht · Answer 1 · 2020-08-20

You could create a DateTime attribute(LastProcessed) in the Entity you are going to process.

In your processing microflow at the start you create a DateTime Variable (ProcessStarted) with the value of ‘CurrentDateTime’.

Then in your retrieve of objects to process you set the offset to 0 and the Amount to 1000. your XPath should be set to [LastProcessed != $ProcessStarted]

Create a list to commit the changed (processed) objects.

When iterating over the list you change each objects Attribute ‘LastProcessed’ and you set the DateTime Variable you created in your first action in your microflow.

After the loop you do a commit of the processed list, then a count on the retrieved list. If the amount in the list was 1000 then you know there are still more records to be retrieved and you merge back to the point you do the retrieve. since the allready committed objects will not be returned with the XPath you are sure that you allways get a fresh list of unprocessed objects.

If during the process objects get added to the database they will be captured with the next retrieve since NewObject/LastProcessed won’t be equal to the DateTime variable(ProcessStarted).

I hope this helps you on your way.