Best‑practice CDC and data extraction pattern from Mendix to Databricks for large datasets

0
Hi Mendix Community,We are designing a Databricks Lakehouse for a Mendix application that manages large‑volume data. The application supports multiple Lines of Business (LOBs), each mapped to a separate schema.RequirementsInitial load (Day 1): Execute a scheduled job to perform a one‑time full extract, generating a single consolidated JSON or CSV file per LOB/schema, and publish it to Azure Blob Storage.Incremental load (Day 2+): On each scheduled run, extract only delta changes since the previous run and publish them as a single JSON or CSV file to Azure Blob Storage.Databricks will consume these files to build and maintain the data warehouse.ConstraintThe Mendix database is very large, and data extraction already has noticeable latency. The solution must avoid impacting runtime performance and prevent heavy database load.QuestionsAre there Mendix‑recommended or supported patterns for full and incremental data extraction or Change Data Capture (CDC)?Would database‑level CDC, event‑driven approaches, or Mendix‑supported integrations be more appropriate for this scenario?Are there known best practices for generating single consolidated files per scheduler run at scale?Any guidance or reference architectures would be appreciated.
asked
2 answers
0

Hi Yogesh, we're working on a new feature in the mendix platform that provides CDC like data replication. Can you contact me at andrej.koelewijn@mendix.com so we can setup a call to discuss your requirements? Thanks, Andrej

answered
0

Hi Yogesh, can you send me an email on andrej.koelewijn@siemens.com, we are working on out of the box CDC replication in the platform. We can schedule a call to discuss. Thanks, Andrej

answered