Query behavior for Generalization/Specialization

Hey all, I've been running some tests on how Mendix performs under different circumstances (bit of time to kill). In part, this means testing common knowledge like 'it is faster to commit in a list than in an iterator' (it is. a lot).  Another is 'it is faster to retrieve a generalization than its specialization'.  This is also true. About twice as fast. But not for the reason I had expected. My (unsubstantiated) assumption had been that when retrieving a specialization, the generalization would also automatically be retrieved, but when retrieving a generalization, no specialization would be retrieved until needed (through a cast, for example).  Turns out that's not quite how things seem to work, and I find myself confused. So, here's hoping that a greater mind than my own can shed some light on this mystery: Picture 1: Domain model for test case   Originally, this part of my test concerned the behavior and performance of specialization/generalization vs. two entities with a 1-1 relation. As part of the test, I ran two microflows; one to retrieve specializations (1000 records) and one to retrieve generalizations (again 1000 records). Unsurprisingly, retrieving the specialization took about twice as long. But when I started looking at the ConnectionBus_Retrieve logs, I noticed something odd:   Data table TestModule.ChildSpecial (1000 row(s)) Column names: 1 testmodule$childspecial.ID PRIMARY 2 Attribute9 3 Attribute8 4 Attribute7 5 Attribute6 6 Attribute10 7 Attribute5 8 Attribute4 9 Attribute3 10 Attribute2 11 Attribute1 Row 1: 1: [ ] [MendixIdentifier:: id=5348024558003465 objectType=TestModule.ChildSpecial entityID=19] 2: [ ] 2158865 3: [ ] 2158865 4: [ ] Thu Apr 18 08:56:11 CEST 1996 5: [ ] Thu Apr 18 08:56:11 CEST 1996 6: [ ] 2158865 7: [ ] Thu Apr 18 08:56:11 CEST 1996 8: [ ] true 9: [ ] NULL 10: [ ] l0BSMQszfJ1YGo2plg2X 11: [ ] l0BSMQszfJ1YGo2plg2X  Picture 2: Mendix log extract for specialization retrieve   Data table TestModule.ParentGeneral (1000 row(s)) Column names: 1 testmodule$parentgeneral.ID PRIMARY 2 submetaobjectname 3 Attribute5 4 Attribute4 5 Attribute3 6 Attribute2 7 Attribute1 8 TestModule.ChildSpecial.Attribute9 9 TestModule.ChildSpecial.Attribute8 10 TestModule.ChildSpecial.Attribute7 11 TestModule.ChildSpecial.Attribute6 12 TestModule.ChildSpecial.Attribute10 Row 1: 1: [ ] [MendixIdentifier:: id=5348024558003465 objectType=TestModule.ChildSpecial entityID=19] 2: [ ] TestModule.ChildSpecial 3: [ ] Thu Apr 18 08:56:11 CEST 1996 4: [ ] true 5: [ ] NULL 6: [ ] l0BSMQszfJ1YGo2plg2X 7: [ ] l0BSMQszfJ1YGo2plg2X 8: [ ] 2158865 9: [ ] 2158865 10: [ ] Thu Apr 18 08:56:11 CEST 1996 11: [ ] Thu Apr 18 08:56:11 CEST 1996 12: [ ] 2158865 Picture 3: Mendix log extract for generalization retrieve   As you can see, both queries appear to return the entire 10 attribute specialization/generalization combination. Debugging the retrieve microflow also shows that all attributes are available.  Picture 4: Test microflow debug   When I dig into query logs of the database level, I get the following: 2018-05-30 11:39:35.482 CEST [15988] LOG: execute <unnamed>: SELECT "testmodule$childspecial"."id", "testmodule$childspecial"."attribute9", "testmodule$childspecial"."attribute8", "testmodule$childspecial"."attribute7", "testmodule$childspecial"."attribute6", "testmodule$childspecial"."attribute10", "dj1testmodule$parentgeneral"."attribute5", "dj1testmodule$parentgeneral"."attribute4", "dj1testmodule$parentgeneral"."attribute3", "dj1testmodule$parentgeneral"."attribute2", "dj1testmodule$parentgeneral"."attribute1" FROM "testmodule$childspecial" INNER JOIN "testmodule$parentgeneral" "dj1testmodule$parentgeneral" ON "dj1testmodule$parentgeneral"."id" = "testmodule$childspecial"."id" ORDER BY "testmodule$childspecial"."id" ASC LIMIT 1000 OFFSET 500000 2018-05-30 11:39:35.928 CEST [15988] LOG: duration: 446.324 ms Picture 5: Database query for specialization retrieve   2018-05-30 11:39:58.225 CEST [15988] LOG: execute <unnamed>: SELECT "testmodule$parentgeneral"."id", "testmodule$parentgeneral"."submetaobjectname", "testmodule$parentgeneral"."attribute5", "testmodule$parentgeneral"."attribute4", "testmodule$parentgeneral"."attribute3", "testmodule$parentgeneral"."attribute2", "testmodule$parentgeneral"."attribute1" FROM "testmodule$parentgeneral" ORDER BY "testmodule$parentgeneral"."id" ASC LIMIT 1000 OFFSET 500000 2018-05-30 11:39:58.378 CEST [15988] LOG: duration: 153.329 ms 2018-05-30 11:39:58.382 CEST [15988] LOG: duration: 0.968 ms parse <unnamed>: SELECT "testmodule$childspecial"."id", "testmodule$childspecial"."attribute9", "testmodule$childspecial"."attribute8", "testmodule$childspecial"."attribute7", "testmodule$childspecial"."attribute6", "testmodule$childspecial"."attribute10" FROM "testmodule$childspecial" WHERE "testmodule$childspecial"."id" IN (5348024558003465, ...[1000x internal ID, removed for brevity sake]... 5348024558004464) 2018-05-30 11:39:58.383 CEST [15988] LOG: duration: 0.683 ms bind <unnamed>: SELECT "testmodule$childspecial"."id", Picture 6: Database query for generalization retrieve   I presume I am either overlooking something, or have made a mistake in my test, because this outcome seems illogical on several levels: 1) Why automatically retrieve the specialization with the generalization? It seems impractical from a performance perspective, as in many situations you won't need it. <-- see update.  2) The query for retrieving the generalization seems a lot less efficient (join vs. retrieve via id), but is clearly faster, why is that? 3) If the query for retrieving the generalization is about twice as fast and yields the same data; why is the less efficient query used to retrieve the specialization?   I'm genuinely curious as to what's behind all this, so any insight is welcome :)   [UPDATE 30/05] The cause of the observed behavior seems to be the way Mendix translates OQL to SQL.  In case of a Datagrid, this seems to go as expected. For generalization retrieves via Microflows and Listviews, the OQL query is always translated into two SQL queries  (three for Listview, including a 'count'): A select statement on the generalization table, and after that a select for all the reference IDs for the specializations.   This may have a specific reason, or be an opportunity for some performance optimization. Perhaps someone from Mendix can clarify.  The other mystery remains though: Why does this two-step query actually seem faster than the inner join used for retrieval of the specialization?   [30/05 13:23 edit note: added image of debug microflow]
3 answers

Each Entity in Mendix gets its corresponding table in the database. Each non-virtual attribute of that entity becomes a field in the corresponding table. In case of a generalization then the actual entity instance is partially stored in the generalization table and partially in the specialization table.


Let's take the example of the `Gen` and `Spec` entities (where `Spec` inherits from `Gen`).


If we would execute an XPath to get all the `Spec` instances (`//MyModule.Spec`) then we need to read from the `Spec` table the attributes defined on the `Spec` entity and from the `Gen` table the attributes defined on the `Gen` entity. We know that for each record in `Spec` there is a corresponding record in `Gen`. As such we can use a single query to read that information using an INNER JOIN. 


If we would execute an XPath to get all the `Gen` instances, then we actually mean to say: give me all the `Gen` instances and all the `Spec` instances (that's the meaning of inheritance, a `Spec` instance is also a `Gen` instance). However, some records available in `Gen` might not be available in `Spec` (we can't invert that same rule to say that all `Gen` instances are also `Spec` instances). Also, there might be cases where there is a `Spec2` instance also inheriting from `Gen`. So that's why we need multiple queries. We start with querying `Gen`. This gives us all the `Gen` instances from which we can deduct which instances are of a more specific type (the `Spec` instances for example). For those instances we fire additional queries (one per specialization entity) to load the attributes defined on those entities. This results in those queries with an `IN` statement having a lot of ID's in it).


The main reason to do that directly is because you actually asked to load the full entity instance (by using the XPath query). The full entity instance is an instance with the attributes from all tables belonging to that entity and inheritance structure. If you would defer loading that, you might do it on demand, but that gets actually worse if you iterate over the entity instances (as it would have to query the `Spec` table per instance).


The chosen approach (INNER JOIN where possible, multiple queries where necessary) is done because usually when there are less records in the database, then the latency of multiple queries quickly outgrows the speed of separate queries. Although I have to admit that I am a bit surprised that the database is apparently more efficient in a big IN statement execution than an INNER JOIN. I'll have to do a bit more investigation into that why this is the case.


Does this answer your questions? 


[Edit: 2018-06-12] 

I did some investigations in the time spend on those queries. I created a situation with `Gen` and `Spec` entities containing 1000 `Spec` entities. When I query all `Gen` entity instances (so executing two queries) my query performs around twice as slow as when I query all `Spec` entity instances (executing one query with INNER JOIN). This is exactly the opposite of what you measured!

So I looked again at your example and noticed one important difference. You have a LIMIT of 1.000 records in your query, but also an OFFSET of 500.000 records. Now, this is interesting. When executing the INNER JOIN query, the database will most probably first join _all_ the records of `Gen` with the corresponding records of `Spec` and only then drop the first 500.000 records. This is obviously slower then loading 1000 specific records from the `Spec`table. I think this is the main reason for the performance difference. I think when you try this again without the OFFSET, you will see different numbers.



1) It usually does not. For this domain

The GEN OQL"SELECT MyFirstModule.gen/* FROM MyFirstModule.gen LIMIT 20" results in

SELECT "myfirstmodule$gen"."id",
 FROM "myfirstmodule$gen"
 ORDER BY "myfirstmodule$gen"."id" ASC LIMIT 20

The SPEC OQL "SELECT MyFirstModule.spec/* FROM MyFirstModule.spec LIMIT 20" results in

"SELECT "myfirstmodule$spec"."id",
 FROM "myfirstmodule$spec"
 INNER JOIN "myfirstmodule$gen" "dj1myfirstmodule$gen" ON "dj1myfirstmodule$gen"."id" = "myfirstmodule$spec"."id"
 ORDER BY "myfirstmodule$spec"."id" ASC LIMIT 20"

So there is something wrong with your test. Sure you got the correct logs? Picture 6 contains more then just the retrieval of GEN. The second part (

2018-05-30 11:39:58.382

is retrieval of SPEC. Probably because your app requests it.

Try again with a simpler model and more separated testcases.

2) and 3) will probably become clearer with your simpler model


A little more detail on what I'm seeing, in response to @Tim van Steenbergen (thanks for pitching in btw). For the record; I'm pretty sure there's something wrong with my test. I'm just not sure what. Hence the post :)

Here's a step-by-step of what I've got.

The domain model (above). Other than having a few more attributes, it doesn't look different from what Tim is using.

The microflow (above). Called from a button. This is literally the entire test. No further actions are called, no pages are shown. 


The retrieve action (above). RunCount constant is set to 1000. Offset is set to 500k (halfway the 1000k DB table).

So far, it all looks about as basic as I can make it. 

The Mendix ConnectionBus_Retrieve shows me the following:

The log output for with ConnectionBus_Retrieve set to trace (above) shows six records that relate to my query.

The first log record (above) shows the output of the request analyzer. Nothing unexpected there.


The second log record (above) shows the incoming query, with what I interpret as a query ID: [071ab9c3-358f-3d20-8d42-7fadc1b7c6e0]. Still looks like a standard retrieve on the ParentGeneral table.

The third log record (above) shows the OQL translation of the Xpath query. Again, nothing unexpected.

 The fourth log record (above) shows the SQL translation of the OQL query. Still nothing unexpected.

The fifth log record (above) shows a second SQL query for the same original query ID. This is the retrieve of the ChildSpecial table (or Tim's SPEC). As I read it, this query originates from the same Xpath / OQL, though I can't for the life of me figure out why.

The sixth log record (above) shows the outcome, similar to the one in my original post. It includes the attributes for both ParentGeneral and ChildSpecial. 


I'll set up another test and try to further simplify things. Still, Apart from the limit/offset, it feels about as basic as it can be. 


[Edit note: resized pictures]