Oracle Multi-Value Index and ORDER BY Pagination queries

In the previous post, I highlighted a case where relational SQL databases struggle with optimizing index access with joins. In contrast, a document model using MongoDB's multi-key indexes efficiently retrieves only the necessary data for the results. SQL databases implemented document datatypes, storing them in binary JSON columns in SQL tables, but that's not enough to compete as a document database. It's essential to look at the JSON path indexing possibilities? A JSON document may include nested sub-documents, and you must index paths that may go through arrays. Oracle 21c introduced multi-value indexes for JSON. Let's test it with the same example from the previous post where MongoDB reads only the necessary index entries, filtering upfront thanks to embedding a One-to-Many instead of joining. Example As I demonstrated it on MongoDB, I used the MongoDB-compatible API of Oracle 23c to use the same syntax, which saves me migrating to the SQL/JSON syntax: const bulkOps = []; for (let i = 0; i db.orders.find( { country_id: 1, order_details: { $elemMatch: { product_id: 15 } } } ).sort({ created_at: -1 }).limit(10).explain(`executionStatistics`) { queryPlanner: { plannerVersion: 1, namespace: 'ora.orders', indexFilterSet: false, parsedQuery: { '$query': { '$and': [ { country_id: { '$numberOnly': 1 } }, { 'order_details[*]': { product_id: { '$numberOnly': 15 } } } ] }, '$orderby': { '$fields': [ { path: 'created_at', order: 'desc', sortByMinMax: true } ], '$lax': true } }, rewrittenQuery: { '$and': [ { '$query': { '$and': [ { country_id: { '$numberOnly': 1 } }, { 'order_details[*]': { product_id: { '$numberOnly': 15 } } } ] } }, { '$orderby': { '$fields': [ { path: 'created_at', order: 'desc', sortByMinMax: true } ], '$lax': true } } ] }, winningPlan: { stage: 'SELECT STATEMENT', inputStage: { stage: 'COUNT', options: 'STOPKEY', columns: '"from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]', filterType: 'filter', filter: 'ROWNUM

Apr 11, 2025 - 08:41
 0
Oracle Multi-Value Index and ORDER BY Pagination queries

In the previous post, I highlighted a case where relational SQL databases struggle with optimizing index access with joins. In contrast, a document model using MongoDB's multi-key indexes efficiently retrieves only the necessary data for the results.

SQL databases implemented document datatypes, storing them in binary JSON columns in SQL tables, but that's not enough to compete as a document database. It's essential to look at the JSON path indexing possibilities?
A JSON document may include nested sub-documents, and you must index paths that may go through arrays. Oracle 21c introduced multi-value indexes for JSON. Let's test it with the same example from the previous post where MongoDB reads only the necessary index entries, filtering upfront thanks to embedding a One-to-Many instead of joining.

Example

As I demonstrated it on MongoDB, I used the MongoDB-compatible API of Oracle 23c to use the same syntax, which saves me migrating to the SQL/JSON syntax:

const bulkOps = [];
for (let i = 0; i < 1000000; i++) {
  const orderDetails = [];
  for (let line = 1; line <= 10; line++) {
    orderDetails.push({
      line: line,
      product_id: Math.floor(Math.log2(1 + i * Math.random())),
      quantity: Math.floor(100 * Math.random()),
    });
  }
  bulkOps.push({
    insertOne: {
      document: {
        country_id: Math.floor(10 * Math.random()),
        created_at: new Date(),
        order_details: orderDetails
      }
    }
  });
}
db.orders.bulkWrite(bulkOps);

db.orders.createIndex(
 { "country_id": 1, "order_details .product_id": 1, "created_at": -1 }
);

The MongoDB emulation on Oracle Database has converted this to the following SQL table with a multi-value index:

CREATE JSON COLLECTION TABLE "ORA"."orders" 
   ( "SYS_NC00005$" RAW(4000) GENERATED ALWAYS AS ( JSON_MKMVI(JSON_TABLE( "DATA", '$' PRESENT ON EMPTY MINIMAL CROSS PRODUCT WITH ERROR ON PARALLEL ARRAYS COLUMNS( NESTED PATH '$."country_id"[*]' COLUMNS( "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) , NESTED PATH '$."order_details"[*]' COLUMNS( NESTED PATH '$."product_id"[*]' COLUMNS( "K1" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) ) , NESTED PATH '$."created_at"[*]' COLUMNS( "K2" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH  DESC ) ) )  AS "K0","K1","K2" DESC )) VIRTUAL , 
    "SYS_NC00006$" RAW(4000) GENERATED ALWAYS AS (JSON_QUERY("DATA" FORMAT OSON , '$."order_details"[*]."product_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)) VIRTUAL , 
    "SYS_NC00007$" RAW(6001) GENERATED ALWAYS AS (JSON_QUERY("DATA" FORMAT OSON , '$."created_at"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE DESC )) VIRTUAL 
   )  DEFAULT COLLATION "USING_NLS_COMP" ;

CREATE MULTIVALUE INDEX "ORA"."$ora:orders.country_id_1_order_details.product_id_1_created_at_-1" ON "ORA"."orders" (JSON_MKMVI(JSON_TABLE( "DATA", '$' PRESENT ON EMPTY MINIMAL CROSS PRODUCT WITH ERROR ON PARALLEL ARRAYS COLUMNS( NESTED PATH '$."country_id"[*]' COLUMNS( "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) , NESTED PATH '$."order_details"[*]' COLUMNS( NESTED PATH '$."product_id"[*]' COLUMNS( "K1" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) ) , NESTED PATH '$."created_at"[*]' COLUMNS( "K2" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH  DESC ) ) )  AS "K0","K1","K2" DESC ), JSON_QUERY("DATA" FORMAT OSON , '$."order_details"[*]."product_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE), JSON_QUERY("DATA" FORMAT OSON , '$."created_at"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE DESC ) DESC) 
  ;

The execution plan provided by the translation layer shows some SQL, but not the execution statistics:


ora> db.orders.find( { country_id: 1, order_details: { $elemMatch: { product_id: 15 } } } ).sort({ created_at: -1 }).limit(10).explain(`executionStatistics`)

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'ora.orders',
    indexFilterSet: false,
    parsedQuery: {
      '$query': {
        '$and': [
          { country_id: { '$numberOnly': 1 } },
          { 'order_details[*]': { product_id: { '$numberOnly': 15 } } }
        ]
      },
      '$orderby': {
        '$fields': [ { path: 'created_at', order: 'desc', sortByMinMax: true } ],
        '$lax': true
      }
    },
    rewrittenQuery: {
      '$and': [
        {
          '$query': {
            '$and': [
              { country_id: { '$numberOnly': 1 } },
              {
                'order_details[*]': { product_id: { '$numberOnly': 15 } }
              }
            ]
          }
        },
        {
          '$orderby': {
            '$fields': [
              { path: 'created_at', order: 'desc', sortByMinMax: true }
            ],
            '$lax': true
          }
        }
      ]
    },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'COUNT',
        options: 'STOPKEY',
        columns: '"from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]',
        filterType: 'filter',
        filter: 'ROWNUM<=10',
        inputStage: {
          stage: 'VIEW',
          columns: '"from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]',
          inputStage: {
            stage: 'SORT',
            options: 'ORDER BY STOPKEY',
            columns: `(#keys=1) JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.created_at[*].max()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16]`,
            filterType: 'filter',
            filter: 'ROWNUM<=10',
            path: "$.created_at[*].max()'",
            inputStage: {
              stage: 'TABLE ACCESS',
              options: 'BY INDEX ROWID BATCHED',
              source: 'orders',
              columns: `"DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16]`,
              path: "$._id'",
              inputStage: {
                stage: 'HASH',
                options: 'UNIQUE',
                columns: '(#keys=2) "orders".ROWID[ROWID,10], SYSVARCOL[8]',
                inputStage: {
                  stage: 'INDEX',
                  options: 'RANGE SCAN (MULTI VALUE)',
                  source: '$ora:orders.country_id_1_order_details.product_id_1_created_at_-1',
                  columns: `"orders".ROWID[ROWID,10], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."country_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."order_details"[*]."product_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000], SYSVARCOL[8]`,
                  filterType: 'access',
                  filter: `JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."country_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)=SYS_CONS_ANY_SCALAR(:1, 3) AND JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."order_details"[*]."product_id"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)=SYS_CONS_ANY_SCALAR(:2, 3)`
                }
              }
            }
          }
        }
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}
ora> 

However, It's the SQL engine behind, and I captured the SQL statement from GV$SQL:


SELECT /*+ FIRST_ROWS(10) monitor */
    "DATA",
    RAWTOHEX("RESID"),
    "ETAG"
FROM
    "ORA"."orders"
WHERE
    JSON_EXISTS ( "DATA", '$?( 
        (@.country_id.numberOnly() == $B0)
        &&
        ( exists(@.order_details[*]?( (@.product_id.numberOnly() == $B1) )) ) 
    )'
    PASSING :1 AS "B0",
    :2 AS "B1" TYPE ( STRICT ) )
ORDER BY
    JSON_QUERY("DATA", '$.created_at[*].max()') DESC NULLS LAST
FETCH NEXT 10 ROWS ONLY

I have added the /* monitor */ hint to get the SQL execution plan with execution statistics:

Image description

Unlike MongoDB, which reads only ten documents with such .find().sort().limit(10) query, Oracle's multi-value index had to read all index entries (53K Rows) and documents, and sorted them, before returning the Top-10. The index was used for filtering on the WHERE clause, but not for ordering on the ORDER BY FETCH NEXT 10 ROWS ONLY despite having "created_at" in the key next to the equiality filter columns.

Possible explanation

I tried to understand if it is a limitation of Oracle multi-value indexes, or the query planner. The internals of Oracle multi-value indexes are not documented, but there's the patent that describes the idea: https://patents.google.com/patent/US11640380B2

My understanding from it is that the following document:

{
  "_id": ObjectId('67f1a477aabaf2dad73f4791'),
  "country_id": 1,
  "created_at": ISODate('2025-04-05T21:45:21.546Z'),
  "order_details": [
    { "line": 1, "product_id": 19, "quantity": 40 },
    { "line": 2, "product_id": 15, "quantity": 10 },
    { "line": 3, "product_id": 18, "quantity": 75 },
    { "line": 4, "product_id": 15, "quantity": 50 }
  ]
}

with the following index:

db.orders.createIndex(
 { "country_id": 1, "order_details.product_id": 1, "created_at": -1 }
);

results in the following index entries where # is the ordinal position in the "order_details" array:

+------------+--------------------+---------------------------+
| country_id |  order_details     | created_at                |
|            +---------------+----+                           |
|            | .product_id   | #  |                           |
+------------|---------------|----|---------------------------+
| 1          |  19           | 0  | 2025-04-05T21:45:21.546Z  |
| 1          |  15           | 1  | 2025-04-05T21:45:21.546Z  |
| 1          |  18           | 2  | 2025-04-05T21:45:21.546Z  |
| 1          |  15           | 3  | 2025-04-05T21:45:21.546Z  |
+------------|---------------|----|---------------------------+

By applying an equality predicate on "country_id" and "product_id," and considering that "created_at" is at the same level and not nested in "order_details," the optimizer could potentially recognize that the entries are sorted by "created_at". Note that it is possible that the ordinal position is not present as I see no FOR ORDINALITY clause in the CREATE MULTIVALUE INDEX statement.

The index entries are ordered and a TABLE ACCESS BY INDEX ROWID should preserve the order, as it is not BATCHED to reduce scattered reads, so a blocking sort operation could be avoided. This could enable the query to read only the necessary documents, but it is not what I observe from the execution plan. An HASH UNIQUE operation breaks the ordering in between. It is true that multiple entries may be returned for the same document and we want the document only once, but I would have expected a SORT UNIQUE NOSORT to preserve the ordering.

Note that all these are guesses, I've run this on the autonomous managed service where I do not have access to the datafiles to look at the internals of the multi-value index.

Conclusion

In exploring multi-value indexes in Oracle for sorting and filtering nested JSON data, a key performance differences emerges compared to MongoDB. Oracle's indexes manage complex JSON structures but struggle with key ordering, requiring an additional blocking sort, which is problematic in pagination queries. This was tested on Oracle 23ai (23.7.0.25.03) showing a MongoDB version of 4.2.14 and may change in the future as the multi-value indexes apparently store the entries in order.

Thus, when evaluating a SQL database to be used as a document database, it's vital to consider both efficient JSON storage and indexing options. A comment in the previous post shows that it's the same with PostgreSQL, as GIN indexes, required for JSON paths though arrays, does not return entries in order.