Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 16 hours 49 min ago

Object_id

Sun, 2024-02-11 12:23

This is a note that will probably be of no practical use to anyone – but someone presented me with the question so I thought I’d publish the answer I gave:

Do you know how object ID is assigned? It doesn’t look as if a sequence is used

I’m fairly sure the mechanism has changed over versions. (Including an enhancement in 23c where the object number of a dropped (and purged) object can be reused.)

I don’t really know what Oracle does, but I do know that there is an object in obj$ called “_NEXT_OBJECT” and Oracle uses that as a control mechanism for the dataobj# and obj# (data_object_id and object_id) in some way. I think Oracle uses the row a bit like the row used by a sequence in seq$ – the dataobj# is bumped by a small value (seems to be 5) each time it is reached (cp. seq$.highwater for a sequence) and it’s possible that the obj# is used to record the instance number of the instance that bumped it. The instance then (I think) has a small cache of obj# values it can use before it has to read and update the “_NEXT_OBJECT” row again.

Footnote.

You might note that this description means that it is the dataobj# that actually drives the generation of a new obj# / object_id. You can demonstrate this most easily (if you have sole access to the database) by:

  • creating a table,
  • checking its object_id and data_object_id (which will match),
  • moving it a couple of time (which will increment the data_object_id – and only the data_object_id – each time),
  • creating another table.

The second table will have an object_id that is one more than the current data_object_id of the first table.

Descending Bug

Thu, 2023-12-21 09:12

This is another example of defects in the code to handle descending columns in indexes, with the added feature that the problem is addressed somewhere between 19.4 and 19.10 (it’s present in 19.3, gone in 19.11) – which means that if you upgrade to a recent RU of from some of earlier versions some of your production code may return rows in a different order. On the plus side, it will be the correct order rather than the previously incorrect order. It’s likely that if your code was exhibiting this bug you would have noticed it before the code got to production, so this note is more a confirmation than a realistic warning of a future threat.

The bug originally showed up in a thread on the Oracle developer forum more than a year ago but I was prompted to finish and publish this note after seeing an article on deadlocks by Frank Pachot where the behaviour of his demonstration code could vary with version of Oracle because of this bug.

Here’s some code to create a demonstration data set:

rem
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0       Right order
rem             19.3.0.0        Wrong order
rem             12.2.0.1        Wrong order
rem
rem     Notes
rem     From 12.1.0.2 to ca. 19.3(++?) the optimizer loses a "sort order by" 
rem     operation when a "descending" index meets an in-list iterator.
rem     

create table t1 
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        substr(dbms_random.string('U',6),1,6)   v1,
        rpad('x',100,'x')                       padding
from
        generator
/

alter table t1 modify v1 not null;

update t1 set v1 = 'BRAVO'      where id = 5000;
update t1 set v1 = 'MIKE'       where id = 1000;
update t1 set v1 = 'YANKEE'     where id = 9000;

create index t1_i1 on t1(v1 desc);

I’ve created a table with a column generated as short random strings, then set three rows scattered through that table to specific values, and created an index on that column – but the index is defined with the column descending.

(Reminder: if all the columns in an index are declared as descending that all you’ve done is waste space and introduce an opportunity for the optimizer to go wrong – descending columns in indexes only add value if the index uses a combination of ascending and descending columns).

Here’s a simple query – with the results when executed from SQL*Plus in 12.2.0.1. Note, particularly, the order by clause, the order of the results, and the body of the execution plan:

set serveroutput off

select  v1, id
from    t1
where   v1 in (
                'MIKE',
                'YANKEE',
                'BRAVO'
        ) 
order by 
        v1
;

select * from table(dbms_xplan.display_cursor(format=>'outline'));


V1                               ID
------------------------ ----------
YANKEE                         9000
MIKE                           1000
BRAVO                          5000

3 rows selected.


SQL_ID  6mpvantc0m4ap, child number 0
-------------------------------------
select v1, id from t1 where v1 in (   'MIKE',   'YANKEE',   'BRAVO'  )
order by  v1

Plan hash value: 4226741654

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |       |       |    22 (100)|          |
|   1 |  INLIST ITERATOR              |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN DESCENDING| T1_I1 |    40 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

The most important point, of course, is that the result set is in the wrong order. It’s interesting to note that there is no “sort order by” operation and that the index range scan is described as “descending”. A brief pause for thought suggests that if you do a descending range scan of a “descending index” then the results ought to come out in ascending order which might explain why the optimizer thought it could eliminate the sort operation. However that thought isn’t necessarily valid since the “inlist iterator” means Oracle should be executing “column = constant” once for each value in the list, which would make the ascending/descending nature of the index fairly irrelevant (for this “single-column” example).

When I created the same data set and ran the same query on 19.11.0.0 I got exactly the same execution plan, including matching Predicate Information and Outline Data (apart from the db_version and optimizer_features_enable values, of course), but the result set was in the right order. (It was still wrong in a test against 19.3, so the fix must have appeared somewhere in the 19.4 to 19.11 range.)

Workaround

In this example one of the ways to work around the problem (in 12.2) was to add the index() hint (which is equivalent to the index_rs_asc() hint) to the query, resulting in the following plan (again identical in 12c and 19c):

SQL_ID  6x3ajwf41x91x, child number 0
-------------------------------------
select  /*+ index(t1 t1_i1) */  v1, id from t1 where v1 in (   'MIKE',
 'YANKEE',   'BRAVO'  ) order by  v1

Plan hash value: 1337030419

-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |       |       |    23 (100)|          |
|   1 |  SORT ORDER BY                        |       |     3 |    33 |    23   (5)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |       |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |    40 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

The “index range scan” operation is no longer “descending”, and we have a “sort order by” operation. You’ll note that, thanks to blocking sort operation the table access is now “batched”.

Best Guess

The way Oracle handles an IN-list is to start by reducing it to a sorted list of distinct items, before iterating through each item in turn. Then, if there is an order by clause that matches the order of the sorted in-list, and Oracle can walk the index in the right order then it can avoid a “sort order by” operation.

I’m guessing that there may be two separate optimizer strategies in the “descending columns” case that have collided and effectively cancelled each other out:

  • Hypothetical Strategy 1: If there is a “descending index” that can be range scanned for the data the in-list should be sorted in descending order before iterating. (There is a flaw in this suggestion – see below)
  • Hypothetical strategy 2: Because the query has an order by (ascending) clause the index scan should be in descending order to avoid a sort operation.

The flaw in the first suggestion is that the Predicate Information suggests that it’s not true. This is what you get in every case (though the operation number changes to 4 when the plan includes a “sort order by” operation):

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00004$"=HEXTORAW('BDADBEA9B0FF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('B2B6B4BAFF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('A6BEB1B4BABAFF')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='BRAVO' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='MIKE' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='YANKEE'))

As you can see the values appearing in the access() predicate are the one’s complements of BRAVO, MIKE and YANKEE in that order; in no case was the order reversed, and previous experience says that predicates are used in the order they appear in the Predicate Information.

On the other hand, it’s arguable that the three predicate values should have been reported (in some form) at the inlist iterator operation – so this may be a case where the simplest strategy for presenting the plan doesn’t match the actual activity of the plan.

Post script

If I change the unhinted query to “order by v1 desc” the rows are reported in ascending order in 12.2.0.1, but in the correct descending order in 19.11.

SDU size etc.

Tue, 2023-12-19 05:38

I sketched the first draft of this note on 7th Feb 2010, then forgot about it until some time in July 2019 which is when I expanded enough of it to publish as a demonstration of how the session data unit ( sdu / default_sdu_size) parameters in SQL*Net affected the number of waits for “SQL*Net more data from dblink / to client”. Once again, though, I failed to complete the note I had started to write 9 years previously.

I recently (Sept 2023) posted a tweet linking to the July note and received a reply asking me about “the role and impact of setting TCP send_buf_size and recv_buf_size parameter in sqlnet”; The answer to that question was in the rest of the draft, so I’m finally completing the note I started A Mole of years ago (Okay, silly reference to “The Secret Diary of Adrian Mole after 13 and 3/4). Since that time, of course, there’s been a lot of change; in particular (for the purposes of this note) machine memories have got bigger, numbers of concurrent users have increased, and various defaults for SQL*Net parameters have changed – possibly to the extent that this note may have become irrelevant for most administrators.

Layers

If you execute a query in SQL*Plus to pull data from an Oracle database there are essentially three nested layers to the communication where the rate of flow of data can be affected by Oracle:

  • At the highest level you can set an arraysize in SQL*Plus to specify the number of rows that should pass from the server to the client in each fetch call. (Other client programs have something similar, of course – sometimes described as a “prefetch” setting.)
  • At the next level down you can set the SDU (session data unit) size in the SQL*Net configuration files to specify the maximum size of the SQL*Net data packets that can pass between server and client.
  • Finally there is the MTU (maximum transmission unit) which is the size of the largest data packet that can pass between the two ends of a tcp/ip (or UDP etc.) connection.

Until the arrival of the 9KB “jumbo frames” [the MOS note is a little elderly, with recent updates] the typical MTU for a tcp/ip link was roughly 1,400 bytes, which is not very large if you want to pull thousands (or millions) of rows across a network, especially if the network layer expects an acknowledgement from the far end after every single packet (as I think it did when I first started using Oracle 6 on a small HP9000).

Imagine yourself back in Oracle 6 days, running a query from SQL*Plus to fetch 480 “reasonably sized” rows and you’ve previously executed “set arraysize 100”:

  1. The Oracle server says: here’s the first 100 rows (which happens to total about 22KB)
  2. The server-side SQL*Net layers says: I can take your first 8KB (because that’s my SDU size) and send it to the tcp layer
  3. The service-side tcp layer says: I’ll send 1400 bytes for in first packet, and wait for the far end to ACK. Then it repeats this step 5 more times, waiting for the client TCP to ACK on each packet.
  4. The client-side SQL*Net layer finally receives enough tcp packets for the client tcp layer to reconstruct the SDU packet and passes it up to the SQL*Plus client process.
  5. Repeat from step 2 until the first 100 rows has arrived at the client – at which point the client asks for the next 100.

In the bad old days performance could get much worse because of the Nagle algorithm which aimed to make network communications more efficient by pausing briefly before sending a “small” network packet in case a little more data was about to come down the stack. So if the final SDU packet from a fetch was “small” the network layer would pause – but there would never be any more data from the server-size SQL*Net layer until the next fetch call from the client. This is the origin and purpose of the sqlnet.ora parameter tcp.no_delay = on. My very first presentation to a user group included a review of a client/server query that was “very slow” if the array size was 15, but much faster if it was 14 or 16 thanks to this effect of “a last little bit”.

The frequency of the ACK is, I think, where the send_buf_size and recv_buf_size – and their predecessor the SQL*Net parameter TDU (transmission data unit) – come into play. Essentially the network layer will still send packets of the MTU size, but will not expect an ACK until it has forwarded the current contents of the send_buf_size. The exact details of how this all works probably depend in some way on what acknowledgement the SQL*Net layer expects for each SDU packet, but in outline the following steps take place:

  1. Server process receives a fetch call
  2. Server process fills one SDU packet (possibly stopping if the next complete row won’t fit) then waits for one of “fetch next array” (if the SDU packet included the last piece of the array) or “give me the next packet” if not.
  3. SQL*Net copies as much of the SDU packet as will fit into the network send buffer
  4. the network layer sends the buffer content to the client machine using MTU-sized chunks, waiting for ACK only after sending the last chunk
  5. repeat from step 3 until the entire SDU packet has been passed to the client
  6. Repeat from step 2 if the server was waiting for “SQL*Net more data to client” else go to 1

You can appreciate from this that the minimum “waste” of resources and time occurs when the entire array fetch size fits inside the SDU size (with a little spare for SQL*Net overheads), and the entire SDU size fits inside the send_buf_size (with a little spare for tcp/ip overheads). In this case a single fetch would result in only one SQL*Net wait for the fetch array to travel (no waits for “more data”) and only one tcp wait for ACK after streaming out MTU packets to send the entire SDU packet/send_buf_size.

There are two possible downsides to this “ideal” configuration:

  • Lots of sessions (specifically their shadow processes) could end up allocating a large chunk of memory for a tcp buffer that they never really needed – and I’ve seen lots of systems in the last few years with thousands of sessions connected, but an average of less than 25 sessions active.
  • A session that actually used a very large send_buf_size could (I speculate) adversely affect the latency (time to get on the wire) for all the sessions that had a requirement for turning around multiple small messages. (This, of course, is just another little detail to consider in the impedance mismatch between OLTP and DW systems).

The first can be handled by ensuring that processes that could really benefit from a larger SDU size and send_buf_size connect to a specially defined service name set up in the (client) tnsnames.ora and (server) listener.ora.

We know that when a client connects to the server through SQL*Net they negotiate the SDU size of the connection as the smaller of the client’s and server’s settings. I don’t know what happens if the receive buffer at one end is different from the send buffer at the other, or whether it even matters – but it’s something that could be worth researching.

Warning

Many years ago I had some subtle and sophisticated experiments that I set up between a Windows PC running SQL*Plus and my personal HP9000 running the Oracle server to investigate what I could do to minimise network chatter due to the client/server interaction. To a large degree the means tweaking SQL*Net parameters, enabling the 10079 trace, and watching the HP’s version of netstat for messages sent / messages received.

I haven’t yet managed to get down to creating similar experiments between two virtual Linux machines running on a Window host; so any comments about what goes on a the level below SQL*Net (i.e. the MTU, TDU, and xxxx_buf_size are still conjecture in need to testing and confirming.

Footnote

There are a number of statistics in v$sysstat (though not necessarily in v$sesstat – statistics 2010 to 2036 are “missing” in 19.11 – and not necessarily enabled by default) about this level of network activity that the interested reader might want to examine. There are also some figures in v$osstat about network configuration.

tl;dr

When moving data from server to client

  • The server generates “packets” dictated by the client array fetch size
  • SQL*Net exchanges packets limited by the negotiated SDU size
  • tcp sends packets limited by the MTU (max transmission unit)
  • (Assumption): The sending tcp layer expects acknowledgements from the receiving tcp layer only after a volume limited by the sending send_buf_size (possibly further limited by the receiving recv_buf_size).

To minimise the time spent in “non-data” chatter on the wire when transferring large volumes of data you could define a service that allows the SDU (plus a little overhead) to fit inside the send/receive buffer size, and an SDU large enough to cope with a “reasonably large” array fetch in a single SQL*Net round trip. (Increasing the MTU at the O/S level may also be possible, and you could also look at the possibility of using “jumbo frames” for tcp.)

Remember that the send/receive buffers are per session, so be careful that you don’t end up with thousands of sessions that have allocated a couple of megabytes they don’t need – you’re allowed to create multiple services with different configurations for the same database so your OLTP users could attach through one service and your DSS/DW/etc. users through another.

sys_op_descend()

Mon, 2023-11-27 03:46

When you specify that a column used in an index should be a “descending” column Oracle uses the internal, undocumented, sys_op_descend() function to derive a value to store for that column in the index leaf block.

For many years I’ve claimed that this function simply takes the one’s-complement of a character value and appends 0xff to the result before storing it. This is nothing like the whole story and in a recent comment to a note on descending indexes and finding max() values I was given another part of the story in response to my confession that I did not know why a varchar2(128) had turned into a raw(193) – it’s the length not the rawness that puzzled me – when subject to sys_op_descend().

Here’s a little script to generate some data that helps to explain what’s going on.

rem
rem     Script:         sys_op_descend.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem

create table t1 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t1 values('A','A');
insert into t1 values('AB','AB');
insert into t1 values('ABC','ABC');
commit;

create table t2 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t2 values(chr(0), chr(0));
insert into t2 values(chr(1), chr(1));
insert into t2 values(chr(2), chr(2));
insert into t2 values(chr(3), chr(3));
insert into t2 values(chr(4), chr(4));
commit;
;

commit;

There’s nothing particularly special about the two tables and data I’ve inserted, I’m just trying to generate patterns that make it easy to see what’s going on; and I’ll be comparing a varchar2() and an nvarchar2() because my nvarchar2() column is a multi-byte character set.

Let’s look at the (nearly raw) data from t1 where I’ve stored growing strings of printable characters. Here’s a simple query and its result set:

set linesize 50
break on row

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t1
/


V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 41
Typ=23 Len=2: be,ff
Typ=1 Len=2: 0,41
Typ=23 Len=4: fe,fb,be,ff

Typ=1 Len=2: 41,42
Typ=23 Len=3: be,bd,ff
Typ=1 Len=4: 0,41,0,42
Typ=23 Len=7: fe,fb,be,fe,fb,bd,ff

Typ=1 Len=3: 41,42,43
Typ=23 Len=4: be,bd,bc,ff
Typ=1 Len=6: 0,41,0,42,0,43
Typ=23 Len=10: fe,fb,be,fe,fb,bd,fe,fb,bc,ff


If you start with the first two columns of the output rows you can see: ‘A’ == 0x41, which becomes 0xbe, 0xff, following the “one’s complement with an appended 0xff” rule. The same pattern is visible for ‘AB’ and ‘ABC’.

When you look at the 3rd and 4th columns of each row (the nvarchar2), ‘A’ is now a 2-byte value (0x00, 0x41) which turns into the four bytes: 0xfe, 0xfb, 0xbe, 0xff. The value 0xbe is recognisable as the one’s-complement of 0x41 that appeared for the varchar2() values – but the 0x00 in the original nvarchar2() value seems to have turned into a two-byte 0xfe, 0xfb rather than the “expected” 0xff.

Why doesn’t Oracle use the “right” one’s complement for zero? Maybe because 0xff is the byte that’s supposed to mark the end of the string, and it’s important to avoid the risk of ambiguity. But now we have a new problem: Oracle is using 0xfe as the first of two bytes to represent the “descending” zero, and 0xfe is the one’s-complement of 0x01. So how is Oracle working around the fact that it would be a bad idea to have two possible meanings for the value 0xfe? That’s where the second data set comes in; here’s the same query, with results, run against the t2 table:

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t2
/

V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 0
Typ=23 Len=3: fe,fe,ff
Typ=1 Len=2: 0,0
Typ=23 Len=3: fe,fd,ff

Typ=1 Len=1: 1
Typ=23 Len=3: fe,fa,ff
Typ=1 Len=2: 0,1
Typ=23 Len=3: fe,fc,ff

Typ=1 Len=1: 2
Typ=23 Len=2: fd,ff
Typ=1 Len=2: 0,2
Typ=23 Len=4: fe,fb,fd,ff

Typ=1 Len=1: 3
Typ=23 Len=2: fc,ff
Typ=1 Len=2: 0,3
Typ=23 Len=4: fe,fb,fc,ff

Typ=1 Len=1: 4
Typ=23 Len=2: fb,ff
Typ=1 Len=2: 0,4
Typ=23 Len=4: fe,fb,fb,ff

Looking at the last three groups of 4 lines we can see the varchar2() column following the “one’s complement, append 0xff” rule and the nvarchar2() following the additional “use 0xfe 0xfb for zero” rule; but for chr(0) and chr(1) the dumps need some further untangling. With the tests so far all we can say with any confidence is that “if you see the 0xfe byte then a different pattern is about to appear briefly”.

I don’t really need to do any more experiments to guess why, in my previous note, the descending varchar2(128) was projected as raw(193) (though you might want to play around to see what happens with strings of several consecutives zeros or ones). I suppose the size reported could vary with character set, but if I have a two-byte fixed width character set and most of the data corresponds to basic ASCII characters then I’ll have a lot of characters where every other byte is a zero that encodes into two bytes when descending – so it makes sense to use for the projection a size derived as: 64 * 1 + 64 * 2 + 1 (0xff) = 193. Q.E.D.

Index sizing

An important side effect of this improved understanding is the realisation of what can happen to the size of an index when declared descending. For a simple demonstration, here’s a table with 4 columns and 50,000 rows copied from all_objects.object_name, using the nvarchar2() type for one of the pairs of copies.

create table t1a (
        nva nvarchar2(128),
        nvd nvarchar2(128),
        va  varchar2(128),
        vd  varchar2(128)
)
/

insert into t1a
select  object_name, object_name, object_name, object_name
from    all_objects
where
        rownum <= 50000
;

create index t1a_nva on t1a(nva);
create index t1a_nvd on t1a(nvd desc);
create index t1a_va on t1a(va);
create index t1a_vd on t1a(vd desc);

execute dbms_stats.gather_table_stats(user,'t1a')

select index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1A'
order by
        index_name
/

select 
        column_name, avg_col_len 
from    user_tab_cols 
where   table_name = 'T1A' 
order by 
        column_name
/


INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1A_NVA                      590
T1A_NVD                      854
T1A_VA                       336
T1A_VD                       343

4 rows selected.

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
NVA                           74
NVD                           74
SYS_NC00005$                 111
SYS_NC00006$                  39
VA                            38
VD                            38

6 rows selected.

As you can see, the descending varchar2() index (backed by column sys_nc0006$) is only slightly larger than the corresponding ascending index, but the descending nvarchar2() (backed by column sys_nc00005$) has increased by something much closer to 50% in size because half the bytes in each object_name were zeroes that have been replaced by the two byte 0xfe 0xfb. This is much worse than the “one byte per descending column per row” that I’ve been claiming for the last 20 or 30 years.

gby_pushdown

Mon, 2023-11-20 12:42

This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.

It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.

I’ll start with a script to create a simple data set and a trivial query with a parallel hint:

rem
rem     Script:         gby_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem

set linesize 180
set pagesize 60
set trimspool on
set tab off

create table t1 
as 
select 
        * 
from 
        all_objects 
where 
        rownum <= 50000
/


alter session set statistics_level = all;

set serveroutput off

prompt  ==================================================
prompt  Default (for this data) pushdown and hash group by
prompt  ==================================================

set feedback only

select 
        /*+ 
                qb_name(main)
                parallel(t1 2) 
--              no_gby_pushdown(@main)
--              no_use_hash_gby_for_pushdown(@main)
        */ 
        owner, object_type, count(*) ct
from 
        t1 
group by 
        owner, object_type
order by
        owner, object_type
/

set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));

In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.07 |       5 |      0 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.01 |       0 |      0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    121 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |    276 |    77   (8)|    121 |00:00:00.04 |    1043 |    991 |  1079K|  1079K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |    991 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.01 |    1043 |    991 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.

If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.

After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)

With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.

There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:

lan hash value: 3954212205

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.14 |       5 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.14 |       5 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.03 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    120 |00:00:00.03 |       0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       SORT GROUP BY      |          |      2 |    276 |    77   (8)|    120 |00:00:00.13 |    1043 |  9216 |  9216 |     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |    1043 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.

There is an interesting detail to look out for, though, in the Outline Data and Hint Report:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   1 -  MAIN
           -  qb_name(main)

   8 -  MAIN / T1@MAIN
           -  parallel(t1 2)

I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)

The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    77 (100)|     96 |00:00:00.08 |       5 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     96 |00:00:00.08 |       5 |  6144 |  6144 |     1/0/0|
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY        |          |      2 |    276 |    77   (8)|     96 |00:00:00.11 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE          |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |  50000 |    73   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |  50000 |    73   (3)|  50091 |00:00:00.02 |    1051 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T1       |     28 |  50000 |    73   (3)|  50091 |00:00:00.01 |    1051 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
   1 -  MAIN
           -  no_gby_pushdown(@main)
           -  qb_name(main)

   7 -  MAIN / T1@MAIN
           -  parallel(t1 2)

Key details to highlight here are:

  • There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
  • 50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
  • The no_gby_pushdown(@main) does appear in the Hint Report
  • The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).

Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.

Summary

When using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.

You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.

Session Activity Stats

Mon, 2023-11-20 09:35

A little detail to remember when trouble-shooting at the session level – some of the information summarised in the Instance Activity figures (v$sysstat) is not available in the Session Activity figures (v$sesstat / v$mystat). The difference goes right down to the x$ objects, and here are two versions of a simple query I wrote for 19c to check for the missing statistics (you’ll have to be able to connect as SYS to get results from the first one):

rem
rem     Script:         mystat_missing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.3.0.0
rem             19.11.0.0
rem             12.2.0.1

spool mystat_missing

set linesize 132
set pagesize 60
set trimspool on
set tab off


select 
        usd.indx, usd.ksusdnam, usd.ksusdcls
from 
        x$ksusd usd
where
        usd.indx not in (
                select  sta.indx 
                from    x$ksumysta      sta
        )
/

select
        sys.statistic#, sys.name, sys.class
from
        v$sysstat sys
where
        sys.statistic# not in (
                select  mst.statistic# 
                from    v$mystat        mst
        )
/

spool off

If you run the script against 12.2.0.1 you’ll find that there are no “missing” session stats, but when you upgrade to 19c (in my case 19.11.0.0) you’ll find a couple of dozen statistics reported. This was the output I got:

STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
      2010 Instance Statistic test case                                            384
      2011 cell RDMA reads unused                                                  264
      2012 cell RDMA writes                                                        264
      2013 cell pmem cache read hits unused                                        264
      2014 cell pmem cache writes                                                  264
      2015 NS transactions                                                         257
      2016 NS transactions timed out                                               257
      2017 NS transactions interrupted                                             257
      2018 NS transactions not using all standbys                                  257
      2019 NS transactions skipping send                                           257
      2020 NS transaction setups                                                   257
      2021 NS transaction bytes loaded                                             320
      2022 NS transaction bytes to network                                         320
      2023 NS transaction bytes relogged                                           322
      2024 NS transaction bytes logged                                             322
      2025 NS transaction send time                                                320
      2026 NS transaction setup time                                               320
      2027 NS transaction confirm time                                             320
      2028 NS recovery timeout interrupts                                          384
      2029 NS recovery DTC full interrupts                                         384
      2030 NS recovery fetch requests made                                         384
      2031 NS recovery fetch ranges received                                       384
      2032 NS recovery fetch requested bytes                                       384
      2033 NS recovery fetch received bytes                                        384
      2034 NS recovery fetch received time                                         384
      2035 NS recovery fetch requests remade                                       384

26 rows selected.

Running the query against 23c Free (23.3) on the Developer VM created by Oracle, the number of “missing” statistics jumped to 1,052 – so I won’t list them. Given the size of the list I did a quick check to remind myself of how many statistics were actually present in v$sysstat, and that produced an interesting comparison.

--
--      19.11.0.0
--
SQL> select count(*) ct_19c from v$sysstat;

    CT_19C
----------
      2036

--
--      23.3.0.0 (on premises)
--
SQL> select count(*) ct_23c from v$sysstat;

    CT_23C
----------
      2661

So there are 600-ish extra stats available in 23c, but 1,000-ish stats that don’t appear at the session level. So if you’ve been taking advantage of some of the “new” 19c stats to help with trouble-shooting you may find that they disappear on the upgrade. More on that later.

If you look at the output I’ve listed above you might spot that all the missing stats satisfy the predicate “class >= 256”. In fact, the class is a bit vector, and a more appropriate predicate would be: “bitand(class,256) = 256”. Either option gives you a fairly lazy way to do any checks you might be interested in. For example, after setting up a database link from a 19c instance to a 23c instance, I ran the following query to find out how many statistics that were visible in the 19c v$sesstat had changed their class to become invisible in the 23c v$sesstat.

select name from v$sysstat where class < 256
intersect
select name from V$sysstat@oracle23 where class >= 256
/


NAME
----------------------------------------------------------------
...
SMON posted for dropping temp segment
SMON posted for instance recovery
SMON posted for txn recovery for other instances
SMON posted for undo segment recovery
SMON posted for undo segment shrink
TBS Extension: bytes extended
TBS Extension: files extended
TBS Extension: tasks created
TBS Extension: tasks executed
...
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanout failures: delayed log
commit cleanout failures: flashback
commit cleanout failures: hot backup in progress
commit cleanout failures: write disabled
...
db corrupt blocks detected
db corrupt blocks recovered
...


502 rows selected.

Of the 502 stats a very large fraction were about In Memory (prefixed IM), and there were a number that looked as if they were only relevant to background processes and therefore (to some extent, possibly) not of interest when debugging user activity. It’s also possible that some of the statistics would fall into different classes if certain features (e.g hybrid columnar compression) were linked in to the executable.

Another query that might be of interest is a typical “what’s changed” query. What statistics are available in 23c that aren’t in 19c (and vice versa):

select
        name, version
from    (
        select name, '23c' version from v$sysstat@oracle23
        minus
        select name, '23c' from V$sysstat
        union all
        select name, '19c' version from v$sysstat
        minus
        select name, '19c' from V$sysstat@oracle23
        )
order by
        version desc, name
/

Again there are so many that I don’t think it would be helpful to reproduce my results, but I’ll just say that there were 663 stats in 23c that weren’t in 19c, and 38 stats in 19c that weren’t in 23c. Of course, it’s possible (and I didn’t check carefully) that some of these new/”disappearing” statistics show up only because they’ve had a name change as the version changed.

A couple of the new 23c stats that I like the look of (and would like to see at the session level) are:

user commits PL/SQL
user commits Autonomous

I’m sure that there are more that will reveal themselves as (occasionally) useful over time, and further investigation is left to the reader.

Swap_Join_Inputs

Mon, 2023-11-06 08:47

This is just a short note (I hope) prompted by a conversation on the Oracle-L list server. A query from a 3rd party application was doing a very large full tablescan with hash join when it should have been doing a high precision index driven nested loop join, and the poor choice of plan was due an optimizer defect when handling column groups (fixed in 23c) when one or more of the columns involved is always, or nearly always, null.

As a follow-up the owner of the problem asked what hints should go into an SQL Patch to make the optimizer use the nested loop. There’s a simple guideline that usually works for this type of “wrong join” problem: report the “Outline Data” from the current execution plan; find the relevant join hint(s) (in this case a use_hash() hint and a full() hint), change those join hint(s) (e.g. use_nl(), index()), and write the entire edited outline data into an SQL Patch watching out for a recently fixed defect in the SQL patch code.

There are, however, various refinements that add complexity to this strategy, as you can appreciate from a note I wrote some years ago about how to hint a hash join properly. This note is an example of handling one such refinement.

The query was a little complex, and the optimizer had unnested a subquery that consisted of a union all of 4 branches, and one of those branches had contributed a very large cardinality estimate to the total for the view, so the optimizer had chosen a hash join between the unnested subquery and a large table. Stripped to a bare minimum that part of the plan looked like this:

HASH JOIN
      VIEW                   VW_NSO_1
      TABLE ACCESS FULL      PO_LINE_LOCATIONS_ALL        

A quick scan of the Outline Data found the join hint (converted to lower case, with quotes removed): use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92), and an over-hasty response might be to convert the use_hash to a use_nl and leave it at that – except for three possible warnings:

  1. we wanted to see a nested loop into po_line_locations_all, so the suggested use_nl() hint would be looping into the wrong “table”
  2. the plan shows the view vw_nso_1 as the build table, while the reported hint is about vw_nso_1 being the second table in the join order
  3. there are further references to vw_nso_1 (and to po_line_locations_all) in the outline data

Here’s the complete set of original hints that might have been relevant to this particular part of the plan:

full(@sel$ac90cd92 po_line_locations_all@sel$2)
no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 po_line_locations_all@sel$2 vw_nso_1@sel$ac90cd92)
use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
swap_join_inputs(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)

index_rs_asc(@sel$b584fdd1 po_line_locations_all@sel$16 (......))
batch_table_access_by_rowid(@sel$b584fdd1 po_line_locations_all@sel$16)

index_rs_asc(@sel$5ed1c707 po_line_locations_all@sel$13 (......))
batch_table_access_by_rowid(@sel$5ed1c707 po_line_locations_all@sel$13)

index_rs_asc(@sel$2f35337b po_line_locations_all@sel$10 (......))
batch_table_access_by_rowid(@sel$2f35337b po_line_locations_all@sel$10)

index_rs_asc(@sel$1b7d9ae9 po_line_locations_all@sel$7 (......))

This is where knowing about the “Query Block / Object Alias” section the execution plans is important – I’ve split the list into several parts based on the query block (@sel$xxxxxxxxxx) they are aimed at, and it’s only the first 5 we need to worry about.

Conveniently this part of the plan is a self-contained query block (@sel$ac90cd92) and we can see why we have an apparent contradiction between vw_nso_1 being the second table in the join order while being the build table: it’s second because of the leading() hint which dictates the join order, but it becomes the build table, hence appearing to be the first table in the join order, because of the swap_join_inputs() hint.

What we want is a join order where vw_nso_1 really is the first table in the join order, followed by a nested loop join into po_line_locations_all, using an index (not the full tablescan that the current hints dictate). It would probably be a good idea to get rid of the redundant no_swap_join_inputs() hints at the same time because that hint applies only to hash joins. So I think we need to replace the 5 hints above with the following 4 hints:

no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 vw_nso_1@sel$ac90cd92 po_line_locations_all@sel$2)
use_nl(@sel$ac90cd92 po_line_locations_all@sel$2)
index(@sel$ac90cd92 po_line_locations_all@sel$2(line_location_id))

The index hint references column line_location_id because that’s the access predicate used in original hash join and I’m assuming that there is an index that starts with that column. It’s always a bit hit and miss with hinting and it might have been sufficient (as a first attempt) to use the index hint without trying to reference a specific index, and there might be good reasons for adding more columns to the list, or simple naming the index rather than describing it.

It’s quite likely that if this change in the hints is sufficient the resulting Outline Data would look a little different anyway; in particular the index() hint that I’ve suggested might get expanded to index_rs_asc(), and there might be a batch_table_access_by_rowid() added. Basically you do test runs until you get the result you want and then use the resulting Outline Data for the patch (although, occasionally, you still find that the Outline Data doesn’t reproduce the plan that it came from).

Frivolous Footnote

There were 75 hints totalling 3,191 bytes in the original Outline Data. If the text gets too long and messy for you to cope with when you create the patch you can probably remove all the double quotes, all the table names from the fully qualified column names in indexes, all the outline() and outline_leaf() hints, all the opt_param() hints that reflect system level parameter changes and the begin/end_outline_data hints and the ignore_optim_embedded_hints hint. You could also change long index descriptions to index_names and, if you’re like me, change it all to lower case anyway because I hate reading in capitals – and if you do change it all to lower case you have to remove the quotes. When I did all this to the original outline data the result was 1,350 bytes for 30 hints.

Descending max()

Wed, 2023-11-01 09:54

I’ve written a few notes about problems with “descending” indexes in the past (the word is in quotes because it’s not the index that’s defined as descending, it’s a proper subset of the columns of the index). A silly example came up recently on twitter where the cost of an execution plan changed so dramatically that it had to be hinted when the query changed from “order by list of column names” to “order by list of numeric positions”.

Just to make things a little messier, the index was not just “descending” but the column that had been declared as descending was actually a to_char() of a date column. I won’t reproduce the example here as the tweet links to a pastebin drop of a self-contained example. What I have, instead, is an even simpler example of a “descending” index producing a most unsuitable plan – even when you try to hint around it.

Here’s the demonstration that I’ve run on 19.11 and 23.3:

rem
rem     Script:         ind_desc_max_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as 
select  * 
from    all_objects
where   rownum <= 10000
/
  
alter table t1 modify object_name not null;

-- create index t1_i1a on t1(object_name);
create index t1_i1d on t1(object_name desc);

execute dbms_stats.gather_table_stats(user,'t1',cascade=>true)

alter session set statistics_level=all;
set serveroutput off

select  max(object_name)
from    t1
/

select * from table(dbms_xplan.display_cursor(format=>'cost allstats last hint_report projection'));

set serveroutput on
alter session set statistics_level = typical;

A couple of details to mention:

  • The code includes lines to create two indexes, one ascending and one descending. When I run the code I create only one of them.
  • I gather stats after creating the index – this is for the benefit of the descending index only, which needs stats collected on the underlying hidden column definition that Oracle creates to support it.
  • There’s a call to define object_name as not null – this is for the benefit of 23c. In 19c the view has several columns which carry forward their underlying not null declarations. In 23c none of the view columns has a not null declaration.

If I create the ascending index 19c and 23c both produce the following plan:

Plan hash value: 1421318352

------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |      1 |        |     2 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE            |        |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I1A |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX("OBJECT_NAME")[128]
   2 - "OBJECT_NAME"[VARCHAR2,128]

If I create the descending index the plan changes (19c and 23c behave the same way, the following plan is from 23c):

-------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |      1 |        |     8 (100)|      1 |00:00:00.01 |      49 |
|   1 |  SORT AGGREGATE       |        |      1 |      1 |            |      1 |00:00:00.01 |      49 |
|   2 |   INDEX FAST FULL SCAN| T1_I1D |      1 |  10000 |     8   (0)|  10000 |00:00:00.01 |      49 |
-------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX(SYS_OP_UNDESCEND("T1"."SYS_NC00029$"))[128]
   2 - "T1"."SYS_NC00029$"[RAW,193]

There was a slight difference between versions – there are a couple of extra columns in the 23c view so the hidden column referenced in the Projection Information was sys_nc00027$ in 19c compared to sys_nc00029$ in 23c). I don’t know why the length is reported as 193 – I would have expected it to be 129 (since it’s going to hold the one’s-complement of the object_name and a trailing 0xFF byte).

The critical point, of course, is that the query is no longer using the special min/max path, it’s doing an index fast full scan, scanning through 49 buffers instead of accessing just the 2 buffers the min/max needed.

I added a no_index_ffs(t1) hint to see what would happen if I tried to block the bad path: Oracle did a tablescan; so I also added an index(t1) hint to see if that would help and got the following plan:

--------------------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |    46 (100)|      1 |00:00:00.01 |      46 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |            |      1 |00:00:00.01 |      46 |
|   2 |   INDEX FULL SCAN| T1_I1D |      1 |  10000 |    46   (0)|  10000 |00:00:00.01 |      46 |
--------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX(SYS_OP_UNDESCEND("T1"."SYS_NC00029$"))[128]
   2 - "T1"."SYS_NC00029$"[RAW,193]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
   2 -  SEL$1 / "T1"@"SEL$1"
           -  index(t1)

As you can see from the Hint Report, the optimizer accepted and used my hint. But it hasn’t used the min/max optimisation, it’s done an index full scan, walking through 46 buffers in index order, which could well be more resource-intensive than the fast full scan.

I tried various tricks to get back the min/max optimisation, and I did finally manage to achieve it – but it’s not a realistic solution so don’t copy it. Here’s the SQL and plan:

select
        /*+ index(t1) */
        utl_raw.cast_to_varchar2(
                sys_op_undescend(
                        min(sys_op_descend(object_name))
                )
        )
from    t1
/

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |      1 |        |    46 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE             |        |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                 |        |      1 |  10000 |    46   (0)|      1 |00:00:00.01 |       2 |
|   3 |    INDEX FULL SCAN (MIN/MAX)| T1_I1D |      1 |  10000 |    46   (0)|      1 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MIN("T1"."SYS_NC00029$")[193]
   2 - "T1"."SYS_NC00029$"[RAW,193]
   3 - "T1"."SYS_NC00029$"[RAW,193]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
   3 -  SEL$1 / "T1"@"SEL$1"
           -  index(t1)

There is a “extra” first row operator in the plan but you can see that we’re back to a min/max optimisation accessing only 2 buffers to get the result we want. (And it was the same result).

  1. For 23c the indexed column is the hidden column SYS_NC00029$, and I know that the optimizer will recognise the equivalence between the column and the expression sys_op_descend(object_name) so it’s been fooled into using the min/max optimisation on the index.
  2. If I want the maximum object name I want the minimum sys_op_descend(object_name). Possibly the first row operation appears because the optimizer doesn’t know that the function call will always return a non-null result.
  3. Once I’ve found the minimum I need to reverse the effect of the sys_op_descend() – which is where the sys_op_undescend() comes in, but that returns a raw value, so I’ve had to call a utl_raw function to convert the raw to a varchar2(). Watch out, though, because it’s going to be a maximum length varchar2().

If I can make it happen without even changing the shape of the query the optimizer ought to be able to make it happen – but there’s probably a little bit of generic index-usage code that’s currently getting in the way.

Summary

We’ve seen the pattern fairly often: indexes with a mix of ascending and descending columns can be very helpful in specific cases, but we shouldn’t be surprised when a mechanism that appears for “ascending only” indexes doesn’t work perfectly for an index with some descending columns.

Push Group by

Wed, 2023-11-01 05:40

Jump to summary.

A new optimizer feature that appears in 23c (probably not 21c) was the ability to push group by clauses into union all set operations. This will happen unhinted, but can be hinted with the highly memorable [no_]push_gby_into_union_all() hint that appeared in 23.1.0.0 according to v$sql_hint. and the feature can be disabled by setting the (equally memorable) hidden parameter _optimizer_push_gby_into_union_all to false.

From a couple of simple experiments it looks as if the hint should be used to identify query blocks where you want an aggregation (group by) that appears “outside” a union all (inline) view to happen “inside” the view. Here’s a trivial demonstration that I’ve run on 23.3

rem
rem     Script:         push_gby_ua.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             23.3.0.0
rem

create table t1 
as 
select  *
from    all_Objects 
where   rownum <= 50000
;

set linesize 156
set pagesize 60
set trimspool on
set tab off
set serveroutput off

alter session set statistics_level = all;

select
        /*+
                -- qb_name(main)
                push_gby_into_union_all(@sel$2)
                no_push_gby_into_union_all(@sel$3)
        */
        owner, count(*)
from    (
        select /*  qb_name(u1) */ owner from t1 where owner = 'SYS'
        union all
        select /*  qb_name(u2) */ owner from t1 where owner = 'PUBLIC'
)       
group by owner
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost outline alias hint_report qbregistry qbregistry_graph '));

All I’ve done here is create a table that copies 50,000 rows from the view all_objects, then executed a query that reports the number of objects for owners SYS and PUBLIC by selecting the two sets of objects separately and aggregating a union all of those sets.

For maximum visibility I’ve shown the positive and negative versions of the hint – the aggregation doesn’t have to apply to all the branches of the view and it’s not unknown for the optimizer to make the wrong choices if it hasn’t managed to produce a good cost estimate.

Here’s the execution plan (with some of the bits removed) that I got from 23.3 for this test:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |   295 (100)|      2 |00:00:00.03 |    2232 |   1114 |
|   1 |  HASH GROUP BY          |      |      1 |     15 |   295   (4)|      2 |00:00:00.03 |    2232 |   1114 |
|   2 |   VIEW                  |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   3 |    UNION-ALL            |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   4 |     SORT GROUP BY NOSORT|      |      1 |      1 |   147   (3)|      1 |00:00:00.02 |    1116 |   1114 |
|*  5 |      TABLE ACCESS FULL  | T1   |      1 |   3333 |   147   (3)|  39724 |00:00:00.01 |    1116 |   1114 |
|*  6 |     TABLE ACCESS FULL   | T1   |      1 |   3333 |   147   (3)|   9287 |00:00:00.01 |    1116 |      0 |
----------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$FC1F66D1")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SET$22FBD6DA")
      PUSH_GBY_INTO_UNION_ALL(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SET$1")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" GROUP_BY)
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$FC1F66D1" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("OWNER"='SYS')
   6 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   4 -  SEL$FC1F66D1
           -  push_gby_into_union_all(@sel$2)

   6 -  SEL$3
           -  no_push_gby_into_union_all(@sel$3)

It’s interesting to note that the Hint Report tells us that both my hints were valid (and used); but the Ouline Data echoes only one of them (the “positive” push_gby_into_union_all). Because I’ve used the same table twice it’s not instantly clear that the optimizer has pushed the subquery that I had specified but if you check the Predicate Information you can confirm that the SYS data has been aggregated inside the union all and the PUBLIC data has been passed up to the union all operator without aggregation. (In the absence of the hints both data sets would have been aggregated early.)

Here, in comparison, is the plan (slightly reduced, and with the qbregistry options removed) that I got from 19.11.0.0

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   265 (100)|      2 |00:00:00.09 |    1986 |    991 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |     13 |   265   (4)|      2 |00:00:00.09 |    1986 |    991 |  1422K|  1422K|  653K (0)|
|   2 |   VIEW               |      |      1 |   7692 |   263   (3)|  48446 |00:00:00.07 |    1986 |    991 |       |       |          |
|   3 |    UNION-ALL         |      |      1 |        |            |  48446 |00:00:00.06 |    1986 |    991 |       |       |          |
|*  4 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|  42034 |00:00:00.02 |     993 |    991 |       |       |          |
|*  5 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|   6412 |00:00:00.01 |     993 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OWNER"='SYS')
   5 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         E -  push_gby_into_union_all

As you can see, 19.11 treats the hint as an error and both subqueries against t1 pass their rows up to the union all without aggregation. The 19.11 plan also gives you some idea of why it can be worth pushing the group by: 23.3 doesn’t report any memory used for either of the aggregation operations that take place while the postponed (or, rather, unpushed) aggregation in 19.11 reports 1.4M of memory used. As a general principle we might expect several small aggregations have a lower peak of memory usage than one large aggregation. There’s also a CPU benefit when Oracle doesn’t have to push lots of rows up through a couple of operations.

In fact the absence of memory-related columns in the 23.3 plan is a little suspect and I may have to examine it further. It may simply be the case that the size of the “small allocation” that doesn’t get reported in earlier versions has been increased to (best guess) 1MB; it may be that dbms_xplan in 23c has got a little bug that omits that part of the report.

Summary

Oracle 23c has a new transformation that will probably help to reduce memory and CPU consumption when it comes into play. Queries that aggregate over union all views may change plans to push the aggregation into some or all of the separate subqueries inside the union.

The feature is cost-based but you can over-ride the optimizer’s choice of which subqueries should be aggregated early with the hint [no_]push_gby_into_union_all(@qbname). The feature can also be disabled completely by setting the hidden parameter _optimizer_push_gby_into_union_all to false.

Addendum

It occurred to me that the optimizer will transform an IN-list to a list of equalities with OR, and it’s also capable of using OR-expansion then there might be cases where an aggregate based on an IN-list could go through the two steps and then benefit from this new feature, for example:

select
        sts, count(*) ct
from    t1
where   sts in ('B','C')
group by
        sts
/

-------------------------------------------------------------------------------
| Id  | Operation	     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	      |       |       |     3 (100)|	      |
|   1 |  SORT GROUP BY NOSORT|	      |     2 |     4 |     3	(0)| 00:00:01 |
|   2 |   INLIST ITERATOR    |	      |       |       | 	   |	      |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |   100 |   200 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))

Alas, no. Although we can see the rewrite of the IN-list the optimizer doesn’t then use OR-expansion. And when I added the hint /*+ or_expand */ to try to push Oracle into the right direction the Hint Report told me:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -	SEL$1
	 U -  or_expand / No valid predicate for OR expansion

Maybe in the next release.

SQL_trace 23

Fri, 2023-10-27 06:46

That’s not the 23rd article I’ve written on sql_trace, it’s just that there’s a little “catch-up” detail about sql_trace that I recently discovered in 23.2 (though it might also be working in 21c or even in recent versions of 19c).

Although level 4 and 8 (binds and waits) of the SQL trace event are commonly known, and the more modern style of calls to sql_trace with the added plan_stat options are fairly well known it’s only in 23c (possibly 21c) that all the plan_stat options get implemented in dbms_session and dbms_monitor. From some time back in the 11g timeline we’ve been able to execute calls like:

alter system set events '
        sql_trace[SQL:9tz4qu4rj9rdp]
        bind=false,
        wait=true,
        plan_stat=all_executions
'
;

Taking advantage of the “oradebug doc” command (which can only be run by a highly privileged user), you could discover the available options for the plan_stat:

SQL> oradebug doc event name sql_trace

sql_trace: event for sql trace

Usage
-------
sql_trace
   wait       < false | true >,
   bind       < false | true >,
   plan_stat  < never | first_execution | all_executions | adaptive >,
   level      <ub4>

The default value for plan_stat is first_execution which is usually adequate, but if you were trying to find out why a particular query sometimes runs very slowly compared to usual you might want to set it to all_executions. If the query executes extremely frequently, though, and produces a fairly length execution plan you might decide to set the value to adaptive (see end notes) which ought to limit the frequency with which the execution plan is dumped into the trace file.

Note: the threat of very long plans for very quick executions of an extremely popular statement comes from two directions – the extra “execution” time to get the plan written to the trace file, and the total size of all the files dumped to the trace directory. Depending on your requirements you might be able to use the “UTS trace” setup to limit the volume kept on disc.

If you check the packages dbms_session and dbms_monitor you will find the following procedures:

QL> desc dbms_session
...
PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

SQL> desc dbms_monitor
...
PROCEDURE CLIENT_ID_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 CLIENT_ID                      VARCHAR2                IN
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE DATABASE_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SERV_MOD_ACT_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SERVICE_NAME                   VARCHAR2                IN
 MODULE_NAME                    VARCHAR2                IN     DEFAULT
 ACTION_NAME                    VARCHAR2                IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSION_ID                     BINARY_INTEGER          IN     DEFAULT
 SERIAL_NUM                     BINARY_INTEGER          IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

Chris Antognini wrote an article some years ago listing the “10046 trace” levels and the relationships between the available numeric levels and the named parameter format (adaptive = 64). In the note he pointed out that adaptive was not available as a plan_stat value in dbms_session and dbms_monitor, and that Enhancement Request #14499199 had been raised to correct this omission. Since then I’ve run a little script from time to time to check if the ER has been fulfilled:

olumn  sid     new_value my_sid
column  serial# new_value my_serial

select  sid, serial#
from    v$session       ses
where   ses.sid = (
                select  ms.sid
                from    v$mystat ms
                where   rownum = 1
        )
;

execute dbms_monitor.session_trace_enable( &my_sid, &my_serial, waits=>false, binds=>false, plan_stat=>'adaptive')

execute dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive');

execute dbms_monitor.session_trace_disable( &my_sid, &my_serial)
execute dbms_session.session_trace_disable;

This depends on the user having suitable privileges on the packages and on a couple of dynamic performance views, and all that happens is that it succeeds (or fails) to enable tracing. Here’s the output from 19.11.0.0 of the two calls to enable:

BEGIN dbms_monitor.session_trace_enable(         40,      56799, waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_MONITOR", line 123
ORA-06512: at line 1


BEGIN dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_SESSION", line 343
ORA-06512: at line 1

Then on the upgrade to 23.2.0.0, the boring but satisfying:

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

Readers with versions in the gap between 19.11.0.0 and 23.2.0.0 are invited to run a check and report the results in the comments to narrow down when the enhancement became available.

plan_stat=adaptive

There are a couple of notes on MOS about the meaning/effect of this setting. They’re not 100% consistent with each other.

Doc ID 21154.1 says:

This dumps the STAT information if a SQL took more than about 1 minute thereby giving information for the more expensive SQLs and for different executions of such SQLs.

Doc ID 8328200 (relating to Bug 8328200) says:

“adaptive” mode dumps STAT lines in SQL trace for every minute of dbtime per shared cursor.

As you can see you can read the two statements as trying to say the same thing, but there is some ambiguity in both statements, and some need for clarification of terms. So I’ve done a couple of simple experiments – basically running a PL/SQL loop that executes the same statement 120 times, where each execution takes a little less 3 seconds to complete.

Cutting the top and tail from each trace file left me with 120 “FETCH” lines; a few of these lines were immediately followed by STAT lines, most were followed by CLOSE lines with no STAT lines reported for the execution. Here are the first few lines of the results from a trace file generated by 23.2 from a call to: “grep -A+1 FETCH {tracefile}”:

FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
STAT #140175454862744 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=277822 pr=0 pw=0 str=1 time=2814245 dr=0 dw=0 us)'
--
FETCH #140175454862744:c=2758633,e=2885235,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36479173822
CLOSE #140175454862744:c=2,e=1,dep=1,type=3,tim=36479173972
--

Piping this into “grep -B+1 STAT” extracted just the FETCH/STAT pairs, which I could then pass through “grep FETCH” to get the output that I wanted to see: the “tim=” values for only those FETCH calls that were followed by STAT lines:

grep -A+1 FETCH {tracefile}  | grep -B+1 STAT  |  grep FETCH
 
FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
FETCH #140175454862744:c=2716296,e=2782323,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36541290925
FETCH #140175454862744:c=2804165,e=2916326,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36610338064
FETCH #140175454862744:c=2677000,e=2737197,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36676533249
FETCH #140175454862744:c=2722436,e=2796966,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36744048403
FETCH #140175454862744:c=2725066,e=2792661,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36811536449

If you do the arithmetic with the tim= values you’ll find that the time between the executions that reported their execution plan in STAT lines varies from about 65 seconds to 69 seconds. The results from 19.11 were similar, varying from 64.9 seconds to 67.9 seconds. So the dumping STAT lines is dependent on spending (approximately) one minute of dbtime in execution (as both MOS Documents indicate) but it’s once every minute, and it’s time that can be accumulated over many executions of the statement. Part of the deviation from one minute comes, of course, from the fact the STAT lines are dumped only as the cursor closes.

The other detail that needs investigation is the interpretation of the clause “per shared cursor”. If I ran my test from two separate sessions concurrently would I be accumulating “db minutes” twice as fast and dumping STAT lines twice as frequently? The answer was yes – providing the two sessions acquired the same child cursor in v$sql.

To demonstrate this, I ran two concurrent copies of the PL/SQL loop, starting one session about 65 seconds after the other. My first trace file showed its second set of STAT lines after about 63 seconds then roughly every 33 seconds, and the second trace file (starting about one minute late) showed its second set of STAT lines after about 33 seconds, repeating that interval between dumps until nearly the end of its run when the last gap (after the first session had finished) stretched to 50 seconds.

Is it good enough?

Maybe, maybe not. It gives you a sample of execution plans for queries that accumulate “a lot” of time while using a particular child cursor, and that might be sufficient for some purposes.

If you have a frequently executed query that usually takes a couple of seconds but occasionally takes 90 seconds then that 90 second execution will show up in the trace file, either because it’s a different plan, and gets dumped on the first execution, or because it will have increased the shared execution time by more than 60 seconds.

If you have a query that executes extremely frequently and takes a couple of hundredths of a second each time but occasionally takes 5 seconds the adaptive option may not help. As with the 90 second/change of plan case you’ll see the plan; but all it does if the plan doesn’t change is improve your chances of seeing the stats of that slow execution plan – it has to take place in that little window where its execution time just tips the running total over the next 60 second limit.

Remove Subquery

Wed, 2023-10-18 08:33

This is a note that echoes a feature (dating back at least as far as 10.2.0.4) that I’ve described in a previous post on Simpletalk. I’m raising it here for three reasons

  • first it says something about optimising SQL by rewriting it
  • secondly it advertises the hugely under-used feature of analytic functions
  • thirdly I’ve not seen anyone in the various forums asking about a particular pattern of surprising execution plans that they couldn’t understand

This last reason makes me wonder whether there are lots of queries in the wild that need a small “cosmetic” change to allow the optimizer to transform them into something completely different.

My example is based on the SH demo schema – the query I’ll be demonstrating came (I think) from a presentation given by Jože Senegačnik about 15 years and 7 major versions ago (10gR2) – so I’ll start with a bit of text to recreate a couple of tables from that schema as it was a few years ago.

rem
rem     Script:         remove_aggr_subq.sql
rem     Author:         Joze Senegacnik / Jonathan Lewis
rem     Dated:          June 2008
rem

create table products(
        prod_id                 number(6,0)     not null,
        prod_name               varchar2(50)    not null,
        prod_desc               varchar2(4000)  not null,
        prod_subcategory        varchar2(50)    not null,
        prod_subcategory_id     number          not null,
        prod_subcategory_desc   varchar2(2000)  not null,
        prod_category           varchar2(50)    not null,
        prod_category_id        number          not null,
        prod_category_desc      varchar2(2000)  not null,
        prod_weight_class       number(3,0)     not null,
        prod_unit_of_measure    varchar2(20),
        prod_pack_size          varchar2(30)    not null,
        supplier_id             number(6,0)     not null,
        prod_status             varchar2(20)    not null,
        prod_list_price         number(8,2)     not null,
        prod_min_price          number(8,2)     not null,
        prod_total              varchar2(13)    not null,
        prod_total_id           number          not null,
        prod_src_id             number,
        prod_eff_from           date,
        prod_eff_to             date,
        prod_valid              varchar2(1),
        constraint products_pk primary key (prod_id)
   )
;

create table sales (
        prod_id         number          not null,
        cust_id         number          not null,
        time_id         date            not null,
        channel_id      number          not null,
        promo_id        number          not null,
        quantity_sold   number(10,2)    not null,
        amount_sold     number(10,2)    not null,
        constraint sales_product_fk foreign key (prod_id)
                references products (prod_id)
   ) 
;

It’s a long time since I loaded, or even looked at, the SH schema but I assume the key details that I need will still be the same. All I’ve got is a products table with a declared primary key of prod_id, and a sales table with a prod_id column declared as not null with a foreign key constraint to the products table. Both tables have not null declarations on most columns.

Imagine writing a query to report all sales where the quantity_sold is less than the average quantity_sold for the corresponding product. The “obvious” choice of SQL for this would be something like:

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products    prd,
        sales       sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = sal1.prod_id
                )
;

.I’ve used hints to block dynamic sampling and adaptive plans, and I’ve used the qb_name() hint to name the two query blocks. The subquery calculates the average quantity_sold for the correlated prod_id and we’re probably assuming Oracle will execute the subquery for each row in the sales (sal1) table with savings from scalar subquery caching – especially if we were to create a “value-added” foreign key index of (prod_id, quantity_sold)). Here’s the execution plan I got from the query in its current form:

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN            |          |     4 |   404 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN           |          |     4 |   244 |     5  (20)| 00:00:01 |
|   3 |    VIEW               | VW_SQ_1  |    82 |  2132 |     3  (34)| 00:00:01 |
|   4 |     HASH GROUP BY     |          |    82 |  2132 |     3  (34)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| SALES    |    82 |  2132 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL  | SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS FULL   | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - access("ITEM_1"="SAL1"."PROD_ID")
       filter("SAL1"."QUANTITY_SOLD"<"AVG(SAL2.QUANTITY_SOLD)")

The optimizer has decided it would make sense to unnest the subquery, generate an aggregate rowsource of the sales data, then eliminate the unwanted sales rows through a hash join to this aggregate rowsource. In passing, take note particularly of the Predicate Information for operation 2 – its’ a reminder that hash joins apply only for equality predicates so the check against average quantity_sold has to take place as a filter predicate after Oracle has found the correct average by probing the build table.

This unnesting will be appropriate for many cases of subquery usage but we could block it and force Oracle to do something that looked more like our original “for each row” visualisation by adding the hint no_unnest(@subq) to the hints at the top of the query (note how we can address the hint to a specific query block). The effect of this is to produce the following execution plan:

--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     2 |   150 |    86   (0)| 00:00:01 |
|*  1 |  FILTER             |          |       |       |            |          |
|*  2 |   HASH JOIN         |          |    82 |  6150 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| SALES    |    82 |  2870 |     2   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  6 |    TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ NO_UNNEST QB_NAME
              ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SOLD")
              FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   2 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   6 - filter("SAL2"."PROD_ID"=:B1)

Now we can see that Oracle is using the subquery as a “filter subquery” as we had imagined it. For each row surviving the simple hash join between products and sales Oracle will execute the subquery (unless the relevant information is already in the scalar subquery cache). A little detail that may surprise users who are less familiar with execution plans is the appearance of the bind variable (:B1) in the predicate for operation 6 – this is the optimizer reminding you that the correlating predicate in the subquery uses a value that will be unknown until run-time when it arrives (repeatedly with constantly changing values) from the main query block.

Again, we can understand that this pattern will probably be suitable for some circumstances, but we may want to control where in the plan the subquery is used – for some queries it might be more efficient to execute the subquery before we do the join. We can tell the optimizer to do this by adding the hint push_subq(@subq) after the no_unnest(@subq) hint, in my case producing the following plan:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     4 |   300 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN           |          |     4 |   300 |     4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | SALES    |     4 |   140 |     2   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |          |     1 |    26 |            |          |
|*  4 |     TABLE ACCESS FULL| SALES    |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - MAIN / SAL1@MAIN
   3 - SUBQ
   4 - SUBQ / SAL2@SUBQ
   5 - MAIN / PRD@MAIN

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")
   2 - filter("SAL1"."QUANTITY_SOLD"< (SELECT /*+ PUSH_SUBQ NO_UNNEST
              QB_NAME ("SUBQ") */ SUM("SAL2"."QUANTITY_SOLD")/COUNT("SAL2"."QUANTITY_SO
              LD") FROM "SALES" "SAL2" WHERE "SAL2"."PROD_ID"=:B1))
   4 - filter("SAL2"."PROD_ID"=:B1)

The first thing that stands out in this plan is that the sales table has become the build (first) table in the hash join and the products table has become the probe table. (In fact this is mainly because (a) there’s no data/statistics, (b) I’ve blocked dynamic sampling, and (c) Oracle has used a few of its standard guesses.)

Another point that stands out is the “staggered” position of operation 3. I’ve included the ‘alias’ format option for the execution plan so that you can see that operations 3 and 4 represent the original subq query block with no transformation. Filter predicates using subqueries often produce odd little “twists” in execution plans which would be hard to explain if you followed the basic “first child first” rule and forgot to check whether there were any separate query blocks that needed to be walked in isolation.

“Cosmetic” effects

I said the query I started with was the “obvious” choice. I didn’t offer any justification for “obvious”, but the query shows an almost a perfect translation of the condition “sales quantity greater than the average sales quantity for the matching product”, and correlating on the prod_id from the sales row you’re looking at (i.e. sal1.prod_id) seems a highly intuitive choice.

However – prod_id is a foreign key to the products table, and the main query block includes the demand/predicate “sal1.prod_id = prd.prod_id” so, based on transitive closure, all the not null constraints, and the foreign key constraint, we should be happy to make the following, logically valid, minor edit to the original query (and it’s so minor I’ve highlighted the critical line in case you miss it):

select
        /*+
                qb_name(main)
                dynamic_sampling(prd 0) 
                dynamic_sampling(sal1 0) 
                no_adaptive_plan
        */
        prd.prod_id, prd.prod_name, 
        sal1.time_id, sal1.quantity_sold
from
        products prd, sales sal1
where
        sal1.prod_id = prd.prod_id
and     sal1.quantity_sold < (
                        select
                                /*+ 
                                        qb_name(subq)
                                        dynamic_sampling(sal2 0)
                                */
                                avg(sal2.quantity_sold)
                        from    sales   sal2
                        where   sal2.prod_id = prd.prod_id
                )
;

Here’s the resulting execution plan following this apparently trivial and logically irrelevant change:

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    82 |  6068 |     5  (20)| 00:00:01 |
|*  1 |  VIEW                | VW_WIF_1 |    82 |  6068 |     5  (20)| 00:00:01 |
|   2 |   WINDOW SORT        |          |    82 |  7134 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN         |          |    82 |  7134 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| SALES    |    82 |  3854 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| PRODUCTS |    82 |  3280 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VW_COL_5" IS NOT NULL)
   3 - access("SAL1"."PROD_ID"="PRD"."PROD_ID")

The plan been reduced to a simple two table join, lost the correlated subquery, and has signs of an analytic (window) function being used somehow to handle the comparison with the average. This is a feature enabled by the (very old) parameter _remove_aggr_subquery which defaults to true.

It’s not immediately visible what the optimizer is doing – the predicate vw_col_5 is not null clearly relates to the internal view vw_wif_1 (Windows Inline Function?), but we have to look at the appropriate “unparsed query” from the CBO/10053 trace to find out why that predicate gives us the right answer. This, with a lot of cosmetic editing, is the transformed query that Oracle finally optimised:

select 
        /*+ qb_name (main) */
        vw_wif_1.item_1         prod_id,
        vw_wif_1.item_2         prod_name,
        vw_wif_1.item_3         time_id,
        vw_wif_1.item_4         quantity_sold 
from    (
        select 
                prd.prod_id             item_1,
                prd.prod_name           item_2,
                sal1.time_id            item_3,
                sal1.quantity_sold      item_4,
                case 
                        when    sal1.quantity_sold < avg(sal1.quantity_sold) over ( partition by sal1.prod_id) 
                        then    sal1.rowid 
                end                     vw_col_5 
        from
                test_user.sales         sal1,
                test_user.products      prd 
        where 
                sal1.prod_id = prd.prod_id
        )       vw_wif_1 
where 
        vw_wif_1.vw_col_5 is not null
;

It’s interesting, and a little surprising, that the code uses a case clause to generate a rowid which is then tested for null. It seems to add complexity that a person would not introduce if you had asked them to do the rewrite by hand, but maybe it comes from a generic framework that addresses more subtle examples.

I said at the start of the note that I’ve not seen anyone asking about this “surprising” pattern of plan; that wasn’t quite true. I have found one question on the Oracle forums dated 2012 using version 10.2.0.4. It’s been possible for a long time – so possibly the fact that questions are so rare is that the correct choice of correlating predicate is rarely made and the “intuitive” choice doesn’t allow the transformation to take place. (It’s also possible that it appears so rarely because so few systems make proper use of constraints.)

Most significantly, though, it’s been possible to write this type of windowing code by hand since version 8.1.6, but it’s only in the last few years that responses to questions about max(subquery), avg(subquery) etc. have been suggestions to rewrite with analytic functions. Maybe some of the answers should have been “change the correlating column”.

Further Observations

As with all optimisation strategies, it’s possible that Oracle will use this feature when it shouldn’t or fail to use it when it should. Unfortunately there is no hint like /*+ [no_]remove_aggr_subq(@subq) */ though aggregate subquery removal seems to be embedded with subquery unnesting, so the /*+ no_unnest() */ hint might be sufficient to block the feature; I don’t know what you can do, though, to force it to happen if you think it’s legal but isn’t happening – other than rewriting the query by hand, of course.

If the no_unnest() hint doesn’t work you could use the hint: /*+ opt_param(‘_remove_aggr_subquery’,’false’) */ to disable the feature complete for the duration of the query, and you may find that that even works as an SQL Patch.

If you’re interested in CBO trace files, the mnemonic (which doesn’t appear in the legend near the start of the trace) for the feature is “RSW” (maybe “remove subquery windowing”?). There aren’t many explicit strings relating to this mnemonic in the binary, but the following are visible:

RSW: Not valid for subquery removal %s (#%d)
RSW:  Valid for having clause subquery removal %s (#%d)
RSW:  Valid for correlated where clause SQ removal %s (#%d)
RSW:  Valid for uncorrelated where clause SQ removal %s (#%d)

My example falls into the category: “Valid for correlated where clause SQ removal”, so there’s scope for someone else to discover what a query, it’s plan, and its unparsed query look like when a query falls into one of the other two categories.

Summary

For certain patterns of query involving aggregate subqueries as predicates the optimizer has been able to use a feature known as aggregate subquery removal to effect the intent of the subquery through the use of an analytical (window) function.

This feature has been available since (at least) 10.2.0.4 and you may already have several queries where the optimizer would use it if the correlating predicate was appropriate – in particular if the predicate is currently joining to the child end of a foreign key constraint then you need to test the effect of joining it to the parent end.

no_merge() #JoelKallmanday

Wed, 2023-10-11 03:10

This is a second follow-up to the video Connor McDonald produced a few days ago about the risks of depending on “current tricks” to control the order of predicate operation, then showing how some requirements for the pattern “check condition B only for rows which have already satisfied condition A” could be handled through a case expression.

The first follow-up note highlighted the problem of a significant change in the handling of CTEs (a.k.a. “with” subqueries); the point I want to discuss in this note is about the step where Connor tried to use a no_merge() hint in a failed attempt to isolate some activity inside an inline view and made the comment that: “no_merge isn’t explicitly designed for inline views of this type”.

Here’s the SQL to create the data, and a version of the query that uses an inline view rather than a CTE:

rem
rem     Script:         ignore_no_merge.sql
rem     Author:         Connor McDonald / Jonathan Lewis
rem     Dated:          Oct 2023
rem
rem
rem     Last tested 
rem             23.2.0.0
rem             19.11.0.0
rem

drop table t purge;

create table t as
select * from dba_objects
/

insert into t ( object_id, owner, object_name)
values (-1,'BAD','BAD')
/

commit;

select
        /*+ no_merge(@inline) */
        count(object_name)
from    (
        select  /*+ qb_name(inline) */
                *
        from    t
        where object_id > 0
        )
where   sqrt(object_id) between 1 and 10
/

I’ve used a query block name for my inline view, then referenced that name in a no_merge() hint in the main query block to ensure that there’s no possible ambiguity about where the hint should apply. When I run the query in either 19.11 or 23.2 Oracle (often – but not always, thanks to statistical effects) raises the error “ORA-01428: Argument -1 is out of range.” which rather suggests that the no_merge() has been ignored and that the optimizer has managed to transform the query into a single query block combining the predicates rather than forcing the inline query block to eliminate the threat before the main query block evaluates the square root.

If you check the hidden parameters you will find a pair that go a long way back in Oracle’s history (at least 8i for the first of the pair and 10g for the second) relating to view merging :

_complex_view_merging    enable complex view merging 
_simple_view_merging     control simple view merging performed by the optimizer

For a very long time I’ve believed that the no_merge() hint was supposed to block complex view merging but not block simple view merging (and the view merging required for our query is an example of simple view merging). I’m no longer sure why I believed this, maybe it used to be true in the very early days maybe I just assumed that complex and simple view merging worked differently, either way I don’t have any scripts that I might have used to test the idea.

In fact it’s quite simple to show that Oracle hasn’t ignored the hint – even though it’s easy to assume that it has because the run-time response doesn’t seem to have changed. But take a look at the execution plan for the query which, pulled from memory with the alias and outline format options, looks like this:

| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   193 (100)|          |
|   1 |  SORT AGGREGATE     |      |     1 |    66 |            |          |
|   2 |   VIEW              |      |   184 | 12144 |   193   (5)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T    |   184 |  7360 |   193   (5)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - INLINE / MY_VIEW@SEL$1
   3 - INLINE / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$1")
      NO_ACCESS(@"SEL$1" "MY_VIEW"@"SEL$1")
      FULL(@"INLINE" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter((SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10
              AND "OBJECT_ID">0))

As you can see, the inline view has suffered some type of transformation effect that results in both the object_id and sqrt() predicates being applied during the tablescan – but the query block named inline still appears in the plan, and we still have an object called my_view.

Compare this with the plan we get if we don’t use the no_merge() hint:

----------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   193 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    40 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   184 |  7360 |   193   (5)| 00:00:01 |
---------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$5843E819
   2 - SEL$5843E819 / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$5843E819")
      MERGE(@"INLINE" >"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"INLINE")
      FULL(@"SEL$5843E819" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10
              AND "OBJECT_ID">0))

In this case we can see the optimizer’s strategy in the merge(@inline>sel$1) hint in the Outline Data, a new query block name has appeared (derived from the (default) sel$1 combined with inline) and my inline view my_view has disappeared from the plan.

So how come the optimizer has managed to combine predicates from two different query blocks without merging those query blocks? The answer is in another hidden parameter: _optimizer_filter_pushdown, which defaults to true. The optimizer has pushed the sqrt() predicate down into the inline view. We can avoid the problem by setting the parameter to false, for example:

select
        /*+ 
                no_merge(@inline) 
                opt_param('_optimizer_filter_pushdown','false')
        */
        count(object_name)
...

select * from table(dbms_xplan.display_cursor(format=>'outline alias'))
/

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |   190 (100)|          |
|   1 |  SORT AGGREGATE     |      |     1 |    79 |            |          |
|*  2 |   VIEW              |      | 73570 |  5675K|   190   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T    | 73570 |  2873K|   190   (4)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - INLINE / MY_VIEW@SEL$1
   3 - INLINE / T@INLINE

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$1")
      NO_ACCESS(@"SEL$1" "MY_VIEW"@"SEL$1")
      FULL(@"INLINE" "T"@"INLINE")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10))
   3 - filter("OBJECT_ID">0)


The overall shape of the plan doesn’t change but we now see the sqrt() predicate being applied at operation 2 (after the table data has been safely filtered at operation 3). We can also see that the Row estimate has changed dramatically – the 73,570 rows extracted from the table is reasonable (but the optimizer seems to have forgotten about making any allowance for the impact of the sqrt() predicates, it should have reported the same 184 that we see in the earlier plans – the normal 5% of 5% for “I haven’t got a clue what a range using a function will do”.)

For this example I’ve used the opt_param() hint to disable filter pushdown, you could, in principle, set the parameter at the session or system level. Whatever you do the effect does not get captured in the Outline Data – so if you can’t rewrite the SQL (and if you could you might opt for a safer strategy anyway) your best bet might be to create an SQL Patch to inject the hint.

Summary

When you use the no_merge() hint to block view merging you may get the impression that it hasn’t worked because the optimizer can still do filter predicate pushdown, so a predicate that you wanted to apply “outside” the view still gets applied “inside” the view. Don’t assume you know what the plan looks like just because the outcome of running the query hasn’t changed – check the actual execution plan and especially the Predicate Information. You may need to set _optimizer_filter_pushdown to false to achieve the “delayed” application of your predicate, possibly by means of an SQL Patch.

deprecation warning

Sat, 2023-10-07 16:07

As the years roll by, Oracle Corp. introduces new procedures (and functions) to replace older procedures that were inefficient, or gave away too much privilege, or simply had used bad naming standards. In relatively recent versions Oracle Corp. has introduced a pragma in the declaration of such procedures/functions that allows their usage to be flagged as deprecated, with a custom warning or error message (for example suggesting the relevant replacement). Here’s a tiny extract from the script dbmssess.sql (located in $ORACLE_HOME/rdbms/admin) that creates the package dbms_session (though if you’re on an old version of Oracle it’s in dbmsutil.sql)

  function is_role_enabled(rolename varchar2) return boolean;
  pragma deprecate(IS_ROLE_ENABLED, 'Use DBMS_SESSION.CURRENT_IS_ROLE_ENABLED 
                   or DBMS_SESSION.SESSION_IS_ROLE_ENABLED instead.');

And here’s a simple script demonstrating how you can make the effect of that pragma visible:

rem
rem     Script:         deprecation_warning.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2023
rem 

alter session set plsql_warnings = 'enable:all';

create or replace procedure test_deprecate 
authid current_user
as
begin
        if 
                dbms_session.is_role_enabled('test_role')
        then
                dbms_output.put_line('Enabled');
        end if;
end;
/

set linesize 120
show errors

alter session set plsql_warnings = 'disable:all';

You’ll notice that I’ve set plsql_warnings to allow Oracle to display any warnings in response to the show errors command. Here’s the result I got from running this script from SQL*Plus on 23.2 (the only version I have to hand at present).

Session altered.


SP2-0804: Procedure created with compilation warnings

Errors for PROCEDURE TEST_DEPRECATE:

INE/COL ERROR
-------- ----------------------------------------------------------------------------------------------------
6/3      PLW-06020: reference to a deprecated entity: IS_ROLE_ENABLED declared in unit DBMS_SESSION[117,12].
         Use DBMS_SESSION.CURRENT_IS_ROLE_ENABLED
         or DBMS_SESSION.SESSION_IS_ROLE_ENABLED instead.



Session altered.

SQL*Plus reports that the procedure has been created “with compilation warnings”, so it will be saved and will execute, but could be corrected in some way.

The call to show errors tells us what the warning was, and we can see the name and location of the object that is deprecated and the message that was the 2nd parameter to the pragma.

You’ll notice that the error reported is PLW-06020, so we could change the “alter session” command to read:

alter session set plsql_warnings = 'error:6020';

This would turn “deprecation” into a fatal error, and SQL*Plus would report:

Warning: Procedure created with compilation errors.

As a final (and strategic) option – that you should consider using from time to time on all your PL/SQL code – you could enable just PLW-06020 as a warning that should be raised during compilation, so you know where you are using deprecated procedures or functions but still end up with compiled code.

alter session set plsql_warnings = 'enable:6020';

CTE Upgrade

Thu, 2023-10-05 10:17

The “common table expression” (CTE) also known as “with clause” or “factored subquery” has been the target of an optimizer upgrade in recent versions of Oracle that may cause a problem for some people – including, possibly, a few of my former clients and readers who may have adopted a suggestion I made for working around particular performance problems.

It was a recent video by Connor McDonald that brought the change to my attention so I’m going to use parts of his script to demonstrate the effect. We start with a very simple table, and a requirement to avoid an Oracle error:

rem
rem     Script:         with_clause_pushdown.sql
rem     Author:         Connor McDonld / Jonathan Lewis
rem     Dated:          Oct 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.2.0.0        Pushdown 
rem             21.8.0.0        Pushdown (Connor)
rem             19.21.0.0       Pushdown (Connor)
rem             19.11.0.0       No pushdown
rem

drop table t purge;

create table t as
select * from dba_objects
/

insert into t ( object_id, owner, object_name)
values (-1,'BAD','BAD');
commit;

prompt  =====================================
prompt  The original query crashes (probably)
prompt  =====================================

select  count(object_name)
from    t
where
        object_id > 0
and     sqrt(object_id) between 1 and 10
/

I have a query that will fail if it tries to take the square root for the row where object_id = -1 (Oracle hasn’t implemented complex numbers, or even imaginary ones). But the query should be okay because I’ve got a predicate that filters out all the rows where the object_id is not positive. Try running the query, though, and you’ll (probably) find that Oracle responds with “ORA-01428: argument ‘-1’ is out of range”.

The optimizer decides the order it wants to apply the predicates so you can’t guarantee that the order you need will be the order used. This used to be a problem that showed up fairly frequently on various Oracle forums at a time when applications were often guilty of storing numeric data in character columns and ran into problems with queries that had predicates like “character_column = numeric_constant” These would sometimes fail because of an implicit to_number() being applied in a row where the column value was not numeric, resulting in the ORA-01722 conversion error. There used to be several ways to bypass this type of problem, none of them particularly desirable but all perfectly adequate as temporary (one hoped) workarounds. Here’s one such workaround:

with pos as (
        select /*+ materialize */ *
        from   t
        where  object_id > 0 
) 
select  count(object_name)
from    pos
where   sqrt(object_id) between 1 and 10
/

We put the “protective” predicate into a “with” subquery and materialize the subquery so that the call to sqrt() is applied to an (internal) global temporary table that holds only the “safe” rows. When Connor demonstrated this method the query crashed with the ORA-01428 (sqrt(-1) problem) that the original query had exhibited. This led Connor to extol the virtues of avoiding dirty tricks to fool the optimizer because an enhancement might appear that made the dirty trick fail, and then he demonstrated an alternative, elegant, restatement of the query that couldn’t be called a dirty trick.

As someone not employed by Oracle Corp. my response was irritation that the optimizer was doing something that (clearly) it shouldn’t, and to wonder whether it was deliberate or a bug that could be worked around. The first step, of course, was to repeat Connor’s test on my default setup of Oracle – which happened to be 19.11 – to find that the optimizer did exactly what I expected and produced the right answer rather than an error. Exchanging email with Connor I learned that the had tested on 19.21 and 21.8 – so something must have changed between 19.11 and 19.21. [Ed: see comment #4 below – reader reports show that the change appeared in 19.21]

I don’t have a 21c VM handy on my laptop but I do have the 23cFREE developer VM (not yet upgraded to 23.3), so I started that up, constructed the model, and started work on the (newly) problematic query. Here are the execution plan from both versions of Oracle, 19.11 first followed by 23.2

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |     1 |    79 |   759   (1)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665C_11934AEE |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                           | 73563 |  9482K|   398   (1)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                             |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                             | 73563 |  5675K|   361   (1)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D665C_11934AEE | 73563 |  9482K|   361   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |     1 |    79 |   238   (5)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6681_1D6D63A |       |       |            |          |
|*  3 |    TABLE ACCESS FULL                     | T                          |   211 |  8229 |   236   (5)| 00:00:01 |
|   4 |   SORT AGGREGATE                         |                            |     1 |    79 |            |          |
|*  5 |    VIEW                                  |                            |   211 | 16669 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6681_1D6D63A |   211 |  8229 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

The shape of the plan doesn’t change in the upgrade – but 23c crashed with error ORA-01428 while Oracle 19c produced the correct expected result. There’s a clue about why this happened in the plans, but there’s a terrible flaw in the in the way I reported the execution plans: where’s the predicate information!

Predicate Information (identified by operation id):    -- 19.11 version
---------------------------------------------------
   3 - filter("OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

Predicate Information (identified by operation id):    -- 23c version
---------------------------------------------------
   3 - filter(SQRT("T"."OBJECT_ID")>=1 AND SQRT("T"."OBJECT_ID")<=10 AND "OBJECT_ID">0)
   5 - filter(SQRT("OBJECT_ID")>=1 AND SQRT("OBJECT_ID")<=10)

If you compare the filter() predicate information for operation 3 you can see that the optimizer in 23c has pushed the predicate from the main query block into the CTE’s query block( and, frankly, I’d call that a design error). The clue in the main body of the plan that this had happened was in the Rows column – 73,563 for 19.11 but only 211 for 23.2: the former looks about right for a copy of dba_objects, the latter is clearly much smaller that you might expect. If you’re wondering why the sqrt() predicate is repeated at operation 5, you’re not alone – I can’t think of a good reason why that’s there.

Workaround

Yes, I am going to tell you how to work around a (nominal) optimizer enhancement. This change may slip through unnoticed in testing and only be discovered, too late, in production, so some people may need a short term hack to deal with it. (Setting optimizer_features_enable to 19.1.0 didn’t make the problem go away on my system.)

Recognising that Oracle was invoking some form of filter predicate “pushdown” I searched the list of parameters, SQL Hints, and fix controls, for possibly contenders. There don’t appear to be any (new) hints related to this behaviour, but there is a parameter _optimizer_wc_filter_pushdown in 23.2 that doesn’t exist in 19.11, and it defaults to true; the description of this parameter is “enable/disable with clause filter predicate pushdown”. Setting this parameter to false (alter session/system or through an opt_param() hint) does what we need.

There are a couple of fix controls that also appear to be related, but I haven’t examined them yet:

30235754  QA control parameter for with clause filter predicate pushdown
29773708  with clause filter predicate pushdown

Again, these are not present in 19.11.

Summary

Somewhere in the 19c and 21c timelines an enhancement to the optimizer allows Oracle to minimise the size of materialized CTEs by pushing filter predicates from the query blocks that use the CTE into the CTE definition. This could result in some existing queries failing (probably with a “conversion error”) because they depended on the CTE applying some “limiting” predicates that stopped illegal values from reaching a later predicate.

If you are caught by this enhancements you can disable the feature by setting the parameter _optimizer_wc_filter_pushdown to false for the session, system, or specific query (you may be able to do the last through an SQL Patch). You ought to seek a strategic fix, however, and Connor’s video shows how you may be able to bypass the materialized CTE trick completely by using a CASE/END expression.

Case Study

Tue, 2023-09-05 10:14

A recent post on the Oracle SQL and PL/SQL forum posted the following query with a complaint that it produced the wrong results:

select  count(*) 
from    all_synonyms
left join 
        all_objects b 
on      (b.owner,object_name)=(select table_owner,table_name from dual)
;

This caused a little confusion to start with since the opening complaint was that the query sometimes produced the wrong results and sometimes produced Oracle error ORA-01799: a column may not be outer-joined to a subquery. The difference appears, of course, because the restriction applies to older versions of Oracle but was lifted by 12c.

The other point of confusion is that there’s no easy way to tell from the output that the result is wrong – especially when the count was in the thousands. This point was addressed by the OP supplying an image of the first rows (and a subset of the columns) for the query when “count(*)” was changed to just “*” (and switched from the “all_” views to the “dba_” views: the output for columns dba_objects.owner and dba_objects.object_name reported the same pair of values for every row – in the case of the OP this was “SYS” and “DUAL”.

Investigation

You could attack this problem from several directions – you could just edit the query to avoid the problem (the structure is a little curious); you could investigate the 10053 trace file to see what optimizer is doing at every step of the way, or you could try to simplify the model and see if the problem still appears. So I created table my_synonyms as a copy of the data in dba_synonyms, and my_objects as a copy of the data in dba_objects and created an index on my_objects(owner, object_name). So my query turned into:

select
        *
from    
        my_synonyms   syn
left join 
        my_objects    obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

You’ll notice that I’ve given both tables a “meaningful” alias and used the aliases for every column, and I’ve also added a query block name (qb_name) to the subquery against dual – because I’d assumed that the subquery probably played a key role in messing the optimizer up (I was wrong) and I wanted to be able to track it easily.

The query produced the wrong results. In my case every row reported the object owner and name as “SYS” / “PRINT_TABLE” – selecting 11,615 rows. The basic execution plan, pulled from memory using dbms_xplan.display_cursor()) looked like this:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       | 46548 (100)|          |
|   1 |  MERGE JOIN OUTER                      |                 | 11615 |  6261K| 46548   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL                    | MY_SYNONYMS     | 11615 |   805K|    18   (6)| 00:00:01 |
|   3 |   BUFFER SORT                          |                 |     1 |   481 | 46530   (1)| 00:00:02 |
|   4 |    VIEW                                | VW_LAT_881C048D |     1 |   481 |     4   (0)| 00:00:01 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |     1 |   132 |     4   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |     1 |       |     3   (0)| 00:00:01 |
|   7 |       FAST DUAL                        |                 |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJ"."OWNER"= AND "OBJ"."OBJECT_NAME"=)

The optimizer had transformed the query into a form using a lateral view – and there have been bugs with lateral views in the past, particularly relating to decorrelation – so maybe the problem was there and a few tests with various hints or fix_controls, or optimizer parameter settings to disable certain features might identify the source of the problem; but before doing that I thought I’d just get a small result set to check the results and execution plan in detail by adding a predicate “rownum <= 12” re-executing with rowsource execution stats enabled, and pulling the plan including the Query Block Names and Object Aliases, Outline Data, Column Projection Information and not forgetting the kitchen sink.

The query produced results that looked as if they were likely to be correct – I no longer had a repeating, incorrect, object owner and name on every row. I also had a plan that had changed to:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |      1 |        |     12 |00:00:00.01 |      24 |
|*  1 |  COUNT STOPKEY                         |                 |      1 |        |     12 |00:00:00.01 |      24 |
|   2 |   NESTED LOOPS OUTER                   |                 |      1 |     12 |     12 |00:00:00.01 |      24 |
|   3 |    TABLE ACCESS FULL                   | MY_SYNONYMS     |      1 |     12 |      7 |00:00:00.01 |       3 |
|   4 |    VIEW                                | VW_LAT_881C048D |      7 |      1 |     12 |00:00:00.01 |      21 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |      7 |      1 |     12 |00:00:00.01 |      21 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |      7 |      1 |     12 |00:00:00.01 |      17 |
|   7 |       FAST DUAL                        |                 |      7 |      1 |      7 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------------------

There’s a “count stopkey” operation to handle the “rownum <= 12” predicate, of course; but most significantly the “merge join outer” operation has changed to a “nested loop outer” operation.

Repeating the test but with “rownum <= 1200” I still got results that looked correct, and still got a “nested loop outer”. A third attempt, with “rownum <= “30000” switched back to wrong results and a “merge join outer” (with a “count stopkey”). So – initial conclusion – it’s the implementation of the merge join that has gone wrong.

Resolution

If the only problem is the choice of join mechanism we can patch the code (by hand or with an SQL_PATCH) to bypass the problem. Given the simple pattern that produces the problem I should be able to find two critical hints in the Outline Data, one which is a leading() hint that dictates the order (my_synonyms, vw_lat_xxxxxxxx) – the user will presumably have a different name generated for their lateral view – and a use_merge() or use_nl() hint that dictates the mechanism to use to join vw_lat_xxxxxxxx to my_synonyms.

In both plans the Outline Data showed me the following leading() hint (cosmetically adjusted):

leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)

This was then followed by a use_nl() hint in the correct case:

use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

and by a use_merge_cartesian() hint in the erroneous case (so it’s possibly just the “cartesian” bit that’s going wrong.:

use_merge_cartesian(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

So, as a final test, I edited the query to include the leading() and use_nl() hints, and ran it.

select  /*+
                leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)
                use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)
        */
        *
from    
        my_synonyms syn
left join 
        my_objects obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

The problem of the repetitious “SYS”/”PRINT_TABLE” disappeared, and the total number of rows selected increased from 11,615 to 12,369 (mostly due to package and package bodies having the same owner and name and so doubling up a number of rows from the dba_synonyms table).

Summary

It’s almost always a good idea to simplify a messy problem.

Always check the execution plan, including Predicate Information, Outline Data and Query Block / Object Alias details.

Although I didn’t use the results in this case, executing with rowsource execution statistics enabled can be very helpful.

This case focused on a detail that looked as if a “newish” feature might be guilty; but the simple model suggested it’s an implementation detail of an old feature (possibly reached because the optimizer has applied a new feature). We can work around it very easily, and we can build a very simple test case to raise with Oracle in an SR.

force_match patch

Sat, 2023-08-19 08:56

A recent post on one of the forums (needs a MOS account) asked if it was possible to change a batch of SQL Plan Profiles from “exact match” to “force match” without going through the hassle of dropping and recreating them. This took me back a couple of years to an occasion when I was asked if it was possible to make an SQL Patch “force match” – the answer was “yes” and the method is pretty simple (once you’ve done the research). There are two methods – the dangerous one and the “safe” one, but both are hacks and there’s no reason for Oracle Corp. to support you if either goes wrong.

Background

When you create an SQL Patch Oracle calls the same PL.SQL code that it uses to create SQL Plan Profiles, so the method applies to both features.

  • Patches and profiles are referenced through an SQL_Handle, but the handle is derived from a signature, and the signature comes from v$sql which holds two columns exact_matching_signature and force_matching_signature,
  • All the components of the profile or patch are keyed through the signature – and if a profile is set to force_match Oracle uses the force_matching_signature as the signature, otherwise is uses the exact_matching signature. Patches, of course, only ever use the exact_matching_signature.
  • The SQL_handle consists of a text prefix attached to the “rawtohex()” representation of the signture, viz:
sql_handle = 'SQL_' || to_char(signature,'fm0xxxxxxxxxxxxxxx')
  • Finally, for SQL Plan Profiles there is a flag on the profile “header” record that tells Oracle whether it is exact matching or force matching.

So, to convert an SQL Plan Profile or an SQL Patch from exact matching to force matching all you have to do is find the two signatures for the statement, then update the data dictionary to change the SQL_Handle and flag on the header, and change the signature on all the connected tables!

If you’re not keen on updating the data dictionary directly there’s a “safer” option.

  • create the appropriate staging table (see dbms_sqldiag (patches) or dbms_spm (profiles))
  • pack (export to) the staging table
  • update the staging table with relevant signature, handle and flag values.
  • unpack (import from) the staging table (the “replace” option doesn’t seem to work correctly for SQL Patches so you’ll have to delete the existing SQL Patches)
Demonstration (sql_patch)
rem
rem     Script:         patch_hack_force_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct  2021 / Nov 2022
rem

create table t1 as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4    -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(n1, n2);

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format=>'-hint_report'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------

SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

Plan hash value: 2432955788

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=15 AND "N2"=15)


I’ve started by creating an indexed table and executing a query that clearly ought to use the index. I’ve flushed the shared_pool before running the query and reported the actual plan with the “-hint_report” format so that I don’t get a complaint about my “find” and “this” hints.

column exact_signature new_value m_exact_signature
column force_signature new_value m_force_signature

column exact_handle new_value m_exact_handle
column force_handle new_value m_force_handle

column sql_id new_value m_sql_id

select
        sql_id, child_number,
        to_char(exact_matching_signature,'999999999999999999999') exact_signature,
        'SQL_' || to_char(exact_matching_signature,'fm0xxxxxxxxxxxxxxx') exact_handle,
        to_char(force_matching_signature,'999999999999999999999') force_signature,
        'SQL_' || to_char(force_matching_signature,'fm0xxxxxxxxxxxxxxx') force_handle
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

SQL_ID        CHILD_NUMBER EXACT_SIGNATURE        EXACT_HANDLE          FORCE_SIGNATURE        FORCE_HANDLE
------------- ------------ ---------------------- --------------------- ---------------------- ---------------------
ayxnhrqzd38g3            0   14028721741576327483 SQL_c2b00818bb74d13b     2882966267652493885 SQL_28025a8bbadc523d


After checking the plan (I have plenty of “prompt” comments and “pause” lines in the full text) the next step is to pick out the sql_id and two signatures from v$sql – reporting the signatures both in their original numeric form and after “conversion” to SQL Handle form. The next step is to create a patch; and the best patch I can make to demonstrate the point is a patch that forces the optimizer into a full tablescan. You’ll notice that I’ve started by dropping it with a “when others then null” exception – that’s just a lazy bit of code to deal with the pre-existence of the patch if I’ve crashed the program on a previous execution.

set serveroutput on

begin
        begin
                dbms_sqldiag.drop_sql_patch('patch_hack_force');
        exception
                when others then null;
        end;

        dbms_output.put_Line(
                'Patch result is: ' ||
                dbms_sqldiag.create_sql_patch(
                        sql_id  => '&m_sql_id',
                        name    => 'patch_hack_force',
                        hint_text => 'full(t1@sel$1)',
                        validate => true
                )
        );

end;
/

Patch result is: patch_hack_force

PL/SQL procedure successfully completed.

So now we need to see what Oracle has stored as the patch, and whether it works.

column signature format 999999999999999999999

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

FOR              SIGNATURE SQL_TEXT
--- ---------------------- --------------------------------------------------------------------------------
NO    14028721741576327483 select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

1 row selected.

So we have an SQL Patch in the data dictionary; it’s not “force matching”, and its signature does match the value we reported above as the “exact signature”. Now we can test the effects of the patch – starting with a flush of the shared_pool so that we don’t pick up the previous execution plan by accident.

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

We re-execute the original query so see if the plan changes and the SQL Patch is reported, then execute a “similar enough” query to see if the patch has any effect on it:

  COUNT(*)
----------
         1

1 row selected.

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    25 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("N1"=15 AND "N2"=15))

Note
-----
   - SQL patch "patch_hack_force" used for this statement


23 rows selected.


  COUNT(*)
----------
         1

1 row selected.


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16

Plan hash value: 2432955788

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=16 AND "N2"=16)

As you can see the plan for the original query is now using a full tablescan and the SQL Patch is reported as used; but only for an exact matching query. The extremely similar query where the 15’s have been changed to 16’s still uses an index range scan and (trust me) didn’t have any note mentioning the SQL Patch.

Time for hacking
execute dbms_sqldiag.create_stgtab_sqlpatch(table_name         => 'my_sql_patches')
execute dbms_sqldiag.pack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force')

select signature, sql_handle, sqlflags from my_sql_patches;

             SIGNATURE SQL_HANDLE                       SQLFLAGS
---------------------- ------------------------------ ----------
  14028721741576327483 SQL_c2b00818bb74d13b                    0

update my_sql_patches set
        signature = &m_force_signature,
        sql_handle = '&m_force_handle',
        sqlflags = 1
where
        signature = &m_exact_signature
and     sql_handle = '&m_exact_handle'
;

1 row updated.

You’ll notice that I’ve been cautious with my update to the staging table – I’ve updated it only if the signature and sql_handle both match the values I extracted with my query against v$sql above

So now we need to import that contents of the staging table and overwrite the original SQL Patch. Unfortunately for patches it seems as if you have to drop the original – using the “replace” option without first dropping resulted in error ORA-13841:

ORA-13841: SQL profile named patch_hack_force already exists for a different signature/category pair

You might note particularly that this refers to an SQL profile, not an SQL patch. Just one of several indications that the SQL Profile and SQL Patch code have a significant overlap.

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/

execute dbms_sqldiag.unpack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force', replace=>true)

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

FOR              SIGNATURE SQL_TEXT
--- ---------------------- --------------------------------------------------------------------------------
YES    2882966267652493885 select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15

After the unpack you see that the patch is now force matching, and its signature matches the force_matching_signature from the orginial query against v$sql above. The only thing left to do is check if the “similar enough” query now uses the same SQL patch:

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

  COUNT(*)
----------
         1

1 row selected.


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    25 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("N1"=16 AND "N2"=16))

Note
-----
   - SQL patch "patch_hack_force" used for this statement

Footnotes

It’s worth reminding everyone that force-matching doesn’t take place for statements that have a mixture of bind variables and literals (even if the bind variables are only in the select list) until 19.12.

You need various privileges to run through the example I’ve shown. To make sure everything worked as expected I dropped and created a specific user to run the tests each time. Here’s the script I used to do this:

drop user patch_hacker cascade;

create user patch_hacker identified by patch_hacker;

grant create session to patch_hacker;
grant create table to patch_hacker;

alter user patch_hacker default tablespace test_8k_assm;
alter user patch_hacker quota unlimited on test_8k_assm;

grant execute on dbms_sqldiag to patch_hacker;
grant administer sql management object to patch_hacker;

grant select on dba_sql_patches to patch_hacker;

grant execute on dbms_xplan to patch_hacker;
grant select on v_$sql to patch_hacker;
grant select on v_$session to patch_hacker;
grant select on v_$sql_plan to patch_hacker;
grant select on v_$sql_plan_statistics_all to patch_hacker;

grant alter session to patch_hacker;
grant alter system to patch_hacker;

The privileges to select on the various v$ (v_$) views are necessary to take full advantage of dbms_xplan.display_cursor() The alter system/session were for testing only (enabling trace, flushing the shared pool). And, of course, you’ll need to pick your own tablespace for setting a quota and making default.

The only thing that worries me about using this strategy is the privilege on “administer sql management object” – is this something that requires the diagnostic and performance packs?

For convenience, the following drop-down includes the complete SQL script, with pauses, comments, prompts, etc. (and a call to setenv.sql which simply sets up a load of column and format definitions, purges the recyclebin, and does general tidying).

Click here to expand / contract the complete SQL script
rem
rem     Script:         patch_hack_force_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2021 / Nov 2022
rem     Purpose:        
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem     Not tested
rem             19.3.0.0
rem             18.3.0.0
rem             12.2.0.1
rem
rem     Notes:
rem     How to hack an SQL Patch to be "force matching"
rem
rem     See also:
rem     https://jonathanlewis.wordpress.com/2023/08/19/force_match-patch/
rem     https://community.oracle.com/mosc/discussion/4547670/change-existing-sql-profile-s-to-force-match-true-without-recreating
rem

drop table t1;
drop table my_sql_patches;

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
--        when others then raise;
end;
/

@@setup

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4    -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(n1, n2);

spool patch_hack_force_2a.lst

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format=>'-hint_report'));

-- pause Press return for sql_id etc.

column exact_signature new_value m_exact_signature
column force_signature new_value m_force_signature

column exact_handle new_value m_exact_handle
column force_handle new_value m_force_handle

column sql_id new_value m_sql_id

select
        sql_id, child_number,
        to_char(exact_matching_signature,'999999999999999999999') exact_signature,
        'SQL_' || to_char(exact_matching_signature,'fm0xxxxxxxxxxxxxxx') exact_handle,
        to_char(force_matching_signature,'999999999999999999999') force_signature,
        'SQL_' || to_char(force_matching_signature,'fm0xxxxxxxxxxxxxxx') force_handle
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

-- pause press return

set serveroutput on

begin
        begin
                dbms_sqldiag.drop_sql_patch('patch_hack_force');
        exception
                when others then null;
        end;

        dbms_output.put_Line(
                'Patch result is: ' ||
                dbms_sqldiag.create_sql_patch(
                        sql_id  => '&m_sql_id',
                        name    => 'patch_hack_force',
                        hint_text => 'full(t1@sel$1)',
                        validate => true
                )
        );

end;
/

column signature format 999999999999999999999

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

pause Did we create a patch ?

alter system flush shared_pool;
set serveroutput off

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

prompt  ==============================
prompt  Patch NOT used for 16/16 query
prompt  ==============================

pause press return

prompt  About to create patch table
execute dbms_sqldiag.create_stgtab_sqlpatch(table_name         => 'my_sql_patches')

prompt  About to pack patch table
execute dbms_sqldiag.pack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force')


prompt  ===============================================
prompt  check and update the patch in the staging table
prompt  ===============================================

select signature, sql_handle, sqlflags from my_sql_patches;

-- set verify on

update my_sql_patches set
        signature = &m_force_signature,
        sql_handle = '&m_force_handle',
        sqlflags = 1
where
        signature = &m_exact_signature
and     sql_handle = '&m_exact_handle'
;

commit;

prompt  =================================================
prompt  Dropping and replacing patch that has been hacked
prompt  =================================================

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/

execute dbms_sqldiag.unpack_stgtab_sqlpatch  (staging_table_name => 'my_sql_patches', patch_name =>'patch_hack_force', replace=>true)

select  force_matching, signature, sql_text 
from    dba_sql_patches
where   name = 'patch_hack_force'
;

pause   Is the patch now force matching?

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor(format => '-hint_report'));

prompt  ==============================
prompt  Patch now used for 16/16 query
prompt  ==============================

drop table my_sql_patches purge;

begin
        dbms_sqldiag.drop_sql_patch('patch_hack_force');
exception
        when others then null;
end;
/


spool off

set echo off
set autotrace off
set serveroutput on

alter session set events '10046 trace name context off';
alter session set statistics_level = typical;
alter session set tracefile_identifier = '';

begin
        dbms_application_info.set_module(
                module_name     => null,
                action_name     => null
        );

        dbms_application_info.set_client_info(
                client_info     => null
        );
end;
/


set doc off
doc


#

xmltype

Tue, 2023-08-08 02:13

This is a follow-up to a comment on an earlier post about LOBs and sizing. The comment raised a question about finding the stored size of xmltype data (which is essentially a LOB enhanced to include “methods”) particularly when it’s declared with compression.

This is one of those questions where you do a little searching on the Internet and MOS to see if you can get any clues, then start doing a little guessing and checking, hoping to get an answer which is at least reasonably close most of the time. So here’s a starting point for addressing the question – a script to create some data:

rem
rem     Script: xml_size_2.sql
rem     Author:  Jonathan Lewis
rem     Dated:  June 2010 / July 2023
rem

create table t1(
        id              number,
        clob_col        xmltype,
        bin_col         xmltype,
        binc_col        xmltype
) 
xmltype clob_col store as clob
xmltype bin_col  store as binary xml
xmltype binc_col store as binary xml(compress)
;

insert /*+ append */ into t1
select 
        rownum, 
        sys.xmltype(other_xml, null, null, null),
        sys.xmltype(other_xml, null, null, null),
        sys.xmltype(other_xml, null, null, null) 
from
        v$sql_plan 
where
        other_xml is not null
;

commit;

All I’ve done is create a table with an id column and three possible ways of storing an xmltype, first as a CLOB, then as a (new-ish) binary XML, then (highlighted) as a binary XML with a storage clause specifying compression.

Once I’ve defined the table, I need to create or find some XML to store in it and an easy place to find some is in the other_xml column of v$sql_plan. If you’ve never looked at the other_xml column from v$sql_plan, here’s a sample row:

select 
        other_xml 
from 
        v$sql_plan 
where 
        other_xml is not null 
and     dbms_lob.getlength(other_xml) < 400 
and     rownum = 1
/

OTHER_XML
--------------------------------------------------------------------------------
<other_xml><info type="db_version">19.0.0.0</info><info type="parse_schema"><![C
DATA["SYS"]]></info><info type="plan_hash_full">273779573</info><info type="plan
_hash_2">0</info><stats type="compilation"><stat name="bg">0</stat></stats><qb_r
egistry><q o="2" f="y"><n><![CDATA[INS$1]]></n><f><h><t><![CDATA[TABPART$]]></t>
<s><![CDATA[INS$1]]></s></h></f></q></qb_registry></other_xml>

In this case it’s just a couple of headline details about the optimizer environment and the query block registry. When I populated my table this value ended up in the row with id = 1383, and I’ll be using the row a few more times in this note.

Now that we’ve got some stored data we can try a few obvious queries that might tell us something about the size of what we’ve stored. First a call to dbms_lob.getlength(), trying to extract the XMLtype as a CLOB:

select 
        dbms_lob.getlength(t.clob_col.getclobval()) c1,
        dbms_lob.getlength(t.bin_col.getclobval())  c2,
        dbms_lob.getlength(t.binc_col.getclobval()) c3
from    t1 t 
where   id = 1383
;

        C1         C2         C3
---------- ---------- ----------
       382        484        484


Two details stand out:

  • The binary XML columns are reported with the same length whether or not they’ve been declared with compression – Oracle is decompressing and converting to a CLOB before calculating the length.
  • The binary XML version of the data seems to be longer than the CLOB version of the data; but once again that’s also the difference between content and display, as we can easily show

Take a look at the actual other_xml value I reported further up the page – it’s just a continuous stream of characters with no convenient formatting, then take a look at what I get if I report the clob_col and bin_col using the getclobval() function (without passing the result to getlength()):

select
        t.clob_col.getclobval() c1,
        t.bin_col.getclobval()  c2
from    t1 t
where   id = 1383
;

C1                                                                               C2
-------------------------------------------------------------------------------- ------------------------------------------------------------
C1                                                                               C2
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
<other_xml><info type="db_version">19.0.0.0</info><info type="parse_schema"><![C <other_xml>
DATA["SYS"]]></info><info type="plan_hash_full">273779573</info><info type="plan   <info type="db_version">19.0.0.0</info>
_hash_2">0</info><stats type="compilation"><stat name="bg">0</stat></stats><qb_r   <info type="parse_schema"><![CDATA["SYS"]]></info>
egistry><q o="2" f="y"><n><![CDATA[INS$1]]></n><f><h><t><![CDATA[TABPART$]]></t>   <info type="plan_hash_full">273779573</info>
<s><![CDATA[INS$1]]></s></h></f></q></qb_registry></other_xml>                     <info type="plan_hash_2">0</info>
                                                                                   <stats type="compilation">
                                                                                     <stat name="bg">0</stat>
                                                                                   </stats>
                                                                                   <qb_registry>
                                                                                     <q o="2" f="y">
                                                                                       <n><![CDATA[INS$1]]></n>
                                                                                       <f>
                                                                                         <h>
                                                                                           <t><![CDATA[TABPART$]]></t>
                                                                                           <s><![CDATA[INS$1]]></s>
                                                                                         </h>
                                                                                       </f>
                                                                                     </q>
                                                                                   </qb_registry>
                                                                                 </other_xml>

The output format for the CLOB version of the XML looks exactly as it did when I reported the other_xml from v$sql_plan, but the output format of the BINARY XML version has been formatted intelligently (and prettily) with lots of spaces used to make the start and end of balancing tags easy to see. Those spaces aren’t stored in the column, they’re injected in real-time by the internal code for handling binary XML types – and that bit of code is also used to supply the input to getclobval(); so the length we’re seeing with our call to dbms_lob.getlength() is the length of that space-padded XML, not the length of the stored XML.

In passing, there have been several examples published of code to produce a readable version of v$sql_plan.other_xml. In the dim and distant past it took some messy and slightly convoluted strategies, but a fairly recent option is simply to tell Oracle to treat it as a binary XML:

select  xmltype.createxml(other_xml) 
from    v$sql_plan
where   other_xml is not null
and     dbms_lob.getlength(other_xml) < 400 
and     rownum = 1
;

XMLTYPE.CREATEXML(OTHER_XML)
--------------------------------------------------------------------------------
<other_xml>
  <info type="db_version">19.0.0.0</info>
  <info type="parse_schema"><![CDATA["SYS"]]></info>
  <info type="plan_hash_full">1417780948</info>
  <info type="plan_hash_2">0</info>
  <stats type="compilation">
    <stat name="bg">0</stat>
  </stats>
  <qb_registry>
    <q o="2" f="y">
      <n><![CDATA[INS$1]]></n>
      <f>
        <h>
          <t><![CDATA[OBJERROR$]]></t>
          <s><![CDATA[INS$1]]></s>
        </h>
      </f>
    </q>
  </qb_registry>
</other_xml>

Now that we’ve been redirected to think about the “unadorned” XML data, let’s see what happens if we try to extract and measure just the XML data in each column:

select
        dbms_lob.getlength(t.clob_col.xmldata)  x1,
        dbms_lob.getlength(t.bin_col.xmldata)   x2,
        dbms_lob.getlength(t.binc_col.xmldata)  x3
from    t1 t 
where   id = 1383
;

        X1         X2         X3
---------- ---------- ----------
       382        208        208

The reported length for the XMLtype that was stored as a CLOB column hasn’t changed, and that’s not surprising since the CLOB value is the XML and the column type tells Oracle how to treat the content. (There are a couple of gotchas about that number anyway but we’ll come to that later.)

On the other hand we can see that the “binary XML” version of the data is much smaller than the character version (though, again, the compressed value seems to have been decompressed before reporting). Here’s what the content of the binary XML data looks like when we report the value rather than the length:

SQL> select t.bin_col.xmldata from t1 t  where id = 1383;

XMLDATA
--------------------------------------------------------------------------------
9F01639E000000C83EA4C82BBAC0090EAF64625F76657273696F6E0731392E302E302E30D9D7D5C0
0B0EAF70617273655F736368656D61A6052253595322D9D5C00D0EAF706C616E5F686173685F6675
6C6C08323733373739353733D9D5C00A0EAF706C616E5F686173685F320030D9D8C83B88C00A0EAF
636F6D70696C6174696F6EC8715BC00162EE62670030D9D9C857E5C87240C000409E32C0005FAC79
C877D4A605494E532431D9C82425C800A6C8708DA6085441425041525424D9C83BEDA605494E5324
31D9D9D9D9D9D9A0

If you count the characters you’ll find there are 416 – but t.bin_col.xmldata is a BLOB so the value has gone through a rawtohex() conversion for display purposes, which means every byte of the BLOB is represented by a two character hexadecimal value.

So it looks as if the standard LOB and XML handling function are not going to be much help in telling us about the actual stored size of the data, especially if we’re interested in the compressed version. What about the vsize() or sys_op_opnsize() functions that Oracle used to use (and still uses, depending on version and estimate_percent) in the SQL executed by calls to dbms_stats to calculate the avg_col_len in user_tab_columns?

select
        vsize(clob_col) v1,
        vsize(bin_col)  v2,
        vsize(binc_col) v3
from    t1 t 
where   id = 1383
;

        V1         V2         V3
---------- ---------- ----------
       901        344        353

select
        sys_op_opnsize(clob_col) o1,
        sys_op_opnsize(bin_col)  o2,
        sys_op_opnsize(binc_col) o3
from    t1 t 
where   id = 1383
;
        O1         O2         O3
---------- ---------- ----------
       901        344        353

This looks as a little strange – vsize() and sys_op_opn_size() produce the same results as each other, but claim the CLOB column is much larger than we’ve seen previously, and also say the the compressed binary XML is longer than the non-compressed column!

The CLOB anomaly is easy to explain – Oracle has rewritten the original other_xml value using a two-byte fixed width character set (which has turned 382 bytes into 764 bytes) and then there’s the typical CLOB overhead to worry about and both vsize() and sys_op_opnsize() really mess that up so an extra 140 bytes shouldn’t cause panic, just irritation. The reported sizes of the binary XML columns are also significantly larger than the getlength(xmldata) query suggested – but (ballpark figures) the difference is the same 140 bytes we saw with the CLOB column.

The oddity of an XML value getting longer after compression might be explained by the simple fact that sometimes that just happens with compression algorithms – and our careful choice of a very short initial value might have increased the chance of that happening – so we ought to check a few other larger values before we worry too much.

select
        id,
        vsize(clob_col) v1,
        vsize(bin_col)  v2,
        vsize(binc_col) v3
from    t1 t
where   id in (14, 31, 67, 116)
;

        ID         V1         V2         V3
---------- ---------- ---------- ----------
        14       3807       1223        792
        31       3795       1217        791
        67       3495       1110        729
       116       3403       1096        732
.

The sample suggests that we had, indeed, picked a fairly special value – which means there may be “edge cases” that we have to worry about if we want a complete picture of how to estimate the storage requirements of binary XML.

Compression

Whatever other problems we have with finding “the size” of XML types, it looks as if we don’t have a way of finding any details about the compressed content other than doing block dumps (and following LOB Locators if necessary) unless, perhaps, we can work out how Oracle is compressing the XML and emulate that compression.

How might we (or Oracle) compress a bytes stream? The first simple guess comes from remembering that Oracle gave us the utl_compress package offering Lempel-Ziv (LZ) compression a very long time ago.

describe utl_compress (edited)

FUNCTION LZ_COMPRESS RETURNS RAW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SRC                            RAW                     IN
 QUALITY                        BINARY_INTEGER          IN     DEFAULT

FUNCTION LZ_COMPRESS RETURNS BLOB
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SRC                            BLOB                    IN
 QUALITY                        BINARY_INTEGER          IN     DEFAULT

FUNCTION LZ_COMPRESS RETURNS BLOB
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SRC                            BINARY FILE LOB         IN
 QUALITY                        BINARY_INTEGER          IN     DEFAULT

The lz_compress() function in this package is conveniently overloaded, so we don’t even have to worry about picking exactly the right function for our data, but we may have to test some variations of the quality parameter which exists to allows us to trade speed against level of compression – fastest is 1, slowest but most compressed is 9, default is 6.

Here’s a starting point for out investigations – extract the XML data and see how well it compresses.

select
        id,
        dbms_rowid.rowid_relative_fno(rowid)   rel_file_no,
        dbms_rowid.rowid_block_number(rowid)   block_no,
        dbms_lob.getlength(t.clob_col.xmldata) clob_len,
        dbms_lob.getlength(t.bin_col.xmldata)  bin_len,
        dbms_lob.getlength(utl_compress.lz_compress(t.bin_col.xmldata,1))   comp_len1,
        dbms_lob.getlength(utl_compress.lz_compress(t.bin_col.xmldata,6))   comp_len6,
        dbms_lob.getlength(utl_compress.lz_compress(t.bin_col.xmldata,9))   comp_len9
from
        t1 t
order by
        id
/

I’ve included two calls to the dbms_rowid package to give the block address of each row, and three calls to the utl_compress.lz_compress function passing their return values to a call to get the length of the resulting BLOB. Here’s a small (carefully chosen) subset of the results:

        ID REL_FILE_NO   BLOCK_NO   CLOB_LEN    BIN_LEN  COMP_LEN1  COMP_LEN6  COMP_LEN9
---------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
...
        33          36       1050       2152       1411        640        626        626
        34          36       1050       2355       1578        693        670        670
        35          36       1050       2687       1467        670        636        636
...
        64          36       1062      10799       6317       1883       1608       1607
        65          36       1062        892        536        438        429        429
...
        76          36       1070       1049        609        479        476        476
        77          36       1070        972        553        445        441        441
...
        84          36       1073       2349       1385        726        693        692
        85          36       1073       2382       1397        734        702        701
        86          36       1073        838        482        411        407        407
...
       157          36       1109       1055        628        464        460        459
       158          36       1109       1130        682        490        483        483
...
       467          36       2308      14795       9337       2752       2252       2240
       468          36       2308      14796       9338       2752       2252       2241


You’ll notice that I’ve picked blocks that have more than one row in them. After picking the rows I’ve listed above I then dumped the corresponding blocks (alter system dump datafile X block Y;) and extracted just those lines from the resulting trace files that report column lengths (grep “col [0-3]” tracefilename). I’ll start with block 36,1109 (highlighted) as that’s an easy one to talk about. I’ll comment beside each line using the less well known Oracle “double dash” comment prefix:

grep "col  [0-3]" {tracefile name}

col  0: [ 3]  c2 02 3a          -- decimal 157
col  1: [2141]  -- getlength() shows 1055.  2141 = 2 * 1055 + 31
col  2: [659]   -- getlength() shows 628.   659 - 628 = 31
col  3: [493]   -- lzcompress(,1) = 464.    493 - 464 = 29

col  0: [ 3]  c2 02           -- decimal 158
col  1: [2291]  -- getlength() shows 1130.  2291 = 2 * 1130 + 31
col  2: [713]   -- getlength() shows 682.   713 - 682 = 31
col  3: [519]   -- lzcompress(,1) = 490.    519 - 490 = 29

There’s a degree of consistency in the results that suggests it will be worth spending time looking more closely at utl_compress.lzcompress(binc_col,1). So let’s look at the entire dump of id = 157:

tl: 3309 fb: --H-FL-- lb: 0x0  cc: 4
col  0: [ 3]  c2 02 3a
col  1: [2141]
 00 70 00 01 02 0c 80 80 00 02 00 00 00 01 00 00(0c 77 42 12)08 49 48 90 08
 43 01 00{08 3e}01 00 3c 00 6f 00 74 00 68 00 65 00 72 00 5f 00 78 00 6d 00
 6c 00 3e 00 3c 00 69 00 6e 00 66 00 6f 00 20 00 74 00 79 00 70 00 65 00 3d
...
 69 00 6e 00 65 00 5f 00 64 00 61 00 74 00 61 00 3e 00 3c 00 2f 00 6f 00 74
 00 68 00 65 00 72 00 5f 00 78 00 6d 00 6c 00 3e
col  2: [659]
 00 70 00 01 01 0c 00 80 00 01 00 00 00 01 00 00(0c 77 42 13)02 7f 48 90 02
 79 01 00{02 74}01 9f 01 63 9e 00 00 00 c8 3e a4 c8 2b ba c0 09 0e af 64 62
 5f 76 65 72 73 69 6f 6e 07 31 39 2e 30 2e 30 2e 30 d9 d7 d5 c0 0b 0e af 70
...
 54 45 53 49 41 4e 28 40 22 53 45 4c 24 31 22 20 22 54 53 22 40 22 53 45 4c
 24 31 22 29 d9 d8 d9 d9 a0
col  3: [493]
 00 70 00 01 01 0c 00 80 00 01 00 00 00 01 00 00(0c 77 42 14)01 d9 40 d2 80
 00 05{01 cb}21 81 05{01 c4}11 00 02 74 01{01 c4}78 01[55 90 cd 6e]d3 40 14
 85 1d a4 0a 52 44 57 88 0d 9b 68 1a e4 40 a5 c8 36 04 c9 aa 54 65 5c 4f d3
...
 5f 36 1f f2 48 8e b5 a9 d3 29 e7 96 31 c4 97 1d 85 bb cf 08 f4 88 d7 21 b0
 8d 3d 4e 18 c5 4e 39 59 76 cb[8b 7f f9 0b]2d d7 db b3

I’ve put brackets around a 4 byte group in each column that I can recognize as a component of the LOB ID (generated from the sequence idgen1$) – the positioning of these 4 bytes shows some stability in the way that the columns are stored as LOBs.

I’ve also put curly brackets {} around 2-byte groups that look about right as the lengths of the LOBs. For the clob_col 0x083e = 2,110 (decimal), which is the byte-size of the XML data. For the bin_col 0x0274 = 628 (decimal) which is the size of the XML data, and its position matches the position of the byte-size reported by the CLOB column. For the compressed binc_col we can find the value 0x01cb = 459 (decimal) in almost the same place, and it’s almost the value the value we’re hoping for, but we can also see nearby two occurrences of the value 0x01c4 = 452 (decimal) which are sufficiently close in position and value to looks as if they are something to do with the size of the XML data – so we may have some trouble if we try to use prior information about basic CLOBs and BLOBs to work out what’s going on with this compressed XML data.

Feeling a little pessimistic I’m still going to try comparing the raw dump of the column from the trace file with the complete result from utl_compress.lz_compress(t.bin_col.xmldata,1). I’ve shown about 120 bytes from the column dump above, so here’s a similar sized chunk from the output of lz_compress(), which I’ve edited to put a space between every pair of characters, set to report 25 bytes per line, introducing a break and step down in two of the lines:

SQL> l
  1  select
  2  	utl_compress.lz_compress(t.bin_col.xmldata,1) comp
  3  from	t1 t
  4* where	id = 157
SQL> /

1F 8B 08 00 00 00 00 00 04 03
                              55 90 CD 6E D3 40 14 85 1D A4 0A 52 44 57 88
0D 9B 68 1A E4 40 A5 C8 36 04 C9 AA 54 65 5C 4F D3 41 89 5D 79 26 FC 4A 5C
39 A9 4B A2 3A 89 1B B7 20 DE 80 0D AF D0 B2 40 5E 21 75 CF CA CB 79 00 A2
...
D3 29 E7 96 31 C4 97 1D 85 BB CF 08 F4 88 D7 21 B0 8D 3D 4E 18 C5 4E 39 59
76 CB 8B 7F F9 0B
                  D1 E8 8A 8E 74 02 00 00

If you look at the start and end bytes of the section I’ve isolated and highlighted (55 90 cd 6e) .. (8b 7f f9 0b) and go back up to the dump file lines 19 and 23 (highlighted) you’ll see those groups of 4 bytes surrounded by square brackets, and see that the bytes between them agree with the bytes in the lz_compress() output.

The length of the lz_compress() output was 464 bytes, we can now see 18 bytes of “wrapper”, leaving 446 bytes of compressed data. Converting to hexadecimal 446 = 0x01be – and it would be nice if we could tie that up with the 0x01c4 that we spotted earlier on except it seems to be 6 bytes too short. But take another look at line 19 (highlighted) of the trace file dump – between the 2nd occurrence of “01 04” and the 4 byte starting point of the lz_compress() data we have two bytes, then on line 23 (highlighted) we have 4 more bytes immediately after the 4 byte ending point of the lz_compress() data. It seems reasonable to assume that the value 0x01c4 includes those 6 bytes.

Summary (so far)

Headline: Ignoring edge cases, Oracle appears to be using utl_compress.lz_compress(,1) on the binary XMLtype when storing it compressed, and this could give us a good indication of how to estimate the space we would use (or could save) by compressing binary XML types.

There are however, a number of messy details that might be difficult to unravel. It looks as if 18 (wrapper) bytes in the compressed data are removed before the resulting BLOB is stored, and 6 (wrapper) bytes are added. For “small” compressed XMLtypes the BLOB overheads then come to a further 41 bytes – which includes 2 bytes giving the length of the BLOB content. No doubt there are numerous (small and large) variations on this outline affected by the actual size of the compressed XML.

Upgrades – 23c

In the earliest versions of Oracle that included LOB types block dumps included a symbolic dump of any LOB columns. This feature disappeared in 12c and still hadn’t reappeared in 19c, which is why I’ve been messing around picking bytes from streams of raw data.

From time to time I have lobbied (pun intended) for the symbolic dump to restored. After getting this far with my draft note it occurred to me to re-run my test on a 23c instance – and the dump is back! So I’m going to continue the investigation in a different version of Oracle, and write the results up in a follow-up blog note.

Parallel DML

Thu, 2023-07-13 11:35

Here’s a note that was prompted by a report about a delete that started up lots of PX processes but apparently did a serial delete even though parallel DML has been forced (alter session force parallel DML parallel 10). The description of “serial” was subsequently refined to explain that “only one of the 48 spawned sessions was doing all the work / IO etc.” so this was not a case of parallel DML being disabled deliberately by Oracle leaving the QC to do the work.

Unfortunately the OP didn’t supply an execution plan (despite 3 people suggesting that this would be helpful), so I’ve dug out an old script that might explain the observed behaviour. It’s based on a simple copy of data from the view all_objects and tries to delete a large fraction of that data using a simple parallel delete statement.

rem
rem     Script:         para_dml_lock.sql
rem     Dated:          June 2010
rem     Author:         Jonathan Lewis
rem

create table t1 as select * from all_objects;
-- create index t1_i1 on t1(owner, object_id, object_name);
-- create index t1_i2 on t1(object_name, owner, object_id);

alter system flush buffer_cache;

spool para_dml_lock

set linesize 180
set pagesize 60
set trimspool on
set serveroutput off

alter session force parallel dml;
alter session set statistics_level = all;
-- alter session set events '10046 trace name context forever, level 8';

delete
         /*+ 
                parallel(t1,2)
                gather_plan_statistics 
        */
from    t1 
where   owner = 'SYS'
;

select * from table(dbms_xplan.display_cursor(format=>'parallel allstats'));

break on  qcsid skip 1 on server_group on server_set on server# on sid skip 1

select
        ses.qcsid,
        ses.server_group,
        ses.server_set,
        ses.server#,
        lck.sid,
        lck.type,
        lck.id1, lck.id2,
        lck.lmode, lck.request,
        lck.ctime, lck.block
from
        V$lock          lck,
        v$px_session    ses
where
        ses.sid = lck.sid
-- and     lck.sid in (select sid from v$session where username = 'TEST_USER')
order by
        case when ses.qcsid = ses.sid then 0 else 1 end,
        ses.qcsid,ses.server_group, ses.server_set, ses.server#,
        lck.sid, lck.type desc , lck.id1, lck.id2
;

spool off

set serveroutput on
alter session set events '10046 trace name context off';
alter session set statistics_level = typical;

You’ll see that I’ve allowed for code to create a couple of indexes as that may affect the way that Oracle behaves; I’ve also enabled and reported rowsource execution statistics so that I can get an idea of the volume of data moving through each process. The query against v$lock and v$px_session is the reason I wrote the script 13 years ago – there are some unexpected locking details when parallel DML is involved, but I won’t be looking at those in this note.

The Tablescan path

Here’s the execution plan from 19.11.0.0 when there are no indexes on the table. (The table has about 64,000 rows spread over nearly 1,300 blocks):

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |          |      1 |        |        |      |            |      4 |00:00:00.49 |      13 |      3 |
|   1 |  PX COORDINATOR       |          |      1 |        |        |      |            |      4 |00:00:00.49 |      13 |      3 |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |      0 |   2465 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   3 |    DELETE             | T1       |      2 |        |  Q1,00 | PCWP |            |      0 |00:00:00.94 |   54061 |   1288 |
|   4 |     PX BLOCK ITERATOR |          |      2 |   2465 |  Q1,00 | PCWC |            |  44733 |00:00:00.14 |    1345 |   1267 |
|*  5 |      TABLE ACCESS FULL| T1       |     26 |   2465 |  Q1,00 | PCWP |            |  44733 |00:00:00.05 |    1345 |   1267 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER"='SYS')

Note
-----
   - Degree of Parallelism is 2 because of hint

There’s nothing terribly interesting in this plan. We can see that the delete (operation 3) took place in parallel because it appears as a child to the PX send QC (operation 2) which means the PX processes did the deletion before sending the query coordinator messages to tell it what they’d done. We can also infer (before reading the Note) that the degree of parallelism was 2 because the delete operation reports 2 in the Starts column. The 26 starts for the Table Access Full (operation 5) is also something we might expect (for tablescans and index fast full scans) as Oracle has a basic target of 13 rowid ranges per process.

One important “anomaly” is that the plan reports a final A-Time of 0.49 seconds while the delete at operation 3 reports 0.94 seconds. Time collected by enabling rowsource execution statistics should always be treated a little cautiously, but in this case the explanation of the contradiction is simple: 0.94 is the sum of the elapsed times seen by the two PX processes, 0.49 is the elapsed time from the viewpoint of the query coordinator.

Index Effects

Here’s the plan if we repeat the experiment but include the first of the two index creation statements:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                    |          |      1 |        |        |      |            |      2 |00:00:00.66 |      10 |      3 |       |       |          |
|   1 |  PX COORDINATOR                     |          |      1 |        |        |      |            |      2 |00:00:00.66 |      10 |      3 |   337K|   337K|     1/0/0|
|   2 |   PX SEND QC (RANDOM)               | :TQ10002 |      0 |   2465 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    INDEX MAINTENANCE                | T1       |      2 |        |  Q1,02 | PCWP |            |      2 |00:00:00.31 |    2290 |      9 |       |       |          |
|   4 |     PX RECEIVE                      |          |      2 |   2465 |  Q1,02 | PCWP |            |  44735 |00:00:00.15 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE                  | :TQ10001 |      0 |   2465 |  Q1,01 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       DELETE                        | T1       |      2 |        |  Q1,01 | PCWP |            |      0 |00:00:01.14 |   53228 |   1068 |       |       |          |
|   7 |        PX RECEIVE                   |          |      2 |   2465 |  Q1,01 | PCWP |            |  44733 |00:00:00.18 |       0 |      0 |       |       |          |
|   8 |         PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |   2465 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   9 |          PX SELECTOR                |          |      2 |        |  Q1,00 | SCWC |            |  44733 |00:00:00.04 |     380 |    380 |       |       |          |
|* 10 |           INDEX RANGE SCAN          | T1_I1    |      1 |   2465 |  Q1,00 | SCWP |            |  44733 |00:00:00.02 |     380 |    380 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - access("OWNER"='SYS')

Note
-----
   - Degree of Parallelism is 2 because of hint

The first thing to notice, perhaps, is that the driving operation for the delete is an index range scan rather than a full tablescan. The range scan is called by a PX SELECTOR operation which means only one PX process is used to do it – so the range scan is, indeed, serialized and in the past (prior to 12c, I think) we might have seen serialization like this handled by the query coordinator. Having walked the index in order the single PX process distributes by hashing (PX SEND HASH (BLOCK ADDRESS)) the block ids that it has derived from the rowids it has identified in the index – it’s at this point that the work switches from serial to parallel (IN-OUT showing S->P).

The purpose of the hash distribution by block id is to ensure that the PX processes that do the actual deletion don’t end up colliding on table blocks and introducing buffer busy waits and problems with ITL waits. Because I flushed the buffer cache I can say that the 1,068 blocks read by the delete operation correspond to the number of table blocks that hold rows where owner = ‘SYS’ and if I enable the extended SQL trace (10046, level 8) I can see that the blocks read by the two relevant PX processes are made up of two disjoint sets.

This pattern of behaviour may, of course, explain why the OP thought the delete was running serially and only one PX process was working: with 48 processes live (which means DOP 24 for this index-driven plan) and only one PX process doing a range scan and spreading table blocks across 24 processes, a relatively small delete might give the impression that the one process called by the PX Selector was the only one doing anything.

Moving on up the plan we can see that we now do a range based distribution to the “next” set of PX processes that handle the index maintenance. Range-based so that the two processes don’t collide on index leaf blocks (except, perhaps, for one leaf block where the two ranges meet – that’s a detail to be investigated). Once the index has been updated the PX processes sends a completion message to the QC.

It’s interesting to note that when we delete 44,733 rows from the table it takes 53,228 buffer gets, but when we maintain (delete) the same number of index entries it takes only 2,290 buffer gets. This suggests that the table deletes might be operating as single rows deletes while the index deletes are array (block) based.

There are always more things to check, of course, when investigating a mechanism – taking a snapshot of the session stats is a necessity if, for example, you want to find out what the buffer gets were for. I didn’t do that, but one other thing I did check was the contents of the view v$pq_tqstat so that I could address the question of whether the PX SEND HASH (BLOCK ADDRESS) operation actually spread the work well or whether all the rowid information went to a single PX process in the next set, making the plan effectively operate like a serial plan.

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCYa
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P000                 44733    2877765    100.00     100.00          0          0           0
                                             1 P001                     0         48      0.00       0.00          0          0           0
                      Consumer               1 P002                 23394    1499012     52.30      52.09         29         26           0
                                             1 P003                 21339    1378801     47.70      47.91         30         27           0

                    1 Ranger                 1 QC                       2     180625    100.00     100.00          5          2           0
                      Producer               1 P002                 24565    1968590     52.30      52.14          2          1           0
                                             1 P003                 22407    1806894     47.70      47.86          2          1           0
                      Consumer               1 P000                 22353    1749518     49.97      48.67         34         27           0
                                             1 P001                 22382    1845341     50.03      51.33         42         35           0

                    2 Producer               1 P000                     1        154     50.00      50.00         13          4           0
                                             1 P001                     1        154     50.00      50.00         11          4           0
                      Consumer               1 QC                       2        308    100.00     100.00         63         19           0


Reading the table queue IDs (TQ_ID) from top to bottom we can see that P000 was obviously the selected process that ran the index range scan, and it split the data a little unevenly between P002 and P003. I don’t understand how P002 and P003 can report more rows forwarded to P000 and P001 for index maintenance, but it seems a little suspicious that the excess is extremely close to 5%, and then the excess disappears when you check the rows consumed by P000 and P001. Check the number of rows and bytes consumed by the QC in the guise of a Ranger, though:180,000 is a lot of bytes for 2 rows, so maybe that’s actually 2 packets (one per PX process) and maybe it’s the QC that has consumed the 5% excess as its ranging sample .

More Indexes

Naturally there are all sorts of variations on this delete that we could investigate – what happens with partitioned tables, what if they have local indexes, what about global and globally partitioned indexes. Do IOTs and bitmap indexes have any interesting effects. What about domain indexes of different types. I’m not going to push the envelope in any exciting direction, but it’s worth checking what happens in the very common case where there are multiple indexes – how much changes when I re-run the test but create the second index.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                    |          |      1 |        |        |      |            |      2 |00:00:01.10 |      10 |      3 |       |       |          |
|   1 |  PX COORDINATOR                     |          |      1 |        |        |      |            |      2 |00:00:01.10 |      10 |      3 |   337K|   337K|     2/0/0|
|   2 |   PX SEND QC (RANDOM)               | :TQ10002 |      0 |   2465 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    INDEX MAINTENANCE                | T1       |      2 |        |  Q1,02 | PCWP |            |      2 |00:00:00.97 |    4836 |    482 |       |       |          |
|   4 |     PX RECEIVE                      |          |      2 |   2465 |  Q1,02 | PCWP |            |  89468 |00:00:00.30 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE                  | :TQ10001 |      0 |   2465 |  Q1,01 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       DELETE                        | T1       |      2 |        |  Q1,01 | PCWP |            |      0 |00:00:01.57 |   53207 |   1068 |       |       |          |
|   7 |        PX RECEIVE                   |          |      2 |   2465 |  Q1,01 | PCWP |            |  44733 |00:00:00.18 |       0 |      0 |       |       |          |
|   8 |         PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |   2465 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   9 |          PX SELECTOR                |          |      2 |        |  Q1,00 | SCWC |            |  44733 |00:00:00.04 |     380 |    380 |       |       |          |
|* 10 |           INDEX RANGE SCAN          | T1_I1    |      1 |   2465 |  Q1,00 | SCWP |            |  44733 |00:00:00.02 |     380 |    380 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The only significant differences between this plan and the previous plan appears at the operations relating to index maintenance. We see the A-Rows value for the PX RECEIVE at operation 4 has doubled (there are now 2 sets of index entries to handle rather than 1), and the number of buffer visits and disk reads have gone up from 2,290 and 9 respectively to 4,836 and 482 respectively as the 2nd index has to be updated (remember the first index would have been well cached at this point because it was used to identify the rows to be deleted).

Summary

There are many reasons why Oracle will refuse to use parallel DML, even when hinted or “forced”. You can, however, be fooled into thinking that it’s not using parallel DML when it is but the pattern of the data and work distribution introduces (possibly temporarily) an extreme skew in the diagnostic data you’re looking at. It’s best to model what you’re doing at a small scale so that you can investigate the options that you are likely to see before you run into an unusual case in production.

Footnote

I said that I’d written this little script some time ago to check on the locking that took place for operations like a parallel update etc. The results may be a little surprising if you’ve not thought about them before. What locks can you hold on a table if someone else it holding an exclusive lock? You might think there was a clue is in the word “exclusive”, but perhaps not really. Here’s the state of the locks I washolding for the indexed case after I had done the delete but before the commit (the tablescan case is very similar but report only 2 PX sessions):

     QCSID SERVER_GROUP SERVER_SET    SERVER#        SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- ------------ ---------- ---------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
        30                                            30 TX     589843      21080          6          0          1          0
                                                         TM     181671          0          6          0          1          0
                                                         PS          1          0          4          0          1          0
                                                         PS          1          1          4          0          1          0
                                                         PS          1          2          4          0          1          0
                                                         PS          1          3          4          0          1          0
                                                         AE        134    3792595          4          0          7          0

                      1          1          1        192 TX     458756      18683          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          0          4          0          1          0

                                            2         17 TX     262167      18430          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          1          4          0          1          0

                                 2          1         41 TX     196620      20448          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          2          4          0          1          0

                                            2        207 TX     655383      20651          6          0          1          0
                                                         TM     181671          0          1          0          1          0
                                                         TM     181671          1          4          0          1          0
                                                         PS          1          3          4          0          1          0

The AE lock is the Editioning lock and isn’t currently of interest.

The QC and the 4 PX sessions all hold separate TX (transaction) locks in exclusive mode. The QC also holds 4 PS locks, which represent the locks on the PX processes (the id1 / id2 values correspond to the inst_id / server_name columns in gv$px_process). These PS locks are held in share mode and are also held by the PX sessions – one each – identifying which process they correspond to. Finally, the QC is holding an exclusive lock on the table (TM / 181671, mode 6).

Despite the “exclusiveness” of the QC’s TM lock, the PX sessions are also holding locks on that table – in fact two locks each, one in mode 1 the other in mode 4. However you will note that the mode 1 lock is held on a TM lock with the (usual) id2 value of 0; while the mode 4 lock is held on a TM lock with the (unusual) id2 value of 1.

The main point of this footnote – which once was planned as the complete article based on the script – is simply to point out that when you start playing around with feature that are new to you, it’s a good idea to investigate some of the basic v$ views to find out what unusual variations you are likely to see if you put the feature into production.

Parallel DML – not

Wed, 2023-07-12 08:21

A fairly common question about parallel DML is: “Why isn’t it happening?” A fairly common (correct) answer to this question is: “Because you haven’t enabled it.” Unlike parallel query and parallel DDL which are enabled by default parallel DML could cause strange deadlocking side-effects so you have to enable it explicitly before you use it (and disable it afterwards), with the statements like:

alter session enable parallel dml;
alter session force parallel dml;
alter session force parallel dml parallel 3;
alter session disable parallel dml;

There are, however, many reasons why Oracle will disallow parallel DML (e.g. this example from 2013) and some of them are listed in the documentation (Link is for 19c). Bear in mind that some things that are listed are no longer true following updates, or are only true in particular circumstances; and some things that block parallel DML are not listed, or are described in such a way that they don’t appear to be listed.

To keep myself up to date I search the executable from time to time for “revealing phrases”, starting with a Linux execution of “strings -a oracle”. For parallel DML I piped the results through “grep -n -i PDML”. I won’t show you the entire (307 line) output from the call to grep, but I will pick out a set of consecutive lines which look very promising:

1827860:PDML disabled in session
1827861:no intrapart PDML and instead-of trigger on view
1827862:no intrapart PDML and only one partition
1827863:no intrapart PDML and two partitions or greater
1827865:PDML disabled for MERGE using conventional insert
1827866:no PDML for domain index during delete/merge/update 
1827867:no PDML on global temp table
1827868:default PDML not allowed
1827869:no PDML for domain indexes that is not parallel ready 
--      always PIDL on cursor temp table  
1827871:no PDML. IPD disabled on table with bitmap index
1827872:PDML disabled due to MTI with conventional load
1827873:PDML enabled in session
--      non-partitioned IOT
--      table locks disabled
1827876:DML violated a PDML restriction
1827877:kkfdtpdml
1827878:ctxflg:%x ctxxyfl:%x pdml %sallowed
1827879:object reference. PDML disallowed. kctflg:0x%x
1827880:arrays. PDML disallowed. kctflg:0x%x
1827881:nested tables. PDML disallowed. kctflg:0x%x
1827882:adts. PDML disallowed. kctflg:0x%x
1827883:delete with retry on row change hint. PDML disallowed
1827884:PDML disallowed for insert-values
1827885:PDML disallowed: Memoptimize for Write insert
--      MTI uniq constr violation                                   [ed: MTI = Multi-table insert ??]
1827887:PDML disallowed: conv insert into temp tab
1827888:PDML disallowed:conv insert into single frag && !IPD
1827889:ignore/change dupkey hint on INSERT. PDML disallowed
1827890:kkfdpdml_restrictions_10
1827891:PDML disallowed: MTI and HWM brokering not allowed
1827892:Complex ADT: PDML disallowed.
1827893:arrays. PDML disallowed. kctflg: 0x%x
--      arrays. PIDL disallowed. Not stored as securefile.
1827895:nested tables. PDML disallowed. kctflg: 0x%x
1827896:typed subquery. PDML disallowed. kctflg: 0x%x
1827897:object references. PDML disallowed. kctflg: 0x%x
1827898:kkfdpdml_restrictions: PDML allowed
1827899:PDML disallowed: Transaction duration Private Temp Table
1827900:Table with ADT columns and VPD Policy: PDML disallowed
1827901:update with retry on row change hint. PDML disallowed
1827902:serializable update. PDML disallowed
1827903: not in parallel long to lob. PDML disallowed
1827904:kkfdpdml_restrictions
--      Basicfile LOB column
1827906:remote table. PDML disallowed
1827907:fro is a view. PDML disallowed
1827908:PDML disallowed: query block can produce at most one row

You’ll notice that there are a number of occurrences of “PDML disallowed” in this list, which makes it look as if it may be the definitive list of notes that appear either in the 10053 trace file or the Notes section of the execution plan, or in the PX trace. There were, however, a few gaps as far as “consecutive” lines were concerned, so I went back to the strings output to fill them – these are the lines marked with “–“

I’m not going to expand on the content or relevance of the list, it’s just something to leave in place as a quick clue to check if you ever find that your DML isn’t executing parallel when you expect it to – do your circumstances seem to match any of the above conditions.

Case Study

Mon, 2023-06-26 05:32

Here’s an example of reading an execution plan that appeared on one of the Oracle forums recently. It’s a simple example, made harder by the initial (costmetic) presentation of the problem. Essentially the owner has said:

  • Here’s a query that’s taking too long.
  • Here’s a bit of the query that seems to be the slow bit
  • Here’s the live execution plan with rowsource execution stats enabled.

The query hadn’t arrived in an easily readable format – possibly because the OP didn’t know how to get a fixed font code display in the forum – and to make things harder the select list included several scalar subqueries and the text used the “commas at the start of line” convention. (Moreover it wasn’t immediately obvious that the first few lines of SQL shown were the first few lines of the problematic scalar subquery and not the start of the entire query.)

The OP had, however, supplied a link to a fixed format text of the execution plan, including the Query Block / Alias information, the Outline information (which showed they were running 19c with optimizer features set to 11.2.0.4), the Predicate information, and the Column Projection information.

I’ve extracted the key details from this output showing just the operations that claim to be responsible for most of the time of the execution. As you might guess, this is the sub-plan for the in-line scalar subquery that the OP had identified as the problem part, and in a follow-up posting had commented that with this subquery in place the query took one hour but with the query removed it “completed faster”.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name                          | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  25 |  SORT UNIQUE                                  |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |   1414 |00:19:44.48 |      91M|    176K|  2048 |  2048 |     2/0/0|
|  26 |   WINDOW SORT                                 |                               |   8149 |      1 |    46 |   145   (2)| 00:00:01 |  37811 |00:19:44.47 |      91M|    176K|   214K|   214K|     2/0/0|
|  27 |    NESTED LOOPS                               |                               |   8149 |        |       |            |          |  37811 |00:19:58.14 |      91M|    176K|       |       |          |
|  28 |     NESTED LOOPS                              |                               |   8149 |      1 |    46 |   143   (0)| 00:00:01 |     17M|00:14:44.21 |      73M|  90450 |       |       |          |
|  29 |      TABLE ACCESS BY INDEX ROWID              | OE_ORDER_LINES_ALL            |   8149 |      1 |    31 |   139   (0)| 00:00:01 |     17M|00:10:17.69 |      19M|   8604 |       |       |          |
|* 30 |       INDEX RANGE SCAN                        | XXOM_OE_ORDER_LINES_N100      |   8149 |    142 |       |     4   (0)| 00:00:01 |     17M|00:00:20.96 |     100K|   6570 |       |       |          |
|* 31 |      INDEX RANGE SCAN                         | XXOM_3LP_SYM_ORA_ORDER_N11    |     17M|      1 |       |     3   (0)| 00:00:01 |     17M|00:03:45.52 |      54M|  81846 |       |       |          |
|* 32 |     TABLE ACCESS BY INDEX ROWID               | XXOM_3LP_SYM_ORA_ORDER_LINES  |     17M|      1 |    15 |     4   (0)| 00:00:01 |  37811 |00:05:59.72 |      19M|  86126 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  30 - access("OL"."INVENTORY_ITEM_ID"=:B1 AND "OL"."SHIP_FROM_ORG_ID"=:B2)
  31 - access("ML"."LINE_ID"="OL"."LINE_ID" AND "ML"."ORG_ID"="OL"."ORG_ID")
  32 - filter("MODEL_STRING" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
  25 - (#keys=1) FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  26 - (#keys=3) "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "MODEL_STRING"[VARCHAR2,240], "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "ML"."LINE_ID"[NUMBER,22],
       "ML"."ORG_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22], "ML".ROWID[ROWID,10], FIRST_VALUE("MODEL_STRING") OVER ( PARTITION BY "OL"."INVENTORY_ITEM_ID" ORDER BY
       "OL"."CREATION_DATE" RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW )[240]

  27 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."ORG_ID"[NUMBER,22], "MODEL_STRING"[VARCHAR2,240], "ML"."LINE_ID"[NUMBER,22]
  28 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22],
       "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  29 - "OL".ROWID[ROWID,10], "OL"."LINE_ID"[NUMBER,22], "OL"."CREATION_DATE"[DATE,7], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22], "OL"."ORG_ID"[NUMBER,22]
  30 - "OL".ROWID[ROWID,10], "OL"."INVENTORY_ITEM_ID"[NUMBER,22], "OL"."SHIP_FROM_ORG_ID"[NUMBER,22]
  31 - "ML".ROWID[ROWID,10], "ML"."LINE_ID"[NUMBER,22], "ML"."ORG_ID"[NUMBER,22]
  32 - "ML".ROWID[ROWID,10], "MODEL_STRING"[VARCHAR2,240]

As I said in my reply, you have to be careful of two traps with execution plans with row source execution stats (gather_plan_statistics) enabled – first that enabling the feature can add a lot of overhead and be very misleading if the query includes a large number of very short operations, secondly that the workload and time information doesn’t always appear exactly where you feel it ought to be.

In this case we’ve got the OP saying the query completed in an hour while the report suggests that this part of the plan took 19 and three-quarter hours to complete! It seems likely that the discrepancy is a reporting error due to the side effect of the 17 million starts of operations 31 and 32, with a granularity error far in excess of the actual run-time.

If you want to check the rest of the execution plan, the driving body of the query (operations 123 to 175) has a few steps that report a few seconds with a total time of 26.90 seconds, and none of the other scalar subquery subplans report more that a few seconds. So it seems fairly clear that the necessary strategy for improving the performance of the whole thing is to find a way of improving the performance of this scalar subquery.

Note: addressing just the scalar subquery “in a vacuum” may be possible and may be sufficient. If this doesn’t work than the next – and more complex – strategy would be to take a wider view of the query and see if there were some way to get the optimizer to treat the scalar subquery differently – e.g. passing more access/filter predicates into the subquery, or finding a way of unnesting the subquery so that a brute force mechanism could be used once rather than the basic nested loop being used 8,149 times.

Analysis of subquery

I’ve shown the extracted plan for the critical subquery further up the page (which you can open from here in a new tab), here’s the in-line scalar subquery (with some cosmetic editing) that appeared in the original posting:

(
select distinct 
        first_value(model_string) over (
                partition by ol.inventory_item_id 
                order by ol.creation_date
        )
from
        xxom.xxom_3lp_sym_ora_order_lines       ml,
        apps.oe_order_lines_all                 ol
where 
        ol.inventory_item_id = pl.item_id                       -- po_lines_all
and     ol.ship_from_org_id  = pll.ship_to_organization_id      -- po_line_locations_all
and     ml.line_id           = ol.line_id
and     ml.org_id            = ol.org_id
and     model_string is not null
)       model_string,

I’ve highlighted the lines with the predicates that correlate the subquery back to the main query. You’ll note that there are two tables driving the subquery which means, in theory, that the join order and join method of those tables in the main query could (thanks to the way that scalar subquery caching is implemented) affect the number of times the subquery was called . For the purposes of this note I’m not going to worry about that detail, though.

The plan shows a simple nested loop, followed by a window sort (for the first_value() call), then a sort unique (for the distinct). We can see from the starts column that the subquery is called 8,149 times, and something that stands out quite sharply is that the index probe and table access for the second table in the nested loop (xxom_slp_sym_ora_order_lines) happens roughly 17 million times, i.e. for each row selected from oe_order_lines_all there’s an average of roughly 2,000 rows in identified and returned from xxom_slp_sym_ora_order_lines. Any tiny operation happening millions of times is likely to introduce some problematic side-effects, so it’s clearly worth having a look at this area of the plan (the fact that the values reported in the A-Time column are extreme just makes it stand out even more).

You have to be very cautious about the A-Time column when enabling rowsource execution statistics, it’s notorious for introducing double-counting and measurement error. For my own purposes I would ignore the silly figures at this point, but for the purposes of the case study I’ll make the suggestion that the 19 hours 58 minutes at operation 27 (nested loop) is the 14 hours and 44 minutes at operation 26 plus the 5 hours and 59 minutes at operation 32 (effectively double counting); then (leaving you to find the numbers and operations) 14:44 is 10:17 + 3:45 (double counting (again) with granularity errors); then 10:17 is double counting (yet again) 3:45 and 5:59; finally those last two figures have a measurement/granularity overhead that is larger than the actual time spent anyway. So ignore the exact details of the timing (once it’s given you a pointer), and look at the work done an output achieved.

Critically, we access xxom_3lp_sym_ora_order_lines 17 million times by rowid at operation 32 to return just 37,811 rows, and we also note that this 37,811 rows is the (8,149 versions of the) result set that is sorted for the first_value() function and sorted then again for the uniqueness operation. So a basic optimization strategy would be to see if we can reduce that 17 million to 37,811 earlier in the plan. Possibly there’s a different index that we could use that would allow us either to visit only the table rows we need (dropping the table access starts down to 37,811) or even to avoid visiting the table at all.

Option 1

Aiming for the simplest option – what do we do at the table: we have a filter predicate filter(“MODEL_STRING” IS NOT NULL), and the projection information says we’re only interested in “ML”.ROWID[ROWID,10], “MODEL_STRING”[VARCHAR2,240]. To sum up – if we can add model_string as a column of the index currently used then we could eliminate operation 32 entirely. The Outline Data (which I haven’t reproduced) tells us that the index in question has three columns (line_id, org_id, assembly_type), so we would have to decide whether to include the model_string as column 3 or column 4 – which we can’t do at this point because we have no information about how this index might otherwise be used.

Option 2

Adding a column to the index may eliminate some 17M function falls, 86K physical reads 19M buffer gets, but we still probe the index 17M times, taking 81K physical reads and 54M buffer gets; we also introduce the possibility of other queries performing less well because of the impact of the size and placement of the extra column – so we could consider a better approach: can we introduce a function-based index, or index on virtual columns, that would be much safer and also much smaller, and then modify the code to join through this index?

We access the table 17M times and find that the model_string is null in all but a few thousand cases. Does this give us a true picture of the whole table, or is it just a co-incidence in this case? If this is indicative of the whole table then an index defined as follows could be very effective:

(
        case when model_string is null then null else line_id end,
        case when model_string is null then null else org_id end,
	model_string
)

(For all recent versions of Oracle I would prefer to create two invisible, virtual columns, then create the index on the virtual columns and model_string. I would also investigate the meaning of the line_id and org_id columns since there may be scope of basic index compression on one or both these columns, which might dictate a better order for the first two columns).

With this definition a row with a null model_string will not appear in the index then, with a corresponding modification to the SQL, the work of probing the index would be reduced to a minimum – though it would still have to happen 17M times!

Option 3

It’s possible that the performance will be good enough after eliminating the probe of the second table and the corresponding nested loop operation, and reducing the work needed for the index probe; but it may be necessary to do far more. The fact that the available information suggests the model_string is often null and results in almost all the 17M probes returning no data prompts what might be the next (in terms of complexity) strategy to investigate – would it be possible to visit the xxom_3lp_sym_ora_order_lines table first and the oe_order_lines_all table second and only (we hope) a few thousand times.

Option 4

It doesn’t look as if a simple rewrite would be possible, but it’s very hard to tell given the number of tables in the query and the number of scalar subqueries in the select list and my ignorance of what all the tables represent and what the code is supposed to do, so I’m not going to think about that.

Instead I’ll go back to a point that I made at the start of this note: system is 19c but the optimizer features have been set back to 11.2.0.4. In 12.2.0.1 (possibly 12.1.0.1) Oracle introduced code to unnest scalar subqueries in the select list and it’s possible that an internal rewrite might manage to do something sensible with this subquery.

If it hadn’t been done already I’d try testing this query with the default optimizer features. If that was a disaster I’d repeat the exercise but start adding /*+ no_unnest */ hints to the scalar subqueries – maybe all but the critical one as a first test. I’d probably have to use some of the hints from the Outline Data of the 11g plan to disable some of the more esoteric 19c features.

If I didn’t get anywhere with the default 19c features I’d also do a few experiments with optimizer_features_enable set to 12.2.0.1 (which, I think, is the earliest version that supported scalar subquery unnesting – though it might have been 12.1.x.x).

Conclusion

A detailed execution plan, with runtime statistics generated through the gather_plan_statistics hint gives a very clear indication of where most of the query time is (apparently) spent. When a very small operation executes a very large number of times the effect of granularity errors often exaggerate the times so a little caution is always needed – especially where nested loops are concerned.

In the interests of keeping any changes as simple as possible a nice strategy is to ask two questions

  • can I make this operation happen less frequently
  • can I make this operation less resource intensive each time it happens

In our case, adding a column to an existing index could make one of the critical operations (table access by rowid) disappear completely; then, in a refinement of that localised approach, creating a function-based index (or index on virtual columns) might also make the remaining operation (index range scan) much less expensive.

Beyond that it was not immediately obvious how a simple rewrite could make the whole query more efficient, but allowing the optimizer to use some of its newer features might allow it to produce an internal rewrite that could eliminate a “row by row” process of calling scalar subqueries with embedded nested loops into a “bulk” process that unnested the subquery and allowed Oracle to find the relevant data for all the rows in a single operation before combining with a hash join.

Footnote

There is a little anomaly with the query that may have had some impact on the optimizer’s choice of execution path, but I didn’t want to get stuck into a detail that might be completely irrelevant when looking at a plan that flags up such a huge timing skew. The oddity appears in the Predicate information at operation 141 of the execution plan:

141 - filter(("PH"."ORG_ID"=9082 AND ((-1)=:BUYER OR "PH"."AGENT_ID"=:BUYER) AND (SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'OPEN' OR
SYS_OP_C2C(COALESCE("PH"."CLOSED_CODE",'OPEN'))=U'CLOSED')))

The sys_op_c2c() function calls tells us that there must be some “national language” activity going on somewhere in this query, and looking back at the full text of the query we can see this in the predicates:

AND COALESCE(ph.closed_code, 'OPEN') IN (N'OPEN',N'CLOSED')
AND PLL.SHIP_TO_ORGANIZATION_ID IN (N'12389',N'12390',N'12469',N'9083',N'12549')

If you’re wondering why I haven’t said anything about any sys_op_c2c() function calls appearing elsewhere in the predicate information, it’s because they don’t. Here (at operation 152) is the ship_to_organization_id predicate showing the character literals being coerced to numeric values:

 152 - access("MP"."ORGANIZATION_ID"="PLL"."SHIP_TO_ORGANIZATION_ID")
       filter(("MP"."ORGANIZATION_ID"=9083 OR "MP"."ORGANIZATION_ID"=12389 OR "MP"."ORGANIZATION_ID"=12390 OR "MP"."ORGANIZATION_ID"=12469 OR "MP"."ORGANIZATION_ID"=12549))

Character-set conversions and storing numbers as characters are mechanisms that make it hard for the optimizer to produce a reasonable cardinality estimate – and bad estimates usually produce bad plans.

Pages