Home » SQL & PL/SQL » SQL & PL/SQL » Join purchases to customers (19c)
Join purchases to customers [message #687660] Mon, 01 May 2023 06:27 Go to next message
Unclefool
Messages: 82
Registered: August 2021
Member
I have some code below, which is working fine. I am having difficulty figuring out the syntax to join purchases customer_id to the customer table in order to display the first_name and last_name after the customer_id. Any help would be appreciated. Below is my sample data and test case


ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'DD-MON-YYYY  HH24:MI:SS.FF';

 ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';


CREATE TABLE customers 
(CUSTOMER_ID, FIRST_NAME, LAST_NAME) AS
SELECT 1, 'Faith', 'Mazzarone' FROM DUAL UNION ALL
SELECT 2, 'John', 'Henry' FROM DUAL UNION ALL
SELECT 3, 'Lisa', 'Saladino' FROM DUAL;


CREATE TABLE purchases(
    ORDER_ID NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
    CUSTOMER_ID NUMBER, 
   PURCHASE_DATE TIMESTAMP
);


INSERT  INTO purchases
(CUSTOMER_ID, PURCHASE_DATE) 
SELECT 1, TIMESTAMP '2023-04-03 11:41:25' + NUMTODSINTERVAL ( LEVEL * 2, 'DAY') FROM  dual CONNECT BY  LEVEL <= 6 UNION ALL
SELECT 2, TIMESTAMP '2023-04-03 01:41:25' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM  dual CONNECT BY  LEVEL <= 7 UNION ALL
SELECT 3, TIMESTAMP '2023-04-23 21:31:25' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM  dual CONNECT BY  LEVEL <= 7 UNION ALL
SELECT 3, TIMESTAMP '2023-03-12 21:44:35' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM    dual
CONNECT BY  LEVEL <=20;

WITH t as (
    select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat 
    from purchases
)
,tt as (
    select t.*
    ,row_number() over (partition by CUSTOMER_ID order by dat) rn
    from t
)
select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
from tt
group by CUSTOMER_ID,dat-rn
having count(*) >= 10

Re: Join purchases to customers [message #687661 is a reply to message #687660] Mon, 01 May 2023 09:34 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Here is one method that just continues what you already have.

C##SCOTT@XE_21.3.0.0.0> WITH t as (
  2  	 select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat
  3  	 from purchases
  4  )
  5  ,tt as (
  6  	 select t.*
  7  	 ,row_number() over (partition by CUSTOMER_ID order by dat) rn
  8  	 from t
  9  )
 10  ,ttt as (
 11  select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
 12  from tt
 13  group by CUSTOMER_ID,dat-rn
 14  having count(*) >= 10
 15  )
 16  select c.customer_id, c.first_name, c.last_name,
 17  	    ttt.start_date, ttt.end_date, ttt.num_days
 18  from   customers c, ttt
 19  where  c.customer_id = ttt.customer_id
 20  /

CUSTOMER_ID FIRST LAST_NAME START_DATE           END_DATE               NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
          3 Lisa  Saladino  13-MAR-2023 00:00:00 01-APR-2023 00:00:00         20

1 row selected.

or eliminating some sub-queries:

C##SCOTT@XE_21.3.0.0.0> WITH p as (
  2  	 select distinct customer_id, trunc(purchase_date) dat,
  3  		dense_rank() over (partition by CUSTOMER_ID order by trunc(purchase_date)) rn
  4  	 from	purchases
  5  )
  6  select c.customer_id, c.first_name, c.last_name,
  7  	    min(p.dat) start_date, max(p.dat) end_date, count(*) num_days
  8  from   customers c, p
  9  where  c.customer_id = p.customer_id
 10  group  by c.customer_id, c.first_name, c.last_name, p.dat-p.rn
 11  having count(*) >=10
 12  /

CUSTOMER_ID FIRST LAST_NAME START_DATE           END_DATE               NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
          3 Lisa  Saladino  13-MAR-2023 00:00:00 01-APR-2023 00:00:00         20

1 row selected.

[Updated on: Mon, 01 May 2023 12:07]

Report message to a moderator

Re: Join purchases to customers [message #687662 is a reply to message #687661] Mon, 01 May 2023 12:25 Go to previous messageGo to next message
Unclefool
Messages: 82
Registered: August 2021
Member
Barbara thanks
Re: Join purchases to customers [message #687663 is a reply to message #687661] Mon, 01 May 2023 12:36 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
Match recognize solution:

select  m.customer_id,
        c.first_name,
        c.last_name,
        m.start_date,
        m.end_date,
        m.end_date - m.start_date + 1 num_days
  from  purchases
  match_recognize(
                  partition by customer_id
                  order by purchase_date
                  measures
                    trunc(first(purchase_date)) start_date,
                    trunc(last(purchase_date)) end_date
                  one row per match
                  pattern(p{10,})
                  define p as prev(purchase_date) is null or trunc(purchase_date) - trunc(prev(purchase_date)) = 1
                 ) m,
        customers c
  where c.customer_id = m.customer_id
/

CUSTOMER_ID FIRST LAST_NAME START_DATE           END_DATE               NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
          3 Lisa  Saladino  13-MAR-2023 00:00:00 01-APR-2023 00:00:00         20

SQL>
SY.
Re: Join purchases to customers [message #687664 is a reply to message #687663] Mon, 01 May 2023 13:41 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I have yet to try using match_recognize other than adapting some simple example.  To me it is kind of like French.  When I read it I can understand it.  However, if I am trying to write it, I have trouble thinking of the words or syntax.  I like that using match_recognize can reduce a lot of code with sub-queries to just one query.  Just for fun, I created some indexes and gathered some statistics, and ran explain plans to compare results.  I only included the indexes below that the optimizer chose to use.  It looks like it opted for compound indexes on all columns so that it can just scan the indexes without even scanning the tables in some cases.  It chose those over indexes on individual columns.  I found it even chose the index with the order_id that is not used in the queries over one without it.  The explain plans show that my initial reduction of sub-queries reduced the steps taken and the match_recognize did so even more.  I am presuming that better access and fewer steps would result in a faster query on a data set large enough to see a difference.


C##SCOTT@XE_21.3.0.0.0> -- addition of indexes and gathering of statistics for testing:
C##SCOTT@XE_21.3.0.0.0> CREATE INDEX c_idx ON customers (customer_id, last_name, first_name)
  2  /

Index created.

C##SCOTT@XE_21.3.0.0.0> CREATE INDEX p_idx ON purchases (customer_id, purchase_date, order_id)
  2  /

Index created.

C##SCOTT@XE_21.3.0.0.0> EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'CUSTOMERS')

PL/SQL procedure successfully completed.

C##SCOTT@XE_21.3.0.0.0> EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'PURCHASES')

PL/SQL procedure successfully completed.

C##SCOTT@XE_21.3.0.0.0> -- minimal modification by me:
C##SCOTT@XE_21.3.0.0.0> explain plan for
  2  WITH t as (
  3  	 select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat
  4  	 from purchases
  5  )
  6  ,tt as (
  7  	 select t.*
  8  	 ,row_number() over (partition by CUSTOMER_ID order by dat) rn
  9  	 from t
 10  )
 11  ,ttt as (
 12  select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
 13  from tt
 14  group by CUSTOMER_ID,dat-rn
 15  having count(*) >= 10
 16  )
 17  select c.customer_id, c.first_name, c.last_name,
 18  	    ttt.start_date, ttt.end_date, ttt.num_days
 19  from   customers c, ttt
 20  where  c.customer_id = ttt.customer_id
 21  /

Explained.

C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4164497899                                                                                                       
                                                                                                                                  
---------------------------------------------------------------------------------                                                 
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                                                 
---------------------------------------------------------------------------------                                                 
|   0 | SELECT STATEMENT        |       |     1 |    60 |     5  (60)| 00:00:01 |                                                 
|   1 |  NESTED LOOPS           |       |     1 |    60 |     5  (60)| 00:00:01 |                                                 
|   2 |   VIEW                  |       |     1 |    44 |     4  (75)| 00:00:01 |                                                 
|*  3 |    FILTER               |       |       |       |            |          |                                                 
|   4 |     HASH GROUP BY       |       |     1 |    22 |     4  (75)| 00:00:01 |                                                 
|   5 |      VIEW               |       |    40 |   880 |     3  (67)| 00:00:01 |                                                 
|   6 |       WINDOW SORT       |       |    40 |   360 |     3  (67)| 00:00:01 |                                                 
|   7 |        VIEW             |       |    40 |   360 |     2  (50)| 00:00:01 |                                                 
|   8 |         HASH UNIQUE     |       |    40 |   560 |     2  (50)| 00:00:01 |                                                 
|   9 |          INDEX FULL SCAN| P_IDX |    40 |   560 |     1   (0)| 00:00:01 |                                                 
|* 10 |   INDEX RANGE SCAN      | C_IDX |     1 |    16 |     1   (0)| 00:00:01 |                                                 
---------------------------------------------------------------------------------                                                 
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   3 - filter(COUNT(*)>=10)                                                                                                       
  10 - access("C"."CUSTOMER_ID"="TTT"."CUSTOMER_ID")                                                                              
                                                                                                                                  
Note                                                                                                                              
-----                                                                                                                             
   - this is an adaptive plan                                                                                                     

27 rows selected.

C##SCOTT@XE_21.3.0.0.0> -- fewer sub-queries by me:
C##SCOTT@XE_21.3.0.0.0> explain plan for
  2  WITH p as (
  3  	 select distinct customer_id, trunc(purchase_date) dat,
  4  		dense_rank() over (partition by CUSTOMER_ID order by trunc(purchase_date)) rn
  5  	 from	purchases
  6  )
  7  select c.customer_id, c.first_name, c.last_name,
  8  	    min(p.dat) start_date, max(p.dat) end_date, count(*) num_days
  9  from   customers c, p
 10  where  c.customer_id = p.customer_id
 11  group  by c.customer_id, c.first_name, c.last_name, p.dat-p.rn
 12  having count(*) >=10
 13  /

Explained.

C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3136216344                                                                                                       
                                                                                                                                  
-------------------------------------------------------------------------------                                                   
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                                                   
-------------------------------------------------------------------------------                                                   
|   0 | SELECT STATEMENT      |       |     1 |    38 |     5  (60)| 00:00:01 |                                                   
|*  1 |  FILTER               |       |       |       |            |          |                                                   
|   2 |   HASH GROUP BY       |       |     1 |    38 |     5  (60)| 00:00:01 |                                                   
|*  3 |    HASH JOIN          |       |    40 |  1520 |     4  (50)| 00:00:01 |                                                   
|   4 |     INDEX FULL SCAN   | C_IDX |     3 |    48 |     1   (0)| 00:00:01 |                                                   
|   5 |     VIEW              |       |    40 |   880 |     3  (67)| 00:00:01 |                                                   
|   6 |      HASH UNIQUE      |       |    40 |   560 |     3  (67)| 00:00:01 |                                                   
|   7 |       WINDOW SORT     |       |    40 |   560 |     3  (67)| 00:00:01 |                                                   
|   8 |        INDEX FULL SCAN| P_IDX |    40 |   560 |     1   (0)| 00:00:01 |                                                   
-------------------------------------------------------------------------------                                                   
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   1 - filter(COUNT(*)>=10)                                                                                                       
   3 - access("C"."CUSTOMER_ID"="P"."CUSTOMER_ID")                                                                                

21 rows selected.

C##SCOTT@XE_21.3.0.0.0> -- query using match_recognize by Solomon:
C##SCOTT@XE_21.3.0.0.0> explain plan for
  2  select  m.customer_id,
  3  	     c.first_name,
  4  	     c.last_name,
  5  	     m.start_date,
  6  	     m.end_date,
  7  	     m.end_date - m.start_date + 1 num_days
  8    from  purchases
  9    match_recognize(
 10  		       partition by customer_id
 11  		       order by purchase_date
 12  		       measures
 13  			 trunc(first(purchase_date)) start_date,
 14  			 trunc(last(purchase_date)) end_date
 15  		       one row per match
 16  		       pattern(p{10,})
 17  		       define p as prev(purchase_date) is null or trunc(purchase_date) - trunc(prev(purchase_date)) = 1
 18  		      ) m,
 19  	     customers c
 20    where c.customer_id = m.customer_id
 21  /

Explained.

C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2832019574                                                                                                       
                                                                                                                                  
------------------------------------------------------------------------------------------------------------------                
| Id  | Operation                                                | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                
------------------------------------------------------------------------------------------------------------------                
|   0 | SELECT STATEMENT                                         |       |    40 |  1640 |     2   (0)| 00:00:01 |                
|*  1 |  HASH JOIN                                               |       |    40 |  1640 |     2   (0)| 00:00:01 |                
|   2 |   INDEX FULL SCAN                                        | C_IDX |     3 |    48 |     1   (0)| 00:00:01 |                
|   3 |   VIEW                                                   |       |    40 |  1000 |     1   (0)| 00:00:01 |                
|   4 |    BUFFER SORT                                           |       |    40 |   560 |     1   (0)| 00:00:01 |                
|   5 |     MATCH RECOGNIZE BUFFER DETERMINISTIC FINITE AUTOMATON|       |    40 |   560 |     1   (0)| 00:00:01 |                
|   6 |      INDEX FULL SCAN                                     | P_IDX |    40 |   560 |     1   (0)| 00:00:01 |                
------------------------------------------------------------------------------------------------------------------                
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   1 - access("C"."CUSTOMER_ID"="M"."CUSTOMER_ID")                                                                                

18 rows selected.
Previous Topic: Group By (merged)
Next Topic: elapsed time for a id
Goto Forum:
  


Current Time: Thu Mar 28 15:15:33 CDT 2024