Home » SQL & PL/SQL » SQL & PL/SQL » remove duplicate values by date (all)
remove duplicate values by date [message #666850] Tue, 28 November 2017 14:23 Go to next message
laredoeneko
Messages: 21
Registered: November 2017
Junior Member
Hello
I have a table in which a process inserts the status of some elements and records the insertion date for each execution. this procedure inserts one row per element 1 time a day, but sometimes the process fails and inserts on the same day 2 times the same register (with different values) being valid only the last one.
something like this:

Element status insert_date
a UP 11/27/2017 13:00
b UP 11/27/2017 13:00
c UP 11/27/2017 13:00
a UP 11/28/2017 13:00
b UP 11/28/2017 13:00
c UP 11/28/2017 13:00
a DOWN 11/28/2017 13:05
b UP 11/28/2017 13:05
c DOWN 11/28/2017 13:05

I wanna a query to obtain the last differents rows of the last execution, normally with sysdate -1 in query it is valid, but when the process fail the i obtain 2 execution for this days, but i wanna only the last execution
result:

a DOWN 11/28/2017 13:05
b UP 11/28/2017 13:05
c DOWN 11/28/2017 13:05

my table has around 4 millions rows day by day.
thank you very much for your help
Re: remove duplicate values by date [message #666852 is a reply to message #666850] Tue, 28 November 2017 14:59 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Please read and follow the forum guidelines, to enable us to help you:
OraFAQ Forum Guide
How to use {code} tags and make your code easier to read

>my table has around 4 millions rows day by day.
so you want only 3 rows selected out of the 4 millions rows for each day?

Re: remove duplicate values by date [message #666857 is a reply to message #666850] Tue, 28 November 2017 23:08 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
-- test data:
SCOTT@orcl_12.1.0.2.0> select * from your_table order by insert_date, element
  2  /

ELEMENT STATUS INSERT_DATE
------- ------ -------------------
a       UP     11/27/2017 13:00:00
b       UP     11/27/2017 13:00:00
c       UP     11/27/2017 13:00:00
a       UP     11/28/2017 13:00:00
b       UP     11/28/2017 13:00:00
c       UP     11/28/2017 13:00:00
a       DOWN   11/28/2017 13:05:00
b       UP     11/28/2017 13:05:00
c       DOWN   11/28/2017 13:05:00

9 rows selected.

-- query:
SCOTT@orcl_12.1.0.2.0> select element,
  2  	    max (status) keep (dense_rank last order by insert_date) status,
  3  	    max (insert_date) insert_date
  4  from   your_table
  5  group  by element
  6  order  by insert_date, element
  7  /

ELEMENT STATUS INSERT_DATE
------- ------ -------------------
a       DOWN   11/28/2017 13:05:00
b       UP     11/28/2017 13:05:00
c       DOWN   11/28/2017 13:05:00

3 rows selected.

Re: remove duplicate values by date [message #666869 is a reply to message #666852] Wed, 29 November 2017 14:10 Go to previous messageGo to next message
laredoeneko
Messages: 21
Registered: November 2017
Junior Member
this is only an example, i wanna obtain all values for the last execution, if in one day there are 5 execution, i only want the last one.
Re: remove duplicate values by date [message #666871 is a reply to message #666857] Wed, 29 November 2017 14:43 Go to previous messageGo to next message
laredoeneko
Messages: 21
Registered: November 2017
Junior Member
thank you barbara, works fine, but i have one question.
what is the differente between:
select element,
	    max (status) keep (dense_rank last order by insert_date) status,
 	    max (insert_date) insert_date
  from   your_table
  group  by element
 order  by insert_date, element
and this other:
select element,
	     status,
 	    max (insert_date) insert_date
  from   your_table
  group  by element
 order  by insert_date, element;
i think that in my case, the result is the same, i have the values for the last execution.
best regard
César

[Updated on: Wed, 29 November 2017 14:44]

Report message to a moderator

Re: remove duplicate values by date [message #666876 is a reply to message #666871] Wed, 29 November 2017 21:34 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
The second query that you posted should produce an error. Please post a copy and paste of what you actually ran.
Re: remove duplicate values by date [message #666878 is a reply to message #666876] Thu, 30 November 2017 02:30 Go to previous messageGo to next message
laredoeneko
Messages: 21
Registered: November 2017
Junior Member
you are right there was a mistake
SQL> set linesize 10000
SQL> r
  1  select element,
  2           status,
  3           max (insert_date) insert_date
  4    from   test
  5    group  by element,status
  6*  order  by insert_date, element

ELEME STATU INSERT_DATE
----- ----- ---------------------------------------------------------------------------
a     UP    28-NOV-17 01.00.00.000000 PM
c     UP    28-NOV-17 01.00.00.000000 PM
a     DOWN  28-NOV-17 01.05.00.000000 PM
b     UP    28-NOV-17 01.05.00.000000 PM
c     DOWN  28-NOV-17 01.05.00.000000 PM

SQL> 
this is not valid, i think that will be like this

SQL>  select * from test a
  2   where insert_date = ( select max(insert_date) from test b
  3          where a.element=b.element
  4          and insert_date>(sysdate -2
  5           ));

ELEME STATU INSERT_DATE
----- ----- ---------------------------------------------------------------------------
a     DOWN  28-NOV-17 01.05.00.000000 PM
b     UP    28-NOV-17 01.05.00.000000 PM
c     DOWN  28-NOV-17 01.05.00.000000 PM

SQL> 

but i donot know if you query is more efficient

  1   select * from test a
  2   where insert_date = ( select max(insert_date) from test b
  3          where a.element=b.element
  4          and insert_date>(sysdate -2
  5*          ))


Execution Plan
----------------------------------------------------------
Plan hash value: 4262546159

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |   222 |     6   (0)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL | TEST |     9 |   333 |     3   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |      |     1 |    25 |            |          |
|*  4 |    TABLE ACCESS FULL| TEST |     1 |    25 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("INSERT_DATE"= (SELECT MAX("INSERT_DATE") FROM "TEST" "B"
              WHERE "B"."ELEMENT"=:B1 AND "INSERT_DATE">SYSDATE@!-2))
   4 - filter("B"."ELEMENT"=:B1 AND "INSERT_DATE">SYSDATE@!-2)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         29  consistent gets
          0  physical reads
          0  redo size
        770  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL> 
SQL> select element,
  2          max (status) keep (dense_rank last order by insert_date) status,
  3           max (insert_date) insert_date
  4   from   test
  5  group  by element
  6  order  by insert_date, element
  7  /


Execution Plan
----------------------------------------------------------
Plan hash value: 2184325130

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |   148 |     5  (40)| 00:00:01 |
|   1 |  SORT ORDER BY      |      |     4 |   148 |     5  (40)| 00:00:01 |
|   2 |   SORT GROUP BY     |      |     4 |   148 |     5  (40)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST |     4 |   148 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        770  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL> 
Re: remove duplicate values by date [message #666882 is a reply to message #666878] Thu, 30 November 2017 04:08 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
If you are going to compare the queries, then you should use the same criteria in both. Yours limits the rows with "and insert_date>(sysdate -2)" and mine has an order by clause. You should limit both or neither and order both or neither for comparison. For a valid comparison, you would need to test multiple runs of both on the same large table and check the times and try to eliminate outside factors, such as network traffic. It appears you have tested on a small sample. I also don't know what indexes you may have. In general, you want to minimize full table scans and filters. The first run of each query will be slower than subsequent runs as Oracle will re-use what is in the SGA for subsequent runs.

Re: remove duplicate values by date [message #667026 is a reply to message #666882] Tue, 05 December 2017 07:10 Go to previous message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
why not stop it from ever happening?

create unique index test_u1 on test(element,status,trunc(insert_date));

You will only be allowed one entry for each element and Up/DOWN per date.
Previous Topic: SQL Query
Next Topic: Term Match with Jaro-Winkler logic
Goto Forum:
  


Current Time: Thu Mar 28 07:09:39 CDT 2024