13 Replies - 355 Views - Last Post: 10 January 2019 - 05:59 AM

#1 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

delete earlier records on same day

Posted 09 January 2019 - 03:34 AM

I have a table of company recommendations. There can be more than one record/recommendation on the same day, but having earlier time elements. I want to remove the records from earlier in the day, keeping only the most recent (for that day).

What is wrong with my statement please? For a company there are 3 records on the same day (with different times) but when I execute the statement it only removes the first of these, leaving 2 records.

	-- remove duplicate dated entries in tblRecommendationsHistory
	-- keeping the most recent
	DELETE FROM tblRecommendationsHistory 
		WHERE [Date] NOT IN(SELECT Max([Date]) 
		FROM tblRecommendationsHistory GROUP BY companyid, CONVERT(Date, [Date]))


Is This A Good Question/Topic? 0
  • +

Replies To: delete earlier records on same day

#2 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 03:41 AM

In the image the circled item has been removed, but not the one following it.
(The amendment of values to NULL happens subsequently so is just a distraction.)

Posted Image
Was This Post Helpful? 0
  • +
  • -

#3 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 04:06 AM

I believe it is because I am ignoring the companyid when choosing to delete records based on time.

I think this is what I need:

	-- remove duplicate dated entries in tblRecommendationsHistory
	-- keeping the most recent
	;WITH latestDatesCTE AS (
		SELECT CompanyID, MAX([Date]) AS MaxDate FROM tblRecommendationsHistory 
			GROUP BY companyid, CONVERT(Date, [Date])
	)
	DELETE RH FROM tblRecommendationsHistory RH 
		INNER JOIN latestDatesCTE LD ON RH.CompanyID = LD.CompanyID
		WHERE (CONVERT(Date, RH.[Date]) = CONVERT(Date, LD.[MaxDate])) -- same day...
		AND (RH.[Date] <> LD.MaxDate) -- different time

Looks a bit clumsy though... I am sure there is a neater way to write this.
Was This Post Helpful? 0
  • +
  • -

#4 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7434
  • View blog
  • Posts: 15,411
  • Joined: 16-October 07

Re: delete earlier records on same day

Posted 09 January 2019 - 06:42 AM

View Postandrewsw, on 09 January 2019 - 06:06 AM, said:

I believe it is because I am ignoring the companyid when choosing to delete records based on time.

Bingo!

This should return all the ones you want to keep:
SELECT a.*
    FROM tblRecommendationsHistory a
        JOIN (SELECT companyid, Max([Date]) as [Date] FROM tblRecommendationsHistory) b
            ON a.companyid = b.companyid AND a.[Date] = b.[Date]



Conversely, this should return all the ones you want to kill:
SELECT a.*
    FROM tblRecommendationsHistory a
        LEFT JOIN (SELECT companyid, Max([Date]) as [Date] FROM tblRecommendationsHistory) b
            ON a.companyid = b.companyid AND a.[Date] = b.[Date]
    WHERE b.companyid IS NULL



If those are, indeed, the ones you want to kill, then this should kill them:
DELETE tblRecommendationsHistory
    FROM tblRecommendationsHistory a
        JOIN (
            SELECT a.companyid, a.[Date]
                FROM tblRecommendationsHistory a
                    LEFT JOIN (SELECT companyid, Max([Date]) as [Date] FROM tblRecommendationsHistory) b
                        ON a.companyid = b.companyid AND a.[Date] = b.[Date]
                WHERE b.companyid IS NULL
        ) b ON a.companyid = b.companyid AND a.[Date] = b.[Date]



It should be reasonably obvious that this will let you see what's getting killed and what's not:
SELECT 
    (case when b.companyid IS NULL then 'KILL' else 'KEEP' end) as State, a.*
    FROM tblRecommendationsHistory a
        LEFT JOIN (SELECT companyid, Max([Date]) as [Date] FROM tblRecommendationsHistory) b
            ON a.companyid = b.companyid AND a.[Date] = b.[Date]
    order by a.companyid, a.[Date] desc


Was This Post Helpful? 1
  • +
  • -

#5 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 06:57 AM

Thank you very much for your insights once again ;)

Although... those statements are incorrect as they stand because there is no GROUP BY clause to enable the display of companyid alongside Max().

Quote

Column 'tblRecommendationsHistory.CompanyID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I am doubtful, though, because it looks like they would work with the Max() per company, rather than for each company and day, so they might delete all rows for a company except one?
Was This Post Helpful? 0
  • +
  • -

#6 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2854
  • View blog
  • Posts: 11,194
  • Joined: 03-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 09:04 AM

We are dealing with a similar issue, maybe. How are your records being imported? Is it a direct add or an import functionality?
Was This Post Helpful? 0
  • +
  • -

#7 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 09:13 AM

Hello.

I am not certain about your question, perhaps you are indicating that there could be an index used to prevent duplicates being added?

Anyway, to respond to your question, I run an SP (which runs another SP and function for each company) that amasses all the data from several tables (and XML/ metadata ;)). There may have been more than one review on a date and, as the summary data is collated I need to, during the process, discard the additional rows (for a date and company).

I don't think I can/would use a multi-field unique index because I think the process will change at some point, either allowing or preventing reviews on the same day.
Was This Post Helpful? 0
  • +
  • -

#8 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2854
  • View blog
  • Posts: 11,194
  • Joined: 03-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 09:27 AM

In essences that was what I was asking. We have an issue where duplicate records are being import via a job that runs. To fix it we had to see if a record was already "in use" before allowing the import to claim any material. But, it sounds like a similar issue.
Was This Post Helpful? 0
  • +
  • -

#9 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 09:35 AM

Essentially, during the process I import data to a temporary table. I have just then introduced this additional step, and my statement above, to remove duplicates on the same day, before continuing with the processing.

I suppose it depends on your/the particular process as to whether it is better, or more convenient, to try and prevent the creation or insertion of duplicates, or to get rid of them at a convenient stage.



Ideally, if the duplicates - same day records - should not be permitted, then it is better to prevent them at the earliest point, going back to the unique index idea.
Was This Post Helpful? 0
  • +
  • -

#10 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2854
  • View blog
  • Posts: 11,194
  • Joined: 03-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 10:03 AM

We just refuse to import the duplicates into the system, but ours is for manufacturing processes. So, everything being made has to be tied to a Workorder. The issue we were having is, that the same products were being bound to multiple workorders when they shouldn't be. Part of it is retraining, and the fail-safe is to not allow the system to do so and to alert someone that they need to fix their shit.
Was This Post Helpful? 0
  • +
  • -

#11 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 09 January 2019 - 10:19 AM

Yes, if people fix their shit at the outset then we wouldn't have to jump through hoops ;)
Was This Post Helpful? 0
  • +
  • -

#12 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7434
  • View blog
  • Posts: 15,411
  • Joined: 16-October 07

Re: delete earlier records on same day

Posted 09 January 2019 - 10:50 AM

View Postandrewsw, on 09 January 2019 - 08:57 AM, said:

Although... those statements are incorrect as they stand because there is no GROUP BY clause to enable the display of companyid alongside Max().

You are correct. Sorry about that.

Ok, quick demo:
create table #TtId (
  TtId int identity(1,1) not null primary key,
  EventId int not null,
  EventDt datetime
)

insert into #TtId(EventId) values (1)
insert into #TtId(EventId) values (2)
insert into #TtId(EventId) values (3)

insert into #TtId(EventId) select EventId from #TtId
insert into #TtId(EventId) select EventId from #TtId

update #TtId set EventDt = '1/1/2001'
update #TtId set EventDt = EventDt + TtId

print 'table state'
select * from #TtId 

print 'keep chuck list'
SELECT
    (case when b.EventId IS NULL then 'KILL' else 'KEEP' end) as State, a.*
    FROM #TtId a
        LEFT JOIN (select EventId, max(EventDt) as EventDt from #TtId group by EventId) b
            ON a.EventId = b.EventId AND a.EventDt = b.EventDt
    order by a.EventId, a.EventDt desc

print 'chuck list list'
SELECT a.EventId, a.EventDt
    FROM #TtId a
        LEFT JOIN (select EventId, max(EventDt) as EventDt from #TtId group by EventId) b
            ON a.EventId = b.EventId AND a.EventDt = b.EventDt
    WHERE b.EventId IS NULL

print 'kill it'
delete #TtId
  from #TtId a
    join (
      SELECT a.EventId, a.EventDt
          FROM #TtId a
              LEFT JOIN (select EventId, max(EventDt) as EventDt from #TtId group by EventId) b
                  ON a.EventId = b.EventId AND a.EventDt = b.EventDt
          WHERE b.EventId IS NULL
      ) b ON a.EventId = b.EventId AND a.EventDt = b.EventDt

print 'table state 2'
select * from #TtId 



Results:
(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(3 row(s) affected)

(6 row(s) affected)

(12 row(s) affected)

(12 row(s) affected)
table state
TtId        EventId     EventDt
----------- ----------- -----------------------
1           1           2001-01-02 00:00:00.000
2           2           2001-01-03 00:00:00.000
3           3           2001-01-04 00:00:00.000
4           1           2001-01-05 00:00:00.000
5           2           2001-01-06 00:00:00.000
6           3           2001-01-07 00:00:00.000
7           1           2001-01-08 00:00:00.000
8           2           2001-01-09 00:00:00.000
9           3           2001-01-10 00:00:00.000
10          1           2001-01-11 00:00:00.000
11          2           2001-01-12 00:00:00.000
12          3           2001-01-13 00:00:00.000

(12 row(s) affected)

keep chuck list
State TtId        EventId     EventDt
----- ----------- ----------- -----------------------
KEEP  10          1           2001-01-11 00:00:00.000
KILL  7           1           2001-01-08 00:00:00.000
KILL  4           1           2001-01-05 00:00:00.000
KILL  1           1           2001-01-02 00:00:00.000
KEEP  11          2           2001-01-12 00:00:00.000
KILL  8           2           2001-01-09 00:00:00.000
KILL  5           2           2001-01-06 00:00:00.000
KILL  2           2           2001-01-03 00:00:00.000
KEEP  12          3           2001-01-13 00:00:00.000
KILL  9           3           2001-01-10 00:00:00.000
KILL  6           3           2001-01-07 00:00:00.000
KILL  3           3           2001-01-04 00:00:00.000

(12 row(s) affected)

chuck list list
EventId     EventDt
----------- -----------------------
1           2001-01-02 00:00:00.000
2           2001-01-03 00:00:00.000
3           2001-01-04 00:00:00.000
1           2001-01-05 00:00:00.000
2           2001-01-06 00:00:00.000
3           2001-01-07 00:00:00.000
1           2001-01-08 00:00:00.000
2           2001-01-09 00:00:00.000
3           2001-01-10 00:00:00.000

(9 row(s) affected)

kill it

(9 row(s) affected)
table state 2
TtId        EventId     EventDt
----------- ----------- -----------------------
10          1           2001-01-11 00:00:00.000
11          2           2001-01-12 00:00:00.000
12          3           2001-01-13 00:00:00.000

(3 row(s) affected)


Was This Post Helpful? 0
  • +
  • -

#13 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6796
  • View blog
  • Posts: 28,083
  • Joined: 12-December 12

Re: delete earlier records on same day

Posted 10 January 2019 - 02:12 AM

Thank you again. Erm, still not correct though ;) although all the ingredients are there (cushioning ;)).

That is keeping the latest record for each company. We end up with a row for each company.
I want to keep all the records... except where there is more than one record for a particular company on the same day. (Yes, a little tricky to describe; see my earlier screenshot which is for 1 particular company.)

Try this simplification:

CompanyId Day
1         Mon 1pm
1         Mon 2pm
1         Mon 3.15pm
1         Tue
1         Wed
2         Mon
2         Wed

For company 1 I keep Tue and Wed's records, but only keep the 3.15pm record on Mon. I keep all of company 2's records.

My solution posted above is this
-- remove duplicate dated entries in tblRecommendationsHistory
-- keeping the most recent
;WITH latestDatesCTE AS (
	SELECT CompanyID, MAX([Date]) AS MaxDate FROM tblRecommendationsHistory 
		GROUP BY companyid, CONVERT(Date, [Date])
)
DELETE RH FROM tblRecommendationsHistory RH 
	INNER JOIN latestDatesCTE LD ON RH.CompanyID = LD.CompanyID
	WHERE (CONVERT(Date, RH.[Date]) = CONVERT(Date, LD.[MaxDate])) -- same day...
	AND (RH.[Date] <> LD.MaxDate) -- different time

It is the grouping by CONVERT(Date, [Date]) that is significant. This groups by the date, without time, then finds the maximum datetime value within each group - for each day. The where clause is matching records by day, but discounting times that aren't the max.
Was This Post Helpful? 0
  • +
  • -

#14 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7434
  • View blog
  • Posts: 15,411
  • Joined: 16-October 07

Re: delete earlier records on same day

Posted 10 January 2019 - 05:59 AM

Ah! Ok. You're not keying off of just date. Representative sample data really helps.

So, to my understanding:
create table #CompanyDay (
  CompanyId int not null,
  Dt datetime not null
)

insert into #CompanyDay
  values (1, '2001-01-07 13:00'),
  (1, '2001-01-07 14:00'),
  (1, '2001-01-07 15:15'),
  (1, '2001-01-08 12:00'),
  (1, '2001-01-09 12:00'),
  (2, '2001-01-07 12:00'),
  (2, '2001-01-09 12:00')

print 'data with drops flagged'
select a.CompanyId, a.Dt, convert(date, a.Dt)as [Day],
    (select count(*) from #CompanyDay b where b.CompanyId=a.CompanyId and convert(date, b.Dt)=convert(date, a.Dt) and b.Dt>a.Dt) as KillMe
  from #CompanyDay a
  order by a.CompanyId, a.Dt desc

print 'killem'
delete #CompanyDay
  from #CompanyDay a
    join(
      select a.CompanyId, a.Dt
      from #CompanyDay a
      where (select count(*) from #CompanyDay b where b.CompanyId=a.CompanyId and convert(date, b.Dt)=convert(date, a.Dt) and b.Dt>a.Dt)<>0
      ) b on b.CompanyId=a.CompanyId and b.Dt=a.Dt

print 'survivors'
select * from #CompanyDay a order by a.CompanyId, a.Dt desc



Result:
data with drops flagged
CompanyId   Dt                      Day        KillMe
----------- ----------------------- ---------- -----------
1           2001-01-09 12:00:00.000 2001-01-09 0
1           2001-01-08 12:00:00.000 2001-01-08 0
1           2001-01-07 15:15:00.000 2001-01-07 0
1           2001-01-07 14:00:00.000 2001-01-07 1
1           2001-01-07 13:00:00.000 2001-01-07 2
2           2001-01-09 12:00:00.000 2001-01-09 0
2           2001-01-07 12:00:00.000 2001-01-07 0

(7 row(s) affected)

killem

(2 row(s) affected)
survivors
CompanyId   Dt
----------- -----------------------
1           2001-01-09 12:00:00.000
1           2001-01-08 12:00:00.000
1           2001-01-07 15:15:00.000
2           2001-01-09 12:00:00.000
2           2001-01-07 12:00:00.000

(5 row(s) affected)



While that one is pretty simple, a group by is usually more efficient, so:
print 'to keep'
select a.CompanyId, max(a.Dt) from #CompanyDay a group by a.CompanyId, convert(date, a.Dt)

print 'to kill'
select a.*
  from #CompanyDay a
    left join (select a.CompanyId, max(a.Dt) as Dt from #CompanyDay a group by a.CompanyId, convert(date, a.Dt)) b
      on b.CompanyId=a.CompanyId and b.Dt=a.Dt
  where b.CompanyId is null


print 'killem'
delete #CompanyDay
  from #CompanyDay a
    join(
      select a.CompanyId, a.Dt
        from #CompanyDay a
          left join (select a.CompanyId, max(a.Dt) as Dt from #CompanyDay a group by a.CompanyId, convert(date, a.Dt)) b
            on b.CompanyId=a.CompanyId and b.Dt=a.Dt
        where b.CompanyId is null
      ) b on b.CompanyId=a.CompanyId and b.Dt=a.Dt

print 'survivors'
select * from #CompanyDay a order by a.CompanyId, a.Dt desc



Result:
to keep
CompanyId   
----------- -----------------------
1           2001-01-07 15:15:00.000
2           2001-01-07 12:00:00.000
1           2001-01-08 12:00:00.000
1           2001-01-09 12:00:00.000
2           2001-01-09 12:00:00.000

(5 row(s) affected)

to kill
CompanyId   Dt
----------- -----------------------
1           2001-01-07 13:00:00.000
1           2001-01-07 14:00:00.000

(2 row(s) affected)

killem

(2 row(s) affected)
survivors
CompanyId   Dt
----------- -----------------------
1           2001-01-09 12:00:00.000
1           2001-01-08 12:00:00.000
1           2001-01-07 15:15:00.000
2           2001-01-09 12:00:00.000
2           2001-01-07 12:00:00.000

(5 row(s) affected)


Was This Post Helpful? 1
  • +
  • -

Page 1 of 1