5 Replies - 1886 Views - Last Post: 21 July 2018 - 01:51 AM

#1 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,271
  • Joined: 12-December 12

merge duplicates

Posted 29 June 2018 - 01:01 AM

In the screenshot I need to merge the duplicate dated rows into one, so the NULL value will become 64. (There are many more of these pairs.) I will then need to delete the redundant row. (If I give a dummy value -999 in the row to delete this should make the deletion easy.)

Posted Image

One issue I have is that the pairs don't appear in the same order, the NULL value might be second.

I found an example of a use of MERGE but I'm not sure that this will be an applicable approach?

MERGE prospects
USING
(
SELECT 
max(id) id,
companyname, 
max(address) address,
max(city) city,
max(zipcode) zipcode, 
max(total_cars) total_cars 
FROM prospects
GROUP BY companyname
) NewData ON prospects.id = NewData.id
WHEN MATCHED THEN 
UPDATE SET 
prospects.companyname = NewData.companyname, 
prospects.address = NewData.address,
prospects.city = NewData.city, 
prospects.zipcode = NewData.zipcode,
prospects.total_cars = NewData.total_cars 
WHEN NOT MATCHED BY SOURCE THEN DELETE;
GO


I am able to write a statement that will compare rows with their preceding value (which I am using to, for example, fill down values) but will this help if the duplicated rows could be in either order?

Is This A Good Question/Topic? 0
  • +

Replies To: merge duplicates

#2 cfoley   User is offline

  • Cabbage
  • member icon

Reputation: 2392
  • View blog
  • Posts: 5,025
  • Joined: 11-December 07

Re: merge duplicates

Posted 19 July 2018 - 03:24 PM

If I understand you;r question, then I think this will detect all the rows to delete, and put whichever part of the join has null to the alias a

SELECT a.*, b.*
FROM prospects a
JOIN prospects b ON a.Date = b.Date
WHERE a.HistoryId <> b.HistoryId
AND a.TargetPrice IS NULL


If so then this might delete those rows.

DELETE 
FROM prospects a
JOIN prospects b ON a.Date = b.Date
WHERE a.HistoryId <> b.HistoryId
AND a.TargetPrice IS NULL


Proceed with caution. I have no idea what I am doing.
Was This Post Helpful? 2
  • +
  • -

#3 modi123_1   User is online

  • Suitor #2
  • member icon



Reputation: 15356
  • View blog
  • Posts: 61,554
  • Joined: 12-June 08

Re: merge duplicates

Posted 19 July 2018 - 04:02 PM

Is it just on the date, or which columns need to match before merging?
Was This Post Helpful? 0
  • +
  • -

#4 andrewsw   User is offline

  • never lube your breaks
  • member icon

Reputation: 6823
  • View blog
  • Posts: 28,271
  • Joined: 12-December 12

Re: merge duplicates

Posted 20 July 2018 - 12:45 AM

Hello and thank you both.

I posted this a little while back and the specification changed slightly. Essentially, there could be more than one entry for a particular company on a particular day and the requirement is just to keep the most recent entry - merging of values is not necessary.

Because we have no time element I can use the HistoryId to determine the 'latest' entry. The following deletes records 'below' the latest entry:

DELETE FROM tblSome
    WHERE HistoryId NOT IN(SELECT Max(HistoryId) 
    FROM tblSome GROUP BY CompanyId, [Date])


Quote

Proceed with caution. I have no idea what I am doing.

Is that a Trumpism?
Was This Post Helpful? 1
  • +
  • -

#5 cfoley   User is offline

  • Cabbage
  • member icon

Reputation: 2392
  • View blog
  • Posts: 5,025
  • Joined: 11-December 07

Re: merge duplicates

Posted 20 July 2018 - 12:00 PM

It's more a reflection of my terror when running SQL against production systems.
Was This Post Helpful? 1
  • +
  • -

#6 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7500
  • View blog
  • Posts: 15,541
  • Joined: 16-October 07

Re: merge duplicates

Posted 21 July 2018 - 01:51 AM

Hmm... because I'm bored and because IN makes me twitch.

If these are the ones you want to keep:
SELECT CompanyId, [Date], Max(HistoryId) as HistoryId
  FROM tblSome
  GROUP BY CompanyId, [Date]



Then these are the ones you want to kill:
SELECT a.*
  FROM tblSome a
    LEFT JOIN (
      SELECT Max(HistoryId) as HistoryId
        FROM tblSome
        GROUP BY CompanyId, [Date]
    ) b ON a.HistoryId=b.HistoryId
  WHERE b.HistoryId IS NULL




And this is how they die:
DELETE tblSome
  FROM tblSome a
    LEFT JOIN (
      SELECT Max(HistoryId) as HistoryId
        FROM tblSome
        GROUP BY CompanyId, [Date]
    ) b ON a.HistoryId=b.HistoryId
  WHERE b.HistoryId IS NULL


Was This Post Helpful? 2
  • +
  • -

Page 1 of 1