Page 1 of 1

Exploring a List Without Duplicates Rate Topic: -----

#1 andrewsw  Icon User is offline

  • Build your own boat!
  • member icon

Reputation: 6188
  • View blog
  • Posts: 24,625
  • Joined: 12-December 12

Posted 16 April 2016 - 05:15 AM

A Not Quite Perfect Set

I was exploring the possibility of creating a "set" for students. That is, a List that will not allow duplicate entries, where duplication is identified by some combination of properties. For this example (encouraged from a recent thread) the combination of FirstName, LastName and MobileNo identifies a duplicate.

A more direct way to achieve this might be to explore HashSet(T) or (from .NET 4) SortedSet(T). These should be preferred for performance reasons.



I am not proposing my approach as a perfect, or complete, solution. There are many ways to approach this, and much more that could be added. This is an exploration. It is a tutorial though (rather than a blog entry, which I considered) as there are a number of things that can be learnt from it, particularly concerning the use of an IEqualityComparer and INotifyPropertyChanged.

Some useful pages:

C#: LINQ and IEqualityComparer
IEquatable(T).Equals Method (T) (System)
Walkthrough: Creating Your Own Collection Class
Having fun with custom collections! - CodeProject

Collection - In Depth (C#)

There's a lot of C# at these links, I'm sure you could find equivalent VB versions if you need to and searched further.



I'm using the IEqualityComparer interface to specify the duplicated combination and then injecting this comparer into a (slightly customized) List Of Student. This approach was prompted from the MSDN page. I believe this is dependency injection; more specifically, constructor injection. So it might be possible to pursue these ideas to create a custom generic collection that behaves as a Set, where the comparer specifies which combination of properties identifies a duplicate for our current, particular, Set. For our example, we could create and use a different comparer and the code would still work. (This isn't something you would typically need to do with a list of "things" (entities), but it would demonstrate the principles.)

The code follows, it is a Console Application. The EqualityComparer supplies the Equals method which is then injected into the StudentList class. The overloaded (or shadowing) Add method of this class first calls Contains, using the comparer, to check if the new Student already exists in the list, according to the combination FirstName, LastName, MobileNo. If he/she already exists then an Exception is thrown; otherwise, the base-class Add method is called and the Student is added to the list.
Module Module1

    Sub Main()
        Dim comparer As New StudentEqualityComparer()
        Dim students As New StudentList(comparer)
        Dim person As Student = Nothing
        Try
            'students.Add(New Object())  'InvalidCastException

            person = New Student() With {.ID = 1, .FirstName = "Bob", .LastName = "Bones", .MobileNo = "34"}
            students.Add(person)
            person = New Student() With {.ID = 2, .FirstName = "Mary", .LastName = "Jones", .MobileNo = "37"}
            students.Add(person)
            person = New Student() With {.ID = 4, .FirstName = "Bob", .LastName = "Bones", .MobileNo = "34"}
            students.Add(person)
            'the following person won't be added either
            person = New Student() With {.ID = 7, .FirstName = "Terry", .LastName = "Lean", .MobileNo = "20"}
            students.Add(person)

        Catch ae As ArgumentException
            Console.WriteLine("{0} not added to list,", If(person Is Nothing, "-", person))
            Console.WriteLine("Reason: {0}", ae.Message)
        End Try

        Console.WriteLine("There are {0} students.", students.Count)

        Console.ReadKey()
    End Sub

    Class Student
        Property ID As Integer
        Property FirstName As String
        Property LastName As String
        Property MobileNo As String

        Public Overrides Function ToString() As String
            Return String.Format("{0}, {1} (M: {2})", Me.LastName, Me.FirstName, Me.MobileNo)
        End Function
    End Class

    Class StudentEqualityComparer
        Implements IEqualityComparer(Of Student)

        Public Function Equals(x As Student, y As Student) As Boolean Implements IEqualityComparer(Of Student).Equals
            If x Is Nothing AndAlso y Is Nothing Then
                Return True
            ElseIf x Is Nothing Or y Is Nothing Then
                Return False
            ElseIf x.FirstName = y.FirstName AndAlso x.LastName =
                y.LastName AndAlso x.MobileNo = y.MobileNo Then
                Return True
            Else
                Return False
            End If
        End Function

        Public Function GetHashCode(obj As Student) As Integer Implements IEqualityComparer(Of Student).GetHashCode
            Return (obj.FirstName & obj.LastName & obj.MobileNo).GetHashCode()
        End Function
    End Class

    Class StudentList
        Inherits List(Of Student)

        Private _comparer As IEqualityComparer(Of Student)

        Private Sub New()
            'hide default constructor
        End Sub

        Public Sub New(comparer As IEqualityComparer(Of Student))
            _comparer = comparer
        End Sub

        Public Overloads Sub Add(ByVal obj As Student)
            If MyBase.Contains(obj, _comparer) Then
                'Contains will use the supplied EqualityComparer
                Throw New ArgumentException("Student already exists.")
            Else
                MyBase.Add(obj)
            End If
        End Sub
    End Class
End Module


Notice that the whole sequence of new students is added in a single Try block, so it not only rejects the duplicate but all following students. In reality, students would either be added one by one, or a batch of new students might be wrapped in a transaction, so that if one fails (to be added) they all fail; that is, the transaction is rolled back.

This works, and I was happy with it. It then occurred to me that there is nothing to prevent a student record being edited so that it then became a duplicate. The code in the spoiler does just this.

Spoiler


To prevent this duplication I decided to implement INotifyPropertyChanged on the Student class, and print a message (or raise an Exception) if editing of a record creates a duplicate.

Basically, with INotifyPropertyChanged, each Student will notify (broadcast) to anyone who is listening that one of its properties has changed.

This is a better, event-driven, approach than writing code in an 'Edit' method to check all the existing records for a duplicate. Such a method would have to store all the revised details first, check for a duplicate, and decide what to do next if one is found. If we later want a slightly different edit method we'd have to reproduce a similar sequence. This will likely lead to inconsistencies (and errors). With an event-driven approach we can throw an Exception whenever any attempt is made that would create a duplicate record. The code that has created the duplicate (has attempted to..) can catch the Exception and do whatever it needs to, to get back on track: ask the user again, exit the current code, or even exit the application.

[A Console Application isn't ideal for this approach, it is possible for the application to exit before events have been processed.]

It is the list class (StudentList) that needs to check for duplicates, as it holds the list of entries, which is why I used AddHandler for every new Student added to the list. With more formal data binding this (individually adding handlers) shouldn't be necessary, as changes can propagate through the list (depending on the collection-type).

Here's the full code. Don't forget to add Imports System.ComponentModel at the beginning for INotifyPropertyChanged.
Imports System.ComponentModel   'INotifyPropertyChanged

Module Module1

    Sub Main()
        Dim comparer As New StudentEqualityComparer()
        Dim students As New StudentList(comparer)
        Dim person As Student = Nothing
        Try
            'students.Add(New Object())  'InvalidCastException

            person = New Student() With {.ID = 1, .FirstName = "Bob", .LastName = "Bones", .MobileNo = "34"}
            students.Add(person)
            person = New Student() With {.ID = 2, .FirstName = "Mary", .LastName = "Jones", .MobileNo = "37"}
            students.Add(person)
            person = New Student() With {.ID = 4, .FirstName = "Bob", .LastName = "Bones", .MobileNo = "34"}
            students.Add(person)
            'the following person won't be added either
            person = New Student() With {.ID = 7, .FirstName = "Terry", .LastName = "Lean", .MobileNo = "20"}
            students.Add(person)

        Catch ae As ArgumentException
            Console.WriteLine("{0} not added to list,", If(person Is Nothing, "-", person))
            Console.WriteLine("Reason: {0}", ae.Message)
        End Try

        Console.WriteLine("There are {0} students.", students.Count)

        'now add the two missing students again, with a different mobile
        person = New Student() With {.ID = 4, .FirstName = "Bob", .LastName = "Bones", .MobileNo = "300"}
        students.Add(person)
        person = New Student() With {.ID = 7, .FirstName = "Terry", .LastName = "Lean", .MobileNo = "20"}
        students.Add(person)

        Console.WriteLine("There are {0} students.", students.Count)        '4

        students.Single(Function(x) x.ID = 4).MobileNo = "34"

        Console.ReadKey()
    End Sub

    Class Student : Implements INotifyPropertyChanged

        Public Event PropertyChanged As PropertyChangedEventHandler Implements INotifyPropertyChanged.PropertyChanged

        Private _ID As Integer
        Private _FirstName As String
        Private _LastName As String
        Private _MobileNo As String

        Public Property ID() As Integer
            Get
                Return _ID
            End Get
            Set(ByVal value As Integer)
                If value <> _ID Then
                    _ID = value
                    NotifyPropertyChanged("ID")
                End If
            End Set
        End Property

        Public Property FirstName() As String
            Get
                Return _FirstName
            End Get
            Set(ByVal value As String)
                If value <> _FirstName Then
                    _FirstName = value
                    NotifyPropertyChanged("FirstName")
                End If
            End Set
        End Property

        Public Property LastName() As String
            Get
                Return _LastName
            End Get
            Set(ByVal value As String)
                If value <> _LastName Then
                    _LastName = value
                    NotifyPropertyChanged("LastName")
                End If
            End Set
        End Property

        Public Property MobileNo() As String
            Get
                Return _MobileNo
            End Get
            Set(ByVal value As String)
                If value <> _MobileNo Then
                    _MobileNo = value
                    NotifyPropertyChanged("MobileNo")
                End If
            End Set
        End Property

        Private Sub NotifyPropertyChanged(ByVal info As String)
            RaiseEvent PropertyChanged(Me, New PropertyChangedEventArgs(info))
        End Sub

        Public Overrides Function ToString() As String
            Return String.Format("{0}, {1} (M: {2})", Me.LastName, Me.FirstName, Me.MobileNo)
        End Function
    End Class

    Class StudentEqualityComparer : Implements IEqualityComparer(Of Student)

        Public Function Equals(x As Student, y As Student) As Boolean Implements IEqualityComparer(Of Student).Equals
            If x Is Nothing AndAlso y Is Nothing Then
                Return True
            ElseIf x Is Nothing Or y Is Nothing Then
                Return False
            ElseIf x.FirstName = y.FirstName AndAlso x.LastName =
                y.LastName AndAlso x.MobileNo = y.MobileNo Then
                Return True
            Else
                Return False
            End If
        End Function

        Public Function GetHashCode(obj As Student) As Integer Implements IEqualityComparer(Of Student).GetHashCode
            Return (obj.FirstName & obj.LastName & obj.MobileNo).GetHashCode()
        End Function
    End Class

    Class StudentList : Inherits List(Of Student)

        Private _comparer As IEqualityComparer(Of Student)

        Private Sub New()
            'hide default constructor
        End Sub

        Public Sub New(comparer As IEqualityComparer(Of Student))
            _comparer = comparer
        End Sub

        Public Overloads Sub Add(ByVal obj As Student)
            If MyBase.Contains(obj, _comparer) Then
                'Contains will use the supplied EqualityComparer
                Throw New ArgumentException("Student already exists.")
            Else
                MyBase.Add(obj)
                AddHandler obj.PropertyChanged, AddressOf CheckDuplicates
            End If
        End Sub

        Private Sub CheckDuplicates(sender As Object, e As PropertyChangedEventArgs)
            Dim current As Student = DirectCast(sender, Student)
            'Dim counter As Integer = MyBase.Where(Function(x)
            '                                          Return x.FirstName = current.FirstName AndAlso
            '                                            x.LastName = current.LastName AndAlso
            '                                            x.MobileNo = current.MobileNo
            '                                      End Function).Count()

            If MyBase.Distinct(_comparer).Count < MyBase.Count Then
                Debug.Print("With {0} {1} (M: {2}) there are now duplicates.",
                            current.FirstName, current.LastName, current.MobileNo)
                'or throw exception
            End If
        End Sub
    End Class
End Module


Apart from implementing INotifyPropertyChanged in the Student class, the significant changes are in the StudentList class. Firstly, in the Add method:
            MyBase.Add(obj)
            AddHandler obj.PropertyChanged, AddressOf CheckDuplicates


then the CheckDuplicates method:
        Private Sub CheckDuplicates(sender As Object, e As PropertyChangedEventArgs)
            Dim current As Student = DirectCast(sender, Student)
            'Dim counter As Integer = MyBase.Where(Function(x)
            '                                          Return x.FirstName = current.FirstName AndAlso
            '                                            x.LastName = current.LastName AndAlso
            '                                            x.MobileNo = current.MobileNo
            '                                      End Function).Count()

            If MyBase.Distinct(_comparer).Count < MyBase.Count Then
                Debug.Print("With {0} {1} (M: {2}) there are now duplicates.",
                            current.FirstName, current.LastName, current.MobileNo)
                'or throw exception
            End If
        End Sub


(The code still allows the duplication, I'm just printing a message to the Debug/Immediate Window to prove that duplication was discovered. In reality, an Exception would be thrown but, with a Console Application, the application may be exiting before the Exception can be sensibly handled.)

As you can see, I've commented out the use of the Where() enumerable method and am using Distinct() instead. The predicate in Where() is replicating the comparer that we've already built. This is not what we want. The Distinct method uses (re-uses) the comparer that we've injected to discover duplicates. If the number of distinct records is different to the total number of records then there must be some duplicates.

This is not quite perfect in my book. The use of Distinct (or GroupBy) is quite common for this type of exercise. However, we already know which record is being edited (current), so I'd rather we were just looking for duplicates for this particular Student, and stop when we've found one. This is possible, but requires a more extensive example (and more interfaces implemented) than this tutorial provides. (It is "beyond the scope of this tutorial", as they say.)

It also isn't perfect (or complete) because, although we can throw an Exception, we still haven't incorporated a way to reject the edits for the current Student that 'would' create a duplicate record. We can retrieve the ID to know which Student we are referring to, but we need to store and reinstate the previous details. I'll leave you to consider this. (Should we reject the one value that creates a duplicate, or all of the edited details?) In a 'real' application though, there would be a database or other persistent storage involved, and so recovery of the original details is likely to be easier.

I hope that you found this interesting, and that it encourages you to explore further.

Is This A Good Question/Topic? 0
  • +

Page 1 of 1