Help Comparing Objects

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

36 Replies - 1153 Views - Last Post: 13 December 2018 - 08:29 PM

#31 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6765
  • View blog
  • Posts: 23,069
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 10:49 PM

I finally had a chance to look at the code in post #26. As you've seen from my conclusions in post #25, I don't really think you would need to do any reflection. Reflection is slow. Avoid it if you can. But if you really feel that you must do reflection, you can cache the results of GetProperty calls for later use. (e.g. lines 44-50) It's not like the types or property types of the objects being compared are going to change on the fly.

Calling Equals() on line 52 may accidentally force boxing/unboxing further slowing things down. The recommended way is to first check if the value implements IEquatable<T> or IComparable<T> and then use these interfaces instead of calling Object.Equals() right off the bat. Again the results of checking for support for IEquatable<T> or IComparable<T> can be cached for later use.
Was This Post Helpful? 0
  • +
  • -

#32 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6765
  • View blog
  • Posts: 23,069
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 11:20 PM

Here's my thoughts so far:

If we had M HighSchoolStudents, N MiddleSchoolStudents, and F Fields, on the surface it seems inevitable that you'll need to do MxN student comparisons, and each of those student comparisons will involve F fields, to get us MxNxF total comparisons. I think that this is part of what is making this look so complex and overwhelming.

For now let's say that those F field comparisons are a fixed cost O(1), so that leaves us MxN student comparisons which would be O(n2) complexity. So I'm thinking, what if that inevitable O(n2) complexity were operating on a much smaller n than our original M or N? What if as a pre-processing step, we picked the highest weight field (preferably a field that was always a straight on equality comparison and didn't need to do fuzzy comparisons), and then sorted both lists by a canonical value of that field.

For example, we picked SSN as something that absolutely must match, but we allow fuzzy matching. Let's say that the canonical form of a SSN is just all digits with the dashes stripped. So we basically run through all the items in both lists, compute the canonical SSN, and then sort by it. Now as we process all the HighSchoolStudents and we look at the canonical SSN, and we see that it starts with a 1, but the first MiddleSchoolStudent's canonical SSN starts with a 3, then we know that we have no hope of matching this particular HighSchoolStudent and we can move on to the next one. This also applies to ranges. If the HighSchoolStudent starts with a 3, we need to only compare the MiddleSchoolStudents that also start with 3. Once we hit the ones that start with 4, we've exhausted potential matches.

As a side note, databases are really good at this type of thing for trimming down the potential candidates. If you are allowed to create temporary database tables, then let the DB do the hardwork of doing the sorting and all you need to do ensure an index for that canonical field value exists, and ask for the data back sorted by that field. And better yet, you can add query criteria to get back an even smaller set.

Once you have a potential match via canonical values, you narrow down whether it is an exact match. Then you move on to compare the other fields of the potential match and either score a you go along, or discard the potential match because the field doesn't even match.

A note about the canonical values discussed above: If the field doesn't even have UseFuzzy enabled, then there isn't even a need to compute a canonical value. Simply sort by that field, and then you'll know when you got to the point were the chance of matching has definitely gone to zero.
Was This Post Helpful? 0
  • +
  • -

#33 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6765
  • View blog
  • Posts: 23,069
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 11:31 PM

To go further down that idea of reducing the number of records that need to be compared, it would be great if the 100% matches could be removed from the lists once that 100% match has been found. There is no point comparing against a student for a match if that student has already been classified as match.

And then originally, we said just consider the F comparisons to be a fixed cost. But if we can cause the comparisons to short circuit early, then not all F comparisons need to be done, thereby saving time.

And lastly, once data is sorted, you now have opportunities for parallel data processing since you have one thread checking all the SSNs that start with 1, another checking all that start with 2, etc.
Was This Post Helpful? 0
  • +
  • -

#34 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 12 December 2018 - 06:48 AM

The Fields from the Fields class will change on the fly depending on the group.

The HighSchoolStudent and MiddleSchoolStudent is just one of the objects we compare.

We then have other things such as HighSchoolClasses and MiddleSchoolClasses and then HighSchoolStaff and MiddleSchoolStaff and things like that and they are part of a different Group which will in turn have different fields as a possibility.

Maybe I could cache them all?

Also, I had not thought about using IEquatable or IComparable.

As for the (O)2 complexity, I was possibly thinking of using LINQ to join the two objects together. But the issue may arise if they are not equal with respect to characters for example SSN: 000-11-2222, 000112222

So maybe get both objects and then sort them by SSN or some other unique identifier and then look at the first character ranging from 0-9 or A-Z and put those in threads as you have stated.

That could work. And then anything that is a 100% match can be removed but still logged and then we only worry about the ones that are no match at all or have some matches but different punctuation etc...

Also, I took this code from an example I saw that did recursive comparison for complex types

private static bool IsPrimitiveType(Type type)
        {
            return
                 type == typeof(object) ||
                 type == typeof(String) ||
                 type == typeof(Char) ||
                 type == typeof(Boolean) ||
                 type == typeof(Byte) ||
                 type == typeof(Int16) ||
                 type == typeof(Int32) ||
                 type == typeof(Int64) ||
                 type == typeof(UInt16) ||
                 type == typeof(UInt32) ||
                 type == typeof(UInt64) ||
                 type == typeof(IntPtr) ||
                 type == typeof(Single) ||
                 type == typeof(Double) ||
                 type == typeof(DateTime) ||
                 type == typeof(Decimal);
        }

private static IList<string> GetDifferingProperties(object source, object target)
        {
            var sourceType = source.GetType();
            var sourceProperties = sourceType.GetProperties();
            var targetType = target.GetType();
            var targetProperties = targetType.GetProperties();

            var result = new List<string>();

            foreach (var property in
                (from s in sourceProperties
                 from t in targetProperties
                 where s.Name == t.Name &&
                 s.PropertyType == t.PropertyType &&
                 !Equals(s.GetValue(source, null), t.GetValue(target, null))
                 select new { Source = s, Target = t }))
            {
                // primitive types are defined above.
                // Up to us to decide on what is primitive or not
                if (IsPrimitiveType(property.Source.PropertyType))
                {
                    result.Add(property.Source.Name);
                }
                else
                {
                    foreach (var subProperty in GetDifferingProperties(
                        property.Source.GetValue(source, null),
                        property.Target.GetValue(target, null)))
                    {
                        result.Add(property.Source.Name + "." + subProperty);
                    }
                }
            }

            return result;
        }


Was This Post Helpful? 0
  • +
  • -

#35 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 12 December 2018 - 06:54 AM

What makes this really difficult is that I'm trying to make this as generic as possible because we have many other objects we will be comparing as I noted above. So while we compare the objects that can compare by using the SSN, we may other other objects that are being compared by SchoolName or EIN(Employer Identification Number) or Serial Number and so on.

I am trying to make it as generic as possible but I may have to do some method overloading for a few of these and do a generic compare class/method
Was This Post Helpful? 0
  • +
  • -

#36 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 13 December 2018 - 06:43 AM

Maybe someone can help me out with this? I am almost finished but I am dealing with the issue of sometimes one list from one object is larger than the list in another object
if(highSchoolStudents == null || middleSchoolStudents == null)
                        {
                            if(highSchoolStudents != null && middleSchoolStudents == null)
                            {
                                count = highSchoolStudents.Count();
                                for(int i = 0; i < count; i++)
                                {
                                    // loggingHistories is the object I am creating to log the differences/similarities, irrelevant for now
                                    loggingHistories = this.Compare(highschoolStudents[i], null, field );
                                }
                            }
                            else if(highSchoolStudents == null && middleSchoolStudents != null)
                            {
                                count = middleSchoolStudents.Count();
                                for(int i = 0; i < count; i++)
                                {
                                    loggingHistories = this.Compare(null, middleSchoolStudents[i], field);
                                }
                            }
                            else
                            {
                                disparityHistories = null;
                            }
                        }
                        else
                        {
                            if (highSchoolStudents.Count >= middleSchoolStudents.Count)
                                count = highSchoolStudents.Count();
                            else
                                count = middlSchoolStudents.Count();

                            // run into problems here when objects are different sizes.
                            for(int i = 0; i < count; i++)
                            {
                                loggingHistories = this.Compare(disparityFields, highSchoolStudents[i], middleSchoolStudents[i]);
                            }
                        }


        private IList<LoggingHistory> Compare(IList<DisparityField> disparityFields, object highSchool, object middleSchool)
        {
            IList<LoggingHistory> loggingHistories = new List<LoggingHistory>();
            bool y = false;
            foreach (var field in disparityFields)
            {
                if (field.Compare)
                {
                    // will change later to be more consistent since I've been using HighSchool first and MiddleSchool second.
                    y = CompareHelper.Compare(middleSchool, highSchool, field);
                }
                else
                {
                    continue;
                }
            }
            return loggingHistories;
        }



So once I use up count of one object but not the other, I overstep the index in one object and an exception is thrown
So one object has a size of 2 and the other a size of 4. Once I get to the third iteration, an exception is thrown because the smaller object no longer has anything.

This post has been edited by fearfulsc2: 13 December 2018 - 06:52 AM

Was This Post Helpful? 0
  • +
  • -

#37 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6765
  • View blog
  • Posts: 23,069
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 13 December 2018 - 08:29 PM

The MxN comparisons I originally mentioned implies that you would have a structure like
for i = 0..M-1
    for j = 0..N-1
        Compare listM[i] against listN[j]



Notice that it doesn't matter if M > N, M < N, M == N, M == 0, N == 0, or M == N == 0.
Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3