Help Comparing Objects

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

36 Replies - 1157 Views - Last Post: 13 December 2018 - 08:29 PM

#16 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 10 December 2018 - 04:27 PM

Wait... So but some miracle, the field names are going to exact matches? How does that happen? I can see some data having "LastName", while others may ha e "Surname", or "FamilyName", or "Last_Name".

Or are you also going to try to fuzzy match the field names as well? If the field names are manually set, then why not use the same object types instead of having this weird HighSchoolStudent being compared to a MiddleSchoolStudent?
Was This Post Helpful? 0
  • +
  • -

#17 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 10 December 2018 - 06:22 PM

It's something I have to follow and I'm at a blocker due to this.

I'll try to give it more perspective if possible to maybe make clearer on my side to explain.

public class HighSchoolStudent
{
 public string FirstName { get; set; }
 public string LastName { get; set; }
 public string Ssn {get; set; }
 public HighSchoolStudentAddress Address { get; set; }
}

public class MiddleSchoolStudent
{
 public string FirstName { get; set; }
 public string LastName { get; set; }
 public string Ssn {get; set; }
 public MiddleSchoolStudentAddress Address { get; set; }
}

public class DifferenceFields
{
 public string GroupCode { get; set; }
 public string FieldCode { get; set; }
 public string FieldName { get; set; }
 public bool Compare { get; set; }
}

public class DifferenceGroups
{
 public string GroupCode { get; set; }
 public string GroupName { get; set; }
}



This is a more basic example of what I am actually working with but I hope it will make my question clearer. Let's say the DifferencesFields class gets this set of data from the database
Student	  FirstName    First Name                1
Student	  LastName     Last Name                 1
Student	  Ssn          Social Security Number    1
Student	  Address      Student Address	         0




The goal is to Compare HighSchoolStudent and MiddleSchoolStudent by each of their properties to check for any mismatches/differences between the two objects based off the Fields data and whether we actually want to compare that data or not. The last row in the table shared shows that we want to ignore that comparison for Address while we want to check for all the others.

The HighSchoolStudent and MiddleSchoolStudent can both be different sizes or the same size.
I want to be able see which ones match and which ones are different. There's a more complex requirement wanting me to see if the Ssn(in this example at least) is equal to the other. For example, High School Student has Ssn 123456789
and Middle School Student has Ssn 123-45-6789. At first comparison, they are not equal to each other so they do not match, but we want to then do a second comparison to see if they match if we remove special characters. In this case, they would match. I would then give them a match rating of under 100% since it wasn't a perfect match from the start.

This post has been edited by Skydiver: 10 December 2018 - 09:20 PM
Reason for edit:: Put comparison table into code tags to preserve formatting.

Was This Post Helpful? 0
  • +
  • -

#18 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 10 December 2018 - 09:15 PM

I'll need to read that description a few more times to grok it.

In the meantime, the following questions and thoughts have popped into my head:

Does those special comparisons also apply to the names? E.g. Hyphenated vs. non-hyphenated? (e.g. Neville-Smith vs NevilleSmith) Spaces vs. Hyphens? (Neville-Smith vs Neville Smith) Spaces vs. no spaces? (Running Bear vs. RunningBear)? Apostrophe vs. no-apostrophe? (O'connor vs Oconnor)

(Did you know that Washington State DMV will only accept one word last names, or either hyphenated last names, names with periods, names with apostrophes, or names without spaces? In other words, they don't support names like "St. John", or "dela Cruz, but will accept "St.John" or "Delacruz" ?)

What about suffixes like "Junior" vs "Jr."?

Personally, I think for the various fields, you'll want to use the Levenshtein distance.
Was This Post Helpful? 0
  • +
  • -

#19 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 10 December 2018 - 09:19 PM

Yes, it'll apply to any field we want to compare and use Fuzzy comparisons. The Fields class will have a UseFuzzy property as well as a boolean value.

This post has been edited by fearfulsc2: 10 December 2018 - 09:19 PM

Was This Post Helpful? 0
  • +
  • -

#20 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 10 December 2018 - 09:34 PM

Okay, I'm beginning to wrap my head around the goals.

So assuming that the last row of your table had a "1" instead of of a "0", that means that you also want to be able to compare HighSchoolStudent.Address with MiddleSchoolStudent.Address even though they are HighSchoolStudentAddress and MiddleSchoolStudentAddress types, respectively, simply because both have fields name "Address". Is that correct?

If types can be dissimilar, does that mean you also support comparing birth dates as well if they were defined like:
class HighSchoolStudent
{
    DateTime BirthDate { get; set; }
}

class MiddleSchoolStudent
{
    string BirthDate { get; set; }
}



or

class HighSchoolStudent
{
    DateTime BirthDate { get; set; }
}

class CustomDate
{
    int Month { get; set; }
    int Day { get; set; }
    int Year { get; set; }
}

class MiddleSchoolStudent
{
    CustomDate BirthDate { get; set; }
}


Was This Post Helpful? 0
  • +
  • -

#21 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 10 December 2018 - 09:46 PM

For the most part, yes. If the last row had a 1 instead of a 0, you would have to compare that as well and see if the Month/Day/Year/Time matched up or how similar they may have been.

If there is huge difference like the year or month or day is wrong, then there would be no match.

But we are under the assumption that all the types will match 1-to-1 with the exception to things like Address.

We would then have to compare the inside of those objects and compare them as well.

And we are again under the assumption that the property names are the same such as "Address, City, State, Zip" and we will compare those to see if there is a match.
Was This Post Helpful? 0
  • +
  • -

#22 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 10 December 2018 - 10:15 PM

How much of the comparison logic do you need to implement vs. what can you callback to the provider of the data to perform the comparison for you and simply give you a match score? For example since you have no way of possibly knowing that the string BirthDate is in Russian while the custom date BirthDate is a class with 3 fields, are you allowed to call back your caller to either convert both to some canonical form; or to do the comparison for you these two birth dates?
Was This Post Helpful? 0
  • +
  • -

#23 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 10 December 2018 - 10:23 PM

Is there a threshold score for which it is not worth continuing to compare fields and simply declare a mismatch? For example, is the first name is a 85% match, and the last name is also an 85% match, do you still go check the SSN, or birthday? Is the score a probability of a match, or an absolute accuracy score?
Was This Post Helpful? 0
  • +
  • -

#24 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 10 December 2018 - 10:26 PM

We are going under the assumption that they will send us all the fields we require from them.

So we have an API that they can make a POST request to and send us an object that meets our specifications. We take that object and insert its data into our database. We have a separate scheme in our database where we have our own set of data.

We take the data from one schema and the data from the other and we compare them.

Thatís the basis of it. We will always know what we are receiving with respect to datatype and so forth, but we will need to see if we have some of all or none of the data in our own system and see if they match completely or partially or not at all.

And we have to compare each field as we need to know how many of the fields match or do not match. Eventually we will show the disparities on the front-end and the user who sees that information will then do something to make sure we update the data in our system so that they do match.
Was This Post Helpful? 0
  • +
  • -

#25 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 11:49 AM

Assuming that HighSchoolStudent is your authoritative database record, and MiddleSchoolStudent is your clients' uploaded data record, I'm coming to the following conclusions:
  • Reflection is not really needed to access the fields in HighSchoolStudent because this well known data that you control.
  • Reflection may not be needed to access the fieldss in MiddleSchoolStudent because it was you who imported the data into your candidate database, and you have to had created the ORM mapping between the MiddleSchoolStudent class and your candidate database.
  • Reflection can be replaced with a delegate that looks like T GetFieldValue<T>(object obj).
  • Even if you don't have an ORM for the candidate database, you could simply use a IDataReader and implement a lambda that looks like obj => ((IDataReader)obj)["fieldname"] and use it as the delegate for above.


Please correct me if I'm missing something.
Was This Post Helpful? 0
  • +
  • -

#26 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 11 December 2018 - 12:13 PM

That does make quite a bit of sense.

They are not linked yet so HighSchoolStudent and MiddleSchoolStudent are not mapped.

And I might still have to use reflection but maybe not.

The fields in the database(I guess I can change the values to meet the property names) are all lowercase

So things would look like this for the fields

Student firstname First Name 1 1 -> This one will have 80% match if they match after removing possible special characters
Student lastname Last Name 1 1 -> This one will have 80% match if they match after removing possible special characters
Student ssn Social # 1 1 -> This one will have 80% match if they match after removing possible special characters
Student address Address 1 0 -> This one will have a 0% match if they are not exact match

public class HighSchoolStudent
{
 public string FirstName { get; set; }
 public string LastName { get; set; }
 public string Ssn { get; set; }
 public int AddressId { get; set; }
 public HighSchoolStudentAddress Address { get; set; }
}

public class MiddleSchoolStudent
{
 public string FirstName { get; set; }
 public string LastName { get; set; }
 public string Ssn { get; set; }
 public int AddressId { get; set; }
 public MiddleSchoolStudentAddress Address { get; set; }
}


public class Fields
{
 public string GroupCode { get; set; }
 public string FieldCode { get; set; }
 public string FieldName { get; set; }
 public bool Compare { get; set; }
 public bool UseFuzzy { get; set; }
}

public class HighSchoolStudentAddress
{
 public int AddressId { get; set; }
 public string Street1 { get; set; }
 public string Street2 { get; set; }
 public string City { get; set; }
 public string State { get; set; }
 public string Zip { get; set; }
}


public class MiddleSchoolStudentAddress
{
 public int AddressId { get; set; }
 public string Street1 { get; set; }
 public string Street2 { get; set; }
 public string City { get; set; }
 public string State { get; set; }
 public string Zip { get; set; }
}




This is what I managed to do so far and you can help me with suggestions if you think it's better
int count = 0;
if(middleSchoolStudent.Count >= highSchoolStudent.Count)
 count = middleSchoolStudent.Count;
else
 count = highSchoolStudent.count

bool x = false; // just generic for now. Playing with this for a bit
for(int i = 0; i < count; i++)
{
 foreach(var field in fields)
 {
  if(field.Compare)
  {
   x = CompareHelper.Compare(middleSchoolStudent, highSchoolStudent, field);
   if(x)
    continue;
   else
   {
    if(field.UseFuzzy)
    {
      // did not implement yet
    }
   }
  }
  else
   continue;
 }
}





public class CompareHelper
{
  public CompareHelper() // default constructor, may use later for dependency injection
  {
  }
  
 public static bool Compare(object middleSchoolStudent, object highSchoolStudent, Field field)
        {
            bool result;

            var middleSchoolStudentObj = middleSchoolStudent.GetType();
            var highSchoolStudentObj = highSchoolStudent.GetType();


            var propertyName = field.FieldCode;
            var property1 = middleSchoolStudentObj .GetProperty(propertyName, BindingFlags.IgnoreCase | BindingFlags.Public | BindingFlags.Instance);
            var property2 = highSchoolStudentObj .GetProperty(propertyName, BindingFlags.IgnoreCase | BindingFlags.Public | BindingFlags.Instance);

            result = property1.GetValue(middleSchoolStudent).Equals(property2.GetValue(highSchoolStudent));
            //result = object.Equals(property1.GetValue(middleSchoolStudent), property2.GetValue(highSchoolStudent));            
            //var x = new ObjectsComparer.Comparer<object>();
            IEnumerable<Difference> differences;
            // var isEqual = x.Compare(middleSchoolStudent, highSchoolStudent, out differences);
            return result;
        }
}



This is what I'm thinking about so far. I am able to have my code determine if the objects are equal to each other or not, except for the objects within an object. I have not implemented the UseFuzzy part of it yet to see if there is a match after doing some character removals.

If you have any better suggestions, I am open to hearing it.

My other concern is when these objects have different sizes and how to make sure we match up the like sets of data.

This post has been edited by fearfulsc2: 11 December 2018 - 12:33 PM

Was This Post Helpful? 0
  • +
  • -

#27 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 07:12 PM

I'll have read that code later when I get to a bigger screen. I can't retain enough in my tired brain while scrolling up/down and left/right to read on my phone.

In the meantime, I had another two questions:
  • What happens to 100% matches? Are they removed from the pool of HighSchoolStudent and MiddleSchoolStudent? Or can more than one MiddleSchoolStudent have a 100% match with HighSchoolStudent?
  • Why the obsession about the list of of HighSchoolStudents having a different number of items than the list of MiddleSchoolStudents? You've mentioned it a couple of times. Is there some kind of requirement/constraint that you've not stated yet?

Was This Post Helpful? 0
  • +
  • -

#28 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 11 December 2018 - 08:25 PM

Okay, I'll give the full scope of the requirement since that should make the picture clearer I think.

Let's say our system has all these High School Students or none at all or a size less than or greater than the amount of Middle School Students our client will be sending over to us.

Our client sends us a list of Middle School Students and we put that in our database somewhere for the time-being. Eventually, we will need to put them all into our own system and integrate them in our High School System.

So now we have these two tables that have Students in both categories: High School and Middle School.

We then get that list of all Students from both systems and compare which ones are the same and which ones are new(not in our system)/similar(same student but maybe different punctuation/phone# etc..)

So we compare those Students and see which ones are 100% match. If they are 100% matched, they are put into our logging table saying that this student matches, and this is the value from both systems.

If the student does not exist at all in our system, our system logs it and shows that they have a value but we don't.

If the student exists in our system but is "fuzzy", then we log into our system saying that the student is a match but we have to make corrections so that this student matches exactly.

And on the front-end(which we don't need to worry about), the user will see how many differences are found for this one student, and we do that person student.

I hope that makes a little bit of sense. I might have explained it in a very confusing way.
Was This Post Helpful? 0
  • +
  • -

#29 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6774
  • View blog
  • Posts: 23,083
  • Joined: 05-May 12

Re: Help Comparing Objects

Posted 11 December 2018 - 09:18 PM

Do some fields have more weight than other fields? Let's say that address mattered (unlike the sample/scenario we'd been working with above). If two students names are nowhere close to each other (ex. "Richard Nixon" vs. "Barrack Obama"), but they had same address of "1700 Pennsylvania Ave.", would they still be considered a match? Or is the decision left to a human (or later some AI) to match -- all the code needs to do is note 0% name match, but 100% address match?

Am asking because I am wondering if there is a way to order the fields that are compared from most restrictive to loosest. It will speed up the comparisons of the two lists if we compare the most restrictive field first to quickly decide if it even worth comparing the other fields.
Was This Post Helpful? 0
  • +
  • -

#30 fearfulsc2   User is offline

  • D.I.C Head

Reputation: 15
  • View blog
  • Posts: 243
  • Joined: 25-May 16

Re: Help Comparing Objects

Posted 11 December 2018 - 10:18 PM

The fields class kinda does that but not really. The Fields class has an OrderBy property and it sorts the fieldnames in order by OrderBy.

And in this instance for the names, we would have a unique identifier such as social security number or something like that. But as of right now, we don't have anything saying to compare based off the unique values to see if any of those match first.
Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3