Subscribe to A Programmer's Casual Blog

## An Interesting Revisit to basic algibra notation while working with DataSets

So... A DataSet is a place where you can store Tables of information, right? Well I was working on a project where I needed to find the difference between two tables (held in two different DataSets, so when I say DataSet, you may take it also to mean it's DataTable).

I was floundering on the issue, because, if you haven't worked with DataSets before, they're kind of a bitch. You need to specially use their .Copy() method to make a new one, because anytime you store a DataSet in a variable, it's actually just a reference to the same DataSet in memory. To me that's wicked bizarre, I mean, what if you made an int variable...

```int myInt = 5;

int secondInt = myInt;

secondInt = 3;

```

Yeah, and what if setting secondInt to 3 also made myInt return 3 as well!? That's crazy! But that's how DataSets are. DataSets are full of weird things like that which I I could probably fill up a blog post about, but we'll leave at that for now.

So anyway, I'm trying to take DataSet A and see what it has in its table which DataSet B doesn't have. I eventually figured out, you could express this problem in a mathematic notation --it's good to know all that high school math wasn't a complete waste =p

```A = a new DataSet that has the most recent datarow in it

B = an older DataSet that doesn't have the latest datarow in it

...

A = B + Additions - Deletions

...

```

This was a groundbreaking revelation for me! I got excited and then pooped. Then when I came back I pondered this question:

```In C#, how could I possibly solve for Additions across two DataSets?

```

After some searching, I found my way to:
(don't bother analyzing this, I just explain it)
```        public static DataSet GetNewRecords(DataSet dsOld, DataSet dsNew)  // dsOld would be B and dsNew A
{
DataSet ut = new DataSet();   // upload this

// having a primary key is the only way to get the merges to work.
dsOld.Tables[0].PrimaryKey = new DataColumn[] { dsOld.Tables[0].Columns["P_Id"] };
dsNew.Tables[0].PrimaryKey = new DataColumn[] { dsNew.Tables[0].Columns["P_Id"] };

DataSet delta = new DataSet();

delta.Merge(dsOld, true);
delta.AcceptChanges();
delta.Merge(dsNew, true);   // also try with true parameter

// this is where the algo gets impossibly weird.  Read the GetChanges Parameter, I'm getting all rows that didn't change...
DataSet changes = delta.GetChanges(DataRowState.Unchanged);
// this must work because the new rows of dsNew didn't get modified, since they didn't exist in dsOld, they were simply coppied (which doesn't count as changed?)

return changes; // returns records originally only in dsNew
}

```

OK, don't mind that C# code, basically all it can do is tell me if dsNew contained records that dsOld didn't contain. What about deletions? Well, it's screwy about deletions and so the returned dataset will contain additions and deletions...
Not very practical, because there's no way to tell deletions from additions based on the DataSet returned. But at least I made progress.

I wrote down in my comments, modifying my notation to be more clear:
```// dsNew =  dsOld + Additions - Deletions
//
// Additions = A.Merge(B); A.GetChanges(DataRowState.Unchanged);     {I thought this worked better at the time}
//
// Deletions = (dsNew - dsOld) - Additions
//

```

And so once I got to that last line I was like, "WOOT! I can solve for each individual factor of the expression to the right of the equal sign, therefor I can solve the left!"

```//   (dsNew - dsOld) != dsOld.Merge(dsNew); dsOld.GetChanges(DataRowState.Unchanged);!!???
//
//   (dsNew - dsOld) = dsAdditions + dsDeletions ??? wtf ???

```

I was wrong because, how the heck do you actually solve (dsNew - dsOld)? My merge method fails to work as I realized while making notes. If you have deletions they get returned in the return DataSet along with the additions =(

So I wrote in...
```// A - B = A.Minus(B)

```

Then I began writing an extension function that actually worked the way I wrote it above.

Here's my actual journal, which might be cool to look at on a blog:

At the bottom, instead of using a plain old equal sign, I decided to go with hash rocket since equals didn't feel right at the time.

Here's the code that eventuated:
```    public static class DataSetExtensions
{

public static DataSet Minus(this DataSet mainDataSet, DataSet subtractorDataSet)
{
DataSet Result = mainDataSet.Copy();

string formatterText = ""; // this is used to format the svl query on the dataset    eg "{0} = {1}"
string[] myColumnNamesAndVals = new string[subtractorDataSet.Tables[0].Columns.Count*2];
for (int i = 0; i < subtractorDataSet.Tables[0].Columns.Count*2; i = i + 2)
{
myColumnNamesAndVals[i] = subtractorDataSet.Tables[0].Columns[i/2].ColumnName;

formatterText += " {" + i + "} = {" + (i + 1) + "} AND";
}
formatterText = formatterText.Substring(1, formatterText.Length - 4);

foreach (DataRow subtractorRow in subtractorDataSet.Tables[0].Rows)
{
// select query:  index = 1, data = bla
List<string> nullDateOrCurrencyColumnNames = new List<string>();
bool doNulls = false;

for (int i = 0; i < subtractorDataSet.Tables[0].Columns.Count; i++)
{
string rowsType = subtractorDataSet.Tables[0].Columns[i].DataType.Name.ToLower();

bool skip = false;

string rowsDataAsString = "";

switch (rowsType)
{
case "autoincrement":
rowsDataAsString = Convert.ToString(subtractorRow[i]);
break;
case "datetime":
if (subtractorRow[i].GetType().Name == "DBNull")
{
//throw new Exception("You have a null date field in one of your databases, and I don't know what to do to select a null date filed in a dataset.  Sry =(");
// Alt:
//
//rowsDataAsString = "\"\"";
myColumnNamesAndVals[(i * 2)] = myColumnNamesAndVals[((i - 1) * 2)];
myColumnNamesAndVals[(i * 2) + 1] = myColumnNamesAndVals[((i - 1) * 2) + 1];
skip = true;

doNulls = true;
}
else
{
DateTime theDate = (DateTime)subtractorRow[i];
rowsDataAsString = "#" + theDate.ToShortDateString() + "#";
}
break;
case "int32":
rowsDataAsString = Convert.ToString(subtractorRow[i]);
break;
case "decimal":
rowsDataAsString = Convert.ToString(subtractorRow[i]);
if (rowsDataAsString == "")
{
myColumnNamesAndVals[(i * 2)] = myColumnNamesAndVals[((i - 1) * 2)];
myColumnNamesAndVals[(i * 2) + 1] = myColumnNamesAndVals[((i - 1) * 2) + 1];
skip = true;

doNulls = true;
}
break;
case "double":
rowsDataAsString = Convert.ToString(subtractorRow[i]);
break;
case "string":
rowsDataAsString = "'" + (string)subtractorRow[i] + "'";
break;
//case "yesno":
//        rowsDataAsString = Convert.ToString(subtractorRow[i]);
//        break;
case "boolean":
if (subtractorRow[i].GetType().Name == "DBNull")
{
break;
}
rowsDataAsString = Convert.ToString(subtractorRow[i]);
break;
default:
throw new Exception("unexpected datatype.  Add this type to the case switch in the Minus extention.");
}

if (!skip)
myColumnNamesAndVals[(i * 2) + 1] = rowsDataAsString;
}

string svl = string.Format(formatterText,   //  eg  "{0} = {1}, {2} = {3}"
myColumnNamesAndVals);                  //  eg  { "Index", "1", "Data", "bla" }

DataRow[] HitRows = Result.Tables[0].Select(svl);

foreach (DataRow rw in HitRows)
{
// I could check for nulls here manually, but that might be a sloppy method
if (doNulls)
foreach (string nullDate in nullDateOrCurrencyColumnNames)
{
if (rw[nullDate].GetType().Name == "DBNull")
{
rw.Delete();
break;
}
}
else
rw.Delete();
}
}
Result.AcceptChanges();

return Result;
}
}

```

Because of the way databases are, this function might throw an exception. I'm going to ask in the forums for some feedback and maybe make some corrections.

Anyway, that was my fun day writing a DataSet extension. All thoughts are welcome below =)

Here's another notebook image of a page that I used during testing. As you can see, I wrote some debugging notes down at the bottom.

### 0 Comments On This Entry

There are no Trackbacks for this entry

S M T W T F S
1234
567891011
12131415161718
1920212223 24 25
262728293031