Page 1 of 1

LINQ Group By and Aggregates Rate Topic: -----

#1 andrewsw  Icon User is online

  • bin deployable
  • member icon

Reputation: 6283
  • View blog
  • Posts: 25,130
  • Joined: 12-December 12

Posted 26 February 2016 - 08:30 AM

Examples demonstrating how to use LINQ to group data and aggregate (Sum, Max, etc.) within the groups.

Here is the sample data I am using:

Attached Image

The corresponding SQL statement is this:
SELECT Office, Department, 
Sum(Salary) AS SumSalary, Max(Salary) AS MaxSalary
FROM tblStaff
GROUP BY Office, Department;


which produces the following resultset:

Attached Image

To obtain the same results using LINQ is not very obvious, and documentation and articles for this aren't great. Here are some useful links though:

Group By Clause
Aggregate Clause
Enumerable.GroupBy(Of TSource, TKey, TElement) Method (IEnumerable(Of TSource), Func(Of TSource, TKey), Func(Of TSource, TElement))
Enumerable.GroupBy(Of TSource, TKey, TElement, TResult) Method (IEnumerable(Of TSource), Func(Of TSource, TKey), Func(Of TSource, TElement), Func(Of TKey, IEnumerable(Of TElement), TResult))
LINQ Group by with multiple properties in VB.Net
Using GroupBy, Count and Sum in LINQ Lambda Expressions

(Finding information using VB, rather than C#, is also a little tricky.)

Using a Console Application, let's set up the data using a simple Staff class and List(Of Staff).
Module Module1

    Class Staff
        Property ID As Integer
        Property Office As String
        Property Department As String
        Property Salary As Decimal
    End Class

    Sub Main()
        Dim team As New List(Of Staff)

        team.Add(New Staff With {.ID = 101, .Office = "London", .Department = "Admin", .Salary = 24000})
        team.Add(New Staff With {.ID = 103, .Office = "London", .Department = "Admin", .Salary = 23500})
        team.Add(New Staff With {.ID = 104, .Office = "Leeds", .Department = "Admin", .Salary = 23500})
        team.Add(New Staff With {.ID = 107, .Office = "London", .Department = "Accounts", .Salary = 22900})
        team.Add(New Staff With {.ID = 109, .Office = "Leeds", .Department = "Accounts", .Salary = 22700})
        team.Add(New Staff With {.ID = 112, .Office = "Leeds", .Department = "Admin", .Salary = 22650})
        team.Add(New Staff With {.ID = 113, .Office = "London", .Department = "Admin", .Salary = 22000})
        team.Add(New Staff With {.ID = 120, .Office = "London", .Department = "Admin", .Salary = 22000})


I present 5 examples which essentially do the same thing (although version 3 is a simpler example of version 4). The first 2 use LINQ expressions (LINQ query operators), the remaining 3 use Enumerable Methods. You should choose the approach that you find the clearest, or is easiest to extend if you need to.

LINQ - Fluent and Query Expression - Is there any benefit(s) of one over other?
        Dim summary1 = From member In team
                       Group member By keys = New With {Key member.Office, Key member.Department}
                       Into Group
                       Select New With {.office = keys.Office, .dept = keys.Department, _
                                        .sum = Group.Sum(Function(x) x.Salary), _
                                        .max = Group.Max(Function(x) x.Salary)}

        For Each item In summary1
            Console.WriteLine("{0} {1} {2} {3}", item.office, item.dept, _
                              item.sum.ToString("c0"), item.max.ToString("c0"))
        Next


I'll attempt some notes, but confess that it is not the easiest process to describe.

Group member By keys = .. The identifier 'keys' is used to identify (to reference) the anonymous type that provides the keys, the groupings. If there were only a single grouping then you could write (for example) Group By OfficeName = member.Office.

It is a common mistake (and one that I fell for) to forget to include the word Key.

Into Group This is a query continuation. Essentially, it gathers the results to a single reference point, Group (a keyword), that can be used further in the query expression.

The Select statement then collects the required output into another anonymous type. The identifier 'keys' is used to obtain the names of each Office and Department, aggregates (Sum and Max) are available via the Group continuation.

The second example, which I prefer, uses an alias (gp) for the Group and applies aggregate functions to the Group in a single (Into) clause.
        Dim summary2 = From member In team
                       Group member By keys = New With {Key member.Office, Key member.Department}
                       Into gp = Group, sm = Sum(member.Salary), mx = Max(member.Salary)
                       Select New With {.office = keys.Office, .dept = keys.Department, _
                                        .sum = sm, .max = mx}

        For Each item In summary2
            Console.WriteLine("{0} {1} {2} {3}", item.office, item.dept, _
                              item.sum.ToString("c0"), item.max.ToString("c0"))
        Next


I cannot comment on any performance difference between the two versions, although my suspicion is that they will resolve to the same statement/ the same execution plan. You'll need to investigate this (or see if anyone comments below) if it concerns you.

Here is the first, simple, version using an Enumerable Method (GroupBy).
        'with single key
        Dim summary3 = team.GroupBy(Function(x) x.Office, Function(x) x.Salary, _
                                    Function(off, sal) New With {.key = off, .sumSal = sal.Sum()})

        For Each item In summary3
            Console.WriteLine("{0} {1}", item.key, item.sumSal.ToString("c0"))
        Next
        'London 114,400
        'Leeds 68,850


This uses a single key, a single grouping, on the Offices. I also just output the Sum but the Max could be included as well.

Notice that the third argument of (this version of) GroupBy refers to the two preceding arguments. The documentation uses the term resultSelector for this third argument, and it references the (first) keySelector and (second) elementSelector. My primitive description is:

  • the first argument specifies the key (or keys), the grouping
  • the second identifies the field (the elements) that will be aggregated
  • the third references the Group specified by the first two arguments, to identify the results, the output

The next version extends the previous example to group by more than one field, using an anonymous type to specify the keys.
        Dim summary4 = team.GroupBy(Function(x) New With {Key x.Office, Key x.Department},
                                    Function(x) x.Salary, _
                                    Function(keys, sal) New With {.key1 = keys.Office, _
                                                                  .key2 = keys.Department, _
                                                                  .sum = sal.Sum(), _
                                                                  .max = sal.Max()})

        For Each item In summary4
            Console.WriteLine("{0} {1} {2} {3}", item.key1, item.key2, _
                              item.sum.ToString("c0"), item.max.ToString("c0"))
        Next


The following final version uses GroupBy() and Select() separately; I also took the opportunity to demonstrate sorting the output using OrderBy() and ThenBy().
        Dim summary5 = team.GroupBy(Function(x) New With {Key x.Office, Key x.Department}) _
                       .Select(Function(y) New With {.office = y.Key.Office, _
                                                     .dept = y.Key.Department, _
                                                     .sum = y.Sum(Function(x) x.Salary), _
                                                     .max = y.Max(Function(x) x.Salary)}) _
                                             .OrderBy(Function(x) x.office) _
                                             .ThenBy(Function(x) x.dept)

        For Each item In summary5
            Console.WriteLine("{0} {1} {2} {3}", item.office, item.dept, _
                              item.sum.ToString("c0"), item.max.ToString("c0"))
        Next


This is the version I would turn to if I wanted to aggregate across more than one field (Salary). The two previous versions (or one of the other GroupBy overloads) could probably achieve this by introducing another anonymous type (for the elementSelector) but I'll leave this as an exercise.

Here is the full code in a spoiler:

Spoiler

This isn't the easiest process to describe (and get working). Hopefully, the examples are useful if you ever need to do this.

Is This A Good Question/Topic? 2
  • +

Page 1 of 1