Page 1 of 1

LINQ by Example 4: Query Expression Syntax

#1 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3475
  • View blog
  • Posts: 11,813
  • Joined: 12-December 12

Posted 30 August 2014 - 08:18 AM

This part of my tutorial-sequence introduces query expression syntax (from.. where..) by means of examples that still use the same core-data and code-template from Part 1.

Other tutorials in this sequence:

LINQ by Example 1: Enumerable Methods A-L
LINQ by Example 2: Enumerable Methods M-Z
LINQ by Example 3: Methods Using IEqualityComparer
LINQ by Example 5: LINQ to XML Querying

I want to stress that this is an introduction to this subject, by means of fairly straight-forward examples. Query expressions are more complex than the enumerable methods (query operators) that they are converted into. You should follow-up this tutorial with further study.

Microsoft links:

LINQ Query Expressions (C# Programming Guide) :MSDN
Query Expression Basics (C# Programming Guide) :MSDN

This link maps enumerable methods to expression syntax:
Query Expression Syntax for Standard Query Operators :MSDN

Query Keywords (C# Reference) :MSDN

This last link provides more examples for each of the query-clauses, so is a reasonable next-step after completing this tutorial.

A word about our sponsor:
Spoiler

Microsoft recommmends (from the first link above):

MS said:

As a rule when you write LINQ queries, we recommend that you use query syntax whenever possible and method syntax whenever necessary. There is no semantic or performance difference between the two different forms. Query expressions are often more readable than equivalent expressions written in method syntax.

I agree that query expressions are more readable, and more expressive. However, for most people, the methods are easier to use and to understand.




It is not usually the methods themselves that people struggle with, but the lambdas. This is not specific to either LINQ or .NET, but a common stumbling block in most programming languages (that have them). If this describes you then please be reassured: they ARE tricky to get your head around, but invaluable once you've grasped them. [I like to think of them as a 'projection', and the C# syntax '=>' helps with this notion.]




The ideal, as implied in the Microsoft statement, is to be comfortable with both forms.

Query Expression Clauses

The clauses and keywords used in query expressions are demonstrated in the sections that follow this one, but I've gathered them here for reference.

In brackets are (contextual, keywords) which are also listed afterwards.

from clause - A query expression must begin with a from clause. (in, into)
let clause
join clause (in, into, on, equals)
where clause
orderby clause (ascending, descending)
group clause (by)
select clause - A query expression must terminate with either a select clause or a group clause. (into)

Query expressions must begin with from and end with either a select or group clause. Other than this, there is not a strict order that the clauses must follow. This is unlike SQL statements, which have a clearly-defined order.

A picture:
Spoiler

Contextual keywords

ascending/ descending, by, equals, in, into, on

The into contextual keyword can be used to create a temporary identifier to store the results of a group, join or select clause into a new identifier.




The Tutorial

Simple Sequence
Simple Sequence, Where
Ordering (Sorting)
Let's Create a New Range Variable
Sequence With Inner Sequence
Independent Sequences
Grouping
Joining
Into

Rather than providing examples, and sections, based on each of the query-clauses, I start with simple queries and proceed to more sophisticated examples. In the process, though, the clauses still appear more-or-less in this order:

from, select, where, orderby, let, group, join (and the into keyword)

I recommend that you study at least the first three sections (preferably all of them) in order, as they contain important descriptions and definitions that are assumed in the remaining sections.

Simple Sequence
    var simpleFrom = from star in rainbow
                     select star;

    foreach (string item in simpleFrom) {
        Console.WriteLine("{0}, ", item);
    }
    // Rod, Jane, Freddy, Bungle, Zippy, George,


As before, you can copy this code into the code-template from Part 1 of this tutorial.

simpleFrom is a query variable, sometimes just called a query. Note that it does not store the results of executing the query expression, it stores the query itself. The results are only obtained when iterating them using a foreach statement (cf. lazy evaluation).

star is a range variable. It represents each successive element in the source sequence. rainbow is the data source, or sequence. star is only in scope (exists) within the query; effectively, it is a local variable (local to the query).

select star provides our output, which is another sequence. In this case it is a sequence of the same type of objects that are contained in the data source (the rainbow array). (For this very simple example this new sequence happens to contain the same number of elements as the original rainbow array - a where clause could reduce this number.)




The Microsoft pages state that the output could be a singleton (a scalar) value, such as a count of the number of elements in the source. I think this is misleading:
    int countEm = (from member in staff select member).Count();


The end result might be an integer, after using Count(), but the query expression itself still results in a sequence.

[If someone provides an example of a query expression that returns a singleton value then I'll happily correct this part of the tutorial.]



    var simpleFromStaffNames = from member in staff
                               select member.name;

    foreach (string name in simpleFromStaffNames) {
        Console.WriteLine("{0}", name);
    }


The output in this case is a sequence of strings, the names of the staff-members. Because the type is known we could use IEnumerable<string> instead of var.

Similarly, if we weren't using an array of anonymous-types, but had instead created a Staff class, then we could be using Enumerable<Staff> for many of our queries.

In this example we cannot use select member.name, member.salary; to return these two pairs of values. Instead, the select clause can be used to transform source data into sequences of new types. This transformation is also named a projection. In the following example, the select clause projects a sequence of anonymous types which contains only a subset of the fields in the original element.
    var simpleFromStaff = from member in staff
                          select new { member.name, member.salary };

    foreach (var item in simpleFromStaff) {
        Console.WriteLine("{0} {1:C0}", item.name, item.salary);
    }


This could also be written:
    var simpleFromStaff = from member in staff
                          select new { Name = member.name, Salary = member.salary };

    foreach (var item in simpleFromStaff) {
        Console.WriteLine("{0} {1:C0}", item.Name, item.Salary);
    }


which is comparable to providing column-aliases in SQL statements.

If you were to do this:
    var justTheStaff = from member in staff select member;
    // (this is pretty pointless without a 'where' clause to reduce 
    // the number of records (elements) returned)


then this is equivalent to a SQL statement of select * from sometable;. Neither is recommended. It is preferable that you return only the fields (properties) that you need. It is therefore very common to select (to project) an anonymous type.

Note that I haven't used anonymous types for a couple of the simpler staff-examples that follow; it might be a useful exercise for you to modify these examples so that they do select anonymous types.

Simple Sequence, Where

MSDN said:

The where clause is used in a query expression to specify which elements from the data source will be returned in the query expression. It applies a Boolean condition (predicate) to each source element (referenced by the range variable) and returns those for which the specified condition is true. A single query expression may contain multiple where clauses and a single clause may contain multiple predicate subexpressions.

where clause (C# Reference) :MSDN

In simpler terms, the where clause reduces the number of elements returned from the data source. Any elements that don't meet the where condition are simply skipped.
    var simpleFromWhere = from num in nos
                          where num > 30
                          select num;

    foreach (int i in simpleFromWhere) {
        Console.Write(i + ", ");
    }
    // 34, 40, 31, 32, 34, 40, 50,


Here's another example using '&&':
    var simpleFromWhereStaff = from member in staff
                               where member.name.Length > 10 && member.salary != 20000
                               select member;

    foreach (var item in simpleFromWhereStaff) {
        Console.WriteLine("{0} {1:C0}", item.name, item.salary);
    }


Ordering (Sorting)

orderby is the simplest clause. We can sort by a number of different columns, using either the ascending or descending keyword. If sorting by more than one column then each is separated by a comma, and they are sorted from left to right. Here are some examples:
    var orderNos = from x in nos
                   orderby x descending
                   select x;

    foreach (int x in orderNos) {
        Console.WriteLine("{0}, ", x);
    }
    // 50, 40, 40, 34, 34, 32, 31, 22, 22, 21, 17, 10,

    var orderByLen = from name in rainbow
                     orderby name.Length
                     select name;

    foreach (string name in orderByLen) {
        Console.WriteLine(name);
    }
    //Rod
    //Jane
    //Zippy
    //Freddy
    //Bungle
    //George

    var orderStaff = from member in staff
                     orderby member.salary descending, member.name ascending
                     select member;

    foreach (var person in orderStaff) {
        Console.WriteLine("{0:C0} {1}", person.salary, person.name);
    }
    //£28,000 Dave Diddly
    //£27,000 Mary Pickles
    //£22,500 Liz Elbow
    //£22,000 Mary Muggins
    //£20,000 Bob Bones
    //£18,000 Robert Piccalilli
    // (give two staff the same salary to test this)


Let's Create a New Range Variable

let clause (C# Reference) :MSDN

MS said:

In a query expression, it is sometimes useful to store the result of a sub-expression in order to use it in subsequent clauses. You can do this with the let keyword, which creates a new range variable and initializes it with the result of the expression you supply. Once initialized with a value, the range variable cannot be used to store another value. However, if the range variable holds a queryable type, it can be queried.

(my emphasis)

the following example lists all staff-members whose lastname starts with a vowel (there is only one).
    var letVowels = from member in staff
                    // create an enumerable type that we can query
                    let names = member.name.Split(' ')
                    from individualName in names
                    let lowerName = individualName.ToLower()
                    where "aeiou".Contains(lowerName[0])
                    select individualName;

    foreach (string item in letVowels) {
        Console.WriteLine("{0} starts with a vowel.", item);
    }
    // "Elbow starts with a vowel." (modify staff names to test)


The benefit of the lowerName range-variable isn't apparent in my example. The example from the docs is more obvious:
    from word in words
    let w = word.ToLower()
    where w[0] == 'a' || w[0] == 'e'
        || w[0] == 'i' || w[0] == 'o'
        || w[0] == 'u'
    select word;


Without the variable 'w', ToLower() would have to applied to each occurrence of w[0] in the where clause.

Notice in my example the two uses of let. names is an enumerable, a collection of strings obtained using Split(). lowerName is a single value, for each enumerated element.

Someone might correct me, but I don't see let as having a direct, single, equivalent in a SQL statement. It behaves as either a subquery, or like a variable within a stored procedure.

Sequence With Inner Sequence

Elements in a sequence may themselves be a sequence. We can access the inner sequences using more than one from clause. Microsoft calls this a compound from clause.

Create this array:
    var deptsInner = new[] { 
        new { dept = "Marketing", 
        staff = new[] { new { name = "B Bones" }, new { name = "M Pickles" }, new { name = "R Piccalilli" } } } ,
        new { dept = "Sales", 
        staff = new[] { new { name = "M Muggins" }, new { name = "L Elbow" } } } ,
        new { dept = "Accounts", 
        staff = new[] { new { name = "D Diddly" } } } };


    var compoundFrom = from x in deptsInner
                       from y in x.staff
                       select new { Dept = x.dept, Name = y.name };

    foreach (var item in compoundFrom) {
        Console.WriteLine("{0} {1}", item.Dept, item.Name);
    }
    //Marketing B Bones
    //Marketing M Pickles
    //Marketing R Piccalilli
    //Sales M Muggins
    //Sales L Elbow
    //Accounts D Diddly


This example (using the same array) includes a where clause:
    var compoundFromWhere = from x in deptsInner
                            from y in x.staff
                            where y.name.StartsWith("M")
                            select new { Dept = x.dept, Name = y.name };

    foreach (var item in compoundFromWhere) {
        Console.WriteLine("{0} {1}", item.Dept, item.Name);
    }           // the two Mary's.


Independent Sequences

Two different sequences will normally share a common value (a field, in database terms) that relates them. See the section on Joining below. However, there are occasions when it is useful to merge two sequences, even though they have no value in common. Some examples of this are to number rows, to generate sets of rows, or perhaps to generate calendar dates.

If one sequence has 5 elements, and the other has 3, then combining them results in 15 (5x3) rows. In database terms this uses a cross join, and the process creates a cross product of the two sequences. That is, because there is no connection between the two tables, each row in the first table is matched with each row in the second table.

Note: Although the ability to do this can be useful, it is not a very common need. If you ever find yourself unexpectedly generating millions of rows, it is because you haven't established a valid connection (a join) between tables.

Using an array { 1, 2, 3 } and the six rainbow-members generates 18 rows:
    int[] indices = { 1, 2, 3 };
    var multipleFrom = from ind in indices
                       from name in rainbow
                       select new { ind, name };

    foreach (var member in multipleFrom) {
        Console.WriteLine("{0} {1}", member.ind, member.name);
    }
    //1 Rod (cross-product)
    //1 Jane
    //1 Freddy
    //1 Bungle
    //1 Zippy
    //1 George
    //2 Rod
    //2 Jane
    //2 Freddy
    //2 Bungle
    //2 Zippy
    //2 George
    //3 Rod
    //3 Jane
    //3 Freddy
    //3 Bungle
    //3 Zippy
    //3 George


The following variation uses a where clause to reduce this list to only those rainbow-members whose names begin with either 'B', 'F' or 'G', and only for the array values of 1 and 2.
    var multipleFromFiltered = from ind in indices
                               from name in rainbow
                               where ind < 3 && "BFG".Contains(name[0])
                               select new { ind, name };

    foreach (var member in multipleFromFiltered) {
        Console.WriteLine("{0} {1}", member.ind, member.name);
    }
    //1 Freddy
    //1 Bungle
    //1 George
    //2 Freddy
    //2 Bungle
    //2 George


Grouping

MS said:

Use the group clause to produce a sequence of groups organized by a key that you specify. The key can be any data type.

More formally:

MS said:

The group clause returns a sequence of IGrouping<TKey, TElement> objects that contain zero or more items that match the key value for the group. For example, you can group a sequence of strings according to the first letter in each string. In this case, the first letter is the key and has a type char, and is stored in the Key property of each IGrouping<TKey, TElement> object. The compiler infers the type of the key.


The following example groups the rainbow-members according to the length of their names:
    var groupByLen = from member in rainbow
                     group member by member.Length;
    // a query expression can end with a group clause.

    foreach (var agroup in groupByLen) {
        // or: foreach (IGrouping<int, string> agroup in GroupByLen) {
        Console.WriteLine("Group length: {0}", agroup.Key);
        foreach (var member in agroup) {
            Console.WriteLine("\t{0}", member);
        }
    }
    //Group length: 3
    //  Rod
    //Group length: 4
    //  Jane
    //Group length: 6
    //  Freddy
    //  Bungle
    //  George
    //Group length: 5
    //  Zippy


Note that the Key, for this example, as an integer - the various lengths of the names.

The following example groups staff-members according to whether their salary is above or below the average salary. In this case the Key is a boolean: there are two groups, one for true and one for false.
    double avgSal = staff.Average(x => x.salary);
    Console.WriteLine("Average Salary {0:C}", avgSal);

    var groupAverageBool = from member in staff
                           group member by member.salary > avgSal;  // bool

    foreach (var salGroup in groupAverageBool) {
        Console.WriteLine(salGroup.Key ? "Above average:" : "Equal or below average:");
        foreach (var member in salGroup) {
            Console.WriteLine("{0} ({1:C})", member.name, member.salary);
        }
    }
    //Average Salary £22,916.67
    //Equal or below average:
    //Bob Bones (£20,000.00)
    //Mary Muggins (£22,000.00)
    //Liz Elbow (£22,500.00)
    //Robert Piccalilli (£18,000.00)
    //Above average:
    //Dave Diddly (£28,000.00)
    //Mary Pickles (£27,000.00)


(I kept the example simple by first working out the average.)

Joining

Two sequences will usually share a common value, a field in database terms. (Compare this with the Independent Sequences section above.) We use a join clause to combine matching data from both tables.

The join clause only directly supports an inner join, an equijoin. That is, it will only output rows where a value in one table's column matches (equals) the corresponding value in the other table's column. See this MSDN page for examples of other join types.

This example uses the default inner join to obtain data from the staff and dept arrays, where a staff-member's deptid appears in the depts-array.
    var innerJoin = from dept in depts
                    join member in staff
                        on dept.deptid equals member.deptid
                    select new { Department = dept.dept, StaffMember = member.name };

    foreach (var item in innerJoin) {
        Console.WriteLine("{0} : {1}", item.Department, item.StaffMember);
    }
    //Marketing : Bob Bones
    //Marketing : Mary Pickles
    //Marketing : Robert Piccalilli
    //Sales : Mary Muggins
    //Sales : Liz Elbow
    //Accounts : Dave Diddly


Left Outer Join

MSDN said:

In a left outer join, all the elements in the left source sequence are returned, even if no matching elements are in the right sequence. To perform a left outer join in LINQ, use the DefaultIfEmpty method in combination with a group join to specify a default right-side element to produce if a left-side element has no matches. You can use null as the default value for any reference type, or you can specify a user-defined default type.

            var leftOuter = from x in depts
                            join y in staff on x.deptid equals y.deptid into staffGroup
                            from item in staffGroup.DefaultIfEmpty(new {
                                name = "Missing", salary = 0, deptid = -1, grade = 0
                            })
                            select new { Department = x.dept, Name = item.name };
            // groups have been flattened
            foreach (var item in leftOuter) {
                Console.WriteLine("{0} {1}", item.Department, item.Name);
            }
            //Marketing Bob Bones
            //Marketing Mary Pickles
            //Marketing Robert Piccalilli
            //Sales Mary Muggins
            //Sales Liz Elbow
            //Accounts Dave Diddly
            //Human Resources Missing <<< an outer join


join with into creates the group join.

A Visual Explanation of SQL Joins

Into

MSDN said:

The into contextual keyword can be used to create a temporary identifier to store the results of a group, join or select clause into a new identifier. This identifier can itself be a generator for additional query commands. When used in a group or select clause, the use of the new identifier is sometimes referred to as a continuation.


Roughly, into something means that something holds the result of the query up to that point which can then serve as the source for further clauses in the query. (It has similarities to the let clause.)

This example counts the number of staff whose salaries fall within £5,000 intervals.
    var tallyInto = from member in staff
                    group member.salary by member.salary / 5000 into salGroup
                    where salGroup.Count() >= 2
                    select new { LowerSalary = salGroup.Key, NoOfStaff = salGroup.Count() };

    foreach (var item in tallyInto) {
        Console.WriteLine("Lower Salary: {0:C0} No of Staff: {1}",
            item.LowerSalary * 5000, item.NoOfStaff);
    }
    //Lower Salary: £20,000 No of Staff: 3
    //Lower Salary: £25,000 No of Staff: 2
    // (Robert is omitted because the count of £15k-£20k is only 1.)


into salGroup creates the identifier salGroup which stores the Keys £15,000, £20,000, £25,000, along with the members whose salaries fall within these boundaries. The continuation (the where clause) reduces these groups to only keep those where the count of members is >= 2.

This post has been edited by andrewsw: 04 September 2014 - 04:14 PM


Is This A Good Question/Topic? 0
  • +

Page 1 of 1