Introductory LINQ Tutorail

Getting into LINQ

Introduction

LINQ is an extremely featured and powerful set of extensions to C#, added as of .NET 3.0 and/or 3.5. Long story extremely short and only partially accurate, LINQ lets you query and manipulate lists of items in ways that until now we've needed to write ourselves using loops.
Given that we're now targeting 3.5 in Talon, this opens us up to taking advantage of its features.
Today we're going to scratch the surface of one aspect of LINQ.
There's a lot to cover, and lots of background info that you probably need, but let's just dive in with an example of what LINQ can do.

What’s happening?

Here's a section of actual pre-3.5 production code with names changed:

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
        if (foos != null &&
            foos.Foo != null &&
            foos.BarContainerList != null)
        {
            foreach (BarContainer barContainer in foos.BarContainerList)
            {
                if (barContainer.Bar != null)
                {
                    list.Add(foos.Foo);
                    break;
                }
            }
        }
    }
    return list;

What is this code doing?
So, we create a list, loop through a dictionary's values each of which also contains a list, and for each dictionary value, if its list has at least one item that meets some criteria, we populate the new list with another value from the dictionary value.
Ouch.
The intent is to get all the Foo instances that are in FooContainerInstances which have at least one non-null Bid in any Category. The comments actually tell us that, sort of, but there's no way to tell easily just by examining the code.

Anytime

Let's clean it up a bit with LINQ. LINQ provides a number of extensions to lists, so let's start by applying the Any extension to the BarContainerList list.

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
        if (foos != null &&
            foos.Foo != null &&
            foos.BarContainerList != null)
        {
            if (foos.BarContainerList.Any())
            {
                list.Add(foos.Foo);
            }
        }
    }
    return list;

So, instead of looping through Category datas to see if there are any, we just call Any, and if it returns true, then add the Foo to our new list.

Anyhow

But we haven't given Any any criteria yet.

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
          if (foos != null &&
              foos.Foo != null &&
              foos.BarContainerList != null)
          {
              if (foos.BarContainerList.Any(delegate(BarContainer barContainer) { return  barContainer.Bar != null; }))
              {
                  list.Add(foos.Foo);
              }
          }
    }
    return list;

So we give Any a delegate that accepts a BarContainer type called barContainer and returns true if it is non null. Any calls the delegate once per item in the list. If our delegate returns true for even one of the items, then Any returns true.

Short and sweet

So that clears up our intent, but it's kinda ugly. Fortunately, LINQ is pretty smart. It already knows the type of the list, and so it provides a convenient short hand.

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
        if (foos != null &&
            foos.Foo != null &&
            foos.BarContainerList != null)
        {
            if (foos.BarContainerList.Any(barContainer => barContainer.Bar != null))
            {
                list.Add(foos.Foo);
            }
        }
    }
    return list;

For those interested in such things, the weird looking thing in Any is called a lambda expression. The LHS contains just a name for a variable local to the lambda expression, Any will populate that variable with each item on the list one at a time. The RHS contains an expression that return true or false. The expression can use the variable or other variable the outer method has access to. We don't need to here, but we are free to.

Shorter and sweeter

We could shorten it further if we wanted.

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
        if (foos != null &&
            foos.Foo != null &&
            foos.BarContainerList != null)
        {
            if (foos.BarContainerList.Any(b => b.Bar != null))
            {
                list.Add(foos.Foo);
            }
        }
    }
    return list;

Leaving the nest

This gives us nested ifs, so we could simplify a bit further.

List<Foo> list = new List<Foo>();
    foreach (FooContainer foos in this.fooContainerDictionary.Values)
    {
        if (foos != null &&
            foos.Foo != null &&
            foos.BarContainerList != null &&
            foos.BarContainerList.Any(b => b.Bar != null))
        {
            list.Add(foos.Foo);
        }
    }
    return list;

There are a number of extensions like Any: All, Contains, Count and for number based lists: Sum, Average, Min, Max.
So that's a bit cleaner. We've reduced our LOCs and expressed our intent more clearly. But we can do better. The Q in LINQ is for query. It lets you use C# kinda like SQL.

Q is for query

So let's take a crack at using it to replace the outer loop. Instead of looping through the dictionary values, we'll use the Where extension method. [Reveal one line at a time]

this.fooContainerDictionary.Values
      .Where(foos => foos != null)
      .Where(foos => foos.Foo != null)
      .Where(foos => foos.BarContainerList != null)
      .Where(foos => foos.BarContainerList.Any(b => b.Bar != null))
  1. So we start with the same dictionary that we were looping over before, and we get all the values out of it. So far nothing new.
  2. Next we'll use the Where method to get the non-null ones, just like the first if in the original. I chose to call the local variable foos, but we could have used p or pD or blah. The RHS of the lamda expression must return a bool that indicates whether or not to include the item in the results.
  3. Now, just like our if statements in the original, we only want to look at the periodDatas that have a non-null period and that have a non-null BarContainerList, so let's add two lines for that. So far all we've done is generate a list of periodDatas that meet all the conditions in the original method's outer if statement.
  4. Now there's one more criteria; the original method's inner foreach and if. Fortunatley, we've already LINQified that, so we just need to put it inside a Where statement. So yes, we've got LINQ inside LINQ. This gives us a list of periodDatas that meet all the conditions of the original method's multiple if and foreach statements.

Making the list

So what's left? We just need to get those items into a list. They're almost in a list. What this statement returns is an IQueryable. This is important because nothing has been evaluated yet. The dictionary's Values call has returned a list, but all the Where statements have done is create a query, they haven't processed the list. Nothing has been looped through, or passed to a lamda expression, and no list has been generated in memory. When dealing with large lists, this is nice, and maybe we'll get into how to take advantage of that another time. For now, we'll just slap a ToList on the end and a return at the begining.

return this.fooContainerDictionary.Values
      .Where(foos => foos != null)
      .Where(foos => foos.Foo != null)
      .Where(foos => foos.BarContainerList != null)
      .Where(foos => foos.BarContainerList.Any(b => b.Bar != null))
      .ToList();

ToList evaluates the query and returns a list of FooContainers.

You can always get what you want

But it turns out I've forgotten something. If we were in VS, I'd have noticed that the original method didn't return the FooContainers, it return the FooContainers' Foo member. I guess we'll have to loop through our new list and create a new list of Foo from it.
Or, we can take advantage of the Select method.

return this.fooContainerDictionary.Values
      .Where(foos => foos != null)
      .Where(foos => foos.Foo != null)
      .Where(foos => foos.BarContainerList != null)
      .Where(foos => foos.BarContainerList.Any(b => b.Bar != null))
      .Select(foos => foos.Foo)
      .ToList();

Select takes a lambda, but instead of returning true/false, it returns, well, whatever we want. The ToList will now create a list of Foo's.
Compared to the original we’ve saved a few lines of code and got rid of the nested loops. But more importantly we've clarified our intent purely through the language features. Anyone can look at this and quickly grasp the intent without working through loops and declarations in their head. I don't know if this qualifies as self documenting code, but it's much closer than our original.

Mmm… sugar

Finally, LINQ provides some syntactic sugar that allows us to write that as this:

var res = from p in this.fooContainerDictionary.Values
        where p != null
        where p.Foo != null
        where p.BarContainerList != null
        where p.BarContainerList.Any(b => b.Bar != null)
        select p.Foo;
    return res.ToList();

Look familiar?
Note that the initial line doesn't create a list or do any processing of the list. It just creates a query. Only when we call ToList does it actually evaluate anything.

One liner

And you can do it in one statement if you want:

return (from p in this.fooContainerDictionary.Values
        where p != null
        where p.Foo != null
        where p.BarContainerList != null
        where p.BarContainerList.Any(b => b.Bar != null)
        select p.Foo).ToList();

Before and After

Take some time to compare this to the original. The LINQ version is smaller, cleaner, no loops, no breaks. And it's understandable at a glance, at least once you've seen a lambda expression.

Conclusion

As I said at the begining, this just scratches the surface. LINQ provides joins and aggregations, list flattening, unions, sorting, group by. I think there are over a hundred extension methods it provides.
And there are a number of flavours of LINQ:

  • Linq to Objects which we just used
  • Linq to SQL which allows you to execute queries against a database in an object oriented manner.
  • Linq to XML which allows you to query, load, validate, serialize and manipulate XML documents.
  • a few crazy ones I don't understand

There a number of great resources online, but here's one to get you started.
http://www.hookedonlinq.com/LINQtoObjects5MinuteOverview.ashx
And if you want examples of everything from simple to crazy:
http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License