🔗

Benchmarking madness: IEnumerable

2012-08-12

What is the best way to know if an IEnumerable<T> contains at least one element?
I mean, what is the most optimized way to know this information?

Maybe the first thing that came to my mind was to do some .Count() > 0 check, but if you think of it, the Count() extension method would have to enumerate the entire collection to know its length. This seems a little overkill.

What about .FirstOrDefault() != null then? Seems better. But looks ugly to me.
(and worse, on a value type, you would have to use default(T) instead of null…)

After some research I realized I had also forgotten the .Any() method, which in fact seems to be done for this matter.
But at this point, I wasn’t sure of anything anymore…

Use The Source, Luke

All these functions are defined as extension methods for the IEnumerable<T> interface, but are they optimized for more derived interfaces and implementations?
(for instance, the ICollection<T> interface and its Count property)

Let’s download ILSpy to see what all these functions are actually doing…

Here is the implementation of the Count() Linq extension method from the .NET Framework 4.0:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
	if (source == null)
	{
		throw Error.ArgumentNull("source");
	}
	ICollection<TSource> collection = source as ICollection<TSource>;
	if (collection != null)
	{
		return collection.Count;
	}
	ICollection collection2 = source as ICollection;
	if (collection2 != null)
	{
		return collection2.Count;
	}
	int num = 0;
	checked
	{
		using (IEnumerator<TSource> enumerator = source.GetEnumerator())
		{
			while (enumerator.MoveNext())
			{
				num++;
			}
		}
		return num;
	}
}

How interesting… so the function is actually testing the ICollection<T> case I mentioned earlier. 🙂

What about the other functions?

Here is the FirstOrDefault() function:

public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source)
{
	if (source == null)
	{
		throw Error.ArgumentNull("source");
	}
	IList<TSource> list = source as IList<TSource>;
	if (list != null)
	{
		if (list.Count > 0)
		{
			return list[0];
		}
	}
	else
	{
		using (IEnumerator<TSource> enumerator = source.GetEnumerator())
		{
			if (enumerator.MoveNext())
			{
				return enumerator.Current;
			}
		}
	}
	return default(TSource);
}

…optimized in the same way as Count()
(The IList<T> interface implements the ICollection<T> interface, which in turn, implements the IEnumerable<T> interface)

And here is the Any() function:

public static bool Any<TSource>(this IEnumerable<TSource> source)
{
	if (source == null)
	{
		throw Error.ArgumentNull("source");
	}
	using (IEnumerator<TSource> enumerator = source.GetEnumerator())
	{
		if (enumerator.MoveNext())
		{
			return true;
		}
	}
	return false;
}

What is this?
This one does not even bother to check if the source is an ICollection<T>!
I was expecting to see a special case for collections, checking if Count > 0, but nothing!
I suppose Microsoft have benchmarked this case, and found out it was not necessary…

But I’m stubborn. And bored.

Let’s benchmark!

I could’t believe comparing an integer was slower than calling an iterator.
So I wrote my one implementation, and did some benchmarks… 🙂

public static bool Empty<TSource>(this IEnumerable<TSource> source)
{
	if (source == null)
	{
		throw new ArgumentNullException("source");
	}
	ICollection<TSource> collection = source as ICollection<TSource>;
	if (collection != null)
	{
		return collection.Count <= 0;
	}
	else
	{
		using (IEnumerator<TSource> enumerator = source.GetEnumerator())
		{
			if (enumerator.MoveNext())
			{
				return false;
			}
		}
	}
	return true;
}

The benchmarking class:

public static class Benchmarks
{
	public static void BenchmarkAll<T>(IEnumerable<T> source)
	{
		Dictionary<Action<IEnumerable<T>>, long> results = new Dictionary<Action<IEnumerable<T>>, long>();
		Console.WriteLine();
		results.Add(BenchmarkAny, Benchmark(BenchmarkAny, source));
		results.Add(BenchmarkCount, Benchmark(BenchmarkCount, source));
		results.Add(BenchmarkFirstOrDefault, Benchmark(BenchmarkFirstOrDefault, source));
		results.Add(BenchmarkCustomEmpty, Benchmark(BenchmarkCustomEmpty, source));
		var fastest = results.OrderBy(pair => pair.Value).First();
		var second = results.OrderBy(pair => pair.Value).ElementAt(1);
		Console.WriteLine("\t{0} was the fastest function, with {1}ms.", fastest.Key.Method.Name, fastest.Value);
		Console.WriteLine("\t{0} was the second fastest function, with {1}ms.", second.Key.Method.Name, second.Value);
		Console.WriteLine();
	}
	
	private static long Benchmark<T>(Action<IEnumerable<T>> function, IEnumerable<T> source)
	{
		const string formatStart = "Starting {0} iterations with {1}...";
		const string formatEnd = "Finished in {0} ms.";
		Console.WriteLine(formatStart, iterations, function.Method.Name);
		Stopwatch stopwatch = Stopwatch.StartNew();
		function(source);
		stopwatch.Stop();
		Console.WriteLine(formatEnd, stopwatch.ElapsedMilliseconds);
		return stopwatch.ElapsedMilliseconds;
	}
	
	private const int iterations = 100000000;
	
	private static void BenchmarkCount<T>(IEnumerable<T> source)
	{
		bool isEmpty = false;
		for (int i = 0; i < iterations; i++)
		{
			isEmpty = source.Count() <= 0;
		}
	}
	
	private static void BenchmarkAny<T>(IEnumerable<T> source)
	{
		bool isEmpty = false;
		for (int i = 0; i < iterations; i++)
		{
			isEmpty = !source.Any();
		}
	}
	
	private static void BenchmarkFirstOrDefault<T>(IEnumerable<T> source)
	{
		bool isEmpty = false;
		for (int i = 0; i < iterations; i++)
		{
			isEmpty = source.FirstOrDefault().Equals(default(T));
		}
	}
	
	private static void BenchmarkCustomEmpty<T>(IEnumerable<T> source)
	{
		bool isEmpty = false;
		for (int i = 0; i < iterations; i++)
		{
			isEmpty = source.Empty();
		}
	}
}

Throw an eye, I must confess I’m pretty proud of it ^^

Results

Let’s begin with the most simple (i.e. that does not implement ICollection<T>) IEnumerable<T> implementation:

4 items, IEnumerable<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 2870 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 6276 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 4764 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 4597 ms.  
    BenchmarkAny was the fastest function, with 2870ms.  
    BenchmarkCustomEmpty was the second fastest function, with 4597ms.

40 items, IEnumerable<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 2800 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 24723 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 4885 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 4597 ms.  
    BenchmarkAny was the fastest function, with 2800ms.  
    BenchmarkCustomEmpty was the second fastest function, with 4597ms.

The first thing to notice is the time it took to complete the Count() benchmark with 40 items. Almost ten time more than the fastest method!
The second is that Any() is the fastest method, followed by my own Empty() method.

What about the results for the ICollection<T> and IList<T> implementations now?

4 items, ICollection<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 4463 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 1808 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 6522 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 2579 ms.  
    BenchmarkCount was the fastest function, with 1808ms.  
    BenchmarkCustomEmpty was the second fastest function, with 2579ms.

40 items, ICollection<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 4412 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 1766 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 6119 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 2331 ms.  
    BenchmarkCount was the fastest function, with 1766ms.  
    BenchmarkCustomEmpty was the second fastest function, with 2331ms.

4 items, IList<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 4117 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 2075 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 2752 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 2519 ms.  
    BenchmarkCount was the fastest function, with 2075ms.  
    BenchmarkCustomEmpty was the second fastest function, with 2519ms.

40 items, IList<T>

Starting 100000000 iterations with BenchmarkAny…  
Finished in 4108 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 2051 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 2735 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 2517 ms.  
    BenchmarkCount was the fastest function, with 2051ms.  
    BenchmarkCustomEmpty was the second fastest function, with 2517ms.

In all cases, and thanks to the optimizations we saw earlier, the Count() method is the fastest way to know if an ICollection<T> contains at least one item.

Followed closely by my Empty() method.

There is more…

All theses benchmarks are nice when you know on which implementation you are working, but the purpose of interfaces is precisely to abstract from the implementation!

Most of the IEnumerable<T> implementations in the .NET Framework also implement ICollection<T>, and I actually had a hard time to find a class only implementing IEnumerable<T>. The most obvious one, which I used in my benchmarks, is String, implementing IEnumerable<char>. Almost all the others are pretty obscure and specific classes.
Based on that, the usage of the Count() method seems a good choice in almost all cases.

There is still a thing bothering me though, what about Linq queries?
Linq is used very often in C#, and Linq works on IEnumerable<T>.

For example, the Where() method returns an IEnumerable<T>.

We don’t know the underlying implementation, and it should not matter. But when you are working on huge collections, or if you loop millions of times over small ones, it does matter.
See my benchmarks, it is the difference between three seconds and thirty.

And as you guessed, the Where() method returns a WhereListIterator<T> which does not implement ICollection<T>!

4 items, linq where query

Starting 100000000 iterations with BenchmarkAny…  
Finished in 6732 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 15568 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 8824 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 7904 ms.  
    BenchmarkAny was the fastest function, with 6732ms.  
    BenchmarkCustomEmpty was the second fastest function, with 7904ms.

40 items, linq where query

Starting 100000000 iterations with BenchmarkAny…  
Finished in 6772 ms.  
Starting 100000000 iterations with BenchmarkCount…  
Finished in 93932 ms.  
Starting 100000000 iterations with BenchmarkFirstOrDefault…  
Finished in 9042 ms.  
Starting 100000000 iterations with BenchmarkCustomEmpty…  
Finished in 8311 ms.  
    BenchmarkAny was the fastest function, with 6772ms.  
    BenchmarkCustomEmpty was the second fastest function, with 8311ms.

As in the first benchmark, the Any() method is the fastest.
Still followed closely, compared to the Count() method, by my own Empty() method.

Final thoughts

I wonder why the Any() implementation is not optimized in a similar way as Count() or FirstOrDefault(). It may be designed to be the most effective with IEnumerable<T> not also implementing ICollection<T>… or it might be an oversight?

Anyway, I think my Empty() method is a nice trade-off, if you seek the best performances without knowing, or without wanting to know, the underlying implementation.

But to be honest, I’m not sure the additional bother of including and using code not native to the Framework is worth it… I would rather stick to the Any() method, and lose a few miliseconds.

Still, it was fun. 🙂

TL;DR:
Do not use the Count() extension method to test if an IEnumerable<T> is empty.
Instead, use the Any() method.

0 comment



Formatting cheat sheet.
The current page url links to a specific comment.
The comment is shown highlighted below in context.

    JavaScript is required to see the comments. Sorry...