Some time ago I was struggling with Linq and left outer join. By default Linq extensions provide Join method (an inner join) but not the left outer join method.
I worked out the following method implementation. That is the fastest I came up with.
public static class EnumberableExtensions
{
/// <summary>
/// Produces a left outer join of two IEnumerable collections.
/// </summary>
/// <typeparam name="TOuter">The base sequence.</typeparam>
/// <typeparam name="TInner">The sequence to join the first
/// sequence.</typeparam>
/// <typeparam name="TKey">The type of the key used in both
/// types.</typeparam>
/// <typeparam name="TResult">Type stored in the resulting
/// IEnumerable collection.</typeparam>
/// <param name="outer">The outer collection.</param>
/// <param name="inner">The inner collection.</param>
/// <param name="outerKeySelector">A function to extract the join
/// key from each element of the first sequence.</param>
/// <param name="innerKeySelector">A function to extract the join
/// key from each element of the second sequence.</param>
/// <param name="joinMethod">The method that takes an element from
/// the first sequence and an element from the second sequence and
/// builds the result element based on the input elements for the
/// joined sequence.</param>
/// <returns></returns>
public static IEnumerable<TResult> JoinLeftOuter<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> joinMethod) where TInner : class
{
var result = outer
.GroupJoin(
inner,
outerKeySelector,
innerKeySelector,
(o, i) => new
{
key = outerKeySelector(o),
outer = o,
innerGrouped = i.Select(e => e)
})
.SelectMany(
e => e.innerGrouped.DefaultIfEmpty().Select(r => joinMethod(e.outer, r)));
return result;
}
}
This uses the
GroupJoin method and then flattens out the results using
SelectMany method. The method allows you to specify the method that takes an element from each sequence and allows to produce the resulting sequence element. This allows an easy transformation of the data into desired class.
A sample usage:
List<DataClass> ll = new List<DataClass>();
ll.Add(new DataClass() { id = 1, data = "aaa" });
ll.Add(new DataClass() { id = 2, data = "aaa" });
ll.Add(new DataClass() { id = 3, data = "aaa" });
ll.Add(new DataClass() { id = 4, data = "aaa" });
List<DataClass> lr = new List<DataClass>();
lr.Add(new DataClass() { id = 2, data = "rr_aaa" });
lr.Add(new DataClass() { id = 3, data = "rr_bbb" });
lr.Add(new DataClass() { id = 3, data = "rr_ccc" });
lr.Add(new DataClass() { id = 5, data = "rr_ccc" });
var res = ll.JoinLeftOuter(
lr,
l => l.id,
l => l.id,
(o, i) => new { id = o.id, data = o.data, inner = i });
foreach (var e in res)
{
Console.WriteLine(
e.id + ", " +
e.data +
(
e.inner == null ?
"" :
", " + e.inner.id + ", " + e.inner.data));
}
The output presents the following results:
1, aaa
2, aaa, 2, rr_aaa
3, aaa, 3, rr_bbb
3, aaa, 3, rr_ccc
4, aaa
Please note, how nicely the outer join worked:
-
the ids 1 and 4 of the first sequence don’t have the matching elements in the second sequence and still appeared in the result
-
the id 3 of the first sequence is matched with both ids 3 from the second sequence
-
the id 5 from the second sequence does not appear as it hasn’t got a matching element in the first sequence
The method is pretty fast. For 10 000 elements in the first sequence and 30 000 elements in the second sequence, the enumeration (producing the result using ToList()) takes around 60 miliseconds on a moderate laptop.
Please note, how the
joinMethod is declared and its role.
Have fun and feel free to re-use.