Архивы: Join

LINQ Join vs. Contains

Хозяйке на заметку.
В данном примере всё выполняется в памяти приложения.

http://stackoverflow.com/questions/16824510/select-multiple-records-based-on-list-of-ids-with-linq

Solution with .Where and .Contains has complexity of O(N square). Simple .Join should have a lot better performance (close to O(N) due to hashing). So the correct code is:

_dataContext.UserProfile.Join(idList, up => up.ID, id => id, (up, id) => up);

And now result of my measurement. I generated 100 000 UserProfiles and 100 000 ids. Join took 32ms and .Where with .Contains took 2 minutes and 19 seconds! I used pure IEnumerable for this testing to prove my statement. If you use List instead of IEnumerable, .Where and .Contains will be faster. Anyway the difference is significant. The fastest .Where .Contains is with Set<>. All it depends on complexity of underlying coletions for .Contains. Look at this post to learn about linq complexity.Look at my test sample below:

    private static void Main(string[] args)
    {
        var userProfiles = GenerateUserProfiles();
        var idList = GenerateIds();
        var stopWatch = new Stopwatch();
        stopWatch.Start();
        userProfiles.Join(idList, up => up.ID, id => id, (up, id) => up).ToArray();
        Console.WriteLine("Elapsed .Join time: {0}", stopWatch.Elapsed);
        stopWatch.Restart();
        userProfiles.Where(up => idList.Contains(up.ID)).ToArray();
        Console.WriteLine("Elapsed .Where .Contains time: {0}", stopWatch.Elapsed);
        Console.ReadLine();
    }
 
    private static IEnumerable<int> GenerateIds()
    {
       // var result = new List<int>();
        for (int i = 100000; i > 0; i--)
        {
            yield return i;
        }
    }
 
    private static IEnumerable<UserProfile> GenerateUserProfiles()
    {
        for (int i = 0; i < 100000; i++)
        {
            yield return new UserProfile {ID = i};
        }
    }

Console output:

Elapsed .Join time: 00:00:00.0322546
Elapsed .Where .Contains time: 00:02:19.4072107

А вот про SQL

http://stackoverflow.com/questions/1200295/sql-join-vs-in-performance

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server.

If it’s not, then IN is faster than JOIN on DISTINCT.

SQL Joins

Как же так тут до сих пор нет этой картинки! Срочно исправляю.