3

I was working on a bit of code for a personal project, when I came upon the need to generate checksums on large amounts of files. First off let me say I already solved this problem ideally using System.Threading.Tasks.Parallel (.net net, C#), which behaves have I would expect. What I expected was several checksums running simultaneously using Tasks, given a list of tasks, but not necessarily have them be processed in order. In other words, if I put a small one (10mb perhaps) as the last one, and a 5gb file as the first, the last one should finish first. Because it takes significantly less time to process.

Here is a very simple example:

static async void MainAsync()
{
    await GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
    await GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
    await GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
    await GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");
}

And the GetCheckSum function:

static async Task<string> GetChecksum(int index,string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        SHA256Managed sha = new SHA256Managed();
        Task<byte[]> checksum = sha.ComputeHashAsync(stream, 1200000);
        var ret = await checksum;
        System.Console.WriteLine($"{index} -> {file}");
        var hash = BitConverter.ToString(ret).Replace("-", String.Empty);
        System.Console.WriteLine($" ::{hash}");
        return hash;
    }
}

According to this article: https://msdn.microsoft.com/en-us/library/hh696703.aspx

Which states:

The method creates and starts three tasks of type Task, where TResult is an integer. As each task finishes, DisplayResults displays the task's URL and the length of the downloaded contents. Because the tasks are running asynchronously, the order in which the results appear might differ from the order in which they were declared.

However that is not what I experience with this example. I see each one finishing in the order they were called. I realize in this example its not using parallel processing, which I assume would force this to use a single processor, but given that the last one takes 2 seconds to process and the first one takes 2 minutes, I would still expect that the smallest one should finish first.

Can somebody possibly explain this behavior? I just want to understand whats going on behind the scenes with async and await when used like this.

Brandon
  • 133
  • 4
  • Indeed you are correct. But why does that simple difference cause them to process in order? I have seen in mentioned that essentially when calling an await the following code is processed as a continuation, which would make sense and explain this to some extent. I don't know if that is correct or not, but even the code from the article calls them as consecutive calls which seems to invalidate that statement. – Brandon Apr 05 '16 at 01:11
  • Would it be too much to ask that someone point out which line(s) you are talking about? – JimmyJames Apr 05 '16 at 01:14
  • 1
    He's talking about the "await [function]" vs defining the function in a Task variable then calling "await [variable]". Which produces the behavior I described as expected. – Brandon Apr 05 '16 at 01:15
  • Thanks. Very interesting. I await the answer with bated breath. – JimmyJames Apr 05 '16 at 01:22

1 Answers1

7

When you call it like this:

await GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
await GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
await GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
await GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");

It creates the first task, then waits for it to complete, then creates the second task, then waits for it to complete, etc.

When you call it this way:

Task<string> task1 = GetChecksum(1,@"E:\Files\ISO\5gbfile.iso");
Task<string> task2 = GetChecksum(2,@"E:\Files\ISO\4gbfile.iso");
Task<string> task3 = GetChecksum(3,@"E:\Files\ISO\3gbfile.iso");
Task<string> task4 = GetChecksum(4,@"E:\Files\ISO\10mbfile.iso");

string checksum1 = await task1;
string checksum2 = await task2;
string checksum3 = await task3;
string checksum4 = await task4;

It creates all the tasks and starts them running in parallel, then waits for the first one to complete, then waits for the second one to complete, etc. Syntax matters. It stops executing statements at the point you call await until the current task finishes.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • I get what your saying. But Im curious, if it blocks and starts waiting as you say and would seem to be the case in my example, why does it not do the same in your example? Does it some how differentiate the fact that the await is on the predefined variable, and move on to the next await right away? Im guessing this has something to do with how its compiled? – Brandon Apr 05 '16 at 01:30
  • 1
    It's because the `await` on the first line keeps it from getting to the second line to call the next `GetChecksum`. In the correct example, all the `GetChecksum`s are called before the line with the first `await`. – Karl Bielefeldt Apr 05 '16 at 01:33
  • Great explanation thanks! I assumed the function was just getting added like an anonymous function and it was called when you call the await. But that makes very little sense now that I think about it! hah! Thanks again. :) – Brandon Apr 05 '16 at 01:36
  • So is the accepted answer here wrong or misleading? http://programmers.stackexchange.com/questions/183576/asyncawait-sync?rq=1 My first thought was that the await would, you know, await but that link put me off that track. – JimmyJames Apr 05 '16 at 01:37
  • Jimmy, its not wrong. The function is apparently called when the Task is created, not with the call to await. – Brandon Apr 05 '16 at 01:40
  • 1
    @JimmyJames, blocking was the wrong choice of words. It effectively won't run anything else in that function, but it doesn't actually block the thread it currently occupies. – Karl Bielefeldt Apr 05 '16 at 01:45