6

Summary

I am developing a WAV file format reader under WinRT, and for this I need to read random amounts of structs consisting of fundamental types such as int, uint, float and so on.

Back in desktop development one would rely on BinaryReader, now in WinRT it has been replaced by DataReader which works asynchronously.

Problem

I cannot grasp how to use this new class since now, an intermediate buffer must be filled using LoadAsync(), prior calling reading methods such as ReadInt32().

In contrast, with the old BinaryReader there was no notion of having to fill an intermediate buffer prior reading primitives from the source.

Every example I have seen on the web are 'naive' in the sense that they entirely read the source stream in memory, but in my case a WAV file is in the range of hundred megabytes and possibly gigabytes.

I have sketched the following helper methods which pre-fills the intermediate buffer with only what's needed and basically frees me from systematically calling LoadAsync every time before reading something from the stream:

internal static class DataReaderExtensions
{
    public static async Task<string> ReadStringAsync(this DataReader reader, uint length)
    {
        await LoadAsync(reader, length);
        return reader.ReadString(length);
    }

    private static async Task LoadAsync(DataReader reader, uint length)
    {
        var u = await reader.LoadAsync(length);
        if (u < length) throw new InvalidOperationException();
    }
}

But I'm not entirely sure whether it is the way to go when using DataReader.

Question

How is one supposed to pre-fill the intermediate buffer in my case ?

  • should one load only the needed amount as shown above ?
  • or should one load a constant size (e.g. 65536 bytes), keep track of reading position then possibly pre-fetch more on larger requests ? (basically wrapping a DataReader in a helper class)

EDIT

By looking at BinaryReader source code there doesn't seem to be any kind of magic behind the scene, i.e. bytes are fetched on demand. So for my case, even if it sounds a bit silly to read primitives asynchronously, I guess it's the simplest and safest way to do it; in contrast to wrapping a DataReader, tracking read position, handling an intermediate buffer and finally, the inability to derive from it as public WinRT types must be sealed ... not sure it is worth it for the outcome.

Unfortunately WINMD assemblies sources are unavailable, it would have been pretty interesting to see how they do it at Microsoft as these newer types can be used as older types, with these extension methods.

aybe
  • 727
  • 6
  • 16
  • Unclear what your asking. Are you really asking for a "pattern" that is more "popular?" – Robert Harvey Sep 21 '15 at 02:09
  • Not more 'popular' but something more effective as I find what I've done a little weird. – aybe Sep 21 '15 at 02:15
  • It's probably written with this granularity to give you more control. On the face of it, reading data without using it right away (i.e. waiting for it) seems not so useful unless you have some fine-grained control over the process. – Robert Harvey Sep 21 '15 at 02:30
  • I'm sorry but I don't really get what you mean in your 2nd sentence (re. waiting for it). – aybe Sep 21 '15 at 02:45
  • That's what an ordinary synchronous call is. In any case, what you've written doesn't seem like unreasonable code. – Robert Harvey Sep 21 '15 at 03:34
  • Actually, I think your code has problems. Look at your first example. When you call `reader.ReadBytes(bytes);` it's going to convert into a synchronous call anyway. – Robert Harvey Sep 21 '15 at 05:41
  • Well, it will end up wrapped in such block anyway :) even if I move that code 'above' in the public method, it would end up being annotated with `async` as well, or did I miss something ? By the way I have reformulated my question and asked for reopen, hopefully it makes more sense by now. – aybe Sep 21 '15 at 20:54
  • Yes, but now you're starting to see the need for an intermediate buffer, right? It gives your code a chance to do something useful while you wait for that buffer to be filled. – Robert Harvey Sep 21 '15 at 21:21
  • I'm left in the dust with your first sentence ... I re-read it 10 times still I don't grasp the usefulness of an intermediate buffer. And the 2nd sentence makes it even more confusing, what am I supposed to do beside waiting for it ? I mean the code execution at this point will resume only when it's done, so how come I can do something else in the meantime, or are you talking about keeping the UI responsive and reporting some progress for instance (doing something, somewhere else) ? – aybe Sep 21 '15 at 21:44
  • When you set up an async method, what you're doing essentially is returning a *promise.* (i.e. a Task). When you interrogate that promise for its value, it will then block until the value becomes available (i.e. the read completes). So it follow that creating an async method that immediately asks for the promise's value is not particularly useful, since it's essentially the equivalent of a blocking call anyway. To make it useful, you have to put some productive work between the call and the asking for the result. – Robert Harvey Sep 21 '15 at 22:42
  • Alright I understand now ! I guess I'll exercise myself tomorrow on this aspect because it really makes sense. Thank you ! – aybe Sep 22 '15 at 01:03

1 Answers1

4

should one load only the needed amount as shown above ?

You should load into the buffer all that you can feasibly expect to process with the code that follows. In the DataReader documentation example, they read the entire stream into the buffer, because they are going to process it all immediately.

The reason for the buffer is that IO is slow (usually). So the amount of data you specify is loaded with asynchronous IO into the memory buffer up front. Then you can subsequently read it without waiting for IO on every read. That's a good thing for performance. IO is batched which will improve perf on many devices (e.g. mechanical hard drives). Your code's execution is suspended (due to async/await) until IO is finished, so it isn't tying up CPU cycles.

or should one load a constant size (e.g. 65536 bytes), keep track of reading position then possibly pre-fetch more on larger requests ? (basically wrapping a DataReader in a helper class)

Sometimes the size of the data will be too large to load in memory all at once. .NET itself sets a memory limit of 2GB per object (well, sortof). So if the data you are reading is close to 2GB, then you will definitely want to keep track of stream's read position, and read only part of the file into the buffer. Then once you get to the end of the buffer, fill it back up again from the next read position and continue processing. Repeating as necessary until you've processed the whole file.

Kasey Speakman
  • 4,341
  • 19
  • 26