Functional Approaches to Serializing Objects to Variable-length Byte Array Output

Question

I have a large number of record types derived from a binary format specification. So far, I've already written a computation expression builder that let’s me read structures from the files easily:

type Data = { Value: … } // Data record

let readData : ReadOnlyMemory<byte> -> Result<Data option, exn> = // This function takes a ReadOnlyMemory<byte> and maybe returns a record or a wrapped exception.
    parser {
        let decode algorithm bytes =
            … // Some code to transform the bytes
        let! algorithm = readUInt32LE 0 // The algorithm value from the first 4 bytes in little endian order
        let! length    = readUInt32LE 4 // The length of bytes to read for the value
        if length > 0 then
            let! value = readBytes    8 248 >=> decode algorithm // The actual data described by the bytes
            return { Value = value }
    }

The nice thing about this approach is that I can easily convert the format specification tables stored on a spreadsheet to create parsers as F# computation expressions for every kind of record defined plus some additional code here and there for validation logic (like above). A lot of the messiness of matching and conditional statements goes away using computational expressions and I can take advantage of imperative-style code with the brevity of F# syntax. (Notice my if statement has no corresponding else statement in the above code above.)

However, it’s not clear to me how best to do this for the reverse―taking records and serializing them into bytes. As in the above example, the byte representation can vary in length. There are also other considerations a writer must be aware of too:

Variable-length: the byte representation is not necessarily fixed-length, although many are.
Context: the byte representation for some types changes depending on where the bytes are written, what parent type they point to, and sometimes even bytes up ahead. (I’ve got one type where the encoder must process all the bytes, and then go back to the first byte position and write the algorithm identifier, so the resulting byte sequences are not always sequentially written.)
- Order: some records have a concept of pointers to parents, children, or siblings, so the order of writing is also important.
Size: the resulting file sizes range from a megabyte to hundreds of gigabytes.

I’ve given some cursory thought to it and come up with the following:

A computation expression builder that caches all the write operations and returns a newly initialized byte array/Memory once the length of the final byte representation is known:

let encode algorithm bytes = // This is defined outside of the computation expression because the expression is 
    …
let serialize data context =
    serializer {
        let algorithm = if context …
            then …
            else …
        do! writeUInt32LE 0 algorithm
        let length = if algorithm …
            then …
            else …
        do! writeUInt32LE 4 length
        do! writeBytesTo  8 <=< encode algorithm <| data.Value
        return Array.zeroCreate <| sizeof<uint32> + sizeof<uint32> + length
    }

An optimized version of the above for serializations with a known fixed size or small upper bound.

I’ve implemented the above with working results, but on second thought, the resulting computation expressions are not very intuitive; the return statement at the very end creates the buffer which the preceding do! statements write to. And the builder type for the computation expression also does a lot of extra work to make this work.

Something tells me that I’m barking up the wrong tree here. If I wanted to pursue code with high signal-to-noise ratio without massively impacting clarity or performance, what is a better way?

The kind of machinery that your second to last paragraph describes is more or less standard-issue for functional programs. I doubt that your average functional programmer would have any difficulty understanding it. — Robert Harvey, Mar 23 '19 at 18:00

scrwtp · Answer 1 · 2019-03-27T23:12:33.190

Pickler combinators are what you could call the functional approach to serialization. Here's a link to a seminal paper by Andrew J. Kennedy that describes them.

There's a number of implementations of this idea in different functional languages, perhaps most relevant here would be FsPickler for F#. I'm not sure if it's immediately useful to you, since you are working based on an existing predefined binary format rather than just looking for a way to serialize .NET types into a binary format.

FsPickler has a way of extending it with a custom format through IPickleFormatWriter and IPickleFormatReader interfaces. If it's not enough to cover your scenario, you might still find applicable ideas in the way it exposes its API, e.g. in the way it combines serializer and deserializer in a single reversible representation.

Having said that, as a fellow F# programmer I don't find your serializer syntax jarring by itself. My only real concern would be around ensuring that serializer and parser can actually roundtrip - the syntax doesn't make them visually symmetrical.

Functional Approaches to Serializing Objects to Variable-length Byte Array Output

1 Answers1