Why are data records in intel hex files often limited to 16 bytes, even for long contiguous blocks?

Question

I'm not sure if this is true in general, but all intel hex files I have seen (from Atmel Studio, STM32CubeIDE and MPLAB) use data records with a length of 16 bytes. Even when the addresses written to are completely sequential there are just a lot of 16 byte records. This results in a lot of redundancy, because the preamble with the memory address is transmitted again and again.

Why is it not common to use longer data blocks? Even if we wanted to stick to power-of-two data block lengths we could easily fit 128 Bytes in one record with the one-byte length-field and reduce the amount of totally transmitted data by some ~30%.

Sometimes I see even shorter blocks, still without skipping any memory addresses. In the following example I added "-" to make it easier to distinguish the seperate fields (:LL-AAAA-TT-...) from each other.

:10-0160-00-3D0B00083D0B00083D0B000800000000-9F
:10-0170-00-3D0B00083D0B00083D0B000800000000-8F
:0C-0180-00-3D0B00083D0B00083D0B0008-83
:10-018C-00-10B5054C237833B9044B13B10448AFF3-C5
:10-019C-00-00800123237010BD0C00002000000000-23

There are a lot of data records with 16 data bytes, always increasing the address by 16. And then there is one record with only 13 bytes, with the next address increasing only by 13.

Can somebody shed some light on this behaviour?
Because both gcc for STM and AVR controllers as well as the MPLAB XC Compiler for dsPIC show this same behaviour I don't think that it is someting architecture or compiler specific.

Is this just "evolved historically" to keep compatible to older parsers? Or does the checksum become too inefficent for longer records? And for the even shorter records I can only imagine it has something to do with the linker? Are the shorter data records the end of some logical block and the linker is for some reason not filling up the record with data from the next block? Or is there some other reason?

Consider that the format goes back to SSI/MSI TTL era, and there were memories as small as 16x4 bits (74189), you didn't want to make programmers any bigger or more expensive than necessary. — , Feb 05 '21 at 14:07

le_top · Answer 1 · 2021-02-05T21:51:22.833

One could say "for historical reasons", but also that it is good practice for these reasons:

Each line has a checksum. Applying the checksum to a smaller number of bytes makes it less prone to errors annihilating each other in the checksum;
There are bootloaders that accept hex files directly. In those cases the bootloader checks the input line for validity before storing it in memory.
Long lines would require that a microcontroller has a big input buffer, which is not really compatible.
When opening it in a text file, it still fits on a screen.
16 bytes are easy to count in hex, and it is a power of 2.
32 bytes would require 64 characters for the content to which you need to add the header and the checksum. That wold be close to 80 characters, which is the screen/editor width, so not recommended for viewing.

Related to the historical reasons we can also note that it easier for engineers and technicians to always find the same familiar format that they know how to decode visually. With more data on a single line you can not just scroll through the data, your eyes would need to go from left to right all the time which is less efficient (this is also a reason for which newspapers have small columns).

There are drawbacks. For example, it takes up more space on disk. And those that are not happy with the drawbacks create and/or use other formats.

score 3 · Accepted Answer · answered Feb 05 '21 at 12:03

The Intel Hexadecimal Object File Format Specification rev. A from 1988 acknowledges that representing binary data as ASCII makes it possible to store binary files into non-binary medium such as punch cards.

The maximum data record size is 255 bytes, but even a 32-byte data record would not fit into a 70-column puch card or data terminal, so next smaller useful size for a record is 16-bytes, could have been quite common for compatibility reasons and easy readability, as address increments 0x10 bytes per record.

So the format is not designed for speed or least overhead to begin with, but maximum portability and compatibility between equipment for storage and transmission.

As to why there are shorter blocks is because there is a change of block in the program, like end of code area and start of data area. It is because these linker sections are anyway processed in one go, and it is up to the user to determine which sections are to be included in a hex file. So the linked program, like an .elf file, is not first converted to raw binary, and the raw binary then converted to .hex file, the .elf file sections are processed one at a time. If you want, make a raw binary first from the .elf binary, and the raw binary can be converted to .hex file and it would not contain any sign of different sections, just directly the same continuous data as the raw binary.

Interesting, Intel Hex certainly goes back much further than 1988. — , Feb 05 '21 at 14:01
@BrianDrummond, if it didn't they probably wouldn't have cared about how to store it on punch cards. — The Photon, Feb 05 '21 at 17:07

score 1 · Answer 3 · answered Feb 05 '21 at 10:40

It's hard to say exactly but in the early days, and this file format goes WAY BACK to the beginning of "computer time", it was chosen so that lines fit on a punch card, or a TTY (i.e. Teletype) output line. It's also designed to be "digestible" by small memory devices (think 1024 BYTES) of RAM, which were common at the time.

Today it's more likely there is just an internal buffer that is filled up and then written out and this turns out to get what you get. In come cases there may be a configurable parameter to change this but in this specific case I'm not aware of it.

Why are data records in intel hex files often limited to 16 bytes, even for long contiguous blocks?

3 Answers3