Which error mechanism would serve best for large data?

Question

I have a microcontroller (mcu) communicating with Raspberry Pi (rpi) using SPI communication. The mcu is a slave device and rpi is the master device. I am transmitting ~800 kB of data from mcu to rpi. I want to use an error detection mechanism to be sure that I don't lose any data when sending over to rpi.

My requirement is that I want the data transfer to be fast (<10 seconds) even after adding additional error checking bits. Currently, I am storing the data on rpi in a file, so the file handling + spi data transfer time is about 6 seconds. I am using 4MBits/sec baud rate (that's the max baud rate I can go with).

My question is that what kind of error detection function I can use for such large data? I don't want to calculate the error checking bits for every 32 bits or lesser, instead, I want to calculate the detection bits for a large data at a single time for example 1 KB.

I checked CRC technique but could not find enough info if I want to use it for a large amount of data. Is it possible to generate a 32 bit/16 bit/8 bit CRC for large amount of data? Further, what could be other error detection techniques I can use?

Thanks.

Gererally: the less data used for calculation, the better the CRC. For these large amounts of data only CRC32 is viable and you should break it up in smaller packages. If possible, you could also create a much more reliable error checks in hardware (XOR gates or daisy-chained shift registers feeding data back to the MCU etc). — Lundin, Feb 17 '22 at 07:30
I agree with @Kartman. Either SHA256 or MD5 hash. (I have done much prior consulting on commercial products with SHA256, so I'd probably start there because of prior coding experience.) But a hash won't "error-correct." So if you need error-correction then you need to say so because that takes you in a different direction. — jonk, Feb 17 '22 at 07:45
Got it, SHA256 and MD5 looks like a good suggestion. Do they have any limitation of what needs to be the data size for which they generate the hash value? — Gagan Batra, Feb 17 '22 at 07:54
@Lundin, Thanks for the suggestion regarding error checks in hardware but I would want to consider an additional hardware if the software solutions don't work out. Further, you mentioned about CRC32, does that mean that CRC32 can only be calculated for 32 bit data and not for like 64 bits or 128 bits data? — Gagan Batra, Feb 17 '22 at 07:57
[How do I ask a good question?](https://electronics.stackexchange.com/help/how-to-ask) — Andy aka, Feb 17 '22 at 09:12
What is the question really? Yes, of course it is possible. You can transfer data in blocks and if you want each block of for example 1 kilobyte can have a CRC which is sure to be small enough block for standard CRC32 to be effective. — Justme, Feb 17 '22 at 11:05
Look this Q&A thread: https://electronics.stackexchange.com/questions/597248/ways-to-send-data-with-crc-validation/597259#597259. Don't see any reason to repeat everything here. — SteveSh, Feb 17 '22 at 11:56

TonyM · Accepted Answer · 2022-02-17T18:10:10.367

Let's take a step back to why you'd use error checking.

If you intend to pass data between an MCU and a Raspberry Pi and want to error-check it, you must have some receiver response actions planned for when an error is found. Your current question doesn't say what.

The options for receiver response actions are some form of:

(1) Stop when error found. Another send can be tried later e.g. at user request.
(2) Request a resend of some/all data (a 'retry') during the transfer, extending the communications time.
(3) Discard the received data and do nothing.

Option (2) schemes are very common. That's what the TCP part of TCP/IP uses, as seen in communications over the Internet. Option (1) is seen in Windows disk file copies, after some option (1)s failed. Option (3) is seen in audio/video streaming.

Examining option (2) in more detail...

Determining the block size for error-checking on is dependent on:

(a) The required reliability of the error checking method.
(b) The acceptable retry time.
(c) The acceptable overhead for the appended checksum on the data.

The checksum algorithm will have a specified Hamming distance that gives you its reliability. (In this context, the Hamming distance is the number of bit errors within a data block that can be detected.) Use it on too long a data block and the receiver could also get the same checksum on corrupted data (with multiple bit errors) as the transmitter got with clean data.

Also, the longer the block the checksum is applied to, the longer the retransmit time of that data for a retry.

Going the other way, appending checksums to too-short blocks of data will pointlessly slow communications. Inserting a CRC16 for every 64 data bits creates a 25% overhead, for no protection benefit over every 256 bits or more.

To calculate this...

For each algorithm you're using, you have to carry out an assessment. Here, I would start with CRC-32. This defines algorithms that calculate across single bits of data to produce a 32-bit 'digest' - a calculation result. There are different CRC-32 algorithms with different performance and you can research these on the internet. Software routines for it are plentiful and relatively fast, especially when table-indexed forms are used to operate on multiple bits at once.

A good starting point are Phillip Koopman's publicly-available 'Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks' and '32-Bit Cyclic Redundancy Codes for Internet Applications'. You can search for these, read and understand them then refer to this summary webpage, which also has links to algorithm software etc.

From the latter, for example, you'll see that CRC-32 with polynomial 0x9960034c can reliably find up to 6 bit errors in 32,738 bits of data (30 bits short of 4 KB). If you want to find more and more bit errors in data, the data length must be progressively shorter, as is shown there.

It's a trade-off. So you need to plan how your system will operate and the block size and any retry scheme you can use. Applying a CRC-32 across your entire 800,000-odd bytes (around 6,250 KB) would give weak protection. Applying it to 4 KB blocks would be valuable.

There have been plenty of standard schemes for handshaking/retry. Your own scheme could be as simple as:

Tx a block of: [blockNum] & [blocksInTotal] & [data] & [CRC32]
Rx checks CRC32, responds with byte code: sendNext, sendAgain or abort

Which error mechanism would serve best for large data?

1 Answers1