Let's take a step back to why you'd use error checking.
If you intend to pass data between an MCU and a Raspberry Pi and want to error-check it, you must have some receiver response actions planned for when an error is found. Your current question doesn't say what.
The options for receiver response actions are some form of:
- (1) Stop when error found. Another send can be tried later e.g. at
user request.
- (2) Request a resend of some/all data (a 'retry') during the transfer, extending the communications time.
- (3) Discard the received data and do nothing.
Option (2) schemes are very common. That's what the TCP part of TCP/IP uses, as seen in communications over the Internet. Option (1) is seen in Windows disk file copies, after some option (1)s failed. Option (3) is seen in audio/video streaming.
Examining option (2) in more detail...
Determining the block size for error-checking on is dependent on:
- (a) The required reliability of the error checking method.
- (b) The acceptable retry time.
- (c) The acceptable overhead for the appended checksum on the data.
The checksum algorithm will have a specified Hamming distance that gives you its reliability. (In this context, the Hamming distance is the number of bit errors within a data block that can be detected.) Use it on too long a data block and the receiver could also get the same checksum on corrupted data (with multiple bit errors) as the transmitter got with clean data.
Also, the longer the block the checksum is applied to, the longer the retransmit time of that data for a retry.
Going the other way, appending checksums to too-short blocks of data will pointlessly slow communications. Inserting a CRC16 for every 64 data bits creates a 25% overhead, for no protection benefit over every 256 bits or more.
To calculate this...
For each algorithm you're using, you have to carry out an assessment. Here, I would start with CRC-32. This defines algorithms that calculate across single bits of data to produce a 32-bit 'digest' - a calculation result. There are different CRC-32 algorithms with different performance and you can research these on the internet. Software routines for it are plentiful and relatively fast, especially when table-indexed forms are used to operate on multiple bits at once.
A good starting point are Phillip Koopman's publicly-available 'Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks' and '32-Bit Cyclic Redundancy Codes for Internet Applications'. You can search for these, read and understand them then refer to this summary webpage, which also has links to algorithm software etc.
From the latter, for example, you'll see that CRC-32 with polynomial 0x9960034c can reliably find up to 6 bit errors in 32,738 bits of data (30 bits short of 4 KB). If you want to find more and more bit errors in data, the data length must be progressively shorter, as is shown there.
It's a trade-off. So you need to plan how your system will operate and the block size and any retry scheme you can use. Applying a CRC-32 across your entire 800,000-odd bytes (around 6,250 KB) would give weak protection. Applying it to 4 KB blocks would be valuable.
There have been plenty of standard schemes for handshaking/retry. Your own scheme could be as simple as:
- Tx a block of:
[blockNum] & [blocksInTotal] & [data] & [CRC32]
- Rx checks CRC32, responds with byte code: sendNext, sendAgain or abort