What is a faster alternative to a CRC?

Question

I'm doing some data transmission from a dsPIC to a PC and I'm doing an 8-bit CRC to every block of 512 bytes to make sure there are no errors. With my CRC code enabled I get about 33KB/s, without it I get 67KB/s.

What are some alternative error detection algorithms to check out that would be faster?

How is the CRC itself implemented? Bitwise? Then switch to a table-based method. Bytewise? Consider the space, complexity, and time tradeoff involved in increasing the table size to, say, 16 bits (which would operate on two bytes at once, but would take 64KB of table storage). — Aidan Cully, Jul 26 '11 at 21:56
I only have 16KB on RAM and 128KB of ROM, so a 64KB table isn't an option. — FigBug, Jul 26 '11 at 21:58
So you're using a 256-byte table? or bitwise CRC? If you're doing bitwise, bytewise (with a 256-byte table) would be 8 times faster. — Aidan Cully, Jul 26 '11 at 22:01
67kb/s to 33kb/s? I'm not sure what your other processing involves, but that sounds like quite a bit of overhead, even for a PIC. Maybe there are some other problems impeding your performance? — Rei Miyasaka, Jul 26 '11 at 23:13
depending on how you have implemented the bitwise CRC calc, it may be possible to make it go faster. I have a (fairly) fast implementation I've used a number of times. If you are interested I can dig it out. — quickly_now, Jul 27 '11 at 01:08
@FigBug: If you post your implementation that might make it easier for us to help out. (Though that might be better for codereview.stackexchange.com) — Billy ONeal, Jul 27 '11 at 01:21

Mark Booth · Accepted Answer · 2020-09-17T09:24:44.420

46

While there may be faster options than CRC, such as Fletcher, if you use them then you are likely to end up sacrificing some degree of error detection capability. Depending on what your performance and error detection requirements are, an alternative may be to use CRC code optimised to your application instead.

For a comparison of CRC with other options, see the excellent answer by Martin Thompson.

One option to help with this is pycrc which is a tool (written in python¹) which can generate C source code for dozens of combinations of crc model and algorithm. This allows you to optimise speed and size for your own application by selecting and benchmarking different combinations. ^{1: Requires Python 2.6 or later.}

It supports the crc-8 model, but also supports crc-5, crc-16 and crc-32 amongst others. As for algorithms, it supports bit-by-bit, bit-by-bit-fast and table-driven.

For example (downloading the archive):

$ wget --quiet http://sourceforge.net/projects/pycrc/files/pycrc/pycrc-0.8/pycrc-0.8.tar.gz/download
$ tar -xf pycrc-0.8.tar.gz
$ cd pycrc-0.8
$ ./pycrc.py --model=crc-8 --algorithm=bit-by-bit      --generate c -o crc8-byb.c
$ ./pycrc.py --model=crc-8 --algorithm=bit-by-bit-fast --generate c -o crc8-bybf.c
$ ./pycrc.py --model=crc-8 --algorithm=table-driven    --generate c -o crc8-table.c
$ ./pycrc.py --model=crc-16 --algorithm=table-driven   --generate c -o crc16-table.c
$ wc *.c
   72   256  1790 crc8-byb.c
   54   190  1392 crc8-bybf.c
   66   433  2966 crc8-table.c
  101   515  4094 crc16-table.c
  293  1394 10242 total

You can even do funky things like specify using dual nibble lookups (with a 16 byte look-up table) rather than single byte look-up, with 256 byte look-up table.

For example (cloning the git repository):

$ git clone http://github.com/tpircher/pycrc.git
$ cd pycrc
$ git branch
* master
$ git describe
v0.8-3-g7a041cd
$ ./pycrc.py --model=crc-8 --algorithm=table-driven --table-idx-width=4 --generate c -o crc8-table4.c
$ wc crc8-table4.c
  53  211 1562 crc8-table4.c

Given your memory and speed constraints, this option may well be the best compromise between speed and code size. The only way to be sure would be to benchmark it though.

The pycrc git repository is on github, as is its issue tracker, but it can also be downloaded from sourceforge.

edited Sep 17 '20 at 09:24

answered Jul 27 '11 at 14:59

Mark Booth

14,214
3
40
79

I don't believe most people writing things for the PIC are using C, but this might work if so. – Billy ONeal Jul 27 '11 at 16:47
5

@Billy - Really? I don't think I've come across anyone developing for PIC commercially who wasn't using C. *I* certainly don't have the patience for assembler these days and well structured C can end up pretty compact. – Mark Booth Jul 27 '11 at 17:12
I'm using a dsPIC and I am using C. – FigBug Jul 27 '11 at 19:31
@FigBug - Thanks, glad you like my answer. If you run some benchmark tests, feel free to edit my answer with your results. I'd love to know just how much difference each of the algorithms makes in terms of your application throughput and memory footprint. – Mark Booth Jul 28 '11 at 00:09
1

Another vote for pyCrc here. using it in various projects with different constraints and it is just great. – Vicky Jan 24 '13 at 11:32
Unless I am mistaken, CRC-x can only guarantee error detection for as many bits as the hash length (e.g. 32 bits in CRC-32). And I believe this is limited to burst errors. If there are more errors than that, it is only as good as a typical hash (and possibly _slightly_ worse because CRC-32 isn't as well distributed as modern 32-bit hashes). But CRC-32 has decent collision statistics. – bryc Mar 17 '19 at 23:41
I'm not sure how your comment helps me improve my answer @bryc, but I've added a link to [Martin Thompson's answer](https://softwareengineering.stackexchange.com/a/184668/22493) which does address this. – Mark Booth Mar 18 '19 at 11:14
I was commenting on your mention that CRC has better error-detection capabilities than alternatives. I think you should elaborate on that. That is only true when there are very few bitwise errors. It is based on hamming distance. See Koopman's work on choosing the right polynomial: https://users.ece.cmu.edu/~koopman/crc/. I just think its important to know that CRC is not magic - it works like your typical credit card check digit. But when error rates are beyond HD, CRC is no more effective than any decent hash, some of which can be quite fast. – bryc Mar 18 '19 at 16:03
I still don't see how your comment helps me improve my answer. I've already added a link to [Martin Thompson's answer](https://softwareengineering.stackexchange.com/a/184668/22493) which addresses this better than I can. If you really have more to say than this, can I suggest that you write your own answer. Repeated comments like this are bordering on harassment. – Mark Booth Mar 20 '19 at 15:02

score 16 · Answer 2 · edited Mar 31 '21 at 14:54

The Effectiveness of Checksums for Embedded Networks by Theresa C. Maxino is a really good paper comparing the performance of various checksums and CRCs in an embedded context.

Some quotes from the conclusions (based on studies of undetected error probabilities):

When burst errors dominate

XOR, two’s complement addition, and CRC checksums provide better error detection performance than one’s complement addition, Fletcher, and Adler checksums.

In other applications

a “good” CRC polynomial, whenever possible, should be used for error detection purposes

If computation cost is very constrained

(as in your case), use (in order of effectiveness):

Fletcher's checksum
one's complement addition
two's complement addition

Other quotes:

The Fletcher checksum has lower computational cost than the Adler checksum and, contrary to popular belief, is also more effective in most situations.

and

There is generally no reason to continue the common practice of using an XOR checksum in new designs, because it has the same software computational cost as an addition-based checksum but is only about half as effective at detecting errors.

As a bonus, a Fletcher checksum is very easy to implement. – RubberDuck Sep 03 '16 at 10:51 — RubberDuck, Sep 03 '16 at 10:51

score 11 · Answer 3 · answered Jul 26 '11 at 21:53

11

Simple one bit parity (basically XORing the data over itself over and over) is about as fast as one can get. You do lose a lot of the error checking of a CRC though.

In pseudocode:

char checksum = 0;
for each (char c in buffer)
{
    checksum ^= c;
    SendToPC(c);
}
SendToPc(checksum);

answered Jul 26 '11 at 21:53

Billy ONeal

8,073
6
43
57

1

I looked into this a while ago. I *believe* that summing instead of xor actually works a little better. (normally, sum everything, and then transmit the 2's complement of the sum as the checksum. On the receiver, sum everything including that received checksum. The result is 0 if its all good, and non-0 otherwise.) – quickly_now Jul 27 '11 at 01:10
1

@quickly: I don't think there's a significant difference between those two -- neither method provides all that good an assurance that things have not been corrupted. If add is faster on the target architecture by all means use that instead. – Billy ONeal Jul 27 '11 at 01:20
7

I remembered: The major difference between ADD and XOR is that there is less detectability of multiple bit errors. In the case of a stream of bytes, errors in the same bit position are cancelled out using XOR. When using ADD, the propagation of bits up through a checksum byte means that this case is more detectable. (However, multiple bit errors in different bits spread through the stream of bytes is likely to be less detectable - depending on the circumstances at the time). Any checksum arrangement like this is TERRIBLE for multiple-bit errors, so its a fairly minor argument. – quickly_now Jul 27 '11 at 06:52
XOR is a lot less helpful than CRC. – Jul 27 '11 at 16:27
3

@Thorbjørn: I believe I acknowledged that in my answer. :) – Billy ONeal Jul 27 '11 at 16:39
"There is generally no reason to continue the common practice of using an XOR checksum in new designs, because it has the same software computational cost as an addition-based checksum but is only about half as effective at detecting errors." http://www.ece.cmu.edu/~koopman/thesis/maxino_ms.pdf – Martin Thompson Jan 24 '13 at 11:43
Addition is technically more effective at scrambling or distributing bits, because XOR just flips bits, while addition is a bit more involved and utilizes a carry flag. They are quite close and one may be preferred in different situations, but if all you have is 8-bit registers and addition isn't significantly slower, you might as well use addition. That being said, XOR isn't a position-dependent hash unless you incorporate bit shifts. – bryc Mar 17 '19 at 23:45

score 5 · Answer 4 · answered Jul 26 '11 at 21:51

5

I'm not aware of anything that's as effective at error detection as a CRC and faster - if there were, people would be using it instead.

You could try a simple checksum, but that's far less likely to detect errors.

answered Jul 26 '11 at 21:51

Bob Murphy

16,028
3
51
77

2

I'm willing to give up a effectiveness for speed. – FigBug Jul 26 '11 at 21:59

score 5 · Answer 5 · edited Jun 16 '20 at 10:01

The Adler checksum should be sufficient for checking for transmission distortions. It's used by the Zlib compression library, and was adopted by the Java 3D Mobile Graphics Standard to provide a fast but effective data integrity check.

From the wikipedia page:

An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the string plus one, and B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo 65521 (the largest prime number smaller than 2^16 or 65536). The bytes are stored in network order (big endian), B occupying the two most significant bytes.

The function may be expressed as

 A = 1 + D1 + D2 + ... + Dn (mod 65521)
 B = (1 + D1) + (1 + D1 + D2) + ... + (1 + D1 + D2 + ... + Dn) (mod 65521)
   = n×D1 + (n-1)×D2 + (n-2)×D3 + ... + Dn + n (mod 65521)

 Adler-32(D) = B × 65536 + A

where D is the string of bytes for which the checksum is to be calculated, and n is the length of D.

Do note that Adler32 is almost useless for short runs of data. Up to about 180 bytes, it produces numerous collisions. — greyfade, Jul 27 '11 at 16:13
+1 -- a reasonable middle ground between a CRC and simple bit parity. — Billy ONeal, Jul 27 '11 at 16:46
@greyfade - FigBug mentioned using 512 byte blocks, so this shouldn't be a problem for the OP. Good to have it noted for people with other requirements though. — Mark Booth, Jul 27 '11 at 17:10

score 3 · Answer 6 · answered Jul 27 '11 at 00:45

Well the checksum logic itself is good and people can help with faster algorithms.

If you want to improve the speed of your component, you might need to look at changing your overall technique to seperate out the transfer component from the validation component.

If you have these as two independant items (on different threads) then you can get your full speed of transfer and only resend failed packets.

Algorithm would look something like:

Server breaks up into known packet sizes (say 1K chunks). Puts them in a queue of "to be sent".
Each packet is sent over with a 16 or 32 bit ID AND its checksum.
The client receives each packet and puts it in a queue to be processed.
On a seperate thread the client takes of a packet at a time, does the validation.
- On success it adds it to the final collection of packetss (in ID order) being
- On a fail it reports the failed ID back to the server, which queues up that packet to be resent.
Once you have recieved and validated the packets and you have the IDs in the correct sequnce (starting at 1) you can start writing these to disk (or do what ever is required).

This will let you trasmit at the highest possible speed and if you play with your packet size you can work out the optimium fail rate VS validate/resend rate.

score 2 · Answer 7 · answered Jul 26 '11 at 23:17

Checksums are traditional

(reduce #'+ stream)

XOR as given above would work as well

(reduce #'XOR stream)

A slightly more elaborate (slower) scheme is the standard parity check for serial connections.

At this level, you are trading correctness for speed. These will occasionally fail.

At the next most sophisticated level, you can use some crc/hash type stuff.

Another design would be to increase the size of the block used for the stream.

You should have an estimation of actual error rate to tune your algorithm selection and parameters for block size.

score 0 · Answer 8 · answered Mar 31 '21 at 15:58

0

Table-driven CRC algorithms should be extremely fast because no actual computation is involved. I would also suggest using at least a 16- bit CRC.

answered Mar 31 '21 at 15:58

Mike Robinson

1,765
4
10

Mecki · Answer 9 · 2021-05-05T14:50:35.887

Many people think Adler32 is a very fast checksum because it is a simple one but that is only partly true. Adler32 is certainly faster than CRC32 but some hash functons are even faster, like Murmur3F or FNVJ32/FHVJ64. See this comparison chart. And hash functions can also be used for checksumming.

And not only are certain checksums faster, at the same time they produce way better results than Adler32, even better than CRC. See this answer to a similar question on Software Engineering. The more random the outcome, the less likely two different blocks of data will result in the same checksum and the better the bits are evenly distributed in the final checksum result (this is important if you plan to truncate the result).

Meanwhile even much faster alternatives exist. Please checkout this answer on Stack Overflow. The fastest hash probably available that is also a very good one, is XXH3 but it can only be implemented that fast on CPUs with SSE2 support. The crazy thing is: It is faster than a plain memory copy. Of course, only if the data is already in the CPU cache (everything else would be magic).

So plenty of better alternatives exist, the question is only, which of these can be implemented on the dsPIC side and will it still be faster when implemented there? As many alternatives are not fast in general, they are fast because they are optimized for the way how modern desktop/server CPUs are processing data.

I don't know dsPICs, and this question is old, but many modern MCUs have dedicated hardware for checksum calculations. — jaskij, May 06 '21 at 08:19

What is a faster alternative to a CRC?

9 Answers9

When burst errors dominate

In other applications

If computation cost is very constrained