Is it possible (and practical) to search a string for arbitrary-length repeating patterns?

Question

I've recently developed a huge interest in cryptography, and I'm exploring some of the weaknesses of ECB-mode block ciphers. A common attack scenario involves encrypted cookies, whose fields can be represented as (relatively) short hex strings.

Up until now, I've relied on my eyes to pick out repeating blocks, but this is rather tedious. I'm wondering what kind of algorithms (if any) could help me automate my search for repeating patterns within a string.

Can anybody point me in the right direction?

@jk. I'm vaguely familiar with Fourier transforms from signal processing, but could you explain how they would be applied in the present case? I'm afraid my understanding of them is very superficial. — Louis Thibault, Nov 05 '12 at 09:47
In signal processing terms, what you want to perform is called an autocorrelation. It allows you to determine if there are similarity in a signal with itself, at different times. http://en.wikipedia.org/wiki/Autocorrelation for more information. :) — Bjarke Freund-Hansen, Nov 05 '12 at 09:54
I guess an autocorrelation + a fourier transform could tell you if there are repeating patterns of similarity, and at which frequency. — Bjarke Freund-Hansen, Nov 05 '12 at 09:55
When you say "relatively", how short do you mean? Additionally, what is the appropriate block size? (This piques my curiosity, though I've dabbled most with 1900 and earlier cryptography) — , Nov 05 '12 at 15:30
@MichaelT, "relatively" means cookie-field sized. A few kb at most, and I don't mind spending tens of minutes crunching the numbers. Block size is 128 bits in most cases. To be fair, I haven't been doing any real-world stuff -- just toying around on my own system. — Louis Thibault, Nov 05 '12 at 19:58

score 5 · Answer 1 · answered Nov 06 '12 at 16:22

Donald Knuth wrote a lot about string matching pattern theory. A lot of algorithms are very efficient in time and hidden constant. You can found some good algorithms here:

http://delab.csd.auth.gr/~dimitris/courses/cpp_fall05/books/SIAM_JNL_Comp_77_KMP_string_matching.pdf

Jerry Coffin · Accepted Answer · 2012-11-06T23:15:59.420

There are two widely-used algorithms for this task: the Index of Coincidence (IC) and Kasinski's method. Googling for either should yield a fair number of hits. Kasiski's method is is older -- at least IMO, the IC is a more effective, especially if you're working with a limited amount of ciphertext. Kasiski's method can work well, but generally requires more ciphertext before it becomes effective.

I should probably add: both of these are really intended for attacking polyalphabetic substitution ciphers. The first step in each is attempting to find the length of the key being used. In your case, working with block ciphers in ECB mode, you probably already know the block size (which is probably the closest equivalent of the key length in a polyalphabetic substitution cipher). What you're left with is largely a matter of just walking through the blocks, and counting how many times a block occurs to see if you can find repetitions. That being the case, it's probably about as easy to skip past most of the algorithm, and just use something like a hash table to count the number of times different block values occur.

Most normal string search algorithms (e.g., Knuth-Morris-Pratt, Boyer-Moore-Horspool) are of little use in this pursuit.

Is it possible (and practical) to search a string for arbitrary-length repeating patterns?

2 Answers2