I need to search a long string of binary string data (couple of megabytes worth of binary digits) for patterns of binary digits. There are about 100 000 different patterns to search for. Each is a maximum of 10 bits/character length patterns of 1's and 0's. I then need to display the most common pattern of all of these.
The binary data is fed in real time to some sort of text file or database. Its essentially a large block of binary digits that I need to search for a couple thousand different patterns in the fastest most effective way possible. Each new digit that comes in does so in about like 10 seconds time one after the other.
So for example, the following string might come in:
10101110000110001011011111000101011010111101110100000011011101010101010111011100101010101011000101110011011111110001010010000110101011100000110011100101001010011001011010110101101010001010101010100010101010010100101001010101010111010101010010100101010011101010101010101001001010110101
and I need to find whats the most common one out of these 100 000 patterns, such as 0001100111
, 0110011011
, 1100011
and so on.
What is the best way to do this on real time data being fed to a text file would be? Does not really need to be exactly real time however I would rather it be as real time as possible. Id also rather it be a straight console affair so as to make it as latency free as possible however I can adapt as it doesn't really need to be terminal either.
I've got about 1 year of programming experience in Bash. Would a simple bash script be fast enough for 100 000 of such patterns? But all those thousands of patterns makes me fear that just a simple bash script solution might be a little too slow and outdated. Or can SQL be really faster than just a straight bash script? I've heard about binary matching algorithms but I dont really know what those are and it seems a little out of my league at the moment, however I'm willing to get to the bottom if it if its truly the most effective way. What's the best way to do this ?