3

I'm having trouble finding an algorithm that matches (or fails to match) a substring of a string to a suffix from a list of suffixes. The hits I'm finding are for suffix trees and suffix arrays, but they are not quite what I am looking for.

My particular problem is a fully qualified hostname - possibly malformed - and trying to match the suffix provided by the banned suffix list published by Mozilla. As a concrete example, given w.x.y.z.example.com, I need to identify it as hostname w, domain example.com, and know that example.com is valid domain.

As a degenerative example, I may need to take example.com and know that its a domain with no host, and that matching it to *.com would be wrong. And the negative cases only get worse when you factor in suffixes like pvt.k12.ma.us, nasushiobara.tochigi.jp, 公司.cn and السعودیة.

The eventual application will be DNS hostname matching in X509 certificates. Hence the reason there may be malformed input from miscreants.

I think this algorithm should run in m log n. log n for each lookup into the [sorted] list of suffixes, and m for each DNS label.

I think it may be a case of not seeing the forest through the trees. Does anyone know of an efficient algorithm for matching substrings from a list of suffixes?

0 Answers0