Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



It solves a slightly different problem, searching for one substring instead of searching for one of many, so it would not be fair to compare the two.


Yes, and if you need the best of both there is the Commentz-Walter algorithm from 1979:

> Like the Aho–Corasick string matching algorithm, it can search for multiple patterns at once. It combines ideas from Aho–Corasick with the fast matching of the Boyer–Moore string search algorithm. For a text of length n and maximum pattern length of m, its worst-case running time is O(m·n), though the average case is often much better.

> GNU grep implements a string matching algorithm very similar to Commentz-Walter.

https://en.wikipedia.org/wiki/Commentz-Walter_algorithm


Commentz-Walter doesn't work that well in practice. As you add more patterns, the likelihood that there is any meaningful byte skipping goes way down. So it might be useful for a small number of patterns, but at that point, there are vectorized algorithms that will do (a lot) better. (Like Teddy from Hyperscan.)


Thank you very much for this reply. I was considering implementing it for a program I’m writing, but in light of that it’s clearly not worth it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: