Congratulations! You have reached a significant milestone in your programming career — implementing an advanced string-matching algorithm from scratch! Here is a quick recap of what we accomplished together throughout this lesson:
- Quick recap of naive pattern-matching and its limits - using a brute-force sequential approach to check for a pattern match against all substrings that performed reasonably well until we ran into a rather large input.
- Looked for ways to speed up pattern-matching by revisiting hashing — assigning a somewhat random and unique value to a pattern to find it in a given text quickly.
- Explored various ideas involving hashing, starting off with assigning ASCII values to characters, which quickly led to problems when different strings collided to the same hash value.
- Improved on the previous idea so that characters are assigned prime numbers instead of ASCII values which gave a unique prime factorization that led to a unique hash value for most strings (except palindromes).
- Implemented the key idea of rolling hashing where we recovered the hash value of the next substring from that of the previous substring by dividing out the hash of the old character and multiplying in the hash of the new character, thereby performing way fewer calculations in the process!
- Implemented the idea of a Polynomial Hash which takes the best of both addition and multiplication (including higher powers of a number) to create a much better hash function for pattern-matching.
- Combined the ideas of rolling hashing and polynomial hashing to implement the Polynomial Rolling Hash, which allowed us to quickly calculate the hash values of subsequent substrings from that of previous substrings, thereby allowing faster pattern-matching.
- Brought all these ideas together to implement the Rabin-Karp Algorithm and tested it on large hidden inputs where naive pattern-matching would have simply timed out.
- Counted occurrences of a large pattern in an even larger text significantly faster using the Rabin-Karp algorithm on the same input that Naive Pattern-Matching wasn’t able to handle so well, and also explored the limits of Rabin-Karp.
We hope you learned a lot from this lesson and have deepened your understanding of an advanced string algorithm while also improving your ability to think algorithmically. This isn’t the end of the road though! Next up is content the Knuth-Morris Pratt algorithm, an even more advanced pattern-matching algorithm that can find a pattern even faster!