In this lesson, we implemented the Knuth-Morris-Pratt Algorithm in Python to count the number of occurrences of a pattern in a given text. Even though we started out with a couple of brute-force approaches that resulted in large running times, we were able to employ some clever tricks and observations that allowed us to condense the underlying rudimentary details into something quite short and elegant! Here is a quick recap of what we did in this lesson:
- Reviewed the simplest way to find a pattern in a text using a brute-force algorithm.
- Explored ideas to improve over the brute-force algorithm by noticing that we can use information about the pattern — specifically, the prefixes and suffixes of the pattern — to skip characters in the text where the pattern cannot match.
- Accomplished this goal by computing the prefix function by finding up to each index the length of the longest proper suffix that is also a proper prefix in the pattern.
- Identified some clever manipulations to reduce the running time of computing the prefix function — all the way from O(n^3) down to O(n).
- Used the pre-computed values in the prefix function to perform faster pattern-matching via the Knuth-Morris-Pratt algorithm and observed its faster performance in comparison to Naive Pattern-Matching and the Rabin-Karp Algorithm.
We hope you learned a lot in this lesson and also enjoyed the process of algorithmic thinking! It is no small feat to be able to grasp the inner workings of complicated algorithms like Knuth-Morris-Pratt. Some of the greatest minds in history came up with these algorithms after decades of work!
If something feels like it hasn’t fully clicked yet, you are not alone! Feel free to come back to this lesson and go through it as many times as you need before things start making sense. In the meantime, you can head over to work on Projects to test and apply your understanding.