Learn

Let’s bring everything together to create the Knuth-Morris-Pratt algorithm!

The pseudocode for the KMP algorithm is extremely similar to that of the computation of the prefix function. In both cases, we are matching strings against each other using the same idea — the prefix function is a result of matching the pattern against itself, whereas the KMP algorithm uses the values of the prefix function to match the pattern against the text.

During the computation of the prefix function, we were interested in finding out whether the next character forms a new valid suffix.

In the KMP algorithm, we are now interested in finding out the same for each character in the text. Whenever possible, we align a valid prefix of the pattern with its corresponding valid suffix in the text. This allows us to skip over characters we know cannot produce a match.

The pseudocode breakdown looks something like this:

  • We start the algorithm by matching the pattern against the text character-by-character.
  • If there is a mismatch, we look for a shorter valid prefix that can potentially align with an existing valid suffix among the characters that have already matched (similar to how we computed the prefix function).
  • However, unlike last time, we don’t have to look for the length of a valid prefix all over again — we have already computed it and stored the corresponding length in the prefix function!
  • Finally, if at any point the number of matched characters equals the length of the pattern, we will have found an occurrence of the pattern in the text.
  • We can repeat the algorithm to find the next occurrence by looking for a shorter valid prefix length.

Instructions

1.

Create a function called kmp_algorithm() that takes in two parameters pattern and text. Inside the function, create the following variables:

  • m to store the length of the pattern,
  • n to store the length of text,
  • a list pi to hold the prefix function computed on pattern,
  • a variable j to store the number of characters matched so far,
  • and a variable count to store the number of occurrences of pattern in text.
2.

Create a for loop to iterate through each index i of text. Inside this loop, create another while loop that runs as long as the following two conditions are True:

  1. j > 0
  2. text[i] != pattern[j]

Inside this while loop, make sure to reset the number of matched characters to the length of a shorter valid prefix by retrieving this value from pi i.e. j = pi[j - 1].

3.

Upon exiting the while loop, create an if condition to check whether text[i] equals pattern[j]. If it does, increase the number of matched characters by 1 i.e. j += 1.

4.

If the number of matched characters equals the length of the pattern:

  • Increase count by 1.
  • Reset the number of matched characters to the length of a shorter valid suffix.

Return the number of occurrences of pattern in text.

Great job! You’ve successfully implemented one of the most difficult algorithms in Python! In the next exercise, we will revisit our old friend Rabin-Karp to see how the two algorithms perform against each other.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?