Learn

We now have all the tools in place to complete the full implementation of the Rabin-Karp algorithm! All that is left is to compute the hash of the pattern and compare it with substrings of the same length in the text. If they hash to the same value, it is safe to assume a match has been found. Here is what the pseudocode might look like:

• Find the first substring in `text` equal in length to `pattern`.
• Compare its polynomial hash with that of `pattern`.
• If they are equal, increase `occurrences` by `1`.
• Iterate through remaining substrings in order
• If the polynomial hash of any substring matches that of `pattern`, increase `occurrences` by `1`.
• Return `occurrences`.

### Instructions

1.

Complete the function `rabin_karp_algorithm()` so that it returns the number of occurrences of `pattern` in `text`.

The polynomial hash of `pattern` and the polynomial hash of the first substring in `text` equal in length to `pattern`, i.e. `polynomial_hash(text[: pattern_length])` Have been computed for you: `substring_hash` and `pattern_hash`. If their hash values match, increase the number of `occurrences` by `1`.

Just like how you did in the last exercise, use a `for` loop to iterate through the remaining subsequent substrings in `text` equal in length to `pattern`.

Compute the polynomial rolling hash of each of these substrings inside the `for` loop, and compare them individually with the polynomial hash of `pattern`. Each time the hash values match, increase the number of occurrences by `1`.

Click the hint for a guided walkthrough of the Rabin-Karp algorithm code.

2.

Great job with getting `rabin_karp_algorithm()` to work! Now we can come full circle and put it to the test against the same pattern and text that took `naive_pattern_matching()` a really long time to figure out back in Exercise 2.

To see if `rabin_karp_algorithm()` runs faster on the same input, create a string having `1000` repetitions of the character `A` and store it in a variable `pattern`. Then, create another string having `100000` repetitions of the character `A` and store it in a variable `text`.

Print `rabin_karp_algorithm()` by passing `pattern` and `text` as arguments to the function and notice how long it takes to count the number of occurrences of `pattern` in `text`. Did it run significantly faster?

3.

However, if you keep increasing the length of `pattern` and `text`, even `rabin_karp_algorithm()` will start to reach its limit. To see this for yourself, leave the value of `pattern` as it is but add another `0` to the number of characters `A` in `text` i.e. `1000000` to make it a million characters long.

Print `rabin_karp_algorithm()` by passing `pattern` and `text` as arguments to the function and notice how long it takes to count the number of occurrences of `pattern` in `text`!

That felt like an eternity and it seems like it’s the end of the road when it comes to dealing with gigantic strings. Quite the opposite, however!

Thanks to breakthroughs in computer science algorithms, there is a more advanced string-search algorithm called the Knuth-Morris-Pratt Algorithm that can outperform Rabin-Karp in counting occurrences of a single pattern in a large text. You will learn about it in a future lesson.