Learn

It seemed like we were making decent progress, but our hash function collided to the same value! Indeed, there is only a limited set of values that we can generate using ASCII values. This is a problem because our pattern-matching algorithm can erroneously produce a match between a pattern and a substring even when they are different from one another! In other words, our hash function is not that great at producing unique values for unique strings. Can we do better?

The collision occurred because even though the product of ASCII values seemed unique on the surface, it turned out to have the same prime factors, which resulted in the same value. What if we use prime numbers instead of ASCII values? This would guarantee that every pattern has its own unique prime factorization.

In other words, instead of assigning a unique ASCII value to each character, we assign a unique prime number in increasing order as follows:

{'A': 2, 'B': 3, 'C': 5, ..., 'X': 89, 'Y': 97, 'Z': 101}

Since 'A' is the first character, we assign it the first prime number: 2. Similarly, 'B' takes on the next prime number, 3, and so on. The complete list of the first twenty-six prime numbers is as follows:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101]

Just like how we multiplied ASCII values for each character in a substring to get its ASCII Hash, we will multiply the prime-number values (or just prime values) for each character in a substring to get its prime hash. We’ve already defined the first twenty-six prime numbers in the code, so let’s get started!

Instructions

1.

Create a dictionary called prime_values that maps each character in uppercase to its corresponding prime value using the ord() function and the prime_value() function you have implemented. Print the dictionary prime_valuesto see the mapping from characters to prime values for yourself!

2.

Use string slicing and a for loop to access all the substrings of length 4 in uppercase. Find the prime hash of each of these substrings using the prime_hash() function you have just implemented.

Create a dictionary prime_hash_values and store the mapping of each substring of length 4 to its corresponding prime hash in this dictionary. Print out prime_hash_values to see the results!

3.

Challenge: Can you still find a pair of strings that are distinct from each other but still have the same prime hash?

Find two such strings and store them in the variables string1 and string2.

Create a dictionary colliding_hash_values that maps string1 and string2 to the same hash value. Print out colliding_hash_values to see the results!

4.

So close! Turns out reversing the string once again produced a collision. However, apart from that, our prime hash function seems to be doing pretty well for most strings that we can encounter except for palindromic strings. Hold that thought for now and we will come back to address it later because there is a much more subtle and important trick to learn using this reasonably effective hash function in the next exercise!

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?