In this course you will learn the conceptual foundation of building and using hash maps. We will work through the different functions and architecture necessary to build your own hash map.

A data structure’s main utility is allowing for data to be represented in a way that resembles the way people will use that data. In some cases, the primary function of that data is that it will be sequenced through like a list and so we use a data structure that allows for easier iteration, like a linked list. In others, the usefulness comes from specifying interrelationships within the data.

In the case of tabular data there is a relationship between the elements of a row. Each column corresponds to a different feature of the row. Let’s consider the following table:
<div class="narrative-table-container">

| <font color=#52b1db>State</font> | <font color=#52b1db>State Flower</font>
| --- | --- |
|Alabama|Camellia|
|Hawaii|Hibiscus|
|Mississippi|Magnolia|
|New York|Rose|
|West Virginia|Rhododendron|
</div>
<br>

Each `State` on the left corresponds to a specific `State Flower` given on the right. For instance, "New York" corresponds to "Rose". This kind of table, with only two columns, represents a special relationship that mathematicians would call a “map”. This table maps states to state flowers, but many other relationships can be modeled with maps. 


Tables

Table showing authors compared by number of novels written

Being a _map_ means relating two pieces of information, but a map also has one further requirement. Let's consider the following table:

<div class="narrative-table-container">

| <font color=#52b1db>Musician</font> | <font color=#52b1db>State of Birth</font>
| --- | --- |
|Miles Davis|Illinois|
|John Coltrane|North Carolina|
|Duke Ellington|Ohio|
|Dizzy Gillespie|South Carolina|
|Thelonious Monk|North Carolina|

</div>
<br>

In the above table we map different jazz musicians to the state where they were born. When talking about a map we describe the inputs (jazz musicians, in this case) as the _keys_ to the map. The output (here the state of origin) is said to be the _value_ at a given key.

In order for a relationship to be a map, every key that is used can only be the key to a single value. In this example every musician can only have one state that they were born in, so it works. There doesn't need to be a value for every possible key, there just can't be more than one value for a given key. For instance, Miles Davis can’t be born in both Illinois and Kentucky.

If we looked at this relationship the other way, with states as the keys and jazz musicians born in a given state as values, this would not be a map. In the example above, if we look at "North Carolina" and try to get _the_ jazz musician from that state, we'll find it very difficult to do. Our relationship would give two different outputs: "John Coltrane" and "Thelonious Monk".

We would still be able to describe that relationship with a table, but it wouldn't be a map, and so we can't save such a relationship using a hash map. 

Maps

In the case of a map between two things, we don’t really care about the exact _sequence_ of the data. We only care that a given input, when fed into the map, gives the accurate output. Developing a data structure that performs this is tricky because computers care much more about values than relationships. A computer doesn’t really care to memorize the astrological signs of all of our friends, so we need to trick the computer into caring.
 
We perform this trick using a structure that our computer is already familiar with, an array. An array uses indices to keep track of values in memory, so we'll need a way of turning each key in our map to an index in our array.

Imagine we want our computer to remember that our good friend Joan McNeil is a Libra. We take her name, and we turn that name into a number. Let's say that the number we correspond with the name "Joan McNeil" is 17. We find the 17th index of the array we're using to store our map and save the value (Libra) there.

How did we get 17, though? We use a special function that turns data like the string "Joan McNeil" into a number. This function is called a _hashing function_, or a hash function. Hashing functions are useful in many domains, but for our data structure the most important aspect is that a hashing function returns an array index as output.


Hash Map Methodology

A hash function takes a string (or some other type of data) as input and returns an array index as output. In order for it to return an array index, our hash map implementation needs to know the size of our array. If the array we are saving values into only has 4 slots, our hash map's hashing method should not return an index bigger than that.

In order for our hash map implementation to guarantee that it returns an index that fits into the underlying array, the hash function will first compute a value using some scoring metric: this is the hash value, hash code, or just the _hash_. Our hash map implementation then takes that hash value <a href="https://en.wikipedia.org/wiki/Modulo_operation" target="_blank" rel="noopener noreferrer">mod</a> the size of the array. This guarantees that the value returned by the hash function can be used as an index into the array we're using.

It is actually a defining feature of all hash functions that they greatly reduce any possible inputs (any string you can imagine) into a much smaller range of potential outputs (an integer smaller than the size of our array). For this reason, hash functions are also known as _compression functions_. 

Much like an image that has been shrunk to a lower resolution, the output of a hash function contains less data than the input. Because of this, hashing is not a reversible process. With just a hash value it is impossible to know for sure the key that was plugged into the hashing function.


Hash Functions

You might be thinking at this point that we’ve glossed over a very important aspect of a hash table here. We’ve mentioned that a hash function is necessary, and described some features of what a hash function does, but never really given an implementation of a hash function that does not feel like a toy example.

Part of this is because a hash function needs to be simple by design. Performing complex mathematical calculations that our hash table needs to compute every time it wants to assign or retrieve a value for a key will significantly damage a hash table’s performance for two things that it should be able to do quickly.

Hash functions also need to be able to take whatever types of data we want to use as a key. We only discussed strings, a very common use case, but it’s possible to use numbers as hash table keys as well.

A very common hash function for integers, for example, is to perform the modular operation on it to make sure it’s less than the size of the underlying array. If the integer is already small enough to be an index into the array, there's nothing to be done.

Many hash functions implementations for strings take advantage of the fact that strings are represented internally as numerical data. Frequently a hash function will perform a shift of the data bitwise, which is computationally simple for a computer to do but also can predictably assign numbers to strings.


How to Write a Hash Function

Now that we have all of the main ingredients for a hash map, let's put them all together. First, we need some sort of associated data that we’re hoping to preserve. Second, we need an array of a fixed size to insert our data into. Lastly, we need a hash function that translates the keys of our array into indexes into the array. The storage location at the index given by a hash is called the _hash bucket_.

Let’s use the following example for our hash map:


<div class="narrative-table-container">

| <font color=#52b1db>Key: Album Name</font> | <font color=#52b1db>Value: Release Year</font>
| --- | --- |
|The Low End Theory|1991|
|Midnight Marauders|1993|
|Beats, Rhymes and Life|1996|
|The Love Movement|1998|

</div>
<br>

Our map here relates to several A Tribe Called Quest studio albums with the year they were produced in. We’ll need an array of at least size 4 to contain all of these elements. And a way to turn each album name into an index into that array.

For each album name, find that album’s hash by performing the following calculation:

```
hash_value = ((# of lowercase 'a's in album name) + (# of number of lowercase 'e's in album name))
```

And then take that hash and calculate an array index by performing `hash_value mod 4`. Following these steps we get the following schema:

<div class="narrative-table-container">

| <font color=#52b1db>Album Name</font> | <font color=#52b1db>Hash</font> | <font color=#52b1db>Hash mod 4</font> | <font color=#52b1db>Release Year</font>
| --- | --- | --- | --- |
|The Low End Theory|2|2|1991|
|Midnight Marauders|3|3|1993|
|Beats, Rhymes and Life|5|1|1996|
|The Love Movement|4|0|1998|
</div>
<br>

First, the key is translated into the hash using our hashing function. Then, our hash map performs modulo arithmetic to turn the hash into an array index. 

Basic Hash Maps

Remember hash functions are designed to compress data from a large number of possible keys to a much smaller range. Because of this compression, it’s likely that our hash function might produce the same hash for two different keys. This is known as a _hash collision_. There are several strategies for resolving hash collisions.

The first strategy we're going to learn about is called _separate chaining_. The separate chaining strategy avoids collisions by updating the underlying data structure. Instead of an array of values that are mapped to by hashes, it could be an array of linked lists!

Collisions

A hash map with a linked list separate chaining strategy follows a similar flow to the hash maps that have been described so far. The user wants to assign a value to a key in the map. The hash map takes the key and transforms it into a hash code. The hash code is then converted into an index to an array using the modulus operation. If the value of the array at the hash function's returned index is empty, a new linked list is created with the value as the first element of the linked list. If a linked list already exists at the address, append the value to the linked list given.

This is effective for hash functions that are particularly good at giving unique indices, so the linked lists never get very long. But in the worst-case scenario, where the hash function gives all keys the same index, lookup performance is only as good as it would be on a linked list. Hash maps are frequently employed because looking up a value (for a given key) is quick. Looking up a value in a linked list is much slower than a perfect, collision-free hash map of the same size. A hash map that uses separate chaining with linked lists but experiences frequent collisions loses one of its most essential features.

Separate Chaining

A hash collision resolution strategy like separate chaining involves assigning two keys with the same hash to different parts of the underlying data structure. How do we know which values relate back to which keys?  If the linked list at the array index given by the hash has multiple elements, they would be indistinguishable to someone with just the key.

If we save both the key and the value, then we will be able to check against the saved key when we’re accessing data in a hash map. By saving the key with the value, we can avoid situations in which two keys have the same hash code where we might not be able to distinguish which value goes with a given key.

Now, when we go to read or write a value for a key we do the following: calculate the hash for the key, find the appropriate index for that hash, and begin iterating through our linked list. For each element, if the saved key is the same as our key, return the value. Otherwise, continue iterating through the list comparing the keys saved in that list with our key.

Saving Keys

Another popular hash collision strategy is called _open addressing_. In open addressing we stick to the array as our underlying data structure, but we continue looking for a new index to save our data if the first result of our hash function has a different key's data.

A common open method of open addressing is called _probing_. Probing means continuing to find new array indices in a fixed sequence until an empty index is found.

Suppose we want to associate famous horses with their owners. We want our first key, “Bucephalus”, to store our first value, “Alexander the Great”. Our hash function returns an array index 3 and so we save “Alexander the Great”, along with our key “Bucephalus”, into the array at index 3. 

After that, we want to store “Seabiscuit"s owner "Charles Howard". Unfortunately “Seabiscuit” also has a hash value of 3. Our probing method adds one to the hash value and tells us to continue looking at index 4. Since index 4 is open we store "Charles Howard" into the array at index 4. Because "Seabiscuit" has a hash of 3 but "Charles Howard" is located at index 4, we must also save "Seabiscuit" into the array at that index.

When we attempt to look up "Seabiscuit" in our Horse Owner's Hash Map, we first check the array at index 3. Upon noticing that our key (Seabiscuit) is different from the key sitting in index 3 (Bucephalus), we realize that this can't be the value we were looking for at all. Only by continuing to the next index do we check the key and notice that at index 4 our key matches the key saved into the index 4 bucket. Realizing that index 4 has the key "Seabiscuit" means we can retrieve the information at that location, Seabiscuit's owner's name: Charles Howard.


Open Addressing: Linear Probing

There are more sophisticated ways to find the next address after a hash collision, although anything too calculation-intensive would negatively affect a hash table's performance. Linear probing systems, for instance, could jump by five steps instead of one step.

In a quadratic probing open addressing system, we add increasingly large numbers to the hash code. At the first collision we just add 1, but if the hash collides there too we add 4 ,and the third time we add 9. Having a probe sequence change over time like this avoids clustering.

_Clustering_ is what happens when a single hash collision causes additional hash collisions. Imagine a hash collision triggers a linear probing sequence to assigns a value to the next hash bucket over. Any key that would hash to this “next bucket” will now collide with a key that, in a sense, doesn’t belong to that bucket anyway.

As a result the new key needs to be assigned to the next, next bucket over. This propagates the problem because now there are two hash buckets taken up by key-value pairs that were assigned as a result of a hash collision, displacing further pairs of information we might want to save to the table.

Other Open Addressing Techniques

We've learned together what a hash map is and how to create one. Let's go over the concepts presented in this lesson.

A hash map is:
- Built on top of an array using a special indexing system.
- A key-value storage with fast assignments and lookup.
- A table that represents a map from a set of keys to a set of values.

Hash maps accomplish all this by using a hash function, which turns a key into an index into the underlying array.

A hash collision is when a hash function returns the same index for two different keys.

There are different hash collision strategies. Two important ones are separate chaining, where each array index points to a different data structure, and open addressing, where a collision triggers a probing sequence to find where to store the value for a given key.

Review

Hash Maps: Conceptual

Learn about hash maps, the efficient key-value storage used in many different programming languages, and then implement one yourself!

Learn Hash Maps

Learn to implement your own hash map in Python!

Hash maps are efficient key-value stores. They are capable of assigning and retrieving data in the fastest way possible for a data structure. This is because the underlying data structure that they use is an array. A value is stored at an array index determined by plugging the key into a hash function.

In Python we don’t have an array data structure that uses a contiguous block of memory. We are going to simulate an array by creating a list and keeping track of the size of the list with an additional integer variable. This will allow us to design something that resembles a hash map. This is somewhat elaborate for the actual storage of a key-value pair, but it helps to remember that the purpose of this lesson is to gain a deeper understanding of the structure as it is constructed. For real-world use cases in which a key-value store is needed, Python offers a built-in hash table implementation with dictionaries.

Creating the Hash Map Class

The necessary ingredient in the hash map recipe is the hashing function. A hashing function takes a key and returns an index into the underlying array. 

Hash functions need to be fast to compute so that access and retrieval can be done fast.

Creating the Hashing Function

Hashing functions return a wide range of integers. In order to transform these values into useful indices for our array we need a compression function. A compression function uses modular arithmetic to calculate an array index for a hash map when given a hash code.

Creating the Compression Function

A data structure that is unable to contain data is a sad sight indeed. We need to put together all the other steps we’ve taken: plug the key into the hash function, plug the hash code into the compression function, use the array index to find the place in the array, and finally set the value of the array to the value we want.

Defining the Setter

There is a natural expectation after placing an item into a bag that we will later be able to remove the item from that bag. Otherwise we have created a hole. Let’s implement retrieval for our hash map.

Defining the Getter

Since we have the basic functionality of a hash map, let’s create a test instance of one for us to make sure everything works as expected.

Creating an Instance

Our hash and compression functions together can result in collisions. This is when two different keys resolve to the same array index. In our current implementation, all keys that resolve to the same index are treated as if they are the same key.

Our first step in implementing a collision strategy is updating our `.assign()` and `.retrieve()` methods to set the value with the key and check the key before retrieving a value.

Handling Collisions in the Setter

When we retrieve hash map values, we also need to be aware of the fact that two keys could point to the same array index.

Handling Collisions in the Getter

Now we’re going to implement an open addressing system so our hash map can resolve collisions. In open addressing systems, we check the array at the address given by our hashing function. One of three things can happen:
 - The address points to an empty cell.
 - The cell holds a value for the key we are getting/setting 
 - The cell holds a value for a different key.

In the first case, this means that the hash map does not have a value for the key and no collision resolution needs to happen. Notice that this does not work if we want to be able to delete keys in our hash map. There are strategies for deleting pairs from a hash map (see <a href="https://en.wikipedia.org/wiki/Lazy_deletion">Lazy Deletion</a>) but we will not be investigating these.

In the second case, we've found the value for our key-value pair!

In the third case, we need to use our collision addressing strategy to find if our key is somewhere else (it may or may not be) so we should recalculate the index of our array.

Open Addressing

Now lets use our open addressing scheme in the setter for our `HashMap`.

Open Addressing in the Setter

With everything in our setter taken care of, we want to make sure that when we retrieve our value we're retrieving the correct value.

Open Addressing in the Getter

Now that we have all of the functionality of a Hash Map, it's time to review what we've learned!

Hash Maps: Python

It performs a complicated numerical calculation.

Two different inputs can never have the same output.

Using the modulus operator, usually via a compression method.

By dividing the hash code into four different possibilities and choosing the one that's empty.

A hash map picks the next available space in the underlying array.

Looks for another cell in the underlying array to add the value to.

Adds the value to an underlying linked list implementation.

Ignores the assignment and waits for the next method call.

Explores hash maps concepts. Implementation, structure, and collision strategies.

Use a Hash Map with a separate chain of Linked Lists to store the language of flowers. For every flower, save its meaning with Blossom!

The underlying data structure for Blossom is going to be a key-value store that uses the common names for flowers as the key and saves the floral meaning of the flower as the value. 

In order to implement this functionality, we’re going to build out a hash map with separate chains of linked lists at every index.

First, let’s define our HashMap class.

The first thing that we’ll need for our hash map is an array. Python’s lists behave similar to an array, but we’ll need to keep track and enforce the list’s size to make the resemblance stronger.

Give `HashMap` a constructor that takes a `size` parameter. Save `size` into `self.array_size`. 

After that, create a list of `None` objects of length `size` and save it into `self.array`. 

In order to implement a hash map, we need to implement four different methods. 

The first two are the internal methods needed to perform the basic responsibilities of a hash map: `.hash()` and `.compress()`. 

The next two are the external methods someone interacting with the hash map will use: `.assign()` and `.retrieve()`.

Let’s start by implementing a basic hash function. When the key is a string (which is true for all of Blossom’s keys) we’ll need to calculate a number for that string. Let’s sum up the character encodings of each character in the string and use that.

Define a method called `.hash()` that takes both `self` and `key` as parameters.

Calculate the hash code for the `key` by calling `key.encode()` and performing the sum on the resulting list-like object.

Now that we have a hash function, that returns a number, we’ll also need a compression function that reduces this number into an array index.

Define a `.compress()` method that takes a `hash_code` parameter. Return the result of calculating the remainder of dividing `hash_code` by `self.array_size`.

With our hash and compression functions written, all we need to create a basic hash map are our `.assign()` and `.retrieve()` methods. Let’s start with `.assign()`.

Define a `.assign()` method that takes three parameters: `self`, `key`, and `value`. Get the hash code by plugging `key` into `.hash()` and then get the array index by plugging the resulting hash code into `.compress()`. Save the result into the variable `array_index`.

In the array, at the address `array_index`, save both the key and the value as a list: `[key, value]`.

Now that we have an assignment function, let’s also build out our retrieval function.

Define a `.retrieve()` method that takes two parameters: `self` and `key`. 

`.retrieve()` should find the hash code for `key` by plugging it into `.hash()` and then find the array index by plugging that hash code into `.compress()`. 

Save that index into a variable called `array_index`

Save the value of `self.array` at `array_index` into a variable called `payload`. 

If `payload` is not `None`, then we know it's a list that looks like `[key, value]`.

Check the first item (`payload[0]`) and compare it with `key`. If they are the same, return the second item in `payload` (the value!).

If `payload` is `None` or the first item is not the same as `key`, return `None`.

Let's add in the separate chaining aspect of our algorithm. Import the linked list and node library by calling

```py
from linked_list import Node, LinkedList
```

At the top of **script.py**

In `HashMap.__init__`, find the line where we created a list of `None` objects.

Change this so that `self.array` instead is a list of `LinkedList`s. 

The resulting `self.array` should be a Python list of `LinkedList` objects, make sure to instantiate them.

In `.assign()`, we're going to be replacing the assign logic after getting the `array_index` from the `.hash()` and `.compressor()` methods.

Create a new `Node` object with value `[key, value]`. Assign that `Node` object to a variable called `payload`.

We'll need to check if the key exists in the LinkedList before we add our new payload to it. Save `self.array[array_index]` into the variable `list_at_array`.

Iterate through `list_at_array` using a `for` loop. For every item in `list_at_array`, check if the key (the element at index `0`) is the same as the key we're trying to assign.

If we do find a key at one of the items in the linked list, overwrite its value with `value`.

If we've iterated through the list and not found our key, we need to add it.

Remove the line where we assign
```py
self.array[array_index] = [key, value]
```
And change it so that we use `list_at_array.insert()` to insert the `payload` to our chained list.

Now we're going to update `.retrieve()` to use separate chaining. We're going to rewrite the code after we get our `array_index`.

Using the `array_index` variable, get the `LinkedList` object at that index in `self.array`. Before we called this `payload` but since it represents a different type of object, let's name it something different.

Save the result into a variable called `list_at_index`.

Iterate through the linked list similarly to how `.assign()` did, checking the key in each part of the list to see if it's the same as our key.

If you do find the key, return the value (at index `1` in the node's value), otherwise return `None`!

Now lets add in some flower definitions! Use

```py
from blossom_lib import flower_definitions 
```

To import the flower definitions.

Now let's create a new instance of our `HashMap` create an instance called `blossom`. Make the list of our new `HashMap` the same length as flower_definitions.

Now, for every element of `flower_definitions`, assign the value (index 1) to its key (index 0) using `blossom.assign()`.

Now use our app! Look up a flower's meaning using `blossom.retrieve('daisy')`. Try printing it out!

Does it work? Next, try looking up another flower. Is the flower you're looking for missing? How would you add it in?

If you are stuck on the project or would like to see an experienced developer work through the project, watch the following project walkthrough video!

<iframe width="300" height="200" src="https://www.youtube.com/embed/pJycHIBqPNg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Blossom

We'll cover the fundamental aspects of what makes a tree.


Trees are an essential data structure for storing hierarchical data with a directed flow. 

Similar to linked lists and graphs, trees are composed of nodes which hold data. The diagram represents nodes as rectangles and data as text.

Nodes also store references to zero or more **other tree nodes**. Data moves **down** from node to node. We depict those references as lines drawn between rectangles.

Trees are often displayed with a single node at the top and connected nodes branching downwards.



Trees Introduction

Trees grow downwards in computer science, and a _root_ node is at the very top. The root of this tree is `/photos`.

`/photos` references to two other nodes: `/safari` and `/wedding`. `/safari` and `/wedding` are _children_ or _child_ nodes of `/photos`. 

Conversely, `/photos` is a _parent_ node because it **has child nodes**. 

`/safari` and `/wedding` share the same parent node, which makes them _siblings_.

Note that the `/safari` node is child (to `/photos`) **and** parent (to `lion.jpg` and `giraffe.jpg`). It's extremely common to have nodes act as both parent and child to different nodes within a tree.

When a node has no children, we refer to it as a _leaf_ node. 



Diagram of a tree, as described in the exercise

Tree Detail

Trees come in various shapes and sizes depending on the dataset modeled.

Some are wide, with parent nodes referencing many child nodes. 

Some are deep, with many parent-child relationships. 

Trees can be both wide and deep, but each node will only ever have **at most** one parent; otherwise, they wouldn't be trees! 

Each time we move from a parent to a child, we're moving down a _level_. Depending on the orientation we refer to this as the _depth_ (counting levels down from the root node) or _height_ (counting levels up from a leaf node).



Tree Varietals


Constraints are placed on the data or node arrangement of a tree to solve difficult problems like efficient search. 

A _binary tree_ is a type of tree where each parent can have **no more than two children**, known as the _left child_ and _right child_. 

Further constraints make a _binary search tree_:

* Left child values must be lesser than their parent.
* Right child values must be greater than their parent.

The constraints of a binary search tree allow us to search the tree efficiently. At each node, we can discard **half** of the remaining possible values!

Let's walk through locating the value `31`.

1. Start at the root: `39`
2. `31` < `39`, we move to the left child: `23`
3. `23` < `31`, we move to the right child: `35`
4. `31` < `35`, we move to the left child: `31`
5. We found the value `31`!

In a dataset of **fifteen** elements, we only made **three** comparisons. What a deal!





Binary Search Tree


Trees are useful for modeling data that has a hierarchical relationship which moves in the direction from parent to child. No child node will have more than one parent.

To recap some terms:

* `root`: A node which has no parent. One per tree.
* `parent`: A node which references other nodes.
* `child`: Nodes referenced by other nodes.
* `sibling`: Nodes which have the same parent.
* `leaf`: Nodes which have no children.
* `level`: The height or depth of the tree. Root nodes are at level 1, their children are at level 2, and so on.



Tree Review

Trees: Conceptual

Many different data types in different nodes.

Many parent-child connections with few sibling nodes.


Before we start building (planting?) our trees, let's do a quick inventory of what we'll need in our Python implementation. We're going to make the class `TreeNodes`.

`TreeNodes`:
- have a value
- have a reference to zero or more other `TreeNodes`
- can add a node as a child
- can remove a child 
- can _traverse_ (or travel through) connected nodes

Let's start by defining our `TreeNode` class. We'll begin with having our node store a value, and additional functionality can be layered on in the following exercises.

Tree Implementation I: Planting Seeds

We have a working `TreeNode` class, but there's no time to enjoy a refreshing glass of lemonade. Trees are all about data hierarchy, and we need a parent-child relationship to make that work.

To review: child nodes are held as references by another instance of `TreeNode`, known as the parent node. 

```py
parent = TreeNode('CEO')
child = TreeNode('Executive Assistant')
print(parent.children)
# []
parent.add_child(child)
print(parent.children)
# [child]
```

We'll store the references to child nodes in a Python list and define an `add_child` method on our `TreeNode` class which will add nodes to that list.



Tree Implementation II: Think of the Children

Let's explore how to remove nodes from a tree. Remember, child nodes are held in a list within the parent node. To remove a child, we need to remove that node from the list.

We want the following functionality:

```py
print(root.children)
# [child_a, child_b, child_c]
root.remove_child(child_b)
print(root.children)
# [child_a, child_c]
```

* Call `.remove_child` on a specific node.
* Pass another node as an argument
* Remove from `.children` any nodes which match the argument node.



Tree Implementation III: Pruning

Trees are an abstract idea that we're making concrete in Python. When implementing these abstract data structures, it's important to leverage the features of your language. 

Let's refactor `.remove_child()` to use Python's list comprehension. As a quick refresher on list comprehension:

```py
nums = [1, 2, 3, 4, 5]
evens = [num for num in nums if num % 2 == 0]
# [2, 4]
```

Tree Implementation IIIa: Tuning the Pruning

Our implementation has covered adding and removing nodes. Let's expand the functionality and add the ability to move through connected nodes. 

Tree traversal is a standard operation for finding nodes with a specific value or printing all the nodes available in a tree. 

We'd like to do the following:

```py
root = TreeNode('Founder')
child_a = TreeNode('VP of Bananas')
child_b = TreeNode('Executive Assistant')

root.add_child(child_a)
root.add_child(child_b)

root.traverse()
# prints "Founder", "VP of Bananas", "Executive Assistant"

```



Tree Implementation IV: Traversing

Our implementation of tree traversal has a slight hiccup. Trees grow many levels deep, but we've only accounted for one parent-child relationship.

How is this a problem?

```py
root = TreeNode('Founder')
child_a = TreeNode('VP of Bananas')
child_b = TreeNode('Executive Assistant')
child_c = TreeNode('Banana R & D')

# adding children to the root
root.add_child(child_a)
root.add_child(child_b)

# assigning child_c to child_a creates an additional level in the tree
child_a.add_child(child_c)

root.traverse()
# prints "Founder", "VP of Bananas", "Executive Assistant"
```

"VP of Bananas" is a child to "Founder", **and** a parent to "Banana R & D". `.traverse()` only goes one level deep which leaves out "Banana R & D". Pull on your gardening gloves and let's fix that!



Tree Implementation V: Traversing Root to Leaf

Congratulations, you have implemented a tree in Python.

For review, in our implementation:
- Trees are a Python class called `TreeNode`.
- A `TreeNode` has two properties, `value` and `children`.
- Nodes hold any type of data inside `value`.
- `children` is a list, which can be empty or hold other instances of `TreeNode`.
- We add to `children` by using the list method `.append`.
- We remove from `children` by filtering the list.

Trees: Python

```py
self.children = [
  child for child in self.children 
  if child is not child_node
]
```


class TreeNode:
  def __init__(self, value):
    self.value = value
    self.children = []

  def remove_child(self, child_node):
    ??????


class TreeNode:  def __init__(self, value):    self.value = value    self.children = []  def add_child(self, child_node):    ??????

```py
current_node = nodes_to_visit.pop()
nodes_to_visit += current_node.children
```

```py
current_node = nodes_to_visit[-1]
nodes_to_visit += current_node.children
```

```py
current_node = nodes_to_visit.pop()
nodes_to_visit = current_node.children
```

class TreeNode:
  def __init__(self, value):
    self.value = value
    self.children = []

  def traverse(self):
    nodes_to_visit = [self]

    while len(nodes_to_visit) > 0:
      ??????



```py
if child_node in self.children:  
  return
```

```py
if self.children.includes(child_node):  
  return
```

```py
if self.children != child_node:  
  return
```

class TreeNode:
  def __init__(self, value):
    self.value = value
    self.children = []

  def add_child(self, child_node):
    ??????
    self.children.append(child_node)


```py
if len(self.children) == 2:
  return
```

```py
if len(self.children) > 2:
  return
```

```py
if len(self.children) < 2:
  return
```

class TreeNode:
  def __init__(self, value):
    self.value = value
    self.children = []

  def add_child(self, child_node):
    self.children.append(child_node) 
  
  def traverse(self):
    nodes_to_visit = [self]
    while len(nodes_to_visit) > 0:
      current_node = nodes_to_visit.pop()
      nodes_to_visit += current_node.children

root = TreeNode("A")
first_child = TreeNode("B")
second_child = TreeNode("C")

root.add_child(first_child)
root.add_child(second_child)

root.traverse()

Write an interactive Choose Your Own Adventure game using the Tree data structure.

This project will be heavily interactive. 

To get in the spirit, write a `print()` function inside of **script.py** to display "Once upon a time..." in the console.

Great! You'll need to save your changes as you go. 

Click "Save", then, inside of the terminal, enter `python3 script.py`. This will run the file.

You should see the contents of your print statement in the console.

Wonderful! Our application will use the tree data structure to keep track of the different paths a user can choose in their story. 

Define a `TreeNode` class.

Our `TreeNode` class will keep track of two things:
- a portion of the story.
- the choices a user can make to progress in the story.

Within `TreeNode`, define an `__init__()` method that takes `self`, `story_piece` as arguments.

Inside of `__init__()`, assign `story_piece` to `self.story_piece`. Also assign `self.choices` to be an empty Python list.

Our market research indicates users are clamoring for a wilderness tale. Let's get the story started... 

Declare a variable `story_root` and assign it to an instance of `TreeNode` with the following text: 
```py
"""
You are in a forest clearing. There is a path to the left.
A bear emerges from the trees and roars!
Do you: 
1 ) Roar back!
2 ) Run to the left...
"""
```

Test out that we're on the right path by printing `story_root.story_piece` near the bottom of **script.py**. 

In the terminal, run `python3 script.py`.

A Choose-Your-Own-Adventure wouldn't be any fun if it weren't interactive. Let's explore how we can take input from the user. Outside of the `TreeNode` class, declare a variable `user_choice` and assign it to `input("What is your name? ")`

Immediately below `input()`, print out `user_choice` and click "Save". 

Inside of the terminal type `python3 script.py` to run our program. 

You should see the argument given to `input()`, "What is your name? ", printed to the screen. 

The terminal is waiting for your response. Type in your name and press the enter key.

Did you see your name printed out? 

This is how users will progress through our Choose-Your-Own-Adventure, they'll enter a number to select one of the displayed choices.
 
Experiment a few times typing in different things. Comment out or delete those lines of code so they don't run and clutter up the terminal.

Every good story has a beginning, middle, and end. Let's alter our `TreeNode` class so we can add the middle and end. 

We'll need an `.add_child()` method defined within `TreeNode` that has `self` and `node` as parameters.

We're treating each node in the tree as a piece of the story. It's a Choose-Your-Own-Adventure story, so there are multiple paths the user can take. 

Store the argument passed into `add_child()` inside of `self.choices`.

Let's add a few more pieces of the story. 

Declare `choice_a` and assign it to a new instance of `TreeNode` with the following argument: 
```py
"""
The bear is startled and runs away.
Do you:
1 ) Shout 'Sorry bear!'
2 ) Yell 'Hooray!'
"""
```

Declare `choice_b` and assign it to a new instance of `TreeNode` with the following argument: 
```py
"""
You come across a clearing full of flowers. 
The bear follows you and asks 'what gives?'
Do you:
1 ) Gasp 'A talking bear!'
2 ) Explain that the bear scared you.
"""
```

Call `add_child()` on `story_root` and pass `choice_a` as an argument.

Call `add_child()` on `story_root` and pass `choice_b` as an argument.

Wonderful! Now our story has a beginning in the variable `story_root`, and two choices available for a middle section stored inside of `story_root.choices`.



Let's add some functionality to our `TreeNode` class so we can move through the story. 

Within `TreeNode`, define `.traverse()`. It should only take `self` as an argument. 

Inside of `.traverse()`, declare a variable `story_node` and assign it to `self`. This variable will track the current portion of the story. 

Call `print()` with `story_node.story_piece` as an argument.

Test out our `.traverse()` method by calling it on `story_root`. 

Click "Save" and run our script by typing `python3 script.py` in the terminal. 

You should see the beginning of the story printed out.

We want to take user input and progress through the story as long as there are story choices remaining. 

Let's break it down:
- print out the `story_node.story_piece`.
- while `story_node.choices` has nodes inside...
- prompt the user for a choice.
- set `story_node` to be the user's story choice.
- repeat until the story is over!

Let's write out the while loop that will make up the rest of `.traverse()`. 

The loop should run as long as `story_node.choices` **is not equivalent to** an empty list. 

If `story_node.choices` is an empty list then we know the story is over.

Inside of the while loop, declare a variable `choice` and set it to `input()` with the argument: `"Enter 1 or 2 to continue the story: "`. 

This is how we will collect the user's choice for how to progress through the story.

Let's use a conditional to ensure the user is entering valid input. 

If `choice` is `not in` a list with the valid options, then print out a message asking them to enter a valid choice: `1` or `2`.

Now let's write the branch of the conditional where the user has made a valid choice. 

We collected input from the terminal, so even if they entered `1`, it will be a `String` data type. Convert it to be an `Integer` so we can use this choice as an index. 

Declare a variable `chosen_index` and assign it to `int()` with `choice` passed as an argument.

We're having our users enter `1` or `2`, but the `story_node.choices` are at index `0` or `1`. 

Reassign `chosen_index` to be one less than it was before.

Declare a variable `chosen_child` and assign it to the appropriate choice from `story_node.choices`. 

Use `chosen_index` to access the correct element!

`chosen_child` is now our current portion of the story because the user selected this child node. 

Use `print()` to display `chosen_child.story_piece`.

Finally, set `story_node` to be `chosen_child`.

 Our while loop will keep checking `story_node` to see if there are more choices to be made in our story.

Congratulations! We have a functioning Choose-Your-Own-Adventure. 

At the bottom of `script.py`, call `.traverse()` on `story_root`. 

Click "Save", and run `python3 script.py` in the terminal. 

You should be able to progress through one level of the story!

Our functionality is all in place, all we have left to do is finish the story. Let's create the child nodes for `choice_a`. 

Declare a variable `choice_a_1` and assign it to an instance of `TreeNode` with the following string as an argument:
```py
"""
The bear returns and tells you it's been a rough week. After making peace with
a talking bear, he shows you the way out of the forest.

YOU HAVE ESCAPED THE WILDERNESS.
"""
```

Declare a variable `choice_a_2` and assign it to an instance of `TreeNode` with the following string as an argument:
```py
"""
The bear returns and tells you that bullying is not okay before leaving you alone
in the wilderness.

YOU REMAIN LOST.
"""
```

`choice_a_1` and `choice_a_2` should be child nodes to `choice_a`. 

Call `.add_child()` on `choice_a` to set up the relationship between these nodes. 

Click "Save" and in the terminal run `python3 script.py`. Navigate through these new sections of the story.

Now let's create the child nodes for `choice_b`. 

Declare a variable `choice_b_1` and assign it to an instance of `TreeNode` with the following string as an argument:
```py
"""
The bear is unamused. After smelling the flowers, it turns around and leaves you alone.

YOU REMAIN LOST.
"""
```

Declare a variable `choice_b_2` and assign it to an instance of `TreeNode` with the following string as an argument:
```py
"""
The bear understands and apologizes for startling you. Your new friend shows you a 
path leading out of the forest.

YOU HAVE ESCAPED THE WILDERNESS.
"""
```

Set up the appropriate relationship for `choice_b_1` and `choice_b_2`; they should be child nodes of `choice_b`.

Our story is complete. Click "Save" and run `python3 script.py` in the terminal. 

Have fun navigating through the different branches of the story!

Choose Your Own Adventure: Wilderness Escape

When you need to efficiently maintain a maximum or minimum value in a dataset.

When you need to create a relationship of connections between data in your dataset.

When you need to maintain an ordering of when elements entered the dataset.

When you need to keep a mapping of String values pointing to other values.

Every value below the "root" will be lesser than the value in the "root".

The root can only be replaced by a lesser value.

The "root" or first element is the largest value and every child is a lesser value than their parent.

The "root" or first element is the smallest value and every child is a greater value than their parent.

The "root" or first element is the sum of both children, and every child is half of their parent's value.

The "root" or first element is an even number and every child is an odd number.

While there is a parent for an element and the parent is greater, the element swaps locations with the parent. 

While there is a parent for an element and the parent is lesser, the element swaps locations with the parent. 

While there is a child for an element and the child is lesser, the element swaps locations with the parent. 

While there is a child for an element and the child is greater, the element swaps locations with the parent. 

While there is a child for an element and the child is greater, the element swaps locations with the child. 

While there is a child for an element and the child is lesser, the element swaps locations with the child. 

The "root" node, or first element in the array.

The element least like the other elements in the dataset.

The element at the bottom right of the tree, or last element in the array.

The element with the mean value of the dataset.

Quiz covering the underlying concepts of the Heap abstract data type.

Heaps: Conceptual

Conceptual overview of the heap data structure, covering its essential properties and how to maintain them.


Heaps are used to maintain a maximum or minimum value in a dataset. Our examples use numbers since this is a straight-forward value, but heaps have many practical applications.

Imagine you have a demanding boss (hopefully this is theoretical!). They always want **the most important** thing done. Of course, once you finish the most important task, another one takes its place. 

You can manage this problem using a **priority queue** to ensure you're always working on the most pressing assignment and heaps are commonly used to create a priority queue.

Heaps tracking the maximum or minimum value are  _max-heaps_ or _min-heaps_. We will focus on min-heaps, but the concepts for a max-heap are nearly identical.

Think of the min-heap as a binary tree with two qualities:

* The root is the **minimum value** of the dataset. 
* Every child's value is **greater than or equal to its parent**.

These two properties are the defining characteristics of the min-heap. By maintaining these two properties, we can efficiently retrieve and update the minimum value.



Introduction to Heaps

### Why Learn Complex Data Structures?

These data structures use a layer of abstraction to make specific operations much more straightforward. They're designed as solutions for problems that don't require linear iteration, but have more nuanced requirements.

### Take-Away Skills:

This course introduces the theory and implementation of abstract data structures. After this course, you'll be ready to solve advanced algorithmic problems like path-finding and maintaining priority queues.

### Notes on Prerequisites:

This course is a continuation of our Linear Data Structures syllabus, which introduces other data structures you might encounter. Since you'll be implementing these data structures in Python, we recommend you take our Python curriculum to become familiar with the language.



Discover and design new data structures that follow abstract rule-based systems by building out graphs, hash-maps, and heaps.