When we use BeautifulSoup to select HTML elements, we often want to grab the text inside of the element, so that we can analyze it. We can use
.get_text() to retrieve the text inside of whatever tag we want to call it on.
<h1 class="results">Search Results for: <span class='searchTerm'>Funfetti</span></h1>
If this is the HTML that has been used to create the
soup object, we can make the call:
Which will return:
'Search Results for: Funfetti'
Notice that this combined the text inside of the outer
h1 tag with the text contained in the
span tag inside of it! Using
get_text(), it looks like both of these strings are part of just one longer string. If we wanted to separate out the texts from different tags, we could specify a separator character. This command would use a
| character to separate:
Now, the command returns:
'Search Results for: |Funfetti'
After the loop, print out
turtle_data. We have been storing the names as the whole
p tag containing the name.
Instead, let’s call
get_text() on the
turtle_name element and store the result as the key of our dictionary instead.
Instead of associating each turtle with an empty list, let’s have each turtle associated with a list of the stats that are available on their page.
It looks like each piece of information is in a
li element on the turtle’s page.
ul element on the page, and get all of the text in it, separated by a
'|' character so that we can easily split out each attribute later.
Store the resulting string in
turtle_data[turtle_name] instead of storing an empty list there.
When we store the list of info in each
turtle_data[turtle_name], separate out each list element again by splitting on