Skip to Content
Learn
Web Scraping with Beautiful Soup
Reading Text

When we use BeautifulSoup to select HTML elements, we often want to grab the text inside of the element, so that we can analyze it. We can use .get_text() to retrieve the text inside of whatever tag we want to call it on.

<h1 class="results">Search Results for: <span class='searchTerm'>Funfetti</span></h1>

If this is the HTML that has been used to create the soup object, we can make the call:

soup.get_text()

Which will return:

'Search Results for: Funfetti'

Notice that this combined the text inside of the outer h1 tag with the text contained in the span tag inside of it! Using get_text(), it looks like both of these strings are part of just one longer string. If we wanted to separate out the texts from different tags, we could specify a separator character. This command would use a . character to separate:

soup.get_text('|')

Now, the command returns:

'Search Results for: |Funfetti'

Instructions

1.

After the loop, print out turtle_data. We have been storing the names as the whole p tag containing the name.

Instead, let’s call get_text() on the turtle_name element and store the result as the key of our dictionary instead.

2.

Instead of associating each turtle with an empty list, let’s have each turtle associated with a list of the stats that are available on their page.

It looks like each piece of information is in a li element on the turtle’s page.

Get the ul element on the page, and get all of the text in it, separated by a '|' character so that we can easily split out each attribute later.

Store the resulting string in turtle_data[turtle_name] instead of storing an empty list there.

3.

When we store the list of info in each turtle_data[turtle_name], separate out each list element again by splitting on '|'.

Folder Icon

Sign up to start coding

Already have an account?