After importing a data file, we’re often eager to get started with exploratory data analysis. Unfortunately, raw data is often messy data. If we don’t take some time to inspect and clean a dataset, we may get meaningless or misleading results.

For example, let’s take a look at this table of monthly sales:

Suppose we want to know the sales associated with `january`

. If we just dive in and filter the table on `month equals january`

, we’ll get a value of 500.

But there are several other rows of the original table that ought to be associated with `january`

! Unfortunately, Excel can’t tell that `jan`

and `january`

really mean the same thing. Even harder to spot is the fact that one of the rows labeled `jan`

has three spaces at the end. We can’t see this extra whitespace easily, but Excel will treat it as a different category.

In a small dataset, these inconsistencies can be edited by hand. For larger datasets, Excel provides text cleaning functions that we can write once and then drag down a column like any other formula. You can even nest these functions (like in the last slide to the right), though nesting is outside the scope of this course.

We’ve placed a slideshow demonstrating common text-cleaning functions in the learning environment. When you’re ready to try these out yourself, move on to the next set of instructions!

### Instructions

When you’re ready to practice what you’ve learned, download and work through our interactive exercise spreadsheet. Feel free to download our solutions spreadsheet to compare your work to ours!

**Reflection Question**: in the spreadsheet, why did we trim the data first?