Let’s start by exploring numeric data! In the learning environment, we’ve loaded a dataset containing the number of new cars produced in the US by year (from the Bureau of Transportation Statistics). Often, exploring numeric data starts with answering basic statistical questions about the data such as
- what is the largest number of new cars in a year?
- what is the smallest number of new cars in a year?
- what is the average number of new cars in a year?
Excel can help us answer all of these. But first, we’ll have to know a little more about how cells in a spreadsheet connect to each other. The connections between cells are based on cell “addresses” or “references”.
Cell addresses are made up of a column letter
followed by a row number
. For example, the cell containing the number 10 in the following image is named B2 since it is in column B and row 2. The number 10 is the content or value of the cell.
In addition to numbers, cells can also contain formulas, which are how we’ll calculate our maximum, minimum, and average. A formula in Excel always starts with the =
sign, so Excel knows that it should do some kind of computation.
For example, since B2 contains the number 10, the formula =2+B2
would produce the value 12
, since 2+10 = 12
. The cell name B2 gets replaced by its value when Excel does the computation.
We can also do subtraction, division, and multiplication in formulas using
-
for subtraction/
for division*
for multiplication
This last one might seem like an unusual symbol, but it is pretty commonly used for multiplication on computers.
Here’s an example of using *
for multiplication. What would you change if you wanted to multiply B2 by 3?
Now, you might be thinking to yourself that you could multiply 2
by 10
without a spreadsheet. That’s right! But the magic of Excel formulas is that they update automatically when data changes. If the cell B2
was changed to contain the number 4
, the cell C2
would automatically update to contain the number 8
, since 2*4 = 8
. These dynamic formulas are the real power of Excel since they let us automatically update analyses with new data.
Ranges of Cells
To calculate an average we need to reference a whole collection of cells instead of a single cell at a time. To reference a range of data, we type the name of the first cell in the range (usually the top-left cell), a colon, and then the name of the last cell (usually the bottom-right cell).
For example, the range B1:C4
includes the cells in Columns B-C and Rows 1-4.
To calculate the maximum, minimum, and average of this range we use special Excel functions =MAX()
, =MIN()
, and =AVERAGE()
, and we we put the range containing the data between the parentheses.
In the example in the learning environment, the data is contained in the range B2:B7
, so the full formulas for calculating these three statistics are
=MAX(B2:B7)
=MIN(B2:B7)
=AVERAGE(B2:B7)
To display the average in cell B8
, we
- Select cell
B8
- Type the formula into the cell
- Select
return
orenter
You can watch the full process in the slideshow in the learning environment. When you are ready to practice this yourself, follow the next set of instructions!
Instructions
When you’re ready to practice what you’ve learned, download and work through our interactive exercise spreadsheet. A couple of important points to keep in mind:
- We’ve changed some of the default Excel formatting to make these easier to follow and more interactive. You’ll learn how to do formatting in later lessons.
- We’ve locked all cells except the ones we’re asking you to create formulas in. This is so we can automatically correct your answers and give you feedback without worrying that some data has been changed accidentally.
- If you are having technical issues with the Excel spreadsheets, check out this post in our forums for possible solutions!
- If you get stuck, feel free to download our solutions spreadsheet
- When you’re done with the exercises, come back here and answer the reflection question!
Reflection question: when you restricted the range for the average, the average increased. Why do you think that happened, and when would you prefer to use one average over the other?