### Summary Statistics in R

Because R is mainly a statistical processing software, summary statistics come standard with base R functionality.

- Use
`mean()`

and`median()`

to calculate average of a vector. - Use
`min()`

,`max()`

, and`range()`

to see the range of a vector. - Use
`sd()`

or`var()`

to calculate the spread of a vector. - Use
`table()`

to view the frequency of each value in a vector.

```
## AVERAGE
mean(dat) #mean
median(dat) #median
## RANGE
min(dat) #minimum value
max(dat) #maximum value
range(dat) #minimum and maximum
## SPREAD
sd(dat) #standard deviation
var(dat) #variance
## FREQUENCY
table(dat) #frequency of each value
```

### ggplot() Initializes a ggplot Object

Invoking the `ggplot()`

function returns an object that serves as the base of a ggplot2 visualization.

viz <- ggplot() viz # renders blank plot

Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the `ggplot()`

function call. Layers can be added to the plot object by adding function calls after `ggplot()`

with a `+`

plus sign. These functions have access to the data frame and can use the column names as variables.

For example, consider a data frame `sales`

with the columns `cost`

and `profit`

. To assign the data frame `sales`

to the `ggplot()`

object that is initialized:

viz <- ggplot(data=sales) + geom_point(aes(x=cost, y=profit)) viz # renders plot

In the example above:

- The ggplot object or canvas was initialized with the data frame
`sales`

assigned to it - The subsequent
`geom_point`

layer used the`cost`

and`profit`

columns to define the scales of the axes for that particular geom. Notice that it referred to those columns with their column names. - The variable name of the ggplot object is stated so the plot is viewable.

### ggplot2 Aesthetics

In ggplot2 aesthetics are the instructions that determine the visual properties of a plot and its geometries.

Examples of ggplot2 aesthetics include:

- scales for the x and y axes
- color of the data points on the plot based on a property or on a color preference
- the size or shape of different geometries

Aesthetics are set either manually or by *aesthetic mappings*. Aesthetic mappings “map” variables from the bound data frame to visual properties in the plot. These mappings are provided in two ways using the `aes()`

mapping function:

- At the canvas level: All subsequent layers on the canvas will inherit the aesthetic mappings defined when the ggplot object was created with
`ggplot()`

. - At the geom level: Only that layer will use the aesthetic mappings provided.

For example, the following code assigns `aes()`

mappings for the `x`

and `y`

scales at the canvas level:

viz <- ggplot(data=airquality, aes(x=Ozone, y=Temp)) + geom_point() + geom_smooth()

In the example above:

- The aesthetic mapping is wrapped in the
`aes()`

aesthetic mapping function as an additional argument to`ggplot()`

. - Both of the subsequent geom layers,
`geom_point()`

and`geom_smooth()`

use the scales defined inside the aesthetic mapping assigned at the canvas level.

You could create the same plot by setting the aesthetics at the geom level, as follows:

viz <- ggplot(data=airquality) + geom_point(aes(x=Ozone, y=Temp)) + geom_smooth(aes(x=Ozone, y=Temp))

### Creating Regression Models in R

The `lm()`

function creates a linear regression model in R. The `glm()`

function creates a logistic regression model in R.

These functions take a formula `Y ~ X`

where `Y`

is the outcome variable and `X`

is the predictor variable. We can add additional predictor variables using `+`

.

A summary of these models can be printed using the `summary()`

function.

```
## Linear regression model
temp_lm <- lm(temp ~ month + region, data = world)
summary(temp_lm) #print summary
## Logistic regression model
winning_glm <- glm(win ~ ranking + home + starting_players, data = team)
summary(winning_glm) #print summary
```

### Making Predictions from Regression Objects in R

To make predictions of the outcome variable using a regression model, we need a dataset whose column names match the names of the coefficients in the model. Once establishing the data to make predictions about, we can use the `predict()`

function to generate predictions. This will produce 1 predicted outcome for each observation in this new dataset.

```
## Create linear regression model
lm1 <- lm(y ~ x1 + x2 + x3, data = dat)
## Establish data to make predictions about
pred_data <- data.frame(
x1 = c(0, 1, -1),
x2 = c(1, 6, 5),
x3 = c(10, -4, 9)
)
## Make predictions
predict(lm1, pred_data)
```