Because R is mainly a statistical processing software, summary statistics come standard with base R functionality.

- Use
`mean()`

and`median()`

to calculate average of a vector. - Use
`min()`

,`max()`

, and`range()`

to see the range of a vector. - Use
`sd()`

or`var()`

to calculate the spread of a vector. - Use
`table()`

to view the frequency of each value in a vector.

## AVERAGEmean(dat) #meanmedian(dat) #median## RANGEmin(dat) #minimum valuemax(dat) #maximum valuerange(dat) #minimum and maximum## SPREADsd(dat) #standard deviationvar(dat) #variance## FREQUENCYtable(dat) #frequency of each value

Invoking the `ggplot()`

function returns an object that serves as the base of a ggplot2 visualization.

viz <- ggplot()viz # renders blank plot

Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the `ggplot()`

function call. Layers can be added to the plot object by adding function calls after `ggplot()`

with a `+`

plus sign. These functions have access to the data frame and can use the column names as variables.

For example, consider a data frame `sales`

with the columns `cost`

and `profit`

. To assign the data frame `sales`

to the `ggplot()`

object that is initialized:

viz <- ggplot(data=sales) +geom_point(aes(x=cost, y=profit))viz # renders plot

In the example above:

- The ggplot object or canvas was initialized with the data frame
`sales`

assigned to it - The subsequent
`geom_point`

layer used the`cost`

and`profit`

columns to define the scales of the axes for that particular geom. Notice that it referred to those columns with their column names. - The variable name of the ggplot object is stated so the plot is viewable.

In ggplot2 aesthetics are the instructions that determine the visual properties of a plot and its geometries.

Examples of ggplot2 aesthetics include:

- scales for the x and y axes
- color of the data points on the plot based on a property or on a color preference
- the size or shape of different geometries

Aesthetics are set either manually or by *aesthetic mappings*. Aesthetic mappings “map” variables from the bound data frame to visual properties in the plot. These mappings are provided in two ways using the `aes()`

mapping function:

- At the canvas level: All subsequent layers on the canvas will inherit the aesthetic mappings defined when the ggplot object was created with
`ggplot()`

. - At the geom level: Only that layer will use the aesthetic mappings provided.

For example, the following code assigns `aes()`

mappings for the `x`

and `y`

scales at the canvas level:

viz <- ggplot(data=airquality, aes(x=Ozone, y=Temp)) +geom_point() +geom_smooth()

In the example above:

- The aesthetic mapping is wrapped in the
`aes()`

aesthetic mapping function as an additional argument to`ggplot()`

. - Both of the subsequent geom layers,
`geom_point()`

and`geom_smooth()`

use the scales defined inside the aesthetic mapping assigned at the canvas level.

You could create the same plot by setting the aesthetics at the geom level, as follows:

viz <- ggplot(data=airquality) +geom_point(aes(x=Ozone, y=Temp)) +geom_smooth(aes(x=Ozone, y=Temp))

The `lm()`

function creates a linear regression model in R. The `glm()`

function creates a logistic regression model in R.

These functions take a formula `Y ~ X`

where `Y`

is the outcome variable and `X`

is the predictor variable. We can add additional predictor variables using `+`

.

A summary of these models can be printed using the `summary()`

function.

## Linear regression modeltemp_lm <- lm(temp ~ month + region, data = world)summary(temp_lm) #print summary## Logistic regression modelwinning_glm <- glm(win ~ ranking + home + starting_players, data = team)summary(winning_glm) #print summary

To make predictions of the outcome variable using a regression model, we need a dataset whose column names match the names of the coefficients in the model. Once establishing the data to make predictions about, we can use the `predict()`

function to generate predictions. This will produce 1 predicted outcome for each observation in this new dataset.

## Create linear regression modellm1 <- lm(y ~ x1 + x2 + x3, data = dat)## Establish data to make predictions aboutpred_data <- data.frame(x1 = c(0, 1, -1),x2 = c(1, 6, 5),x3 = c(10, -4, 9))## Make predictionspredict(lm1, pred_data)