Because R is mainly a statistical processing software, summary statistics come standard with base R functionality.
mean()
and median()
to calculate average of a vector. min()
, max()
, and range()
to see the range of a vector.sd()
or var()
to calculate the spread of a vector.table()
to view the frequency of each value in a vector.## AVERAGEmean(dat) #meanmedian(dat) #median## RANGEmin(dat) #minimum valuemax(dat) #maximum valuerange(dat) #minimum and maximum## SPREADsd(dat) #standard deviationvar(dat) #variance## FREQUENCYtable(dat) #frequency of each value
The lm()
function creates a linear regression model in R. The glm()
function creates a logistic regression model in R.
These functions take a formula Y ~ X
where Y
is the outcome variable and X
is the predictor variable. We can add additional predictor variables using +
.
A summary of these models can be printed using the summary()
function.
## Linear regression modeltemp_lm <- lm(temp ~ month + region, data = world)summary(temp_lm) #print summary## Logistic regression modelwinning_glm <- glm(win ~ ranking + home + starting_players, data = team)summary(winning_glm) #print summary
To make predictions of the outcome variable using a regression model, we need a dataset whose column names match the names of the coefficients in the model. Once establishing the data to make predictions about, we can use the predict()
function to generate predictions. This will produce 1 predicted outcome for each observation in this new dataset.
## Create linear regression modellm1 <- lm(y ~ x1 + x2 + x3, data = dat)## Establish data to make predictions aboutpred_data <- data.frame(x1 = c(0, 1, -1),x2 = c(1, 6, 5),x3 = c(10, -4, 9))## Make predictionspredict(lm1, pred_data)
Invoking the ggplot()
function returns an object that serves as the base of a ggplot2 visualization.
viz <- ggplot()viz # renders blank plot
Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the ggplot()
function call. Layers can be added to the plot object by adding function calls after ggplot()
with a +
plus sign. These functions have access to the data frame and can use the column names as variables.
For example, consider a data frame sales
with the columns cost
and profit
. To assign the data frame sales
to the ggplot()
object that is initialized:
viz <- ggplot(data=sales) +geom_point(aes(x=cost, y=profit))viz # renders plot
In the example above:
sales
assigned to itgeom_point
layer used the cost
and profit
columns to define the scales of the axes for that particular geom. Notice that it referred to those columns with their column names.In ggplot2 aesthetics are the instructions that determine the visual properties of a plot and its geometries.
Examples of ggplot2 aesthetics include:
Aesthetics are set either manually or by aesthetic mappings. Aesthetic mappings “map” variables from the bound data frame to visual properties in the plot. These mappings are provided in two ways using the aes()
mapping function:
ggplot()
.For example, the following code assigns aes()
mappings for the x
and y
scales at the canvas level:
viz <- ggplot(data=airquality, aes(x=Ozone, y=Temp)) +geom_point() +geom_smooth()
In the example above:
aes()
aesthetic mapping function as an additional argument to ggplot()
.geom_point()
and geom_smooth()
use the scales defined inside the aesthetic mapping assigned at the canvas level.You could create the same plot by setting the aesthetics at the geom level, as follows:
viz <- ggplot(data=airquality) +geom_point(aes(x=Ozone, y=Temp)) +geom_smooth(aes(x=Ozone, y=Temp))