Because R is mainly a statistical processing software, summary statistics come standard with base R functionality.
median()to calculate average of a vector.
range()to see the range of a vector.
var()to calculate the spread of a vector.
table()to view the frequency of each value in a vector.
## AVERAGEmean(dat) #meanmedian(dat) #median## RANGEmin(dat) #minimum valuemax(dat) #maximum valuerange(dat) #minimum and maximum## SPREADsd(dat) #standard deviationvar(dat) #variance## FREQUENCYtable(dat) #frequency of each value
ggplot() function returns an object that serves as the base of a ggplot2 visualization.
viz <- ggplot() viz # renders blank plot
Data is bound to a ggplot2 visualization by passing a data frame as the first argument in the
ggplot() function call. Layers can be added to the plot object by adding function calls after
ggplot() with a
+ plus sign. These functions have access to the data frame and can use the column names as variables.
For example, consider a data frame
sales with the columns
profit. To assign the data frame
sales to the
ggplot() object that is initialized:
viz <- ggplot(data=sales) + geom_point(aes(x=cost, y=profit)) viz # renders plot
In the example above:
salesassigned to it
geom_pointlayer used the
profitcolumns to define the scales of the axes for that particular geom. Notice that it referred to those columns with their column names.
In ggplot2 aesthetics are the instructions that determine the visual properties of a plot and its geometries.
Examples of ggplot2 aesthetics include:
Aesthetics are set either manually or by aesthetic mappings. Aesthetic mappings “map” variables from the bound data frame to visual properties in the plot. These mappings are provided in two ways using the
aes() mapping function:
For example, the following code assigns
aes() mappings for the
y scales at the canvas level:
viz <- ggplot(data=airquality, aes(x=Ozone, y=Temp)) + geom_point() + geom_smooth()
In the example above:
aes()aesthetic mapping function as an additional argument to
geom_smooth()use the scales defined inside the aesthetic mapping assigned at the canvas level.
You could create the same plot by setting the aesthetics at the geom level, as follows:
viz <- ggplot(data=airquality) + geom_point(aes(x=Ozone, y=Temp)) + geom_smooth(aes(x=Ozone, y=Temp))
lm() function creates a linear regression model in R. The
glm() function creates a logistic regression model in R.
These functions take a formula
Y ~ X where
Y is the outcome variable and
X is the predictor variable. We can add additional predictor variables using
A summary of these models can be printed using the
## Linear regression modeltemp_lm <- lm(temp ~ month + region, data = world)summary(temp_lm) #print summary## Logistic regression modelwinning_glm <- glm(win ~ ranking + home + starting_players, data = team)summary(winning_glm) #print summary
To make predictions of the outcome variable using a regression model, we need a dataset whose column names match the names of the coefficients in the model. Once establishing the data to make predictions about, we can use the
predict() function to generate predictions. This will produce 1 predicted outcome for each observation in this new dataset.
## Create linear regression modellm1 <- lm(y ~ x1 + x2 + x3, data = dat)## Establish data to make predictions aboutpred_data <- data.frame(x1 = c(0, 1, -1),x2 = c(1, 6, 5),x3 = c(10, -4, 9))## Make predictionspredict(lm1, pred_data)