A data frame is an R object that stores tabular data in a table structure made up of rows and columns. You can think of a data frame as a spreadsheet or as a SQL table. While data frames can be created in R, they are usually imported with data from a CSV, an Excel spreadsheet, or a SQL query.
Data frames have rows and columns. Each column has a name and stores the values of one variable. Each row contains a set of values, one from each column. The data stored in a data frame can be of many different types: numeric, character, logical, or NA.
A data frame containing the address, age and name of students in a class could look like this:
|123 Main St.||34||John Smith|
|456 Maple Ave.||28||Jane Doe|
|789 Broadway||51||Joe Schmo|
As seen in the first row, the column names of this data frame are
Note: when working with
dplyr, you might see functions that take a data frame as an argument and output something called a tibble. Tibbles are modern versions of data frames in R, and they operate in essentially the same way. The terms tibble and data frame are often used interchangeably. Here on Codecademy we will use the term data frame!
The code in
notebook.Rmd loads a data frame named
songs that contains data about 7 songs from popular music groups (you’ll learn how to load a data frame yourself shortly).
songs in the empty code block and run the code to view the data frame. Make sure to click the arrow to explore each column!