StreetEasy is New York City’s leading real estate marketplace — from studios to high-rises, Brooklyn Heights to Harlem.
In this lesson, you will be working with a dataset that contains a sample of 5,000 rentals listings in Manhattan
, Brooklyn
, and Queens
, active on StreetEasy in June 2016.
It has the following columns:
rental_id
: rental IDrent
: price of rent in dollarsbedrooms
: number of bedroomsbathrooms
: number of bathroomssize_sqft
: size in square feetmin_to_subway
: distance from subway station in minutesfloor
: floor numberbuilding_age_yrs
: building’s age in yearsno_fee
: does it have a broker fee? (0 for fee, 1 for no fee)has_roofdeck
: does it have a roof deck? (0 for no, 1 for yes)has_washer_dryer
: does it have washer/dryer in unit? (0/1)has_doorman
: does it have a doorman? (0/1)has_elevator
: does it have an elevator? (0/1)has_dishwasher
: does it have a dishwasher (0/1)has_patio
: does it have a patio? (0/1)has_gym
: does the building have a gym? (0/1)neighborhood
: (ex: Greenpoint)borough
: (ex: Brooklyn)
More information about this dataset can be found in the StreetEasy Dataset article.
Let’s start by doing exploratory data analysis to understand the dataset better. We have broken the dataset for you into:
Instructions
First, pick a borough out of the three (Manhattan
, Brooklyn
, and Queens
) that you are most interested in!
We are going to import the dataset and store it in a variable called df
.
To import, we will need to run this snippet:
pd.read_csv("path")
Replace path
with one of the three URL’s above.
Let’s take a look at the first few rows using df.head()
:
- How far is the apartment in the third row from a subway station?
- Which neighborhood is it in?