Functional programming is widely applicable in the data science domain as higher-order functions can be used to process data files efficiently. One of the most common formats for a data file is a CSV file (comma-separated value). In this exercise and the next, we will work with the three higher-order functions to process data contained in a CSV file. For this exercise, we will use a CSV file containing housing data as an example.
The file zillow.csv contains housing data collected by the American real-estate company Zillow for the city of Tallahassee Florida. In this exercise we will use map()
to import the data from the CSV file and represent it using a namedtuple
.
The data in the file is represented by the following table (which shows the first five lines):
Index | Square footage | Year | List price (USD) |
---|---|---|---|
1 | 2222 | 1981 | 250000 |
2 | 1628 | 2009 | 185000 |
3 | 3824 | 1954 | 399000 |
4 | 1137 | 1993 | 150000 |
5 | 3560 | 1973 | 315000 |
To work with this file, we must first open it and create a reader object (an iterator) that will read the file:
import csv with open('zillow.csv', newline = '') as csvfile: reader = csv.reader(csvfile, delimiter=',', quotechar='|')
A CSV file often contains millions of lines of data. Importing the entire contents of a CSV file is impractical as this would occupy too much RAM resulting in poor program performance. to avoid importing all the data at once, reader
is an iterator object that maintains a pointer to the file and iterates through the data when next(reader)
is called.
We create a namedtuple
to represent each record:
from collections import namedtuple house = namedtuple("house", ["index", "square_footage", "year", "list_price"])
We use the map()
function to read in a line of data and store it in a tuple. Incoming data arrives as a list of strings regardless of their intended type; therefore we must cast each element to convert it to the proper type. The order of the data is dictated by the first line in a CSV file.
This is done like so:
# Read record into namedtuple: house(index, square_footage, year, list_price) h = map(lambda x: house(int(x[0]), int(x[1]), int(x[2]), int(x[3])), reader)
Because the amount of data in the zillow
file is small, it would be okay to create a tuple called houses
and populate it with all individual house tuples generated like so:
print(tuple(h))
In the next exercise, we will see how we can use the higher-order functions to process data contained in the CSV file.
Instructions
Create a namedtuple
called tree
intended to store tuples from the records of the provided CSV file trees.csv
. The CSV file will store the following data for a tree entry:
index
(inches)width
(inches)height
(feet)volume
(in ft3).
Create an iterator called mapper
that will “map” the records to tuples of type tree
.
Note: the index and height are of type int
; the width and volume are of type float
.
Uncomment the following lines after creating mapper
:
#trees = tuple(mapper) #print(trees)