Learn

Functional programming is widely applicable in the data science domain as higher-order functions can be used to process data files efficiently. One of the most common formats for a data file is a CSV file (comma-separated value). In this exercise and the next, we will work with the three higher-order functions to process data contained in a CSV file. For this exercise, we will use a CSV file containing housing data as an example.

The file zillow.csv contains housing data collected by the American real-estate company Zillow for the city of Tallahassee Florida. In this exercise we will use map() to import the data from the CSV file and represent it using a namedtuple.

The data in the file is represented by the following table (which shows the first five lines):

Index Square footage Year List price (USD)
1 2222 1981 250000
2 1628 2009 185000
3 3824 1954 399000
4 1137 1993 150000
5 3560 1973 315000

To work with this file, we must first open it and create a reader object (an iterator) that will read the file:

import csv with open('zillow.csv', newline = '') as csvfile: reader = csv.reader(csvfile, delimiter=',', quotechar='|')

A CSV file often contains millions of lines of data. Importing the entire contents of a CSV file is impractical as this would occupy too much RAM resulting in poor program performance. to avoid importing all the data at once, reader is an iterator object that maintains a pointer to the file and iterates through the data when next(reader) is called.

We create a namedtuple to represent each record:

from collections import namedtuple house = namedtuple("house", ["index", "square_footage", "year", "list_price"])

We use the map() function to read in a line of data and store it in a tuple. Incoming data arrives as a list of strings regardless of their intended type; therefore we must cast each element to convert it to the proper type. The order of the data is dictated by the first line in a CSV file.

This is done like so:

# Read record into namedtuple: house(index, square_footage, year, list_price) h = map(lambda x: house(int(x[0]), int(x[1]), int(x[2]), int(x[3])), reader)

Because the amount of data in the zillow file is small, it would be okay to create a tuple called houses and populate it with all individual house tuples generated like so:

print(tuple(h))

In the next exercise, we will see how we can use the higher-order functions to process data contained in the CSV file.

Instructions

1.

Create a namedtuple called tree intended to store tuples from the records of the provided CSV file trees.csv. The CSV file will store the following data for a tree entry:

  • index (inches)
  • width (inches)
  • height (feet)
  • volume (in ft3).
2.

Create an iterator called mapper that will “map” the records to tuples of type tree.

Note: the index and height are of type int; the width and volume are of type float.

Uncomment the following lines after creating mapper:

#trees = tuple(mapper) #print(trees)

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?