Date Time Data in Python

Codecademy Team
Learn how to work with date and time data in Python.

Introduction and motivation

When working with data, a common type of information you might be handling is date and time data. For example, this could be the timestamp of a purchase order in an e-commerce dataset, or the start and end date of an event for a calendar application.

Let’s take a look at the example dataset below, which contains events data with start and end date information. We used Python’s pandas library to create the DataFrame:

import pandas as pd
df = pd.DataFrame(data={
'event_id': [1, 2, 3],
'event_start': ['2021-01-01', '2021-02-01', '2021-03-01'],
'event_end': ['2021-01-10', '2021-02-20', '2021-03-30']
})
event_id event_start event_end
0 1 2021-01-01 2021-01-10
1 2 2021-02-01 2021-02-20
2 3 2021-03-01 2021-03-30

By default, pandas interprets the inputted dates as strings. We can verify that by checking the type of any of the values in the event_start or event_end column:

# Check type of first date in the event_start column
print(type(df.event_start[0]))
<class 'str'>

While this may not seem like an issue at first glance, there are many disadvantages to storing dates as plain text. For example, we would not be able to easily perform common operations like finding the duration between dates when they are stored as strings:

# This results in a TypeError
print(df.event_end - df.event_start)

But not to worry! The good news is that Python offers a special datetime type that makes it much easier to work with date and time data, and pandas has a corresponding Timestamp object with equivalent functionality. We can use pd.to_datetime() to cast the event_start and event_end columns to datetime and verify that each value is now a Timestamp object:

df.event_start = pd.to_datetime(df.event_start)
df.event_end = pd.to_datetime(df.event_end)
# Check type of first date in the event_start column
print(type(df.event_start[0]))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Now, we can easily find the duration between the start and end date columns using the subtraction operator:

print(df.event_end - df.event_start)
0    9 days
1   19 days
2   29 days
dtype: timedelta64[ns]

This only scratches the surface of what we can do with datetimes! In the rest of the article, we will take a closer look at Python’s built-in datetime module, how to use it, and how it translates to working with pandas DataFrames.

Importing the datetime module

Python’s datetime module provides several classes for working with date and time data. The one we will be focusing on in this article is the datetime class, but there are other types available. See the datetime module documentation for more details.

We can use the following line of code to import the datetime class from the datetime module. Note that the first datetime refers to the module and the second refers to the class:

from datetime import datetime

Creating a datetime object

To create a new instance of the datetime class, we need to supply the year, month, and day arguments, and optionally any time information, otherwise midnight will be assumed. This creates an object that represents the specified date and time.

For example, let’s create a datetime object representing the start of the 21st century, January 1st of 2000:

# First 3 arguments are year, month, and day
twentyfirst_century = datetime(2000, 1, 1)

This would be equivalent to specifying 0 for the hour, minute, and second arguments, since 0 is already the default value for these arguments:

# Next 3 arguments are hour, minute, and second
twentyfirst_century = datetime(2000, 1, 1, 0, 0, 0)

We can print twentyfirst_century to view a string representation of the object:

print(twentyfirst_century)
2000-01-01 00:00:00

Great! Now that we’ve created a datetime object, we can access all the attributes and methods available to it. See the datetime class documentation for more details.

For example, we can obtain each individual component of the datetime object from the following attributes:

  • twentyfirst_century.year: Returns 2000
  • twentyfirst_century.month: Returns 1
  • twentyfirst_century.day: Returns 1
  • twentyfirst_century.hour: Returns 0
  • twentyfirst_century.minute: Returns 0
  • twentyfirst_century.second: Returns 0

We can also get the day of the week of the datetime object by calling one of the following instance methods:

  • twentyfirst_century.weekday(): Returns 5 (where Monday is 0 and Sunday is 6)
  • twentyfirst_century.isoweekday(): Returns 6 (where Monday is 1 and Sunday is 7)

As seen, the first day of the 21st century was a Saturday.

Getting the current date and time

The datetime class also offers a handy method .now() for getting the current local date and time. Because this is a class method, we can call it on the datetime class, rather than an instance of the class. This method returns a datetime object representing the current date and time, which we store in current_datetime below:

current_datetime = datetime.now()

Again, we can use the print() function to view a string representation of the datetime object:

print(current_datetime)
2021-06-01 02:12:35.773280

Finding duration between two datetime objects

Now that we have two datetime objects, twentyfirst_century and current_datetime, let’s try finding the duration between them. As previewed in the introduction of this article, having datetime objects make operations like this very simple! All we’ll need to do is subtract the dates to get the duration:

print(current_datetime - twentyfirst_century)
7822 days, 2:12:35.773280

Parsing dates and times

Something else that the datetime class offers is the functionality to parse a string containing date and time information into a datetime object. This is really useful if you have existing date and time data that you want to convert into datetime type.

Parsing dates and times can be done using the class method .strptime(). The first argument to the method is the string containing the date and time information, and the second argument specifies how the string should be parsed. The latter can be done with the help of format codes like the ones shown in the table below. More information about format codes can be found in the documentation.

Code Meaning Example
%d Two-digit day 01-31
%a Weekday abbreviation Sun
%A Weekday Sunday
%m Two-digit month 01-12
%b Month abbreviation Jan
%B Month January
%y Two-digit year 09
%Y Four-digit year 2009

In the example below, we are parsing the string 'Jan 1, 2000' by telling .strptime() that the string is formatted as '%b %d, %Y', where %b is the abbreviated month name, %d is the day of the month, and %Y is the four-digit year:

parsed_date = datetime.strptime('Jan 1, 2000', '%b %d, %Y')

We can verify that the parsed_date above was correctly parsed by printing out the object:

print(parsed_date)
2000-01-01 00:00:00

Working with dates and times in a pandas DataFrame

Awesome! Now that we have a better idea of Python’s datetime class, let’s circle back and see how we can work with dates and times in the context of pandas DataFrames.

In the first example we saw with the events data, pandas was able to successfully parse our dates and convert the date columns to datetime using pd.to_datetime() — but that might not always be the case. If automatic parsing fails, we can manually provide the format argument for pd.to_datetime(), similar to how we would for datetime.strptime():

df = pd.DataFrame(data={
'event_id': [1, 2, 3],
'event_start': ['01.01.2021', '01.02.2021', '01.03.2021'],
'event_end': ['10.01.2021', '20.02.2021', '30.03.2021']
})
# Manually specify how to parse the datetime columns
df.event_start = pd.to_datetime(df.event_start, format='%d.%m.%Y')
df.event_end = pd.to_datetime(df.event_end, format='%d.%m.%Y')

Once we have the date columns in the right type, we can perform operations like subtracting the start and end date columns to find the duration, similar to how we can with native Python datetime objects. Let’s store this information in a new column in our DataFrame:

df['event_duration'] = df.event_end - df.event_start
event_id event_start event_end event_duration
0 1 2021-01-01 2021-01-10 9 days
1 2 2021-02-01 2021-02-20 19 days
2 3 2021-03-01 2021-03-30 29 days

What else can we do with the datetime columns? As it turns out, pandas Series has a handy .dt accessor that allows us to easily access datetime properties for a column. Below, we will create new columns in our DataFrame to store the end date’s year, month, day, and day of week info as accessed through the accessor:

df['end_year'] = df.event_end.dt.year
df['end_month'] = df.event_end.dt.month
df['end_day'] = df.event_end.dt.day
df['end_weekday'] = df.event_end.dt.weekday
df['end_day_name'] = df.event_end.dt.day_name()
event_id event_start event_end event_duration end_year end_month end_day end_weekday end_day_name
0 1 2021-01-01 2021-01-10 9 days 2021 1 10 6 Sunday
1 2 2021-02-01 2021-02-20 19 days 2021 2 20 5 Saturday
2 3 2021-03-01 2021-03-30 29 days 2021 3 30 1 Tuesday

Note that the properties accessible via .dt are not exactly equivalent to the attributes and methods available for a datetime object. Rather, they provide a convenient way of returning datetime information for all values in the Series. On the other hand, if we look at each individual value that make up a datetime column, those are more analogous to Python datetime objects.

Recall that each value in a datetime column is a pandas Timestamp object, which is described to be the pandas equivalent of Python’s datetime object. As such, Timestamps come with all the same attributes and methods that we are familiar with from a datetime object.

For example, we can call the .weekday() and .isoweekday() methods on each Timestamp the same way we could for a datetime object:

# Call methods on first date in the event_start column, which is a Timestamp object
print(df.event_start[0].weekday())
print(df.event_start[0].isoweekday())
4
5

We can also apply this to an entire column in the DataFrame by using a lambda function:

df['start_weekday'] = df.event_start.apply(lambda x: x.weekday())
df['start_isoweekday'] = df.event_start.apply(lambda x: x.isoweekday())
event_id event_start event_end event_duration end_year end_month end_day end_weekday end_day_name start_weekday start_isoweekday
0 1 2021-01-01 2021-01-10 9 days 2021 1 10 6 Sunday 4 5
1 2 2021-02-01 2021-02-20 19 days 2021 2 20 5 Saturday 0 1
2 3 2021-03-01 2021-03-30 29 days 2021 3 30 1 Tuesday 0 1

Conclusion

As we can see, Python’s datetime module is a powerful tool that can make working with date and time information simple and efficient. Pandas also offers equivalent capabilities for handling datetime data in the context of a DataFrame. Without these datetime types, working with date and time data as plain text in Python can get real messy, real quick!

This article serves only as an introduction to what can be accomplished using datetime. If you’d like to learn more, feel free to check out the Python datetime module documentation and pandas “Time series / date functionality” guide.