Date Time Data in Python
Introduction and motivation
When working with data, a common type of information you might be handling is date and time data. For example, this could be the timestamp of a purchase order in an e-commerce dataset, or the start and end date of an event for a calendar application.
Let’s take a look at the example dataset below, which contains events data with start and end date information. We used Python’s pandas library to create the DataFrame:
import pandas as pddf = pd.DataFrame(data={'event_id': [1, 2, 3],'event_start': ['2021-01-01', '2021-02-01', '2021-03-01'],'event_end': ['2021-01-10', '2021-02-20', '2021-03-30']})
Output:
event_id | event_start | event_end | |
---|---|---|---|
0 | 1 | 2021-01-01 | 2021-01-10 |
1 | 2 | 2021-02-01 | 2021-02-20 |
2 | 3 | 2021-03-01 | 2021-03-30 |
By default, pandas interprets the inputted dates as strings. We can verify that by checking the type of any of the values in the event_start
or event_end
column:
# Check type of first date in the event_start columnprint(type(df.event_start[0]))
Output:
<class 'str'>
While this may not seem like an issue at first glance, there are many disadvantages to storing dates as plain text. For example, we would not be able to easily perform common operations like finding the duration between dates when they are stored as strings:
# This results in a TypeErrorprint(df.event_end - df.event_start)
But not to worry! The good news is that Python offers a special datetime
type that makes it much easier to work with date and time data, and pandas has a corresponding Timestamp object with equivalent functionality. We can use pd.to_datetime()
to cast the event_start
and event_end
columns to datetime and verify that each value is now a Timestamp object:
df.event_start = pd.to_datetime(df.event_start)df.event_end = pd.to_datetime(df.event_end)# Check type of first date in the event_start columnprint(type(df.event_start[0]))
Output:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Now, we can easily find the duration between the start and end date columns using the subtraction operator:
print(df.event_end - df.event_start)
Output:
0 9 days
1 19 days
2 29 days
dtype: timedelta64[ns]
This only scratches the surface of what we can do with datetimes! In the rest of the article, we will take a closer look at Python’s built-in datetime
module, how to use it, and how it translates to working with pandas DataFrames.
Importing the datetime
module
Python’s datetime
module provides several classes for working with date and time data. The one we will be focusing on in this article is the datetime
class, but there are other types available. See the datetime
module documentation for more details.
We can use the following line of code to import the datetime
class from the datetime
module. Note that the first datetime
refers to the module and the second refers to the class:
from datetime import datetime
Creating a datetime
object
To create a new instance of the datetime
class, we need to supply the year
, month
, and day
arguments, and optionally any time information, otherwise midnight will be assumed. This creates an object that represents the specified date and time.
For example, let’s create a datetime
object representing the start of the 21st century, January 1st of 2000:
# First 3 arguments are year, month, and daytwentyfirst_century = datetime(2000, 1, 1)
This would be equivalent to specifying 0
for the hour
, minute
, and second
arguments, since 0
is already the default value for these arguments:
# Next 3 arguments are hour, minute, and secondtwentyfirst_century = datetime(2000, 1, 1, 0, 0, 0)
We can print twentyfirst_century
to view a string representation of the object:
print(twentyfirst_century)
Output:
2000-01-01 00:00:00
Great! Now that we’ve created a datetime
object, we can access all the attributes and methods available to it. See the datetime
class documentation for more details.
For example, we can obtain each individual component of the datetime
object from the following attributes:
twentyfirst_century.year
: Returns2000
twentyfirst_century.month
: Returns1
twentyfirst_century.day
: Returns1
twentyfirst_century.hour
: Returns0
twentyfirst_century.minute
: Returns0
twentyfirst_century.second
: Returns0
We can also get the day of the week of the datetime
object by calling one of the following instance methods:
twentyfirst_century.weekday()
: Returns5
(where Monday is0
and Sunday is6
)twentyfirst_century.isoweekday()
: Returns6
(where Monday is1
and Sunday is7
)
As seen, the first day of the 21st century was a Saturday.
Getting the current date and time
The datetime
class also offers a handy method .now()
for getting the current local date and time. Because this is a class method, we can call it on the datetime
class, rather than an instance of the class. This method returns a datetime
object representing the current date and time, which we store in current_datetime
below:
current_datetime = datetime.now()
Again, we can use the print()
function to view a string representation of the datetime
object:
print(current_datetime)
Output:
2021-06-01 02:12:35.773280
Finding duration between two datetime
objects
Now that we have two datetime
objects, twentyfirst_century
and current_datetime
, let’s try finding the duration between them. As previewed in the introduction of this article, having datetime
objects make operations like this very simple! All we’ll need to do is subtract the dates to get the duration:
print(current_datetime - twentyfirst_century)
Output:
7822 days, 2:12:35.773280
Parsing dates and times
Something else that the datetime
class offers is the functionality to parse a string containing date and time information into a datetime
object. This is really useful if you have existing date and time data that you want to convert into datetime
type.
Parsing dates and times can be done using the class method .strptime()
. The first argument to the method is the string containing the date and time information, and the second argument specifies how the string should be parsed. The latter can be done with the help of format codes like the ones shown in the table below. More information about format codes can be found in the documentation.
Code | Meaning | Example |
---|---|---|
%d | Two-digit day | 01-31 |
%a | Weekday abbreviation | Sun |
%A | Weekday | Sunday |
%m | Two-digit month | 01-12 |
%b | Month abbreviation | Jan |
%B | Month | January |
%y | Two-digit year | 09 |
%Y | Four-digit year | 2009 |
In the example below, we are parsing the string 'Jan 1, 2000'
by telling .strptime()
that the string is formatted as '%b %d, %Y'
, where %b
is the abbreviated month name, %d
is the day of the month, and %Y
is the four-digit year:
parsed_date = datetime.strptime('Jan 1, 2000', '%b %d, %Y')
We can verify that the parsed_date
above was correctly parsed by printing out the object:
print(parsed_date)
Output:
2000-01-01 00:00:00
Working with dates and times in a pandas DataFrame
Awesome! Now that we have a better idea of Python’s datetime
class, let’s circle back and see how we can work with dates and times in the context of pandas DataFrames.
In the first example we saw with the events data, pandas was able to successfully parse our dates and convert the date columns to datetime using pd.to_datetime()
— but that might not always be the case. If automatic parsing fails, we can manually provide the format
argument for pd.to_datetime()
, similar to how we would for datetime.strptime()
:
df = pd.DataFrame(data={'event_id': [1, 2, 3],'event_start': ['01.01.2021', '01.02.2021', '01.03.2021'],'event_end': ['10.01.2021', '20.02.2021', '30.03.2021']})# Manually specify how to parse the datetime columnsdf.event_start = pd.to_datetime(df.event_start, format='%d.%m.%Y')df.event_end = pd.to_datetime(df.event_end, format='%d.%m.%Y')
Once we have the date columns in the right type, we can perform operations like subtracting the start and end date columns to find the duration, similar to how we can with native Python datetime
objects. Let’s store this information in a new column in our DataFrame:
df['event_duration'] = df.event_end - df.event_start
Output:
event_id | event_start | event_end | event_duration | |
---|---|---|---|---|
0 | 1 | 2021-01-01 | 2021-01-10 | 9 days |
1 | 2 | 2021-02-01 | 2021-02-20 | 19 days |
2 | 3 | 2021-03-01 | 2021-03-30 | 29 days |
What else can we do with the datetime columns? As it turns out, pandas Series has a handy .dt
accessor that allows us to easily access datetime properties for a column. Below, we will create new columns in our DataFrame to store the end date’s year, month, day, and day of week info as accessed through the accessor:
df['end_year'] = df.event_end.dt.yeardf['end_month'] = df.event_end.dt.monthdf['end_day'] = df.event_end.dt.daydf['end_weekday'] = df.event_end.dt.weekdaydf['end_day_name'] = df.event_end.dt.day_name()
Output:
event_id | event_start | event_end | event_duration | end_year | end_month | end_day | end_weekday | end_day_name | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2021-01-01 | 2021-01-10 | 9 days | 2021 | 1 | 10 | 6 | Sunday |
1 | 2 | 2021-02-01 | 2021-02-20 | 19 days | 2021 | 2 | 20 | 5 | Saturday |
2 | 3 | 2021-03-01 | 2021-03-30 | 29 days | 2021 | 3 | 30 | 1 | Tuesday |
Note that the properties accessible via .dt
are not exactly equivalent to the attributes and methods available for a datetime
object. Rather, they provide a convenient way of returning datetime information for all values in the Series. On the other hand, if we look at each individual value that make up a datetime column, those are more analogous to Python datetime
objects.
Recall that each value in a datetime column is a pandas Timestamp object, which is described to be the pandas equivalent of Python’s datetime
object. As such, Timestamps come with all the same attributes and methods that we are familiar with from a datetime
object.
For example, we can call the .weekday()
and .isoweekday()
methods on each Timestamp the same way we could for a datetime
object:
# Call methods on first date in the event_start column, which is a Timestamp objectprint(df.event_start[0].weekday())print(df.event_start[0].isoweekday())
Output:
4
5
We can also apply this to an entire column in the DataFrame by using a lambda function:
df['start_weekday'] = df.event_start.apply(lambda x: x.weekday())df['start_isoweekday'] = df.event_start.apply(lambda x: x.isoweekday())
Output:
event_id | event_start | event_end | event_duration | end_year | end_month | end_day | end_weekday | end_day_name | start_weekday | start_isoweekday | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2021-01-01 | 2021-01-10 | 9 days | 2021 | 1 | 10 | 6 | Sunday | 4 | 5 |
1 | 2 | 2021-02-01 | 2021-02-20 | 19 days | 2021 | 2 | 20 | 5 | Saturday | 0 | 1 |
2 | 3 | 2021-03-01 | 2021-03-30 | 29 days | 2021 | 3 | 30 | 1 | Tuesday | 0 | 1 |
Conclusion
As we can see, Python’s datetime
module is a powerful tool that can make working with date and time information simple and efficient. Pandas also offers equivalent capabilities for handling datetime data in the context of a DataFrame. Without these datetime types, working with date and time data as plain text in Python can get real messy, real quick!
This article serves only as an introduction to what can be accomplished using datetime. If you’d like to learn more, feel free to check out the Python datetime
module documentation and pandas “Time series / date functionality” guide.
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full team