Articles

A Guide to Python Data Structures

Learn the fundamentals of Python data structures in this comprehensive guide, covering different types, examples, and ideal scenarios for using them efficiently.

In programming, data is the backbone of every application. Whether we’re creating a web service, analyzing data, or building a game, the way we organize and manage our data plays a crucial role in our program’s performance and efficiency. This is where data structures come in.

In this guide, we’ll explore the importance of data structures in programming and take a close look at Python’s built-in and user-defined data structures. Besides that, we’ll also offer insights on how to choose the most appropriate data structure based on the problem we’re solving.

Let’s start by discussing what data structures are in Python.

What are data structures in Python?

In general, data structures are specialized formats used to store, organize, and retrieve data efficiently. Python, popular for its simplicity and readability, offers a variety of powerful data structures that help developers handle data in a clean and efficient manner. From organizing items in a list to managing complex relationships in a graph, Python’s built-in and user-defined data structures provide the tools needed to solve a wide range of problems.

Using the right data structure is critical to building efficient and scalable software. The structure we choose determines how quickly and effectively our program can store, access, and manipulate data. A well-chosen data structure can drastically reduce execution time and improve responsiveness, while a poor choice can result in weak performance or even system failures under heavy loads.

Here is a tree map that visualizes the classification of data structures in Python:

A tree map that visualizes the classification of data structures in Python

In the next section, let’s discover the different types of data structures in Python.

Related Course

Linear Data Structures

Learn about virtualization of computer memory by building the fundamental data structures of computer science: lists, stacks, and queues.Try it for free

Types of data structures in Python

Python provides a handful of data structures to handle data in different ways depending on the problem at hand. These data structures fall into two main categories:

  • Built-in data structures
  • User-defined data structures

By understanding both built-in and user-defined data structures, we can choose or create the best structure for our problem, ensuring that our program runs smoothly and efficiently.

Built-in data structures

Built-in data structures are the core tools provided by Python’s standard library. These structures are highly optimized and are used in almost every Python program. They include:

  • Lists
  • Tuples
  • Dictionaries
  • Sets

Each built-in data structure has specific strengths and use cases. For example, lists are great for maintaining a sequence of elements that may need to change. At the same time, sets are perfect when we need to eliminate duplicates or perform mathematical set operations.

Key Characteristics:

  • Integrated into the Python language: No need to import external libraries
  • High-level abstractions: Simple syntax and easy-to-read
  • Optimized performance: Fast and memory-efficient for most standard operations
  • Dynamic typing: Elements can be of different data types
  • Versatile: Support slicing, iteration, and various built-in methods

User-defined data structures

While built-in data structures are powerful, they may not always be sufficient for solving more complex or specialized problems. In such cases, Python allows developers to create their own data structures using classes. These user-defined data structures provide greater control over how data is stored and accessed. They include:

  • Arrays
  • Linked Lists
  • Stacks
  • Queues
  • Trees
  • Graphs
  • Hashmaps

These structures are especially useful when working with algorithms or systems that require specific behaviors or constraints that built-in types can’t fully support.

Key Characteristics:

  • Customizable: We define how data is stored, linked, and accessed
  • Efficient for specific use cases: Can be optimized for speed, memory, or special operations
  • Based on OOP principles: Typically implemented using classes and objects
  • Better control: Provide fine-tuned control over data manipulation
  • Support for complex relationships: Ideal for hierarchical or networked data (e.g., trees, graphs)

Now that we’ve got a good idea about built-in and user-defined data structures in Python, let’s go one step further and explore their different types one by one.

Primarily, let’s learn about the different types of built-in data structures in Python.

Built-in data structures in Python

Let’s start by discussing what lists are and how they work in Python.

Lists

A list in Python is an ordered, mutable collection of items. It is one of the most commonly used data structures in Python and can store elements of different data types, including other lists. Lists are defined using square brackets [ ] with items separated by commas.

fruits = ["apple", "banana", "cherry"]

Key Characteristics:

  • Ordered: The order of the items is preserved. Each item has a fixed index starting from 0.
  • Mutable: We can add, remove, or modify elements after the list has been created.
  • Dynamic sizing: Lists can increase or decrease in size dynamically - no need to declare their size ahead of time.
  • Heterogeneous: Lists can contain a mix of data types (e.g., integers, strings, other lists).
  • Indexable and iterable: We can access elements using indices and loop through them using loops like for or while.
  • Supports nesting: Lists can contain other lists or even more complex structures like dictionaries.

Example

Here is an example that demonstrates the usage of lists in Python:

# Creating a list
numbers = [1, 2, 3, 4, 5]
# Accessing an element
print(numbers[0])
# Modifying an element
numbers[2] = 10
print(numbers[2])
# Adding an element
numbers.append(6)
print(numbers)
# Removing an element
numbers.remove(4)
print(numbers)
# Iterating through the list
for num in numbers:
print(num)

Here is the output:

1
10
[1, 2, 10, 4, 5, 6]
[1, 2, 10, 5, 6]
1
2
10
5
6

Tuples

A tuple in Python is an ordered, immutable collection of elements. Like lists, tuples can hold a variety of data types, including other tuples or complex objects. Tuples are defined using parentheses ( ) with elements separated by commas. While they resemble lists, the key difference is that tuples cannot be modified after they are created.

coordinates = (10.5, 20.3)

Key Characteristics:

  • Ordered: Tuples maintain the element order, and each item can be accessed using its index.
  • Immutable: Once a tuple is created, its contents cannot be changed. This makes tuples usable as keys in dictionaries.
  • Faster than lists: Due to immutability, tuples are slightly more memory-efficient and perform better in situations where data doesn’t need to change.
  • Can contain duplicates: Tuples can hold repeated values.
  • Support nesting: Tuples can contain other tuples, lists, dictionaries, etc.
  • Indexable and iterable: Elements can be accessed using indexing and iterated through with loops.

Example

Here is an example that demonstrates the usage of tuples in Python:

# Creating a tuple
person = ("Alice", 30, "Engineer")
# Accessing an element
print(person[0])
# Iterating through the tuple
for item in person:
print(item)

Here is the output:

Alice
Alice
30
Engineer

Dictionaries

A dictionary in Python is an unordered, mutable collection of key-value pairs. Each key is unique and can be utilized to access its associated value. Dictionaries are incredibly powerful for representing structured data and performing fast lookups. They are defined using curly braces { } with pairs in the format key: value.

person = {"name": "Alice", "age": 30, "profession": "Engineer"}

Key Characteristics:

  • Unordered (as of older versions): In Python versions prior to 3.7, dictionaries did not maintain insertion order. From Python 3.7 onward, insertion order is preserved.
  • Key-value mapping: Data is stored in pairs where each key maps to a specific value.
  • Mutable: Values can be added, updated, or deleted after creating the dictionary.
  • Keys must be unique and immutable: Common key types include strings, numbers, and tuples. Mutable types like lists cannot be used as keys.
  • Efficient lookups: Dictionary access time is nearly constant (O(1)) for most operations, making them excellent for large datasets.
  • Flexible data types: Both keys and values can be of any data type, including other dictionaries.

Example

Here is an example that demonstrates the usage of dictionaries in Python:

# Creating a dictionary
student = { "name": "John", "age": 21, "grades": [88, 92, 79] }
# Accessing a value
print(student["name"])
# Modifying a value
student["age"] = 22
print(student["age"])
# Adding a new key-value pair
student["major"] = "Computer Science"
print(student)
# Removing a key-value pair
del student["grades"]
print(student)
# Iterating through the dictionary
for key, value in student.items():
print(f"{key}: {value}")

Here is the output:

John
22
{'name': 'John', 'age': 22, 'grades': [88, 92, 79], 'major': 'Computer Science'}
{'name': 'John', 'age': 22, 'major': 'Computer Science'}
name: John
age: 22
major: Computer Science

Sets

A set in Python is an unordered, mutable collection of unique elements. Sets are useful when we want to store a group of items without any duplicates and don’t need to maintain any specific order. They are defined using curly braces { } or the set() constructor, with items separated by commas.

unique_numbers = {1, 2, 3, 4}

Key Characteristics:

  • Unordered: The elements in a set have no defined order, and their positions may change unpredictably.
  • No duplicates: A set automatically eliminates duplicate values, making it ideal for filtering unique items.
  • Mutable: We can add or remove elements after the set is created.
  • Non-indexable: Unlike lists or tuples, we cannot access set elements by index.
  • Optimized for membership tests: Checking if a value exists in a set is very fast, typically faster than doing the same with a list.
  • Supports mathematical set operations: Sets provide built-in methods for union, intersection, difference, and symmetric difference.

Example

Here is an example that demonstrates the usage of sets in Python:

# Creating a set
fruits = {"apple", "banana", "cherry"}
# Adding an element
fruits.add("orange")
print(fruits)
# Removing an element
fruits.remove("banana")
print(fruits)
# Membership test
if "cherry" in fruits:
print("Cherry is in the set")
# Set operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}
print(set1.union(set2))
print(set1.intersection(set2))
print(set1.difference(set2))

Here is the output:

{'orange', 'banana', 'apple', 'cherry'}
{'orange', 'apple', 'cherry'}
Cherry is in the set
{1, 2, 3, 4, 5}
{3}
{1, 2}

With the different types of built-in data structures covered, let’s focus on understanding the different types of user-defined data structures in Python in the next section.

User-defined data structures in Python

We’ll start with arrays, one of the most widely used data structures in Python.

Arrays

An array is a collection of elements of the same data type, stored in contiguous memory locations. While Python’s built-in list is similar to an array and often used interchangeably, Python also supports more formal arrays through modules like array (for basic array operations) and NumPy (for advanced array operations).

The key difference between a Python list and an array (especially when using the array module) is that arrays enforce a uniform data type and offer more efficient storage and faster numeric operations.

import array
numbers = array.array("i", [11, 12, 13, 14, 15]) # "i" denotes an array of integers

Key Characteristics:

  • Homogeneous data: All elements must be of the same data type (e.g., all integers or all floats).
  • Efficient memory usage: Arrays store data more compactly than lists, making them more memory-efficient.
  • Faster numerical operations: Especially with libraries like NumPy, arrays support vectorized operations and are significantly faster than lists for mathematical computations.
  • Index-based access: Array elements can be accessed using indices.
  • Supports slicing and iteration: Arrays can be sliced and looped over.

Example

Here is an example that demonstrates the usage of arrays in Python:

import array
# Creating an array of integers
arr = array.array("i", [10, 20, 30, 40])
# Accessing an element
print(arr[0])
# Modifying an element
arr[1] = 25
print(arr[1])
# Adding an element
arr.append(50)
print(arr)
# Removing an element
arr.remove(30)
print(arr)
# Iterating through the array
for num in arr:
print(num)

Here is the output:

10
25
array('i', [10, 25, 30, 40, 50])
array('i', [10, 25, 40, 50])
10
25
40
50

Linked lists

A linked list is a data structure in which elements, called nodes, are connected using pointers. Each node contains two parts: the data and a reference (or pointer) to the next node in the sequence. Unlike arrays, linked lists do not store elements in contiguous memory locations. This makes them more flexible for certain operations, such as inserting or deleting elements without needing to shift others.

There are several types of linked lists:

  • Singly linked list: Each node only refers to the next node.
  • Doubly linked list: Each node refers to the previous and next nodes.
  • Circular linked list: The last node refers to the first node.

Key Characteristics:

  • Dynamic size: Can grow or shrink easily during runtime.
  • Efficient insertion/deletion: Especially efficient when adding or removing nodes from the beginning or middle.
  • Sequential access: Unlike arrays, elements must be accessed sequentially - no random indexing.
  • Uses more memory: Due to the storage of pointers alongside data.
  • Great for implementing other data structures: Such as stacks, queues, and graphs.

Example

Here is the implementation of a singly linked list in Python:

class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
def insert(self, data):
new_node = Node(data)
if not self.head:
self.head = new_node
return
current = self.head
while current.next:
current = current.next
current.next = new_node
def display(self):
current = self.head
while current:
print(current.data, end=' -> ')
current = current.next
print("None")
# Example usage
ll = LinkedList()
ll.insert(10)
ll.insert(20)
ll.insert(30)
ll.display()

Here is the output:

10 -> 20 -> 30 -> None

Stacks

A stack is a linear data structure that operates on the Last In, First Out (LIFO) principle. This means the last element that is added to the stack is the first one to be removed. Stacks can be implemented using Python lists or linked lists and are widely used in situations where we need to keep track of operations or data in reverse order.

Think of a stack like a stack of plates - only the topmost plate can be removed or accessed first.

Key Characteristics:

  • LIFO behavior: The last item pushed to the stack is the first one to be popped off.
  • Two primary operations:
    • push: Insert an item at the top of the stack.
    • pop: Discard an item from the top of the stack.
  • Peek/top operation: View the item at the top without removing it.
  • Multiple use cases: Used in recursion, backtracking, and parsing algorithms.

Example

Here is the implementation of a stack using lists in Python:

stack = []
# Pushing elements onto the stack
stack.append(1)
stack.append(2)
stack.append(3)
# Peeking at the top element
print("Top element:", stack[-1])
# Popping elements from the stack
print("Popped:", stack.pop())
print("Popped:", stack.pop())
# Current stack
print("Current stack:", stack)

Here is the output:

Top element: 3
Popped: 3
Popped: 2
Current stack: [1]

Queues

A queue is a linear data structure that utilizes the First In, First Out (FIFO) principle. In a queue, the first element that is added is the first one to be removed - just like people waiting in line. New elements are added at the rear (end), and elements are removed from the front (beginning).

Queues can be implemented using lists, collections.deque, or linked lists in Python, with deque being the most efficient and recommended option for performance.

Key Characteristics:

  • FIFO behavior: The element inserted first is the one removed first.
  • Two main operations:
    • enqueue: Insert an item at the end of the queue.
    • dequeue: Discard an item from the front of the queue.
  • Efficient with deque: The collections.deque class offers O(1) time complexity for append and popleft operations.
  • Multiple use cases: Used in task scheduling, buffering, and simulations.

Here is the implementation of a queue using collections.deque in Python:

from collections import deque
# Creating a queue
queue = deque()
# Enqueue elements
queue.append("A")
queue.append("B")
queue.append("C")
# Peeking at the front element
print("Front element:", queue[0])
# Dequeue elements
print("Dequeued:", queue.popleft())
print("Dequeued:", queue.popleft())
# Current queue
print("Current queue:", list(queue))

Here is the output:

Front element: A
Dequeued: A
Dequeued: B
Current queue: ['C']

Trees

A tree is a hierarchical data structure containing nodes connected by edges. It starts with a root node, and each node may have zero or more child nodes. Unlike linear structures such as arrays or linked lists, trees branch out in multiple directions, making them ideal for representing hierarchical data like file systems, organizational charts, or XML/HTML documents.

There are various types of trees, including:

  • Binary trees: Each node has at most two children (left and right).
  • Binary search trees (BST): A binary tree with the left child holding smaller values and the right child holding larger values.
  • Balanced trees (e.g., AVL trees, Red-black trees): Maintain optimal tree height for efficient operations.
  • Heaps: A specialized tree used to implement priority queues.

Besides them, there are tries, B-trees, and more for advanced use cases.

Key Characteristics:

  • Hierarchical structure: Nodes are arranged in levels - root, parent, child, and leaf.
  • Root node: The top-most node without a parent.
  • Leaf nodes: Nodes with no children.
  • Non-linear: Unlike lists or arrays, data in trees is organized in branches.
  • Efficient operations: Searching, insertion, and deletion can be optimized depending on the type of tree (e.g., O(log n) for balanced BST).
  • Recursively defined: Each subtree is itself a tree.

Here is the implementation of a binary tree using classes in Python:

class Node:
def __init__(self, data):
self.data = data
self.left = None
self.right = None
# Creating nodes
root = Node(10)
root.left = Node(5)
root.right = Node(15)
# Accessing data
print("Root:", root.data)
print("Left child:", root.left.data)
print("Right child:", root.right.data)

Here is the output:

Root: 10
Left child: 5
Right child: 15

Graphs

A graph is a non-linear data structure that includes a set of nodes (also known as vertices) connected by edges. Graphs are used to model relationships between pairs of objects and are especially useful for representing networks such as social connections, computer networks, and maps.

Graphs can be:

  • Directed or undirected: In directed graphs, edges have a direction (from one vertex to another); in undirected graphs, they do not.
  • Weighted or unweighted: Edges may carry weights (e.g., distances or costs) or not.
  • Cyclic or acyclic: Graphs may contain cycles (a path from a node back to itself) or not.

Key Characteristics:

  • Non-linear: Unlike lists or trees, there’s no hierarchical or linear order.
  • Flexible structure: Can represent many-to-many relationships.
  • Nodes (vertices): Entities in the graph.
  • Edges: Connections between nodes, which may be directed and/or weighted.
  • Efficient modeling of complex systems: Graphs are ideal for network representation.

Here is the implementation of an undirected graph using a dictionary of lists in Python:

class Graph:
def __init__(self):
self.graph = {}
def add_edge(self, node1, node2):
self.graph.setdefault(node1, []).append(node2)
self.graph.setdefault(node2, []).append(node1)
def display(self):
for node in self.graph:
print(f"{node}: {self.graph[node]}")
# Example usage
g = Graph()
g.add_edge("A", "B")
g.add_edge("A", "C")
g.add_edge("B", "D")
g.display()

Here is the output:

A: ['B', 'C']
B: ['A', 'D']
C: ['A']
D: ['B']

Hashmaps

A hashmap is a data structure that stores data in key-value pairs and allows for fast data retrieval based on the key. It uses a hash function for computing an index (or hash code) into an array of slots, from which the desired value can be found. In Python, the built-in dict type is essentially a highly optimized implementation of a hashmap.

While Python dictionaries are typically used in practice, understanding hashmaps as a concept is crucial for grasping how efficient lookups, insertions, and deletions are achieved.

Key Characteristics:

  • Key-value mapping: Each value is associated with a unique key.
  • Fast lookup: Average-case time complexity of O(1) for insertion, deletion, and access.
  • Hash function: Maps keys to indices where their values are stored.
  • Handles collisions: Uses techniques like chaining or open addressing when two keys hash to the same index.
  • Unordered (as of older versions): In Python versions prior to 3.7, hashmaps did not maintain insertion order. From Python 3.7 onward, insertion order is preserved.
  • Mutable: Keys must be immutable types (e.g., strings, numbers, tuples), but values can be of any type.

Here is an example that demonstrates the usage of hashmaps in Python:

# Creating a hashmap using a dictionary
person = {
"name": "Alice",
"age": 30,
"city": "New York"
}
# Accessing values
print(person["name"])
# Adding a new key-value pair
person["email"] = "[email protected]"
print(person)
# Updating a value
person["age"] = 31
print(person["age"])
# Deleting a key-value pair
del person["city"]
print(person)
# Iterating over keys and values
for key, value in person.items():
print(f"{key}: {value}")

Here is the output:

Alice
{'name': 'Alice', 'age': 30, 'city': 'New York', 'email': '[email protected]'}
31
{'name': 'Alice', 'age': 31, 'email': '[email protected]'}
name: Alice
age: 31

We’re now done discussing the different types of built-in and user-defined data structures In Python. Next, let’s have a look at some factors that we should consider while choosing a data structure.

Choosing the right data structure

Selecting a suitable data structure is critical in solving any programming problem efficiently. The right choice can drastically improve our code’s performance, readability, and maintainability.

Here is a table that shows the recommended data structure(s) for different use cases:

Requirement Recommended Data Structure
Fast lookup by key Dictionary (Hashmap)
Maintaining insertion order List or collections.OrderedDict
Storing unique elements Set
Undo/redo functionality Stack
Task scheduling (FIFO) Queue (collections.deque)
Hierarchical data Tree
Network connections or maps Graph
Immutable sequence Tuple

By carefully understanding the problem at hand and evaluating the strengths and trade-offs of each data structure, we can write more efficient and robust Python programs.

Conclusion

In this guide, we covered the basics of data structures in Python, including built-in types like lists, tuples, dictionaries, and sets, and user-defined types such as arrays, linked lists, stacks, queues, trees, graphs, and hashmaps. We also discussed how to choose the right data structure based on our specific programming needs.

Understanding and effectively using data structures is crucial for writing optimized Python code. As we continue to develop in Python, knowing when and how to use these structures will significantly impact the quality and performance of our applications.

If you want to learn more about data structures in Python, check out the Learn Data Structures and Algorithms with Python course on Codecademy.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team