A Complete Guide to Python Generators
What are Python generators?
What if a function could generate a massive sequence of numbers without storing them all in memory? Instead of loading everything at once, it could produce each value only when needed, making it far more efficient. That’s precisely how Python generators handle large-scale data processing.
Generators in Python are a special type of iterable that generates values on demand rather than storing them in memory. Unlike lists, which allocate memory for all their elements upfront, generators yield values one at a time, making them ideal for processing large datasets, streaming data, or working with infinite sequences.
A typical scenario where generators shine is reading large files. Instead of loading an entire file into memory, a generator can read and process it line by line, significantly reducing memory consumption. Similarly, developers frequently use generators in real-time data streams, pagination systems, and computations that require intermediate values without storing the entire dataset in memory.
Next, let’s explore how to create a generator and understand its syntax.
Learn Intermediate Python 3: Iterators and Generators
Learn how to create and implement your own iterators and generators in Python.Try it for freeCreating a generator in Python
A generator in Python is defined as a regular function but uses the yield
keyword to generate values one at a time. Each time yield
is encountered, the generator produces a value and pauses execution, preserving its state until the next value is requested.
The syntax for creating a generator in Python is:
def generator_function():yield value # Produces a value and pauses execution
To understand the above syntax, let’s look at an example.
A common way to generate a sequence of numbers is by returning a list:
def count_up_to_list(n):return list(range(1, n + 1))# Using the functionnumbers = count_up_to_list(5)print(numbers)
While this approach works, it has a drawback: for large values of n
, storing all numbers in memory at once can be inefficient. Instead of keeping the entire list in memory, generating values only when needed would be better.
Generators solve this problem. Instead of returning a complete list, a generator yields values one at a time:
def count_up_to(n):count = 1while count <= n:yield countcount += 1# Using the generatorcounter = count_up_to(5)for num in counter:print(num)
The output in both cases would be:
12345
When using generators, the function doesn’t create a list. Instead, it pauses at each yield
, returning the following number only when requested, making generators far more memory-efficient, especially for large sequences or streaming data.
Now that we’ve explored how to create generators and their advantages let’s look at how yield
and next()
control the execution flow in a generator.
Using next()
in Python generators
The next()
function in Python fetches values from a generator. Here’s how it interacts with yield:
First call to
next()
: Executes the function up to the firstyield
, returns the value, and pauses execution.Subsequent calls to
next()
: Resumes execution right after the previousyield
, runs until the next yield, and returns the new value.When there are no more
yield
statements: AStopIteration
exception is raised.
Hence, if we consider the example:
def count_up_to(n):count = 1while count <= n:yield count # Pauses execution and returns the current countcount += 1 # Resumes from here in the next call# Using the generatorcounter = count_up_to(3)# Fetching values manually using next()print(next(counter)) # Output: 1print(next(counter)) # Output: 2print(next(counter)) # Output: 3print(next(counter)) # Raises StopIteration
Calling count_up_to(3)
does not immediately execute the function but returns a generator object. Values are retrieved using next()
until there are no more yield statements, at which point a StopIteration
exception is raised.
To avoid the StopIteration
exception, generators are typically used inside loops, which automatically handle this exception:
for num in counter:print(num)
Using loops ensures the generator is fully iterated without needing explicit calls to next()
.
Now that we’ve seen how generators work using yield
and next()
, let’s explore a more concise way to create them using generator expressions.
Python generator expression
Python provides generator expressions or generator comprehension as a concise way to create generators without using a function and the yield
keyword. Instead, they use parentheses ()
to define an iterable sequence.
The syntax of a generator expression is:
generator = (expression for item in iterable if condition)
Python generator expressions have a similar structure to list comprehensions but generate values one at a time instead of storing them in memory.
Let’s look at an example. To write a generator expression that doubles the numbers in a range, it would look like this:
numbers = (x * 2 for x in range(5))print(next(numbers)) # Output: 0print(next(numbers)) # Output: 2print(next(numbers)) # Output: 4
In this example, each call to next(numbers)
retrieves the next computed value, just like a generator function with yield
.
If we were to use list comprehension, it would have been like this:
# List comprehension (creates a full list in memory)nums_list = [x * 2 for x in range(5)]# Generator expression (generates values on demand)nums_gen = (x * 2 for x in range(5))print(sum(nums_list)) # Works normallyprint(sum(nums_gen)) # Also works, but without storing all values
Now that we’ve explored generator expressions and how they offer a memory-efficient alternative to list comprehensions, let’s discuss real-world applications of Python generators.
Applications of generators
Generators play a crucial role in real-world scenarios that require efficient processing of large amounts of data.
Here are some common real-world use cases for generators:
1. Processing large files line by line
Instead of loading an entire file into memory, generators can read one line at a time, significantly reducing memory usage.
Here is an example where a generator function reads a large file efficiently:
def read_large_file(file_path):with open(file_path, "r") as file:for line in file:yield line.strip()for line in read_large_file("large_text_file.txt"):print(line) # Processes one line at a time
2. Streaming API responses
When consuming data from an API, using a generator prevents loading all responses at once.
Here is an example where a generator is used to efficiently process streaming API data without loading all responses into memory at once:
import requestsdef stream_api_data(url):response = requests.get(url, stream=True)for line in response.iter_lines():yield linefor data in stream_api_data("https://api.example.com/stream"):print(data) # Processes each chunk lazily
3. Implementing infinite sequences
Generators enable producing endless sequences without exhausting memory.
Here is an example where a generator is used to produce an infinite sequence of Fibonacci numbers without exhausting memory:
def fibonacci():a, b = 0, 1while True:yield aa, b = b, a + bfib_gen = fibonacci()for _ in range(10):print(next(fib_gen)) # Outputs Fibonacci numbers one at a time
4. Lazy-loading data in applications
Many features in an application, such as database query results or paginated data, benefit from generators for lazy evaluation instead of loading everything at once.
Here is an example where a generator is used to fetch paginated API data efficiently, processing results lazily instead of loading everything at once:
import requestsdef fetch_paginated_data(api_url):page = 1while True:response = requests.get(f"{api_url}?page={page}")data = response.json()if not data["results"]: # Stop if no more resultsbreakyield from data["results"] # Yield each item lazilypage += 1 # Move to the next page# Example usageapi_url = "https://api.example.com/data"for item in fetch_paginated_data(api_url):print(item) # Process each item lazily
Now that we know and understand how powerful generators can be in real-world scenarios, we need to learn how to leverage their benefits and write efficient, maintainable code. Let’s explore key guidelines for using generators effectively.
Best practices for using generators
It is essential to follow best practices for performance, readability, and error handling to get the most out of generators:
Keep generators simple: Focus on yielding values without modifying external data. Each generator should have a single, clear purpose.
Handle errors carefully: Use
try-except
to prevent generators from stopping unexpectedly when processing large datasets.Use generators for large data: Generators efficiently handle big files or long sequences.
Iterate only when needed: Generators don’t store values, so looping over them multiple times exhausts them, making subsequent iterations yield no results. To reuse values, consider converting the generator to a list or another iterable data structure.
For example:
gen = (x for x in range(3))print(list(gen)) # [0, 1, 2]print(list(gen)) # [] (empty because the generator is exhausted)gen_list = list(x for x in range(3))print(gen_list) # [0, 1, 2]print(gen_list) # [0, 1, 2] (can be reused)Combine small generators: Chain multiple generators instead of making one complex function.
Conclusion
Python generators efficiently work with large datasets, streaming data, and infinite sequences without exhausting memory. By yielding values one at a time, they enable better performance and resource management.
Understanding when to use generators, how to write them effectively, and following best practices ensures code remains efficient and maintainable. Whether processing large files, handling API responses, or implementing lazy-loaded data, generators are a powerful tool every Python developer should have in their toolkit.
To learn more Python concepts, check out the Learn Python 3 course on Codecademy.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
How to Concatenate List in Python
Learn how to concatenate lists in Python using `+`, `*`, `for` loop, list comprehension, `extend()`, and `itertools.chain()`. Compare the methods for their efficiency. - Article
Python Glossary
Programming reference for Python - Article
Exception & Error Handling in Python
Learn how to handle Python exceptions using try-except blocks, avoid crashes, and manage errors efficiently. Explore Python error-handling techniques, including built-in exceptions, custom exceptions, and best practices.
Learn more on Codecademy
- Free course
Learn Intermediate Python 3: Iterators and Generators
Learn how to create and implement your own iterators and generators in Python.Intermediate4 hours - Course
Learn Intermediate Python 3
Learn Intermediate Python 3 and practice leveraging Python’s unique features to build powerful, sophisticated applications.With CertificateIntermediate20 hours - Career path
Data Engineer
A data engineer builds the pipelines to connect data input to analysis.Includes 17 CoursesWith CertificateBeginner Friendly90 hours