How to Run and Use OpenAI’s GPT-OSS Locally
GPT-OSS are OpenAI’s first open-weight models in 20b and 120b parameter variants. You can download and run these models locally with tools like Ollama. In this guide we will show you how to set up and run GPT-OSS locally, then test the models with real examples such as reasoning, code challenge, and multilingual translations to understand their use cases.
What is GPT OSS?
OpenAI’s GPT-OSS series is their first open-weight release since GPT-2, and it’s a big one. There are two models: gpt-oss-20b and gpt-oss-120b, licensed under Apache 2.0, so you can run them locally, fine-tune them, and even use them commercially without jumping through legal hoops. This freedom makes them appealing to developers, researchers, and businesses that want complete control over their AI stack.
The lineup has two models named gpt-oss-20b and gpt-oss-gpt-120b.
gpt-oss-20b
OpenAPI’s gpt-oss-20b packs 21 billion parameters and can punch above its weight in reasoning and language tasks. It outperforms models like o3-mini in some benchmarks yet runs comfortably on a single consumer-grade machine with 16 GB of memory. It’s ready for quick local deployment with Ollama, vLLM, or Apple’s Metal platform, making it perfect for lightweight assistants, edge devices, and private, responsive AI apps.
gpt-oss-120b
The flagship of the series, OpenAPI’s gpt-oss-120b, carries 117 billion parameters and delivers performance on par with proprietary heavyweights like o4-mini in demanding reasoning tests. Optimized to run on a single 80 GB GPU, it’s a good option for research labs and enterprise teams that want cutting-edge AI in-house. It excels at complex workflows, tool use, multi-step reasoning, and fine-grained control over “thinking” depth. The Apache 2.0 license gives teams the flexibility to customize and deeply inspect the model.
Let’s start by understanding how to run these models.
Run the GPT-OSS models locally
OpenAI designed its OSS models for accessible local usage. Here are the top three ways to run the OSS models locally:
- Using Ollama
- Hugging face + transformers
- LM Studio

Let’s explore them one by one.
Using Ollama to run the GPT-OSS models
To download the GPT-OSS models using Ollama, follow these steps:
Step 1: Download the Ollama app from its official website.

Step 2: Once done, open your terminal and pull the required model:
ollama pull gpt-oss:20b //to use the 20b modelollama pull gpt-oss:120b //to use the 120b model
Step 3: Finally open Ollama on your device, and you’ll see the models installed:

You can start interacting with the models here.
Using Hugging Face to install GPT-OSS models
To run the GPT-OSS model with Hugging Face, you can either use the quick Command line interface (CLI) method or a more flexible Python-based approach. Here are both the ways:
Using the transformers CLI
This is the easy way to get started if you’re looking to run the model locally and interact via a browser or terminal.
Step 1: Start by installing the required libraries:
pip install transformers torch
Step 2: With the libraries installed, we now need to serve the model locally:
transformers serve
Step 3: To open a chat interface, use the command as follows:
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
This will launch a local endpoint at localhost:8000 where you can chat with the model directly. It’s fast to set up and handy for quick testing.
Using Python code
If you’re building a Python app or want more flexibility in how you interact with the model, the pipeline method is better suited. Here’s how to use it:
from transformers import pipelineimport torchpipe = pipeline("text-generation",model="openai/gpt-oss-20b",torch_dtype="auto",device_map="auto",)response = pipe([{"role": "user", "content": "Explain chain-of-thought reasoning."}],max_new_tokens=100)print(response[0]["generated_text"])
This method loads the model in your Python environment and gives you direct access to input/output for deeper integration into apps, scripts, or experiments.
Installing the GPT-OSS models using LM Studio
To install the GPT-OSS models using LM Studio, follow these steps:
Step 1: Install the LM Studio from their official website.
Step 2: Once it is installed, run the following commands to get the models:
lms get openai/gpt-oss-20b //to get the 20b variantlms get openai/gpt-oss-120b //to get the 120b variant
You’ll get a complete GUI for chatting, adjusting reasoning effort, and launching a mini local API.
Now that we have the GPT-OSS models installed, let’s move on to how these models actually perform in real-world experiments.
How to use GPT-OSS model locally
Let’s demonstrate gpt-oss-20b’s capabilities through reasoning, coding, math, and multilingual task examples as follows:
gpt-oss-20b reasoning and problem-solving example
GPT-OSS models are designed with transparent reasoning capabilities that show their “thinking” process. Let’s test this with a classic multi-step math problem that requires logical deduction:
Our prompt:
A train leaves Station A at 8:00 AM at 60 km/h. Another train leaves Station B at 9:00 AM at 80 km/h on the same track towards Station A. They're 200 km apart. When do they meet?
The output given by gpt-oss-20b:

Testing gpt-oss-20b on Python function
One key strength of GPT-OSS models is generating well-documented, production-ready code. Let’s see how it handles a common programming challenge with proper documentation and edge cases:
Our prompt:
Write a Python function to check if a string is a palindrome, ignoring case and non-alphanumeric characters.
The response by the model:
Here's a compact, well‑documented function that does exactly what you asked for:```pyimport refrom typing import Anydef is_palindrome(s: Any) -> bool:""" Return True if s is a palindrome when case is ignored and all non‑alphanumeric characters are removed.Parameters----------s : AnyThe value to test. It will be converted to a stringusing ``str(s)`` so you can pass integers, bytes, etc.Returns-------boolTrue if the cleaned string reads the same forwards andbackwards, False otherwise."""# Convert to string (handles ints, bytes, etc.)text = str(s)# Keep only alphanumerics and lower‑case everythingcleaned = re.sub(r'[^A-Za-z0-9]', '', text).lower()# A string is a palindrome if it equals its reversereturn cleaned == cleaned[::-1]# ---- Example usage ----if __name__ == "__main__":test_cases = ["Racecar","A man, a plan, a canal: Panama!","No lemon, no melon","Hello, World!",12321, # numeric palindromeb"Madam, I'm Adam", # bytes"12345", # not a palindrome]for t in test_cases:print(f"{t!r:30} -> {is_palindrome(t)}")```How it works1. Convert to string – `str(s)` ensures the function accepts any input type.2. Remove non‑alphanumerics – `re.sub(r'[^A-Za-z0-9]', '', text)` strips out everything except letters and digits.3. Normalize case – `.lower()` turns everything to lowercase so "Racecar" and "racecar" are treated the same.4. Check palindrome – Comparing the cleaned string to its reverse (cleaned[::-1]) tells us whether it reads the same forwards and backwards.Complexity- Time: O(n), where n is the length of the input string.- Space: O(n) for the cleaned version (you could also do a two‑pointer comparison in place to reduce space to O(1)).Feel free to tweak the regex if you want to support Unicode alphanumerics (`r'\W'` or `\p{L}` with the `regex` module) or any other custom rule set.
Multilingual capabilities of gpt-oss-20b
Unlike some local models that struggle with non-English languages, GPT-OSS was trained on diverse multilingual data. Let’s test its translation accuracy across different language families:
Our prompt:
Translate "Where is the nearest train station?" to French, Hindi, and Japanese.
The output generated by the model:

Overall, gpt-oss-20b handles a wide range of tasks well, from reasoning to coding. So, what are the key features of GPT-OSS models?
What features do the GPT-OSS models have?
OpenAI’s GPT‑OSS models are built for powerful, flexible, and transparent AI for developers and researchers who value control and reasoning. Here’s why they stand out:
Open-Weight & Commercially Usable: These GPT-OSS models are licensed under the Apache 2.0 license, which enables downloading, modifying, fine‑tuning, and deploying them even in commercial environments without legal or API restrictions.
Chain-of-Thought Reasoning (CoT): These models highlight their step-by-step reasoning process, enabling transparency, debugging and tracing the logic easier.
Configurable Reasoning Effort: Depending on the complexity and needs of the task, you can balance speed and depth by choosing low, medium, or high reasoning levels.
Agentic Capabilities & Tool Use: These models are built to act, they support function calling, web browsing, Python execution, which are required for agent-based workflows.
With the foundation in place, let’s see what really powers these models and how they run differently.
Architecture of the GPT-OSS models
The gpt-oss-20b and gpt-oss-120b models are built on a Transformer backbone using a Mixture-of-Experts (MoE) architecture, which dramatically improves efficiency. Instead of activating the entire model for every token, MoE routes each token through just a few specialized “experts”, keeping compute lower without any performance compromise.

This dynamic routing allows models to scale up to massive sizes like 120b parameters while maintaining a reasonable compute footprint. Let’s break down their technical specs:
| Specs | gpt-oss-120b | gpt-oss-20b |
|---|---|---|
| Layers | 36 | 24 |
| Total parameters | 117B | 21B |
| Experts per layer | 128 | 32 |
| Active experts per token | 4 | 4 |
| Active parameters/token | ~5.1B | ~3.6B |
| vRAM required (FP16) | ≥ 128 GB (A100 80GB x2) | ≥ 48 GB (A100 40GB / H100) |
| Model size on disk | ~350 GB | ~85 GB |
| Context length | 16,384 tokens | 16,384 tokens |
These specs give an idea of the software and hardware needed to run GPT-OSS models efficiently. Now let’s see how it compares to LLaMA 3 and Mistral 7B head-to-head.
gpt-oss-20b vs LLaMA 3 vs Mistral 7B: Model comparison
Now let’s see how gpt-oss-20b performs when stacked against two of the most talked-about open-source models today: LLaMA 3 and Mistral 7B.
| Feature | gpt-oss-20b | Llama 3 | Mistral 7B |
|---|---|---|---|
| Architecture | Mixture of Experts (MoE) | Decoder-only Transformer | Mixture of Experts (MoE) |
| Parameter count | 20B (with 64 experts, 2 active) | 8B / 70B variants | 7B (dense) |
| Tokenizer | BPE, 100K vocab size | Tiktoken (32K) | BPE |
| Training data | 4.2T tokens from RefinedWeb + code, arXiv, and more | 15T tokens from curated sources | ~1.5T tokens, focus on quality |
| License | Apache 2.0 | Custom Meta license | Apache 2.0 |
| Context length | 4K | 8K | 32K |
At the end of the day, there’s no one-size-fits-all winner. The best open model depends on what you’re optimizing for, it could be speed, accuracy, size, or flexibility.
Conclusion
Running GPT-OSS locally gives you powerful AI capabilities without API costs or internet dependency. Whether you choose Ollama for simplicity, Hugging Face for flexibility, or LM Studio for a complete GUI experience, you can have OpenAI-quality models running on your hardware in minutes.
The key takeaways for running GPT-OSS locally:
Start with GPT-OSS 20B if you have 16GB+ RAM - it’s fast and capable for most tasks
Use Ollama for the quickest setup with just two terminal commands
Choose Hugging Face if you need Python integration for custom applications
Pick LM Studio if you prefer a visual interface with built-in chat features GPT-OSS represents a major shift toward accessible, powerful AI that you fully control.
With Apache 2.0 licensing and strong performance across reasoning, coding, and multilingual tasks, these models make enterprise-grade AI available to everyone - from individual developers to large organizations seeking data privacy and cost control.
Ready to explore advanced LLM applications? Check out Codecademy’s Build Chatbots with Python course.
Frequently asked questions
1. What does GPT stand for?
GPT stands for Generative Pre‑trained Transformer, a family of models based on the Transformer architecture that are pre-trained to generate text and perform a wide range of language tasks.
2. What is gpt-oss-120b?
gpt-oss-120b is an open-weight GPT model from OpenAI featuring a 117-billion parameter Mixture‑of‑Experts (MoE) architecture. It’s released under the Apache 2.0 license, fully compatible with the Hugging Face Transformers ecosystem, and built for high-performance reasoning, instruction-following workflows, and local deployment.
3. What is the full form of GPT Transformer?
The full form of GPT Transformer is Generative Pre‑trained Transformer, combining two ideas:
Generative: The model generates text step-by-step, and
Pre‑trained: It’s trained on large amounts of text before fine-tuning.
And Transformer refers to the underlying neural architecture using self-attention for efficient sequence modeling.
4. What is an OpenAI open-weight model?
An open-weight model is one where the model’s parameters (weights) are publicly available, allowing anyone to download, inspect, modify, or fine-tune the model on their own infrastructure. GPT‑OSS models are OpenAI’s first open-weight releases since GPT‑2.
5. What is the difference between GPT and Transformer?
A Transformer is a foundational deep learning architecture using self-attention to process sequences, introduced in “Attention Is All You Need” in 2017. GPT, on the other hand, is a specific type of model built on the Transformer decoder pre-trained for generation tasks. In short: all GPTs are Transformers, but not all Transformers are GPTs.
6. Does OpenAI have local models?
Yes, gpt-oss-20b and gpt-oss-120b are OpenAI’s first local models since GPT-2, available for free download and offline use under Apache 2.0 license.
7. Can I install GPT-OSS on Windows/Mac?
Yes, GPT-OSS runs on Windows, Mac, and Linux using Ollama, Hugging Face, or LM Studio. Mac users need 16GB+ unified memory for the 20B model.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Setup and Fine-Tune Qwen 3 with Ollama
Learn how to set-up, fine-tune, and use the Qwen 3 LLMs using Ollama with step-by-step instructions and examples. - Article
How to Use Hugging Face: Beginner's Guide to AI Models
Learn Hugging Face fundamentals to train transformer models, tokenize text, and deploy AI with Google Colab. Complete beginner tutorial. - Article
What is GPT 5: OpenAI's Latest Model Explained
Explore GPT-5 features, pricing, and comparisons. Test prompts, build a game, and see how ChatGPT-5 outshines earlier models.
Learn more on Codecademy
- Learn about what transformers are (the T of GPT) and how to work with them using Hugging Face libraries
- Intermediate.3 hours
- Navigate DeepSeek-R1 to refine prompts, tackle complex tasks, and oversee projects. Explore reasoning models for goal-setting, writing, and technical design.
- Beginner Friendly.< 1 hour
- Learn how to build a Generative Pre-trained Transformer (GPT) from scratch using PyTorch.
- With Certificate
- Intermediate.2 hours