Articles

Scikit-Learn Cheatsheet

Published Aug 1, 2018Updated Mar 5, 2025
Explore key Scikit-Learn commands for machine learning, covering regression, classification, clustering, and model validation in Python.


Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms. It’s built upon some of the technology you might already be familiar with, like NumPy, pandas, and Matplotlib!

As you build robust Machine Learning programs, it’s helpful to have all the sklearn commands all in one place in case you forget.

Linear regression in Scikit-Learn

Import and create the model:

from sklearn.linear_model import LinearRegression
your_model = LinearRegression()

Fit:

your_model.fit(x_training_data, y_training_data)
  • .coef_: contains the coefficients
  • .intercept_: contains the intercept

Predict:

predictions = your_model.predict(your_x_data)
  • .score(): returns the coefficient of determination R²

Related Course

Build a Machine Learning Model

Learn to build machine learning models with Python.Try it for free

Naive Bayes classification in Scikit-Learn

Import and create the model:

from sklearn.naive_bayes import MultinomialNB
your_model = MultinomialNB()

Fit:

your_model.fit(x_training_data, y_training_data)

Predict:

# Returns a list of predicted classes - one prediction for every data point
predictions = your_model.predict(your_x_data)
# For every data point, returns a list of probabilities of each class
probabilities = your_model.predict_proba(your_x_data)

K-nearest neighbors (KNN) in Scikit-Learn

Import and create the model:

from sklearn.neighbors import KNeighborsClassifier
your_model = KNeighborsClassifier()

Fit:

your_model.fit(x_training_data, y_training_data)

Predict:

# Returns a list of predicted classes - one prediction for every data point
predictions = your_model.predict(your_x_data)
# For every data point, returns a list of probabilities of each class
probabilities = your_model.predict_proba(your_x_data)

K-means clustering in Scikit-Learn

Import and create the model:

from sklearn.cluster import KMeans
your_model = KMeans(n_clusters=4, init='random')
  • n_clusters: number of clusters to form and number of centroids to generate
  • init: method for initialization
    • k-means++: K-Means++ [default]
    • random: K-Means
  • random_state: the seed used by the random number generator [optional]

Fit:

your_model.fit(x_training_data)

Predict:

predictions = your_model.predict(your_x_data)

Validating a machine learning model

Import and print accuracy, recall, precision, and F1 score:

from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
print(accuracy_score(true_labels, guesses))
print(recall_score(true_labels, guesses))
print(precision_score(true_labels, guesses))
print(f1_score(true_labels, guesses))

Import and print the confusion matrix:

from sklearn.metrics import confusion_matrix
print(confusion_matrix(true_labels, guesses))

Splitting data into training and test sets

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, test_size=0.2)
  • train_size: the proportion of the dataset to include in the train split
  • test_size: the proportion of the dataset to include in the test split
  • random_state: the seed used by the random number generator [optional]

Conclusion

Scikit-Learn provides a powerful and user-friendly framework for implementing machine learning models in Python. From regression and classification to clustering and model validation, it simplifies complex tasks with efficient built-in functions. Whether you’re training models, making predictions, or evaluating performance, Scikit-Learn equips you with the tools needed to build robust machine learning applications.

If you’re interested in learning more about Scikit-Learn and its applications in machine learning, please check out our AI Catalog of articles!

Robot Emoji

Happy Coding!

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team