Key Concepts

Review core concepts you need to learn to master this subject

Spark Overview

Spark is an application that was designed to process large amount of data. Originally creating data pipelines for Machine Learning (ML) workloads, Spark is capable of querying, transforming, and analyzing Big Data on a variety of data systems.

RDDs with PySpark
Lesson 1 of 1
  1. 1
    Apache Spark is a framework that allows us to work with big data. But how do we tell Spark what to do with our data? In this lesson, we’ll get familiar with using PySpark (the Python API for Spark)…
  2. 2
    The entry point to Spark is called a SparkSession. There are many possible configurations for a SparkSession, but for now, we will simply start a new session and save it as spark: from pyspark…
  3. 3
    RDDs may seem more complicated than DataFrames, but we can also manipulate RDDs using Spark transformations. Transformations are functions that take an RDD as input and will always output a new…
  4. 4
    You may have noticed that the transformation executed rather quickly! That’s because it didn’t execute at all. Unlike transformations in Pandas, which we call eager, transformations in Spark ar…
  5. 5
    The reduce() function we used previously is a powerful aggregation tool, but there are limitations to the operations it can apply to RDDs. Namely, reduce() must be commutative and **associative…
  6. 6
    By now we’ve talked endlessly about the benefits of distributing our data across multiple nodes and allowing for parallel processing, but what happens when we don’t want our data to be distributed?…
  7. 7
    You’ve broadcasted a dictionary over to your nodes, and everything went well! You’re now curious as to how many east versus west coast entries there are. We could attempt to create a couple variabl…
  8. 8
    Congratulations! You’ve just finished your first coding adventure with PySpark! In this lesson, we learned that: * RDDs are the foundational data structure of Spark * RDDs are fault-tolerant, parti…

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo