Congratulations! You’ve just finished your first coding adventure with PySpark! In this lesson, we learned that:

  • RDDs are the foundational data structure of Spark
  • RDDs are fault-tolerant, partitioned, and operated on in parallel
  • Transformations are lazy and do not execute until an action is called

We also learned how to:

  • Transform and summarize RDDs with transformations and actions
  • Send information to all nodes with broadcast variables
  • Debug work with accumulator variables

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?