Congratulations! You’ve just finished your first coding adventure with PySpark! In this lesson, we learned that:

  • RDDs are the foundational data structure of Spark
  • RDDs are fault-tolerant, partitioned, and operated on in parallel
  • Transformations are lazy and do not execute until an action is called

We also learned how to:

  • Transform and summarize RDDs with transformations and actions
  • Send information to all nodes with broadcast variables
  • Debug work with accumulator variables

