This content is from the fall 2016 version of this course. Please go here for the most recent version.

cm018 - November 23, 2016

Overview

  • Illustrate the split-apply-combine analytical pattern
  • Define parallel processing
  • Introduce Hadoop and Spark as distributed computing platforms
  • Introduce the sparklyr package
  • Demonstrate how to use sparklyr for machine learning using the Titanic data set

To do for Monday

  • Final projects

This work is licensed under the CC BY-NC 4.0 Creative Commons License.