Module 2

Overview | Module 2

Welcome to Module 2 | Exploratory Data Science! Here, you will become cloud native: able to do everything you currently do on-premise (and more). You will also start your team project work and begin exploratory data analysis.

Specifically, this module will focus on developing key skills and toolsets needed to scale analysis using cloud computing. We will discuss reproducible research, provide an introduction to cloud computing tools on AWS, expand upon fundamentals of parallel computing, learn about finding and using data on the cloud, and begin our team project development.

By the end of this module, you will be familiar with and conversant in the following areas:

  • Reproducible Research Tools, Techniques, and Tactics.
  • The key building blocks of cloud computing.
  • Data Input/Output on the Cloud.
  • The basics of data manipulation.

Specifically by the end of the module, you will have accomplished the following:

  • Launched, committed to, and collaborated on development in GitLab Repo for a project.
  • Used Overleaf and linked Overleaf to GitLab.
  • Built and used a container.
  • Managed running containers in a cloud environment.
  • Managed credentials on AWS.
  • Deployed and used an S3 bucket, and an EC2 Instance Instance.
  • Imported and Exported data to, from, and across cloud resources.
  • Deployed code in parallel across a single compute instance.
  • Developed a team project research question.
  • Presented an exploratory data analysis of the data used to answer that research question.
Previous
Syllabus