Module 2

Syllabus | Module 2

Day 1 | Monday, March 4, 2024

Module 2 Introduction (45 min)

0900-0945 March 4, 2024

Content

|

Zoom

|

Recording

  • Welcome Back
  • Interim Check In
  • Where we are in Earth Systems Data Science in the Cloud
  • Course Goals and Objectives
  • Module Goals and Objectives
  • Course Logistics

Going to Cloud9 (60 min)

1000-1100 March 4, 2024

Content

|

Zoom

|

Recording

  • Introduction to Cloud9
  • AWS Credentials Management
  • Connecting Git
  • Installing Packages

To the Cloud | Introduction to AWS (60 min)

1100-1200 March 4, 2024

Content

|

Zoom

|

Recording

  • Intro to AWS
  • AWS Services:
    • EC2
    • S3
    • Sagemaker
    • Lambda

Lunch and Learn

1200-1300 March 4, 2024

Zoom

  • Individual and Team Progress Check In

Setting up a Team Project Repo (60 min)

1300-1400 March 4, 2024

Content

|

Zoom

|

Recording

  • Initializing GitLab Repos
  • Collaborations
  • Issues & Branching
  • SSH Setup

Day 2 | Tuesday, March 5, 2024

Input/Output (I/O) (60min)

0900-1000 March 5, 2024

Content

|

Zoom

|

Recording

  • Data Formats
  • I/O on the Cloud
  • Foundations for Performant Data Science

Team Project Check In (30 min)

1000-1030 March 5, 2024

Content

|

Zoom

|

Recording

  • Team Name
  • Project Ideation
  • Project Idea Curation

Containers: Reproducible Computing Environments (60 min)

1030-1130 March 5, 2024

Content

|

Zoom

|

Recording

  • Containers & Containerization
  • Dependency Management
  • Deployment, Use, and Sharing
  • Using Containers on Cloud9

Rich Signell | Visiting Speaker (60 min)

1130-1230 March 5, 2024

Content

|

Zoom

|

Recording

  • Pangeo: A community platform for open, reproducible, and scalable geoscience

Lunch and Learn

1230-1300 March 5, 2024

Zoom

  • Individual and Team Progress Check In

Programmatic Cloud Access (60 min)

1300-1400 March 5, 2024

Content

|

Zoom

|

Recording

  • AWS CLI
  • Boto3
  • Other tools

Team Project Play (60 min)

1400-1500 March 5, 2024

Content

|

Zoom

|

Recording

  • Intro to Exploratory Data Analysis
  • Finding Data
  • Getting Data on the cloud

Day 3 | Wednesday, March 6, 2024

Team Project Work (120 min)

0900-1100 March 6, 2024

Zoom

|

Recording

  • Review from Yesterday
  • Finding Data
  • Getting Data on the cloud
  • Exploratory Data Analysis

Managing Containers (60 min)

1100-1200 March 6, 2024

Content

|

Zoom

|

Recording

  • Finding Running Containers
  • Entering Running Containers
  • Removing Running Containers

Lunch and Learn

1200-1300 March 6, 2024

Zoom

  • EDA Questions and Check In

I/O in Python (60 min)

1300-1400 March 6, 2024

Content

|

Zoom

|

Recording

  • Lazy Loading Constructs
  • Tabular Data
  • Gridded Data

Overleaf & LaTeX: Production Publishing (60 min)

1400-1500 March 6, 2024

Content

|

Zoom

|

Recording

  • Overleaf Introduction
  • Overleaf Configuration
  • Accelerating collaboration & publication

Day 4 | Thursday, March 7, 2024

Introduction to Data Cleaning (60 min)

0900-1000 March 7, 2024

Content

|

Zoom

  • Grammar of Data
  • Order of Operations
  • Time Complexity (Big O Notation)
  • Functional Programming (Mapping)
  • Building a Pipeline
  • Troubleshooting
  • Performance

Parallel Computing in Python | Single Machine (120 min)

1000-1200 March 7, 2024

Content

|

Zoom

  • Paradigms
  • Templates
  • Multiprocessing, Polars, Dask

Lunch and Learn

1200-1300 March 7, 2024

Zoom

  • Team Project Check In

Team Presentation Training (120 min)

1300-1500 March 7, 2024

Content

|

Zoom

  • Presentation Goals
  • Clear Communication
  • Primacy, Frequency, and Recency
  • Body Language
  • START Method
  • Training

Day 5 | Friday, March 8, 2024

Team Presentations | EDA (120 min)

0900-1100 March 8, 2024

Zoom

|

Recording

  • Capstone Presentations and Feedback.

Module Wrap Up (30 min)

1100-1130 March 8, 2024

Content

|

Zoom

|

Recording

  • Closing
  • Architecting Data Product Development based on EDA.
  • Interim Period
  • Next Steps
Previous
Overview