Module 2

Syllabus | Module 2

Day 1 | Monday, February 10, 2025

Module 2 Introduction (45 min)

0900-0945 February 10, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Welcome Back
  • Interim Check In
  • Where we are in Earth Systems Data Science in the Cloud
  • Course Goals and Objectives
  • Module Goals and Objectives
  • Course Logistics

Coding on the Cloud (60 min)

1000-1100 February 10, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Introduction to Coder
  • Configuration
  • AWS Credentials Management
  • Connecting Git
  • Installing Packages

To the Cloud | Introduction to AWS (60 min)

1100-1200 February 10, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Intro to AWS
  • AWS Services:
    • EC2
    • S3
    • SageMaker
    • Lambda

Lunch and Learn

1200-1300 February 10, 2025

Zoom

  • Individual and Team Progress Check In

Beginning a Project (30 min)

1300-1330 February 10, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Defining a Project
  • Choosing a Language
  • Finding Data
  • Accessing Data
  • Introduction to Data Formats

Setting up a Team Project Repo (60 min)

1330-1430 February 10, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Initializing GitLab Repos
  • Collaborations
  • Issues & Branching
  • SSH Setup

Day 2 | Tuesday, February 11, 2025

Input/Output (I/O) (60 min)

0900-1000 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Data Formats
  • I/O on the Cloud
  • Foundations for Performant Data Science

Team Project Check In (30 min)

1000-1030 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Team Name
  • Project Ideation
  • Project Idea Curation

Containers: Reproducible Computing Environments (60 min)

1030-1130 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Containers & Containerization
  • Dependency Management
  • Deployment, Use, and Sharing
  • Using Containers on Cloud9

Foundations of Parallel Computing (30 min)

1130-1200 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • What is parallel computing
  • Units of parallelization
  • MapReduce
  • Why Map Reduce changed the world
  • The two key types of Parallel Computing

Lunch and Learn

1200-1300 February 11, 2025

Zoom

  • Individual and Team Progress Check In

Programmatic Cloud Access (60 min)

1300-1400 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • AWS CLI
  • Boto3
  • Other tools

Overleaf & LaTeX: Production Publishing (30 min)

1400-1430 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

  • Overleaf Introduction
  • Overleaf Configuration
  • Accelerating collaboration & publication

Team Project Play (60 min)

1430-1530 February 11, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Intro to Exploratory Data Analysis
  • Finding Data
  • Getting Data on the Cloud

Day 3 | Wednesday, February 12, 2025

Team Project Work (90 min)

0900-1030 February 12, 2025

Zoom

  • Review from Yesterday
  • Finding Data
  • Getting Data on the cloud
  • Exploratory Data Analysis

Managing Containers (90 min)

1030-1200 February 12, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Finding Running Containers
  • Entering Running Containers
  • Removing Running Containers

Lunch and Learn

1200-1300 February 12, 2025

Zoom

  • EDA Questions and Check In

Team Time / NCICS Meeting (60 min)

1300-1400 February 12, 2025

  • NCICS Staff Meeting
  • Team Time for Non-NCICS participants

Introduction to DataViz in Python (60 min)

1400-1500 February 12, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Base principles of DataViz
  • Intro to Matplotlib
  • Matplotlib API

Day 4 | Thursday, February 13, 2025

Introduction to Data Cleaning (60 min)

0900-1000 February 13, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Grammar of Data
  • Order of Operations
  • Time Complexity (Big O Notation)
  • Functional Programming (Mapping)
  • Building a Pipeline
  • Troubleshooting
  • Performance

Parallel Computing in Python | Single Machine (120 min)

1000-1200 February 13, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Paradigms
  • Templates
  • Arrays & Tabular Data

Lunch and Learn

1200-1300 February 13, 2025

Zoom

  • Team Project Check In

Team Presentation Training (120 min)

1300-1500 February 13, 2025

Content

|

Word

|

PDF

|

Zoom

|

Recording

  • Presentation Goals
  • Clear Communication
  • Primacy, Frequency, and Recency
  • Body Language
  • START Method
  • Training

Day 5 | Friday, February 14, 2025

Team Presentations | EDA (120 min)

0900-1100 February 14, 2025

Zoom

  • Capstone Presentations and Feedback

Module Wrap Up (30 min)

1100-1130 February 14, 2025

Content

|

Word

|

PDF

|

Zoom

  • Closing
  • Architecting Data Product Development based on EDA
  • Interim Period
  • Next Steps
Previous
Overview