Module 2
Syllabus | Module 2
Day 1 | Monday, February 10, 2025
Module 2 Introduction (45 min)
- Welcome Back
- Interim Check In
- Where we are in Earth Systems Data Science in the Cloud
- Course Goals and Objectives
- Module Goals and Objectives
- Course Logistics
Coding on the Cloud (60 min)
- Introduction to Coder
- Configuration
- AWS Credentials Management
- Connecting Git
- Installing Packages
To the Cloud | Introduction to AWS (60 min)
- Intro to AWS
- AWS Services:
- EC2
- S3
- SageMaker
- Lambda
Lunch and Learn
1200-1300 February 10, 2025
- Individual and Team Progress Check In
Beginning a Project (30 min)
- Defining a Project
- Choosing a Language
- Finding Data
- Accessing Data
- Introduction to Data Formats
Setting up a Team Project Repo (60 min)
- Initializing GitLab Repos
- Collaborations
- Issues & Branching
- SSH Setup
Day 2 | Tuesday, February 11, 2025
Input/Output (I/O) (60 min)
- Data Formats
- I/O on the Cloud
- Foundations for Performant Data Science
Team Project Check In (30 min)
- Team Name
- Project Ideation
- Project Idea Curation
Containers: Reproducible Computing Environments (60 min)
- Containers & Containerization
- Dependency Management
- Deployment, Use, and Sharing
- Using Containers on Cloud9
Foundations of Parallel Computing (30 min)
- What is parallel computing
- Units of parallelization
- MapReduce
- Why Map Reduce changed the world
- The two key types of Parallel Computing
Lunch and Learn
1200-1300 February 11, 2025
- Individual and Team Progress Check In
Programmatic Cloud Access (60 min)
- AWS CLI
- Boto3
- Other tools
Overleaf & LaTeX: Production Publishing (30 min)
- Overleaf Introduction
- Overleaf Configuration
- Accelerating collaboration & publication
Team Project Play (60 min)
- Intro to Exploratory Data Analysis
- Finding Data
- Getting Data on the Cloud
Day 3 | Wednesday, February 12, 2025
Team Project Work (90 min)
0900-1030 February 12, 2025
- Review from Yesterday
- Finding Data
- Getting Data on the cloud
- Exploratory Data Analysis
Managing Containers (90 min)
- Finding Running Containers
- Entering Running Containers
- Removing Running Containers
Lunch and Learn
1200-1300 February 12, 2025
- EDA Questions and Check In
Team Time / NCICS Meeting (60 min)
1300-1400 February 12, 2025
- NCICS Staff Meeting
- Team Time for Non-NCICS participants
Introduction to DataViz in Python (60 min)
- Base principles of DataViz
- Intro to Matplotlib
- Matplotlib API
Day 4 | Thursday, February 13, 2025
Introduction to Data Cleaning (60 min)
- Grammar of Data
- Order of Operations
- Time Complexity (Big O Notation)
- Functional Programming (Mapping)
- Building a Pipeline
- Troubleshooting
- Performance
Parallel Computing in Python | Single Machine (120 min)
- Paradigms
- Templates
- Arrays & Tabular Data
Lunch and Learn
1200-1300 February 13, 2025
- Team Project Check In
Team Presentation Training (120 min)
- Presentation Goals
- Clear Communication
- Primacy, Frequency, and Recency
- Body Language
- START Method
- Training
Day 5 | Friday, February 14, 2025
Team Presentations | EDA (120 min)
0900-1100 February 14, 2025
- Capstone Presentations and Feedback
Module Wrap Up (30 min)
- Closing
- Architecting Data Product Development based on EDA
- Interim Period
- Next Steps