Module 3
Syllabus | Module 3
Upcoming Cohort
Cohort 4 of the Earth Systems Data Science in the Cloud Course will go through Module 2 the week of October 21, 2024. The information below contains the detailed course materials from the previous cohort and will be updated with the Cohort 4 schedule shortly.
Day 1 | Monday, March 25, 2024
Module 3 Introduction (30 min)
- Welcome Back
- Interim Check In
- Where we are in Earth Systems Data Science in the Cloud
- Course Goals and Objectives
- Module Goals and Objectives
- Course Logistics
AI/ML Overview (60 min)
- All of Statistics
- Understanding & Prediction
- Baysian & Frequentist
- Machine Learning & AI
- Unsupervised, Semi-Supervised, Supervised Machine Learning
- Classification and Regression
- Deep Learning and Not-so-Deep Learning
- Types of data: Image, Text, Gridded, Tabular
- Manual and Automated ML
Demo Day | Parallelization & ML (90 min)
- Machine Learning: The Bad, the Good, and the Awesome!
- Scale: Data Processing, Training, and Inference
Lunch and Learn
- Individual and Team Progress Check In
Model Development Workflow (90 min)
- Goals
- Metrics
- Training and Testing Data
- Preparing your data
- Information Leakage
- Order of Operations
- Training
- Cross validation!
- Optimization
- Fitting
- Out of Sample Performance
- Workflow Evaluation
Team Project Module Goals (30 min)
- Team Project Check-In
- Team Project Goals
- Presentation
- Report
Team Project Work - Data Processing (60 min)
- Data Preprocessing
Day 2 | Tuesday, March 26, 2024
Feature Engineering (60 min)
- Transformation
- Imputation
- Actual transformation
- One-hot encoding
- Outliers
- Scaling
- Extraction
- Dimension Reduction
- Clustering
- Binning
- Creation
- Indexing
- Orthogonal Matching
- Lagging
Multi-Worker Parallelization | Dask (60 min)
- Starting a Cluster
- Running a Cluster
- Troubleshooting a Cluster
- Shutting Down a Cluster
Building Blocks of Machine Learning (60 min)
- Linear Regression
- Logistic Regression
- K-Means Clustering
Lunch and Learn
- Machine Learning Background Discussion
Machine Learning Metrics (30 min)
- What they mean and how to interpret/implement them
- Classification Metrics
- Regression Metrics
- Bias Variance Tradeoff
- Stats Modeling criteria
- Unbalanced data considerations
Team Project Time
- Feature Engineering
Day 3 | Wednesday, March 27, 2024
The Tool Landscape (30 min)
- SciKit Learn
- Darts
- Deep Learning: Tensorflow, PyTorch, Keras
- Hugging Face
- AutoML Frameworks (Gluon, Canvas)
Serverless Multi-Worker Parallelization (150 min)
- Container Orchestration
- Cloud services
- Servers vs Serverless
- Lambda, Fargate, Lithops, Coiled, Modal
Lunch and Learn
Explainable AI (60 min)
- What is XAI?
- Principles
- Applications
- Techniques
Team Project Work
- Beginning ML
Day 4 | Thursday, March 28, 2024
Cross Validation (60 min)
- Best Practices
- Strategies
- Validating across Space and Time
Team Project Development (120 min)
- Report development
- Presentation development
Lunch and Learn
- Team Project Check In
ML Algorithms and Approaches (90 min)
- Unsupervised
- Semi-supervised
- Supervised
Team Presentation Training (90 min)
- Clear Communication
- Body Language
- Story Telling
- Methods Focus
- Takeaways
Day 5 | Friday, March 29, 2024
Team Presentations | DPD (120 min)
- Capstone Presentations and Feedback.
Module Wrap Up (30 min)
- Closing
- Moving to Model Development
- Interim Period
- Next Steps