Module 3
Syllabus | Module 3
Day 1 | Monday, March 3, 2025
Module 3 Introduction (30 min)
- Welcome Back
- Interim Check In
- Where we are in Earth Systems Data Science in the Cloud
- Course Goals and Objectives
- Module Goals and Objectives
- Course Logistics
AI/ML Overview (60 min)
- All of Statistics
- Understanding & Prediction
- Bayesian & Frequentist
- Machine Learning & AI
- Unsupervised, Semi-Supervised, Supervised Machine Learning
- Classification and Regression
- Deep Learning and Not-so-Deep Learning
- Types of data: Image, Text, Gridded, Tabular
- Manual and Automated ML
Demo Day | Parallelization & ML (90 min)
- Machine Learning: The Bad, the Good, and the Awesome!
- Scale: Data Processing, Training, and Inference
Lunch and Learn
- Individual and Team Progress Check In
Model Development Workflow (90 min)
- Goals
- Metrics
- Training and Testing Data
- Preparing your data
- Information Leakage
- Order of Operations
- Training
- Cross validation!
- Optimization
- Fitting
- Out of Sample Performance
- Workflow Evaluation
Team Project Module Goals (30 min)
- Team Project Check-In
- Team Project Goals
- Presentation
- Report
Team Project Work - Data Processing (60 min)
- Data Preprocessing
Day 2 | Tuesday, March 4, 2025
Feature Engineering (60 min)
- Transformation
- Imputation
- Actual Transformation
- One-hot Encoding
- Outliers
- Scaling
- Extraction
- Dimension Reduction
- Clustering
- Binning
- Creation
- Indexing
- Orthogonal Matching
- Lagging
Multi-Worker Parallelization | Dask (60 min)
- Starting a Cluster
- Running a Cluster
- Troubleshooting a Cluster
- Shutting Down a Cluster
Building Blocks of Machine Learning (60 min)
- Linear Regression
- Logistic Regression
- K-Means Clustering
Lunch and Learn
- Machine Learning Background Discussion
Machine Learning Metrics (30 min)
- What they mean and how to interpret/implement them
- Classification Metrics
- Regression Metrics
- Bias Variance Tradeoff
- Stats Modeling criteria
- Unbalanced data considerations
Team Project Time
- Feature Engineering
Day 3 | Wednesday, March 5, 2025
The Tool Landscape (30 min)
- SciKit Learn
- Darts
- Deep Learning: Tensorflow, PyTorch, Keras
- Hugging Face
- AutoML Frameworks (Gluon, Canvas)
Serverless Multi-Worker Parallelization (150 min)
- Container Orchestration
- Cloud services
- Servers vs Serverless
- Lambda, Fargate, Lithops, Coiled, Modal
Lunch and Learn
Explainable AI (60 min)
- What is XAI?
- Principles
- Applications
- Techniques
Team Project Work
- Beginning ML
Day 4 | Thursday, March 6, 2025
Cross Validation (60 min)
- Best Practices
- Strategies
- Validating across Space and Time
Team Project Development (120 min)
- Report development
- Presentation development
Lunch and Learn
- Team Project Check In
ML Algorithms and Approaches (90 min)
- Unsupervised
- Semi-supervised
- Supervised
Team Presentation Training (90 min)
- Clear Communication
- Body Language
- Story Telling
- Methods Focus
- Takeaways
Day 5 | Friday, March 7, 2025
Team Presentations | DPD (120 min)
- Capstone Presentations and Feedback.
Module Wrap Up (30 min)
- Closing
- Moving to Model Development
- Interim Period
- Next Steps