Module 3
Syllabus | Module 3
Day 1 | Monday, March 3, 2025
Module 3 Introduction (45 min)
- Welcome Back
- Interim Check In
- Where we are in Earth Systems Data Science in the Cloud
- Course Goals and Objectives
- Module Goals and Objectives
- Course Logistics
AI/ML Overview (30 min)
- All of Statistics
- Understanding & Prediction
- Bayesian & Frequentist
- Machine Learning & AI
- Unsupervised, Semi-Supervised, Supervised Machine Learning
- Classification and Regression
- Deep Learning and Not-so-Deep Learning
- Types of data: Image, Text, Gridded, Tabular
- Manual and Automated ML
Demo Day | Parallelization & ML (90 min)
- Machine Learning: The Bad, the Good, and the Awesome!
- Scale: Data Processing, Training, and Inference
Lunch and Learn
1200-1300 March 3, 2025
- Individual and Team Progress Check In
Model Development Workflow (90 min)
- Goals
- Metrics
- Training and Testing Data
- Preparing your data
- Information Leakage
- Order of Operations
- Training
- Cross validation!
- Optimization
- Fitting
- Out of Sample Performance
- Workflow Evaluation
Team Project Module Goals (30 min)
- Team Project Check-In
- Team Project Goals
- Presentation
- Report
Team Project Work - Data Processing (60 min)
1500-1600 March 3, 2025
- Data Preprocessing
Day 2 | Tuesday, March 4, 2025
Feature Engineering (60 min)
- Transformation
- Imputation
- Actual Transformation
- One-hot Encoding
- Outliers
- Scaling
- Extraction
- Dimension Reduction
- Clustering
- Binning
- Creation
- Indexing
- Orthogonal Matching
- Lagging
Machine Learning Metrics (60 min)
- What they mean and how to interpret/implement them
- Classification Metrics
- Regression Metrics
- Bias Variance Tradeoff
- Stats Modeling criteria
- Unbalanced data considerations
Team Project Time (60 min)
1100-1200 March 4, 2025
Lunch and Learn
1200-1300 March 4, 2025
- Machine Learning Background Discussion
Building Blocks of Machine Learning (90 min)
- Linear Regression
- Logistic Regression
- K-Means Clustering
Team Project Time (90 min)
1430-1600 March 4, 2025
- Feature Engineering
Day 3 | Wednesday, March 5, 2025
The Tool Landscape (30 min)
- SciKit Learn
- Darts
- Deep Learning: Tensorflow, PyTorch, Keras
- Hugging Face
- AutoML Frameworks (Gluon, Canvas)
Serverless Multi-Worker Parallelization (150 min)
- Container Orchestration
- Cloud services
- Servers vs Serverless
- Lambda, Fargate, Lithops, Coiled, Modal
Lunch and Learn
1200-1300 March 5, 2025
NCICS Staff Meeting / Team Time (60 min)
1300-1400 March 5, 2025
- Finish Data Preprocessing
Team Project Work (120 min)
1400-1600 March 5, 2025
- Beginning ML
Day 4 | Thursday, March 6, 2025
Cross Validation (60 min)
- Best Practices
- Strategies
- Validating across Space and Time
Serverless Multi-Worker Parallelization (60 min)
- Container Orchestration
- Cloud services
- Servers vs Serverless
- Lambda, Fargate, Lithops, Coiled, Modal
Team Project Time / NCEI Team Meeting (60 min)
1100-1200 March 6, 2025
- Finish Data Preprocessing
Lunch and Learn (60 min)
1300-1400 March 6, 2025
- Team Project Check In
Explainable AI (60 min)
- What is XAI?
- Principles
- Applications
- Techniques
ML Algorithms and Approaches (60 min)
- Unsupervised
- Semi-supervised
- Supervised
Team Presentation Training (60 min)
1500-1600 March 6, 2025
- Clear Communication
- Body Language
- Story Telling
- Methods Focus
- Takeaways
Day 5 | Friday, March 7, 2025
Team Presentations | DPD (120 min)
- Capstone Presentations and Feedback.
Module Wrap Up (30 min)
- Closing
- Moving to Model Development
- Interim Period
- Next Steps