teaching_datascience – Wangkun Xu Homepage

Guest Lecturer: Data Science and Digitalization in Energy Sector, MSc in Sustainable Energy Futures, Imperial College London

Course code: MSc in Sustainable Energy Futures
Affiliation: Energy Futures Lab, Imperial College London
Level: Master
Role: Lecturing with Dr. Daphne Tuncer.
Year: 2025-2026
Descrption: This is a master level short course where I work with the students with the basic data science and machine learning techniques for energy sector with hands-on exercises. The module is designed to have 2-hour lecture and 2-hour tutorial.

Course Material

2025-2026 [lecture notes].
2025-2026 [github tutorial].

Module Aims

Equip students with a practical ML “toolkit” for energy-system data problems (forecasting, classification, anomaly detection, clustering), and the ability to reason about generalization, model selection, and data quality in real pipelines.

Intended learning outcomes (ILOs)

By the end, students should be able to:

Formulate ML tasks in energy systems (regression vs classification; supervised vs unsupervised).
Train and interpret baseline models: linear regression, logistic regression, and SVM (concept + when to use).
Apply PCA and k-means for dimensionality reduction, visualization, and clustering on unlabeled data.
Use appropriate performance metrics (MSE; TPR/FPR/TNR/FNR; F1) and explain tradeoffs.
Diagnose underfitting/overfitting, explain model capacity, and apply regularization.
Conduct hyperparameter tuning using a validation set / grid search, and justify correct data splitting.
Describe (at a high level) reinforcement learning and when it is relevant (coverage: overview only).
Understand the motivation and concept of foundation models, with emphasis on time-series foundation models and Chronos.
Identify key energy data sources (market/load/generation; weather) and articulate “garbage in, garbage out” risks.

Content outline

Part A — ML framing for energy systems
- Why ML now: renewables, inverter-dominated grids, big data availability (RTU/PMU/smart meters, etc).
- Task taxonomy: regression vs classification; supervised vs unsupervised; RL overview.
Part B — Supervised learning essentials
- Linear regression (incl. nonlinear features / polynomial regression; MSE & least squares).
- Logistic regression (sigmoid, cross-entropy, gradient descent intuition).
- SVM (max-margin idea; convex formulation; hyperparameters; practical use via sklearn).
Part C — Unsupervised learning essentials
- PCA (dimensionality reduction, visualization; why unsupervised).
- k-means (algorithm steps; choosing k for anomaly/normal separation).
Part D — Model evaluation & generalization
- Regression metrics (MSE) and classification metrics (TPR/FPR/TNR/FNR; F1).
- Capacity, underfit vs overfit, and regularization (weight decay).
Part E — Special topics
- No Free Lunch intuition + why foundation models “shift the tradeoff”.
- Time-series foundation models; Chronos overview and energy forecasting example.
- Data: where to obtain energy + weather datasets (ENTSO-E transparency, ERA5; regional operators).

Tutorial Outline (on GitHub)

Repo: [github].

What students do:

svm.py: Binary classification with SVM on make_moons (non-linearly separable toy dataset).
pca.py: PCA dimensionality reduction + 2D visualization on the Wine dataset.
kmeans+pca.py: k-means clustering on Iris, then PCA to 2D for visualization.
chronos_example.py: Energy price forecasting using Amazon Chronos (zero-shot time-series forecasting).

How students run it:

Recommended: Google Colab (no local setup).
Local: install dependencies:
- Core: numpy, matplotlib, scikit-learn
- Chronos-only: pandas[pyarrow], chronos-forecasting

Skills the tutorial reinforces:

End-to-end sklearn pattern: load data → preprocess → fit → evaluate → visualize
How hyperparameters change decision boundaries / clusters / embeddings (SVM, k-means)
Practical intuition for representation learning via PCA
Modern angle: zero-shot forecasting with a time-series foundation model (using Amazon Chronos as example).