Lectures
You can download the lectures here. We will try to upload lectures prior to their corresponding classes. Future lectures listed below likely have broken links.
-
01 - Deep Learning Concepts and Course Logistics
tl;dr: We will introduce the topic of deep learning, a bit about it's history, and what impact it has had. Then we'll go over the course logistics, the lecture topics, homework assignments and projects.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 1
-
02 - Supervised Learning
tl;dr: We go a little deeper into supervised learning, introducing terminology and illustrating with a simple example of a linear model.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 2
-
03 - Loss Functions
tl;dr: In this lecture, we will motivate and derive common loss functions from a probabilistic view point using maximum likelihood estimation.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 5
- Mathematics for Machine Learning, Chapters 6 and 8, especially 8.1.3
- Maximum Likelihood Estimation Examples from an MIT course. This video walks through MLE examples for binomial and normal distributions, but does not pull out equivalent loss functions like we did in lecture.
-
04 - Gradient Descent
tl;dr: In this lecture, we consider gradient as a general technique for training parametric models, and preview how it enables joint training of differentiable components.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 3
- Mathematics for Machine Learning, Chapter 7
-
05 - Shallow Networks
tl;dr: In this lecture we consider networks with one layer of hidden units and explore their representational power.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 3
-
06 - Shared Compute Cluster Tutorial
tl;dr: In this lecture, IS&T staff will introduce us to the use of the shared compute cluster which will be used for your larger model training projects.
[slides] [lecture recording]
Readings:
-
07 - Deep Networks
tl;dr: We dive into deep networks by composing two shallow networks and visualizing their representational capabilities. We then generalize fully connected networks with two and more layers of hidden units. We'll compare the modeling efficiency between deep and shallow networks.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 4
-
08 - Fitting Models
tl;dr: In this lecture we look at different ways minimizing the loss function for models given a training dataset. We'll formally define gradient descent, then show the advantages of stochastic gradient descent and then finally see how momentum and normalized gradients (ADAM) can improve model training farther.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 6
-
09 - Backpropagation
tl;dr: In this lecture we show how to efficienctly calculate gradients over more complex functions like deep neural networks using backpropagation.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapters 7.1 - 7.4
-
10 - Initialization
tl;dr: In this lecture we show how to use gradients to derive better initialization schemes.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 7.5
-
11 - Measuring Performance
tl;dr: We look at measuring model training performance, the importance of test sets as well as how noise, bias and variance play a role in training outcomes.
[slides] [annotated slides] [lecture recording]
Readings:
-
12 - Regularization
tl;dr: We explain explicit and implicit regularization techniques and how they help generalize models.
[slides] [annotated slides] [lecture recording]
-
13 - Convolutional Neural Networks
tl;dr: We cover 1D and 2D convolutional neural networks along with subsampling and upsampling operations.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 10
-
14 - Residual Networks
tl;dr: In this lecture we introduce residual networks, the types of problems they solve, why we need batch normalization and then review some example residual network architectures.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 11
-
15 - Recurrent Neural Networks
tl;dr: In this lecture we introduce recurrent neural networks, starting the plain vanilla RNN, the problem of vanishing gradients, LSTM and GRU and batch normalization.
[slides]
Readings:
-
16 - Attention and Transformers
tl;dr: In this lecture we cover the attention mechanism used to handle longer variable length contexts, and present the transformer architure, self-attention and the variations for encoder, decoder and encoder-decoder type models.
[slides] [annotated slides] [lecture recording]
Readings:
-
17 - Transformers Part 2
tl;dr: In this lecture we continue to review the transformer architecture. We continue the discussion of decoders and encoder-decoder architectures, then discuss scaling to large contexts and then tokenization and embedding.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 12
- Optional The Illustrated Transformer
-
18 - Training, Tuning and Evaluating LLMs
tl;dr: In this lecture go through the entire LLM training process, starting with pretraining, then finetuning. We'll also discuss how to evaluate LLMs as well as a parameter efficient fine tuning technique called LORA.
[slides] [annotated slides] [lecture recording]
Readings:
- See slides for references
-
19 - Vision & Multimodal Transformers
tl;dr: In this lecture we'll cover vision and multimodal transformers as a survey of three papers.
[slides] [annotated slides] [lecture recording]
Readings:
- See slides for references
-
20 - Adversarial Inputs and Generative Adversarial Models
tl;dr: In this lecture we introduce adversarial inputs. We will then dive into Generative Adversarial Networks (GANs) and their applications. We will also discuss the challenges and limitations of GANs and some of the recent advances in the field.
[slides] [annotated slides] [lecture recording]
Readings:
- Intriguing Properties of Neural Networks
- Robustness and Generalization via Generative Adversarial Training
- Adversarial Examples are Not Bugs, They are Features
- Generative Adversarial Nets
- A Style-Based Generator Architecture for Generative Adversarial Networks
- Understanding Deep Learning, Chapters 15
-
21 - Unsupervised Learning and Variational Autoencoders
tl;dr: In this lecture we revisit the concept of unsupervised learning in the context of generative models and then dive into Variational Autoencoders. After reviewing unsupervised learning, we look look at autoencoders and their ability to reduce dimensions of inputs into a latent space. We'll see why they don't make good generative models and then generalize to VAEs. We'll finish with some examples of generative output of VAEs.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Variational Autoencoders
- Understanding Deep Learning, Chapter 14, 17 (optional)
-
22 - Diffusion Models
tl;dr: In this lecture, we consider diffusion models, the current best practice for image generation.
[slides] [annotated slides] [lecture recording]
Readings:
-
23 - Latent Diffusion Models
tl;dr: In this lecture, we consider diffusion models, the current best practice for image generation.
[slides] [annotated slides] [lecture recording]
-
24 - Using Pre-Trained Models
tl;dr: In this lecture we explore the usage of pre-trained models as a foundation for classification and controlled generation.
[slides] [annotated slides] [lecture recording]
-
25 - Data Preparation and Augmentation
tl;dr: In this lecture we investigate the use of different strategies preparing data to facilitate learning and generalization.
[slides] [annotated slides] [lecture recording]
-
26 - Reasoning and World Models
tl;dr: In this lecture we examine claims that models have world models and can reason, and attempts to encourage such behavior.
[slides] [annotated slides] [lecture recording]
Readings:
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Large Language Models are Zero-Shot Reasoners
- Learning to Reason with LLMs
- OpenAI Harmony Response Format
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
- AlphaGeometry: An Olympiad-level AI system for geometry
- How Does A Blind Model See The Earth?
- Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
- V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Extra Readings:
-
27 - Graph Neural Networks
tl;dr: In this lecture we introduce graph neural networks, define matrix representations, how to do graph level classification and regression, and how to define graph convolutional network layers.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 13
-
28 - Deep Reinforcement Learning
tl;dr: In this lecture we will introduce reinforcement learning and how it can use deep learning.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 19
- Data Center Cooling using Model-Predictive Control
- Human-level control through deep reinforcement learning (access via BU)
- Mastering the game of Go without human knowledge (access via BU)
- Gran Turismo Sophie
- Craig Sherstan Keynote at Computers and Games 2024, re: Gran Turismo Sophy
- Aligning language models to follow instructions
- Training language models to follow instructions with human feedback
- RLHF: Reinforcement Learning from Human Feedback
