Lectures
You can download the lectures here. We will try to upload lectures prior to their corresponding classes. Future lectures listed below likely have broken links.
-
01 - Deep Learning Concepts and Course Logistics
tl;dr: We will introduce the topic of deep learning, a bit about it's history, and what impact it has had. Then we'll go over the course logistics, the lecture topics, homework assignments and projects.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 1
-
02 - Supervised Learning
tl;dr: We go a little deeper into supervised learning, introducing terminology and illustrating with a simple example of a linear model.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 2
-
03 - Loss Functions
tl;dr: In this lecture, we will motivate and derive common loss functions from a probabilistic view point using maximum likelihood estimation.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 5
- Mathematics for Machine Learning, Chapters 6 and 8, especially 8.1.3
- Maximum Likelihood Estimation Examples from an MIT course. This video walks through MLE examples for binomial and normal distributions, but does not pull out equivalent loss functions like we did in lecture.
-
04 - Gradient Descent
tl;dr: In this lecture, we consider gradient as a general technique for training parametric models, and preview how it enables joint training of differentiable components.
[slides] [annotated slides] [lecture recording]
Readings:
- Understanding Deep Learning, Chapter 3
- Mathematics for Machine Learning, Chapter 7
-
05 - Shallow Networks
tl;dr: In this lecture we consider networks with one layer of hidden units and explore their representational power.
[slides]
Readings:
- Understanding Deep Learning, Chapter 3
-
06 - Shared Compute Cluster Tutorial
tl;dr: In this lecture, IS&T staff will introduce us to the use of the shared compute cluster which will be used for your larger model training projects.
Readings:
-
07 - Deep Networks
tl;dr: We dive into deep networks by composing two shallow networks and visualizing their representational capabilities. We then generalize fully connected networks with two and more layers of hidden units. We'll compare the modeling efficiency between deep and shallow networks.
[slides]
Readings:
- Understanding Deep Learning, Chapter 4
-
08 - Fitting Models
tl;dr: In this lecture we look at different ways minimizing the loss function for models given a training dataset. We'll formally define gradient descent, then show the advantages of stochastic gradient descent and then finally see how momentum and normalized gradients (ADAM) can improve model training farther.
[slides]
Readings:
- Understanding Deep Learning, Chapter 6
-
09 - Backpropagation and Initialization
tl;dr: In this lecture we show how to efficienctly calculate gradients over more complex functions like deep neural networks using backpropagation, and then reason about the resulting gradients to derive better initialization schemes.
[slides]
Readings:
- Understanding Deep Learning, Chapters 7.1 - 7.6
-
10 - Measuring Performance
tl;dr: We look at measuring model training performance, the importance of test sets as well as how noise, bias and variance play a role in training outcomes.
[slides]
Readings:
- Understanding Deep Learning, Chapter 8
-
11 - Regularization
tl;dr: We explain explicit and implicit regularization techniques and how they help generalize models.
[slides]
Readings:
- Understanding Deep Learning, Chapter 9
-
12 - Convolutional Neural Networks
tl;dr: We cover 1D and 2D convolutional neural networks along with subsampling and upsampling operations.
[slides]
Readings:
- Understanding Deep Learning, Chapter 10
-
13 - Residual Networks
tl;dr: In this lecture we introduce residual networks, the types of problems they solve, why we need batch normalization and then review some example residual network architectures.
[slides]
Readings:
- Understanding Deep Learning, Chapter 11
-
14 - Recurrent Neural Networks
tl;dr: In this lecture we introduce recurrent neural networks, starting the plain vanilla RNN, the problem of vanishing gradients, LSTM and GRU and batch normalization.
[slides]
Readings:
-
15 - Transformers Part 1
tl;dr: In this lecture we cover the transformer architecture, starting with the motivation that required a new type of model, the concept and implementation of self-attention and then the full transformer architecture for encoder, decoder and encoder-decoder type models.
[slides]
Readings:
- Understanding Deep Learning, Chapter 12
- Optional: The Illustrated Transformer
-
16 - Transformers Part 2
tl;dr: In this lecture we continue to review the transformer architecture. We continue the discussion of decoders and encoder-decoder architectures, then discuss scaling to large contexts and then tokenization and embedding.
[slides]
Readings:
- Understanding Deep Learning, Chapter 12
- Optional The Illustrated Transformer
-
17 - Vision & Multimodal Transformers
tl;dr: In this lecture we'll cover vision and multimodal transformers as a survey of three papers.
[slides]
Readings:
- See slides for references
-
18 - Training, Tuning and Evaluating LLMs
tl;dr: In this lecture go through the entire LLM training process, starting with pretraining, then finetuning. We'll also discuss how to evaluate LLMs as well as a parameter efficient fine tuning technique called LORA.
[slides]
Readings:
- See slides for references
-
20 - Adversarial Inputs and Generative Adversarial Models
tl;dr: In this lecture we introduce adversarial inputs. We will then dive into Generative Adversarial Networks (GANs) and their applications. We will also discuss the challenges and limitations of GANs and some of the recent advances in the field.
[slides]
Readings:
- Understanding Deep Learning, Chapters 14 and 15
-
21 - Unsupervised Learning and Variational Autoencoders
tl;dr: In this lecture we revisit the concept of unsupervised learning in the context of generative models and then dive into Variational Autoencoders. After reviewing unsupervised learning, we look look at autoencoders and their ability to reduce dimensions of inputs into a latent space. We'll see why they don't make good generative models and then generalize to VAEs. We'll finish with some examples of generative output of VAEs.
[slides]
Readings:
- Understanding Variational Autoencoders
- Understanding Deep Learning, Chapter 17 (optional)
-
22 - Diffusion Models
tl;dr: In this lecture, we consider diffusion models, the current best practice for image generation.
[slides]
Readings:
-
23 - Graph Neural Networks
tl;dr: In this lecture we introduce graph neural networks, define matrix representations, how to do graph level classification and regression, and how to define graph convolutional network layers.
[slides]
Readings:
- Understanding Deep Learning, Chapter 13
-
24 - Using Pre-Trained Models
tl;dr: In this lecture we explore the usage of pre-trained models as a foundation for classification and controlled generation.
[slides]
Readings:
- TBD
-
25 - Scaling Concerns
tl;dr: In this lecture we examine the empirical tradeoffs of the so-called scaling laws and other ways to manage cost scaling.
[slides]
Readings:
- TBD
-
26 - Reasoning and World Models
tl;dr: In this lecture we examine claims that models have world models and can reason, and attempts to encourage such behavior.
[slides]
Readings:
- TBD
-
27 - Object Detection and Segmentation
tl;dr: In this lecture we investigate the application of deep learning models to object detection and segmentation.
[slides]
Readings:
- TBD
-
28 - Benchmarks
tl;dr: In this lecture we review the history of AI benchmarks and how they have driven model development so far.
[slides]
Readings:
- TBD