Lectures - Deep Learning for Data Science (DL4DS) / Fall 2025

01 - Deep Learning Concepts and Course Logistics
tl;dr: We will introduce the topic of deep learning, a bit about it's history, and what impact it has had. Then we'll go over the course logistics, the lecture topics, homework assignments and projects.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 1

02 - Supervised Learning
tl;dr: We go a little deeper into supervised learning, introducing terminology and illustrating with a simple example of a linear model.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 2

03 - Loss Functions
tl;dr: In this lecture, we will motivate and derive common loss functions from a probabilistic view point using maximum likelihood estimation.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 5
Mathematics for Machine Learning, Chapters 6 and 8, especially 8.1.3
Maximum Likelihood Estimation Examples from an MIT course. This video walks through MLE examples for binomial and normal distributions, but does not pull out equivalent loss functions like we did in lecture.

04 - Gradient Descent
tl;dr: In this lecture, we consider gradient as a general technique for training parametric models, and preview how it enables joint training of differentiable components.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 3
Mathematics for Machine Learning, Chapter 7

05 - Shallow Networks
tl;dr: In this lecture we consider networks with one layer of hidden units and explore their representational power.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 3

06 - Shared Compute Cluster Tutorial
tl;dr: In this lecture, IS&T staff will introduce us to the use of the shared compute cluster which will be used for your larger model training projects.
[slides] [lecture recording]

Readings:

07 - Deep Networks
tl;dr: We dive into deep networks by composing two shallow networks and visualizing their representational capabilities. We then generalize fully connected networks with two and more layers of hidden units. We'll compare the modeling efficiency between deep and shallow networks.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 4

08 - Fitting Models
tl;dr: In this lecture we look at different ways minimizing the loss function for models given a training dataset. We'll formally define gradient descent, then show the advantages of stochastic gradient descent and then finally see how momentum and normalized gradients (ADAM) can improve model training farther.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 6

09 - Backpropagation
tl;dr: In this lecture we show how to efficienctly calculate gradients over more complex functions like deep neural networks using backpropagation.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapters 7.1 - 7.4

10 - Initialization
tl;dr: In this lecture we show how to use gradients to derive better initialization schemes.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 7.5

11 - Measuring Performance
tl;dr: We look at measuring model training performance, the importance of test sets as well as how noise, bias and variance play a role in training outcomes.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 8
Double Descent Demystified

12 - Regularization
tl;dr: We explain explicit and implicit regularization techniques and how they help generalize models.
[slides] [annotated slides] [lecture recording]

Readings:

13 - Convolutional Neural Networks
tl;dr: We cover 1D and 2D convolutional neural networks along with subsampling and upsampling operations.
[slides] [annotated slides] [lecture recording]

Readings:

Understanding Deep Learning, Chapter 10

14 - Residual Networks
tl;dr: In this lecture we introduce residual networks, the types of problems they solve, why we need batch normalization and then review some example residual network architectures.
[slides]

Readings:

Understanding Deep Learning, Chapter 11

15 - Recurrent Neural Networks
tl;dr: In this lecture we introduce recurrent neural networks, starting the plain vanilla RNN, the problem of vanishing gradients, LSTM and GRU and batch normalization.
[slides]

Readings:

lecture slides

16 - Transformers Part 1
tl;dr: In this lecture we cover the transformer architecture, starting with the motivation that required a new type of model, the concept and implementation of self-attention and then the full transformer architecture for encoder, decoder and encoder-decoder type models.
[slides]

Readings:

Understanding Deep Learning, Chapter 12
Optional: The Illustrated Transformer

17 - Transformers Part 2
tl;dr: In this lecture we continue to review the transformer architecture. We continue the discussion of decoders and encoder-decoder architectures, then discuss scaling to large contexts and then tokenization and embedding.
[slides]

Readings:

Understanding Deep Learning, Chapter 12
Optional The Illustrated Transformer

18 - Training, Tuning and Evaluating LLMs
tl;dr: In this lecture go through the entire LLM training process, starting with pretraining, then finetuning. We'll also discuss how to evaluate LLMs as well as a parameter efficient fine tuning technique called LORA.
[slides]

Readings:

See slides for references

19 - Vision & Multimodal Transformers
tl;dr: In this lecture we'll cover vision and multimodal transformers as a survey of three papers.
[slides]

Readings:

See slides for references

20 - Adversarial Inputs and Generative Adversarial Models
tl;dr: In this lecture we introduce adversarial inputs. We will then dive into Generative Adversarial Networks (GANs) and their applications. We will also discuss the challenges and limitations of GANs and some of the recent advances in the field.
[slides]

Readings:

Understanding Deep Learning, Chapters 14 and 15

21 - Unsupervised Learning and Variational Autoencoders
tl;dr: In this lecture we revisit the concept of unsupervised learning in the context of generative models and then dive into Variational Autoencoders. After reviewing unsupervised learning, we look look at autoencoders and their ability to reduce dimensions of inputs into a latent space. We'll see why they don't make good generative models and then generalize to VAEs. We'll finish with some examples of generative output of VAEs.
[slides]

Readings:

Understanding Variational Autoencoders
Understanding Deep Learning, Chapter 17 (optional)

22 - Diffusion Models
tl;dr: In this lecture, we consider diffusion models, the current best practice for image generation.
[slides]

Readings:

Rocca, Understanding Diffusion Probabilistic Models
Understanding Deep Learning, Chapter 18

23 - Using Pre-Trained Models
tl;dr: In this lecture we explore the usage of pre-trained models as a foundation for classification and controlled generation.
[slides]

Readings:

TBD

24 - Object Detection and Segmentation
tl;dr: In this lecture we investigate the application of deep learning models to object detection and segmentation.
[slides]

Readings:

TBD

25 - Reasoning and World Models
tl;dr: In this lecture we examine claims that models have world models and can reason, and attempts to encourage such behavior.
[slides]

Readings:

TBD

26 - Scaling Concerns
tl;dr: In this lecture we examine the empirical tradeoffs of the so-called scaling laws and other ways to manage cost scaling.
[slides]

Readings:

TBD

27 - Graph Neural Networks
tl;dr: In this lecture we introduce graph neural networks, define matrix representations, how to do graph level classification and regression, and how to define graph convolutional network layers.
[slides]

Readings:

Understanding Deep Learning, Chapter 13

28 - Contrastive Learning
tl;dr:
[slides]

Readings: