You can download the lectures here. We will try to upload lectures prior to their corresponding classes. Future lectures listed below likely have broken links.

  • 01 - Deep Learning Concepts and Course Logistics
    tl;dr: We will introduce the topic of deep learning, a bit about it's history, and what impact it has had. Then we'll go over the course logistics, the lecture topics, homework assignments and projects.
    [slides] [annotated slides] [lecture recording]

    Readings:

  • 02 - Supervised Learning
    tl;dr: We go a little deeper into supervised learning, introducing terminology and illustrating with a simple example of a linear model.
    [slides] [annotated slides] [lecture recording]

    Readings:

  • 03 - Loss Functions
    tl;dr: In this lecture, we will motivate and derive common loss functions from a probabilistic view point using maximum likelihood estimation.
    [slides] [annotated slides] [lecture recording]

    Readings:

  • 04 - Gradient Descent
    tl;dr: In this lecture, we consider gradient as a general technique for training parametric models, and preview how it enables joint training of differentiable components.
    [slides] [annotated slides] [lecture recording]
  • 05 - Shallow Networks
    tl;dr: In this lecture we consider networks with one layer of hidden units and explore their representational power.
    [slides]

    Readings:

  • 06 - Shared Compute Cluster Tutorial
    tl;dr: In this lecture, IS&T staff will introduce us to the use of the shared compute cluster which will be used for your larger model training projects.

  • 07 - Deep Networks
    tl;dr: We dive into deep networks by composing two shallow networks and visualizing their representational capabilities. We then generalize fully connected networks with two and more layers of hidden units. We'll compare the modeling efficiency between deep and shallow networks.
    [slides]

    Readings:

  • 08 - Fitting Models
    tl;dr: In this lecture we look at different ways minimizing the loss function for models given a training dataset. We'll formally define gradient descent, then show the advantages of stochastic gradient descent and then finally see how momentum and normalized gradients (ADAM) can improve model training farther.
    [slides]

    Readings:

  • 09 - Backpropagation and Initialization
    tl;dr: In this lecture we show how to efficienctly calculate gradients over more complex functions like deep neural networks using backpropagation, and then reason about the resulting gradients to derive better initialization schemes.
    [slides]

    Readings:

  • 10 - Measuring Performance
    tl;dr: We look at measuring model training performance, the importance of test sets as well as how noise, bias and variance play a role in training outcomes.
    [slides]

    Readings:

  • 11 - Regularization
    tl;dr: We explain explicit and implicit regularization techniques and how they help generalize models.
    [slides]

    Readings:

  • 12 - Convolutional Neural Networks
    tl;dr: We cover 1D and 2D convolutional neural networks along with subsampling and upsampling operations.
    [slides]

    Readings:

  • 13 - Residual Networks
    tl;dr: In this lecture we introduce residual networks, the types of problems they solve, why we need batch normalization and then review some example residual network architectures.
    [slides]

    Readings:

  • 14 - Recurrent Neural Networks
    tl;dr: In this lecture we introduce recurrent neural networks, starting the plain vanilla RNN, the problem of vanishing gradients, LSTM and GRU and batch normalization.
    [slides]

    Readings:

  • 15 - Transformers Part 1
    tl;dr: In this lecture we cover the transformer architecture, starting with the motivation that required a new type of model, the concept and implementation of self-attention and then the full transformer architecture for encoder, decoder and encoder-decoder type models.
    [slides]
  • 16 - Transformers Part 2
    tl;dr: In this lecture we continue to review the transformer architecture. We continue the discussion of decoders and encoder-decoder architectures, then discuss scaling to large contexts and then tokenization and embedding.
    [slides]
  • 17 - Vision & Multimodal Transformers
    tl;dr: In this lecture we'll cover vision and multimodal transformers as a survey of three papers.
    [slides]

    Readings:

    • See slides for references
  • 18 - Training, Tuning and Evaluating LLMs
    tl;dr: In this lecture go through the entire LLM training process, starting with pretraining, then finetuning. We'll also discuss how to evaluate LLMs as well as a parameter efficient fine tuning technique called LORA.
    [slides]

    Readings:

    • See slides for references
  • 20 - Adversarial Inputs and Generative Adversarial Models
    tl;dr: In this lecture we introduce adversarial inputs. We will then dive into Generative Adversarial Networks (GANs) and their applications. We will also discuss the challenges and limitations of GANs and some of the recent advances in the field.
    [slides]

    Readings:

  • 21 - Unsupervised Learning and Variational Autoencoders
    tl;dr: In this lecture we revisit the concept of unsupervised learning in the context of generative models and then dive into Variational Autoencoders. After reviewing unsupervised learning, we look look at autoencoders and their ability to reduce dimensions of inputs into a latent space. We'll see why they don't make good generative models and then generalize to VAEs. We'll finish with some examples of generative output of VAEs.
    [slides]
  • 22 - Diffusion Models
    tl;dr: In this lecture, we consider diffusion models, the current best practice for image generation.
    [slides]
  • 23 - Graph Neural Networks
    tl;dr: In this lecture we introduce graph neural networks, define matrix representations, how to do graph level classification and regression, and how to define graph convolutional network layers.
    [slides]

    Readings:

  • 24 - Using Pre-Trained Models
    tl;dr: In this lecture we explore the usage of pre-trained models as a foundation for classification and controlled generation.
    [slides]

    Readings:

    • TBD
  • 25 - Scaling Concerns
    tl;dr: In this lecture we examine the empirical tradeoffs of the so-called scaling laws and other ways to manage cost scaling.
    [slides]

    Readings:

    • TBD
  • 26 - Reasoning and World Models
    tl;dr: In this lecture we examine claims that models have world models and can reason, and attempts to encourage such behavior.
    [slides]

    Readings:

    • TBD
  • 27 - Object Detection and Segmentation
    tl;dr: In this lecture we investigate the application of deep learning models to object detection and segmentation.
    [slides]

    Readings:

    • TBD
  • 28 - Benchmarks
    tl;dr: In this lecture we review the history of AI benchmarks and how they have driven model development so far.
    [slides]

    Readings:

    • TBD