Recent Advances in Unsupervised Learning: Fundamental Limits and Efficient Algorithms

Abstract: Our understanding of unsupervised learning problems has progressed rapidly over the past decade. A synergy of diverse ideas from high-dimensional statistics, computer science and statistical physics has dramatically advanced the state-of-the-art, leading to powerful new algorithms and precise characterizations of the associated information theoretic limits. In addition, there is substantial emerging evidence regarding the existence of computational/statistical gaps—parameter regimes where signal recovery is information theoretically possible but is expected to be impossible using efficient algorithms. In this tutorial series, we will introduce some core techniques driving this recent progress. We will present these techniques in the context of a few concrete examples. We will end with a discussion of ongoing efforts in this domain and review several open problems.

  • Nov 14: Lecture 1, 10 AM – 12 PM, Groseclose 402 (Coffee and snacks provided)
    • The first lecture will introduce three emblematic problems—community detection, low rank matrix estimation and community detection with high-dimensional node covariates. We will introduce the main statistical questions—detection, recovery and optimal recovery in the context of these examples. Finally, we will discuss a few key tools to characterize the information theoretic limits in these problems.
  • Nov 15: Lecture 2, 11 AM – 1 PM, ISyE Main Room 228 (Lunch provided)
    • The second lecture will focus on algorithms for recovering the underlying signals. Specifically, we will discuss spectral algorithms and convex optimization methods based on semidefinite programming.
      • In addition, the speaker will hold office hours of discussion Nov 15, from 3 PM to 4:30 PM in Groseclose 404.
  • Nov 17: Lecture 3, 9 AM – 11 AM, ISyE Main Room 126 (Coffee and snacks provided)
    • The third lecture will explore optimal recovery, and the use of algorithms based on mean-field variational approximations. Finally, we will discuss the notion of computational/statistical gaps in these problems.
  • Nov 18: Lecture 4, 10 AM – 12 PM, Groseclose 402 (Lunch provided)
    • We will move beyond the current state-of-the-art and discuss some directions for future research. Possible themes include universality of these algorithms (to the distribution of the observations), spectral/SDP algorithms for problems with multi-view data, and information theoretic thresholds for models with general community structure.

Bio: Subhabrata Sen is an Assistant Professor of Statistics at Harvard University. Prior to Harvard, he was a Schramm postdoctoral fellow at Microsoft Research New England and MIT. He obtained his PhD from Stanford Statistics in 2017. His research lies at the intersection of applied probability, statistics and machine learning. His research interests include high-dimensional and non-parametric statistics, random graphs and inference on networks. Subhabrata has received the Probability Dissertation Award for his thesis, an AMS Simons Travel grant, and an honorable mention at the Bernoulli Society New Researcher Award (2018). He has been a long-term visitor at the Simons Institute for the Theory of Computing in Fall 2021 and Fall 2022.  

To continue receiving all AI4OPT seminar announcements, please sign up to the mailing list at https://lists.isye.gatech.edu/mailman/listinfo/ai4opt-seminars.

For past seminars click the link provided: https://www.ai4opt.org/seminars/past-seminars.