schedule.html

---
title: Schedule
layout: default
---

<h1>AISTATS 2017 Program of Events</h1>


<h2>Best Paper Awards</h2>

<a href="http://proceedings.mlr.press/v54/newling17a.html">A Sub-Quadratic Exact Medoid Algorithm</a> <br>
<font color=red>James Newling, Francois Fleuret</font><br><br>

<a href="http://proceedings.mlr.press/v54/bahmani17a.html">Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation</a><br>
<font color=red>Sohail Bahmani, Justin Romberg</font><br><br>

<a href="http://proceedings.mlr.press/v54/naesseth17a.html">Reparameterization Gradients through Acceptance- Rejection Sampling Algorithms</a><br>
<font color=red>Christian Naesseth, Francisco Ruiz, Scott Linderman, David Blei</font><br><br>


<h2>19-Apr (Wed)</h2>
<font color=blue><b>18:30-20:30</b> Registration Desk <br><br></font>


<h2>20-Apr (Thu)</h2>

<b>7:30-8:50</b> Breakfast, Windows on the Green & Chart Room <br><br>
<font color=blue><b>8-10</b> Registration Desk <br><br></font>


<font color=green><b>8:50-9pm</b> Welcome and award announcement<br><br></font>


<b>9:00-10:00</b><font color=red> Invited Talk:  Csaba Szepesvari. Crystal Ballroom 1, 2</font><br>
<b>Stochastic linear bandits. </b> 
<a href="#foo1" onclick="toggle_visibility('foo1');">See abstract.</a> <a href="csaba-talk.pptx">See slides</a>.
<div id="foo1" style="display:none;">
<i>Learning and decision making often conflict: A decision maker who is uncertain about its environment may need to choose actions whose main benefit is to gain information rather gaining reward. The dilemma between exploration and exploitation is at the heart of bandit problems.  In this talk I will focus on the so-called stochastic linear bandits where the payoff function is assumed to posses a linear structure, an assumption that proved to be extremely effective elsewhere in machine learning. Here we look into how the linear structure can be exploited in bandits. I will discuss existing results and open questions. The issues discussed will include limits of performance of bandit algorithms, how to design efficient and effective algorithms, or how to exploit additional information like sparsity.<br>
<font color=blue><u>Bio</u>: Csaba Szepesvari is interested in designing principled learning algorithms for agents that learn from and control their environments. He is the author or coauthor of two books on the topic, as well as nearly 200 journal and conference papers. He is best known as the co-inventor of "UCT", a tree-search algorithm that inspired much subsequent work in AI, search and optimization and serves, among other things, as the basis of the search component in AlphaGo, Deepmind's Go program that defeated a top human Go player in the beginning of 2016. His work on UCT was recently recognized by the "Test of Time" award at ECML'2016. Csaba serves as an action editor of the Journal of Machine Learning Research and the Machine Learning Journal. He also served as a co-chair for COLT and ALT and has served as a senior PC member for first-tier AI and machine learning conferences for too many years. Currently Csaba is a Professor at the Department of Computing Science of the University of Alberta and a Principal Investigator of the Alberta Machine Intelligence Institute (AMII). From August, he will be joining Deepmind, London.</i></font>
</div>
<br><br>

<b>10:00-10:30</b> Coffee Break, Crystal Atrium<br><br>

<b>10:30-12:10</b> <u>Online Learning</u>, Crystal Ballroom 1, 2<br>
<i>Session Chair: Csaba Szepesvari<br></i>
  63 Linear Thompson Sampling Revisited<br>
  217 Horde of Bandits using Gaussian Markov Random Fields<br>
  225 The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits<br>
  304 Improved Strongly Adaptive Online Learning using Coin Betting<br><br>

<b>12:10-2:00</b> Lunch on your own<br><br>
<font color=blue><b>1:00-3:00</b> Registration Desk <br><br></font>

<b>2:00-3:40</b> <u>Nonparametric methods</u>, Crystal Ballroom 1, 2 <br>
<i>Session Chair: Byron Boots<br></i>
  97 Poisson intensity estimation with reproducing kernels<br>
  249 Attributing Hacks<br>
  248 Regression Uncertainty on the Grassmannian<br>
  401 Modal-set estimation with an application to clustering<br><br>

<b>3:40-4:10</b> Coffee break, Crystal Atrium <br><br>

<b>4:10-7:00</b> Poster Session (with light snacks), Crystal Ballroom 3, 4<br>
<a href="#foo2" onclick="toggle_visibility('foo2');">See poster list.</a>
<div id="foo2" style="display:none;">
<font size=1px>
  tP01: 352 Scalable variational inference for super resolution microscopy<br>
  tP02: 342 Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification<br>
  tP03: 249 Attributing Hacks<br>
  tP04: 317 Fairness Constraints: Mechanisms for Fair Classification<br>
  tP05: 116 Encrypted accelerated least squares regression<br>
  tP06: 294 DP-EM: Differentially Private Expectation Maximization<br>
  tP07: 539 Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain<br>
  tP08: 328 A Learning Theory of Ranking Aggregation<br>
  tP09: 406 Relativistic Monte Carlo <br>
  tP10: 119 Gray-box inference for structured Gaussian process models<br>
  tP11: 11 Clustering from Multiple Uncertain Experts<br>
  tP12: 139 Combinatorial Topic Models using Small-Variance Asymptotics<br>
  tP13: 212 High-dimensional Time Series Clustering via Cross-Predictability<br>
  tP14: 516 Convergence rate of stochastic k-means<br>
  tP15: 183 Fast column generation for atomic norm regularization<br>
  tP16: 492 Binary and Multi-Bit Coding for Stable Random Projections<br>
  tP17: 30 On the learnability of fully-connected neural networks<br>
  tP18: 362 Fast rates with high probability in exp-concave statistical learning<br>
  tP19: 526 Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions<br>
  tP20: 441 Inference Compilation and Universal Probabilistic Programming<br>
  tP21: 199 Gradient Boosting on Stochastic Data Streams<br>
  tP22: 114 Localized Lasso for High-Dimensional Regression<br>
  tP23: 366 Learning with feature feedback: from theory to practice<br>
  tP24: 47 Consistent and Efficient Nonparametric Different-Feature Selection<br>
  tP25: 148 Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models<br>
  tP26: 215 Learning Nonparametric Forest Graphical Models with Prior Information<br>
  tP27: 301 Efficient Algorithm for Sparse Tensor-variate Gaussian Graphical Models via Gradient Descent<br>
  tP28: 383 Belief Propagation in Conditional RBMs for Structured Prediction<br>
  tP29: 393 A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models<br>
  tP30: 422 Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields<br>
  tP31: 99 Generalized Pseudolikelihood Methods for Inverse Covariance Estimation<br>
  tP32: 327 A New Class of Private Chi-Square Hypothesis Tests<br>
  tP33: 32 An Information-Theoretic Route from Generalization in Expectation to Generalization in Probability<br>
  tP34: 405 Local Group Invariant Representations via Orbit Embeddings<br>
  tP35: 439 Spatial Decompositions for Large Scale SVMs<br>
  tP36: 488 Distributed Adaptive Sampling for Kernel Matrix Approximation<br>
  tP37: 97 Poisson intensity estimation with reproducing kernels<br>
  tP38: 268 Scalable Learning of Non-Decomposable Objectives<br>
  tP39: 414 Fast Classification with Binary Prototypes<br>
  tP40: 497 Label Filters for Large Scale Multilabel Classification<br>
  tP41: 150 Efficient Rank Aggregation via Lehmer Codes<br>
  tP42: 325 A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation<br>
  tP43: 45 Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis<br>
  tP44: 469 Optimal Recovery of Tensor Slices<br>
  tP45: 290 Hit-and-Run for Sampling and Planning in Non-Convex Spaces<br>
  tP46: 312 Black-box Importance Sampling<br>
  tP47: 361 Sequential Graph Matching with Sequential Monte Carlo<br>
  tP48: 134 On the Troll-Trust Model for Edge Sign Prediction in Social Networks<br>
  tP49: 153 Nonlinear ICA of Temporally Dependent Stationary Sources<br>
  tP50: 216 Sparse Randomized Partition Trees for Nearest Neighbor Search<br>
  tP51: 341 Structured adaptive and random spinners for fast machine learning computations<br>
  tP52: 401 Modal-set estimation with an application to clustering<br>
  tP53: 435 Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning<br>
  tP54: 138 Online Optimization of Smoothed Piecewise Constant Functions<br>
  tP55: 178 Contextual Bandits with Latent Confounders: An NMF Approach<br>
  tP56: 200 Online Learning and Blackwell Approachability with Partial Monitoring: Optimal Convergence Rates<br>
  tP57: 217 Horde of Bandits using Gaussian Markov Random Fields<br>
  tP58: 221 Trading off Rewards and Errors in Multi-Armed Bandits<br>
  tP59: 225 The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits<br>
  tP60: 250 Unsupervised Sequential Sensor Acquisition<br>
  tP61: 304 Improved Strongly Adaptive Online Learning using Coin Betting<br>
  tP62: 35 Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection<br>
  tP63: 63 Linear Thompson Sampling Revisited<br>
  tP64: 96 Regret Bounds for Lifelong Learning<br>
  tP65: 106 Regret Bounds for Transfer Learning in Bayesian Optimisation<br>
  tP66: 111 Scaling Submodular Maximization via Pruned Submodularity Graphs<br>
  tP67: 23 Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach<br>
  tP68: 360 Linear Convergence of Stochastic Frank Wolfe Variants<br>
  tP69: 386 Finite-sum Composition Optimization via Variance Reduced Gradient Descent<br>
  tP70: 40 Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains<br>
  tP71: 467 Initialization and Coordinate Optimization for Multi-way Matching<br>
  tP72: 52 Less than a Single Pass: Stochastically Controlled Stochastic Gradient<br>
  tP73: 521 Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition<br>
  tP74: 193 Exploration-Exploitation in MDPs with Options<br>
  tP75: 515 Value-Aware Loss Function for Model-based Reinforcement Learning<br>
  tP76: 84 Learning Nash Equilibrium for General-Sum Markov Games from Batch Data<br>
  tP77: 324 Frequency Domain Predictive Modelling with Aggregated Data<br>
  tP78: 394 Communication-efficient Distributed Sparse Linear Discriminant Analysis<br>
  tP79: 402 Compressed Least Squares Regression revisited<br>
  tP80: 219 Random projection design for scalable implicit smoothing of randomly observed stochastic processes<br>
  tP81: 78 Learning Theory for Conditional Risk Minimization<br>
  tP82: 248 Regression Uncertainty on the Grassmannian<br>
  tP83: 299 Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems<br>
  tP84: 55 Learning Time Series Detection Models from Temporally Imprecise Labels<br>
</font>
</div>
<br><br>

<h2>21-Apr (Fri)</h2>

<b>7:30-9:00</b> Breakfast, Windows on the Green & Chart Room <br><br>
<font color=blue><b>8-10</b> Registration Desk <br><br></font>

<b>9:00-10:00</b><font color="red"> Invited Talk, Cynthia Rudin, Crystal Ballroom 1, 2</font><br>
<b>What Are We Afraid Of?: Computational Hardness vs the Holy Grail of Interpretability in Machine Learning.</b>
 <a href="#foo3" onclick="toggle_visibility('foo3');">See abstract.</a> <a href="http://prezi.com/6i5xnwf-snwf/?utm_campaign=share&rc=ex0share&utm_medium=copy">See slides</a>.
<div id="foo3" style="display:none;">
<i>
Is there always a tradeoff between accuracy and interpretability? This is a very old AI question. Many people have claimed that they have investigated the answer to this question, but it is not clear that these attempts have been truly serious. If we try to investigate this claim by comparing interpretable modeling algorithms (like decision trees - say CART, C4.5) to a black box method that optimizes only accuracy (SVM or neural networks), we will not find the answer. This is not a fulfilling comparison - the methods for producing interpretable models are greedy myopic methods with no global objective, whereas the black box algorithms have global objectives and principled optimization routines. In order to actually answer this question, we would have to compare an "optimal" interpretable model to an optimal black box model. This means we actually need optimality for interpretable models. This, of course, leads to computationally hardness, which scares us. On the other hand, we have computing power like never before. So do we truly know what we are afraid of any more? 
In this talk I will discuss algorithms for interpretable machine learning. Some of these algorithms are designed to create certificates of nearness to optimality. I will focus on some of our most recent work, including (1) work on optimal rule list models using customized bounds and data structures (these are an alternative to CART) (2) work on optimal scoring systems (alternatives to logistic regression + rounding). Further, since we have methods that can produce optimal or near-optimal models, we can use them to produce interesting new forms of interpretable models. These new forms were simply not possible before, since they are almost impossible to produce using traditional techniques (like greedy splitting and pruning). In particular: (3) Falling rule lists, (4) Causal falling rule lists, and (5) Cost-effective treatment regimes. Work on (1) is joint with postdoc Elaine Angelino, students Nicholas Larus-Stone and Daniel Alabi, and colleague Margo Seltzer. Work on (2) is joint with student Berk Ustun. Work on (3) and (4) are joint with students Fulton Wang and Chaofan Chen, and (5) is an AISTATS 2017 paper that is joint work with student Himabindu Lakkaraju.<br>
<font color=blue>
<u>Bio:</u> Cynthia Rudin is an associate professor of computer science and electrical and computer engineering at Duke University, and directs the Prediction Analysis Lab. Her interests are in machine learning, data mining, applied statistics, and knowledge discovery (Big Data). Her application areas are in energy grid reliability, healthcare, and computational criminology. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo where she received the College of Arts and Sciences Outstanding Senior Award in Sciences and Mathematics, and three separate outstanding senior awards from the departments of physics, music, and mathematics. She received a PhD in applied and computational mathematics from Princeton University. She is the recipient of the 2013 and 2016 INFORMS Innovative Applications in Analytics Awards, an NSF CAREER award, was named as one of the "Top 40 Under 40" by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. Work from her lab has won 10 best paper awards in the last 5 years. Her work has been featured in Businessweek, The Wall Street Journal, the New York Times, the Boston Globe, the Times of London, Fox News (Fox & Friends), the Toronto Star, WIRED Science, U.S. News and World Report, Slashdot, CIO magazine, Boston Public Radio, and on the cover of IEEE Computer. She is past chair of the INFORMS Data Mining Section, and is currently chair-elect of the Statistical Learning and Data Science section of the American Statistical Association.</i> 
</font>
</div>
<br><br>

<b>10:00-10:30</b> Coffee Break, Crystal Atrium<br><br>

<b>10:30-12:10</b> <u>Theory</u>, Crystal Ballroom 1, 2<br>
<i>Session Chair: Sanjoy Dasgupta</i><br>
  94 Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation<br>
  68 A Sub-Quadratic Exact Medoid Algorithm<br>
  456 On the Interpretability of Conditional Probability Estimates in the Agnostic Setting<br>
  209 Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers<br><br>

<b>12:10-2:00</b> Lunch on your own<br><br>
<font color=blue><b>1:00-3:00</b> Registration Desk <br><br></font>

<b>2:00-3:40</b> <u>Approximate Inference and MCMC</u>,  Crystal Ballroom 1, 2 <br>
<i>Session Chair: Simon Lacoste-Julien</i><br>
  51 Annular Augmentation Sampling<br>
  101 Removing Phase Transitions from Gibbs Measures<br>
  170 Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms<br>
  174 Asymptotically exact inference in differentiable generative models<br><br>

<b>3:40-4:10</b> Coffee Break, Crystal Atrium <br><br>

<b>4:10-7:00</b> Poster Session (with light snacks), Crystal Ballroom 3, 4 <br>
<a href="#foo4" onclick="toggle_visibility('foo4');">See poster list.</a>
<div id="foo4" style="display:none;">
 <font size = 1px>
  fP01: 82 Near-optimal Bayesian Active Learning with Correlated and Noisy Tests<br>
  fP02: 9 Large-Scale Data-Dependent Kernel Approximation<br>
  fP03: 86 Distance Covariance Analysis<br>
  fP04: 228 Rank Aggregation and Prediction with Item Features<br>
  fP05: 420 Signal-based Bayesian Seismic Monitoring<br>
  fP06: 60 Learning Cost-Effective and Interpretable Treatment Regimes<br>
  fP07: 170 Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms<br>
  fP08: 174 Asymptotically exact inference in differentiable generative models<br>
  fP09: 288 Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models<br>
  fP10: 196 Local Perturb-and-MAP for Structured Prediction<br>
  fP11: 51 Annular Augmentation Sampling<br>
  fP12: 104 Performance Bounds for Graphical Record Linkage<br>
  fP13: 180 Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets<br>
  fP14: 273 CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC<br>
  fP15: 298 On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior<br>
  fP16: 345 Learning Optimal Interventions<br>
  fP17: 419 Learning Structured Weight Uncertainty in Bayesian Neural Networks<br>
  fP18: 429 Discovering and Exploiting Additive Structure for Bayesian Optimization<br>
  fP19: 211 Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes<br>
  fP20: 416 Prediction Performance After Learning in Gaussian Process Regression<br>
  fP21: 129 A Framework for Optimal Matching for Causal Inference<br>
  fP22: 523 Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness<br>
  fP23: 182 Least-Squares Log-Density Gradient Clustering for Riemannian Manifolds<br>
  fP24: 68 A Sub-Quadratic Exact Medoid Algorithm<br>
  fP25: 117 Random Consensus Robust PCA<br>
  fP26: 224 Adaptive ADMM with Spectral Penalty Parameter Selection<br>
  fP27: 94 Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation<br>
  fP28: 214 Data Driven Resource Allocation for Distributed Learning<br>
  fP29: 278 Comparison-Based Nearest Neighbor Search<br>
  fP30: 363 Generalization Error of Invariant Classifiers<br>
  fP31: 456 On the Interpretability of Conditional Probability Estimates in the Agnostic Setting<br>
  fP32: 141 ConvNets with Smooth Adaptive Activation Functions for Regression<br>
  fP33: 404 Diverse Neural Network Learns True Target Functions<br>
  fP34: 209 Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers<br>
  fP35: 329 Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere<br>
  fP36: 69 Minimax density estimation for growing dimension<br>
  fP37: 540 Scalable Greedy Feature Selection via Weak Submodularity<br>
  fP38: 449 Information Projection and Approximate Inference for Structured Sparse Variables<br>
  fP39: 227 Dynamic Collaborative Filtering With Compound Poisson Factorization<br>
  fP40: 242 Information-theoretic limits of Bayesian network structure learning<br>
  fP41: 3 Conditions beyond treewidth for tightness of higher-order LP relaxations<br>
  fP42: 347 A Lower Bound on the Partition Function of Attractive Graphical Models in the Continuous Case<br>
  fP43: 531 Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models<br>
  fP44: 504 Sequential Multiple Hypothesis Testing with Type I Error Control<br>
  fP45: 22 Lower Bounds on Active Learning for Graphical Model Selection<br>
  fP46: 498 Learning from Conditional Distributions via Dual Embeddings<br>
  fP47: 13 Online Nonnegative Matrix Factorization with General Divergences<br>
  fP48: 161 Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines<br>
  fP49: 384 Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data<br>
  fP50: 417 Communication-Efficient Learning of Deep Networks from Decentralized Data<br>
  fP51: 520 Automated Inference with Adaptive Batches<br>
  fP52: 190 Bayesian Hybrid Matrix Factorisation for Data Integration<br>
  fP53: 192 Co-Occurring Directions Sketching for Approximate Matrix Multiply<br>
  fP54: 205 Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)<br>
  fP55: 442 Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications<br>
  fP56: 245 Markov Chain Truncation for Doubly-Intractable Inference<br>
  fP57: 101 Removing Phase Transitions from Gibbs Measures<br>
  fP58: 484 Distribution of Gaussian Process Arc Lengths<br>
  fP59: 76 Estimating Density Ridges by Direct Estimation of Density-Derivative-Ratios<br>
  fP60: 213 Minimax Approach to Variable Fidelity Data Interpolation<br>
  fP61: 132 Stochastic Rank-1 Bandits<br>
  fP62: 26 Sparse Accelerated Exponential Weights<br>
  fP63: 479 Efficient Online Multiclass Prediction on Graphs via Surrogate Losses<br>
  fP64: 124 Frank-Wolfe Algorithms for Saddle Point Problems<br>
  fP65: 167 Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot<br>
  fP66: 175 Decentralized Collaborative Learning of Personalized Models over Networks<br>
  fP67: 20 ASAGA: Asynchronous Parallel SAGA<br>
  fP68: 264 A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization<br>
  fP69: 282 A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe<br>
  fP70: 284 Faster Coordinate Descent via Adaptive Importance Sampling<br>
  fP71: 375 Tracking Objects with Higher Order Interactions via Delayed Column Generation<br>
  fP72: 399 Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage<br>
  fP73: 367 Optimistic Planning for the Stochastic Knapsack Problem<br>
  fP74: 410 Thompson Sampling for Linear-Quadratic Control Problems<br>
  fP75: 239 Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering<br>
  fP76: 302 Minimax-optimal semi-supervised regression on unknown manifolds<br>
  fP77: 2 Minimax Gaussian Classification & Clustering<br>
  fP78: 372 Identifying groups of strongly correlated variables through Smoothed Ordered Weighted L_1-norms<br>
  fP79: 494 Spectral Methods for Correlated Topic Models<br>
  fP80: 131 Quantifying the accuracy of approximate diffusions and Markov chains<br>
  fP81: 267 Hierarchically-partitioned Gaussian Process Approximation<br>
  fP82: 459 Linking Micro Event History to Macro Prediction in Point Process Models<br>
  fP83: 507 A Maximum Matching Algorithm for Basis Selection in Spectral Learning<br>
</font>
</div>
<br><br>

<b>7:15-9:00</b> <font color=red>Dinner Buffet, Panorama Ballroom<br><br></font>


<h2>22-Apr (Sat)</h2>

<b>7:30-9:00</b> Breakfast, Panorama Ballroom C, D & Terrace <br><br>
<font color=blue><b>8-10</b> Registration Desk <br><br></font>

<b>9:00-10:00</b> <font color=red>Invited Talk: Sanjoy Dasgupta. Panorama Ballroom A, B <br></font>
<b>Towards a Theory of Interactive Learning.</b>
<a href="#foo5" onclick="toggle_visibility('foo5');">See abstract.</a> <a href="sanjoy-talk.pdf">See slides</a>.
<div id="foo5" style="display:none;">
<i>"Interactive learning" refers to scenarios in which a learning agent (human or machine) engages with an information-bearing agent or system (for instance, a human expert) with the goal of efficiently arriving at a useful model. Examples include: active learning of classifiers; automated teaching systems; augmenting unsupervised learning with interactive post-editing; and so on. In particular, such interaction is a basic mechanism by which we can communicate our needs and preferences to the computers that play an increasing role in our lives.
It would be helpful to have unifying mathematical frameworks that can provide a basis for evaluating interactive schemes, and that supply generic interaction algorithms. I will describe one such mathematical framework, that covers a fairly broad range of situations, and illustrate how it yields algorithms for interactive hierarchical clustering and interactive topic modeling.<br>
<font color=blue><u>Bio</u>: Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD at UC Berkeley in 2000. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning. He is the author of a textbook, 'Algorithms', with Christos Papadimitriou and Umesh Vazirani. He was program co-chair for the Conference on Learning Theory (COLT) in 2009 and for the International Conference on Machine Learning (ICML) in 2013.</i>
</font></div>
<br><br>

<b>10:00-10:30</b> Coffee Break, Panorama Foyer<br><br>

<b>10:30-12:10</b> <u>Bayesian Methods</u>, Panorama Ballroom A, B<br>
<i>Session Chair: Rebecca Steorts</i><br>
  420 Signal-based Bayesian Seismic Monitoring<br>
  180 Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets<br>
  82 Near-optimal Bayesian Active Learning with Correlated and Noisy Tests<br>
  298 On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior<br><br>

<b>12:10-1:30</b> Lunch on your own <b>(note shorter lunch)</b> <br><br>

<b>1:30-3:10</b> <u>Large-scale learning</u>, Panorama Ballroom A, B <br>
<i>Session Chair: Pradeep Ravikumar</i><br>
  417 Communication-Efficient Learning of Deep Networks from Decentralized Data<br>
  520 Automated Inference with Adaptive Batches<br>
  224 Adaptive ADMM with Spectral Penalty Parameter Selection<br>
  372 Identifying groups of strongly correlated variables through Smoothed Ordered Weighted L_1-norms<br><br>
 
<b>3:10-3:40</b> Coffee Break Panorama Foyer <br><br>

<b>3:40-5:20</b> <u>Sketching</u>, Panorama Ballroom A, B <br>
<i>Session Chair: Anastasios (Tasos) Kyrillidis</i><br>
  384 Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data<br>
  399 Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage<br>
  192 Co-Occurring Directions Sketching for Approximate Matrix Multiply<br>
  117 Random Consensus Robust PCA<br><br>