Multimarginal Flow Matching with Adversarially Learnt Interpolants; Kviman, Oskar and Tamogashev, Kirill and Branchini, Nicola and Elvira, Víctor and Lagergren, Jens and Malkin, Nikolay (NeurIPS workshop): 2nd edition of Frontiers in Probabilistic Inference: Learning meets Sampling.
Existing multimarginal flow matching (FM) methods either do not scale well with dimension or encourage trajectories to pass through intermediate marginal samples, rather than the intermediate distributions. We learn a parameterised interpolant for FM via a GAN-inspired loss, which addresses these shortcomings.
The TLDR; To estimate µ = E_p[f(θ)] when p's normalizing constant is unknown, instead of doing MCMC on p(θ) or even p(θ)|f(θ)|, or learning a parametric q(θ), we try MCMC directly on p(θ)|f(θ)- µ|, which is the asymptotic-variance minimizing proposal.
Note: we cannot do MCMC straightforwardly, as p(θ)|f(θ)- µ| cannot be evaluated - it contains µ, the quantity of interest ! So, we propose a simple iterative scheme that works: initial estimate µ₀ ; run a chain on the *approximation* p(θ)| f(θ)- µ₀ |; estimate µ again with SNIS, and keep iterating. I'm quite excited about extending this work. An imprecision in the current paper - will be fixed soon and in upcoming journal paper - is that each each time we need to plug in the global estimate of µ, not the local one build with the ``current'' MCMC samples. A CLT for the final combined estimates is coming..
Importance sampling with mixture models is all over the place (even where you don't see it). Subtractive mixture models - MMs with negative weights - are super cool and can model complex distributions more efficiently. It'd be great to use them for IS, but sampling from them is a pain. We propose an estimator that exploits that a SMM is a difference of two regular MMs, so that we can do IS and scale in higher dimension (note: sampling from an SMM requires costly autoregressive inverse transform sampling).
To estimate posterior expectations *consistently*, we need to use self-normalized importance sampling (or MCMC, but SNIS has a better variance lower bound). It is a ratio of two IS estimators. Typical diagnostics forget this, and only look at IS-weights for numerator or denominator separately. We know tho that the statistical dependence between the estimators affect the performance. Here, we try to capture this information with the concept of tail dependence of random variables, which applies in heavy tailed scenarios. Ongoing journal extension..
The self-normalized IS estimator is widely used to estimate expectations with intractable normalizing constants, for example, in Bayesian leave-one-out cross validation or likelihood free inference. In this paper, we propose a framework to understand when SNIS works and when it does not, with a generalization that allows us to overcome its limitations, with connections to continuous optimal transport. See paper abstract for more info.
Many adaptive IS (and some VI) methods are based on matching the moments of a target distributions. When the target has heavy tails, these moments can be undefined or their estimation can have high variance. We propose an AIS method that overcomes this by matching the moments of a (lighter tailed) modified target, which is exponentiated to a power alpha. Despite this, the procedure actually minimizes the alpha-divergence between the proposal and the true target. Note: many previous works propose AIS methods with heavy-tailed *proposals*, but not necessarily suitable for heavy-tailed *targets*.
Variational Resampling; Kviman, Oskar and Branchini, Nicola and Elvira, Víctor and Lagergren, Jens. In 27th Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, 2024.
A very neat idea stemming from Oskar's Master's thesis (he's impressive, isn't he ?); when we resample in PFs, we usually would like the resulting equally-weighted distribution of the resampled particles to be ``close'' in some sense to the distribution before resampling (which was unequally-weighted, in general).
Usually, resampling schemes enforce this by saying that the number of times a particle gets replicated is, on average, equal to its weight in the pre-resampling distribution. What we do here instead is to optimize the number of times a particle gets replicated so as to minimize a divergence between the post-resampling distribution and the pre-resampling distribution directly ! With a very smart algorithm again entirely due to Oskar.
Causal optimal transport of abstractions; Felekis, Yorgos and Zennaro, Fabio and Branchini, Nicola and Damoulas, Theodoros. In 3rd Conference on Causal Learning and Reasoning (CLeaR 2024).
The task of causal abstraction involves finding a mapping (a measurable transport map) between structural causal models (SCMs) and their corresponding "abstracted versions", which can be simplified or coarser SCMs (fewer variables or different functional relationships). We consider the problem of learning causal abstractions from data. We propose a framework that does so without specifying parametric relationships for the SCM functions. The method involves a multimarginal OT problem (as many marginals as there are considered interventions (not really, but roughly to get the idea)) with soft constraints and a cost function econding knowledge of the underlying causal DAGs. Nicely, the soft constraints have a do-calculus interpretation.
A kind of journal extension of the earlier ``optimized APF'' paper, where we present a perspective on PFs that emphasizes that at each iteration, we want to select a mixture proposal (mixture that arises naturally as proposal in the PF context) that is close to a mixture target. Methods in the literature match these term-by-term, while with this view, it is possible to conceive of new methods that directly match the two mixtures. To be honest, we should have way done more in this direction to show that this can be useful. Still, I do think the perspective is interesting - maybe someone comes up with a smart way to learn a mixture that is close to the ``optimal'' one.
Causal Entropy Optimization; Branchini, Nicola and Aglietti, Virginia and Dhir, Neil and Damoulas, Theodoros. In 26th Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, 2023.
In this paper, we studied the problem of "causal global optimization": finding the optimum intervention that is the minimizer of several causal effects (that is, we consider possibly intervening on many different subset of variables). When the underlying causal graph is not known, the first step is studying what happens if we assume any one of the possible graphs is the true one, and run "CBO"- causal Bayesian optimization - as normal. We studied what the effect of this kind of incorrect causal assumption is for optimization purposes. Further, since in many cases the underlying function can be optimized efficiently even if the graph is not fully known, we designed an acquisition function that automatically trades-off optimization of the effect and structure learning.
In this paper we wanted to improve on the Auxiliary Particle Filter (APF), which is thought for estimating the likelihood in sequential latent variable models with very informative observations. This algorithm however still has severe drawbacks; among some, the resampling weights are chosen independently, i.e. each particle chooses its own without "knowing" what the others are doing.
We devise a new way to optimize these resampling weights by viewing them as mixture weights of an importance sampling mixture proposal. It turns out that choosing mixture weights in order to minimize the resulting empirical variance of the importance weights leads to a convex optimization problem.