# Bayesian numerical analysis

Distributing points $\{x_i\}_{i=1}^n$ on the sphere as to minimize the mean square error

$\mathbb{E}\left[\left(q_n(f) - \int_{\mathbb{S}^2}f(s)\,ds\right)^2\right]$

of the quadrature formula $q_n(f) =\frac{1}{n}\sum_{i=1}^n f(x_i)$, where $f$ is a centered Gaussian process with covariance function $C(x,y) = \exp(\langle x, y \rangle)$. Shown is $n=6, 12, 23$.

# Sampling Lipschitz Continuous Densities

A simple and efficient algorithm for generating random variates from the class of Lipschitz continuous densities is described. A MatLab implementation is freely available on GitHub.

PDF note.

# Introduction

Le problème est de calculer $I(f) = \int f d\lambda,$$\mu$ est une mesure de probabilité sur un espace $X$ et $f : X \rightarrow \mathbb{R}$ est intégrable. Si $\{X_n\}$ est une suite de variables aléatoires indépendantes et distribuées selon $\mu$, alors on peut approximer $I(f)$ par

$I_n(f) = \frac{1}{n}\sum_{i=1}^n f(X_i),$

qui est dit un estimateur de Monte-Carlo.
En pratique, il peut être difficile de générer $X_n \sim \lambda$. On préférera alors introduire une mesure $\mu$, avec $\lambda$ absolument continue par rapport à $\mu$, de sorte que

$I_n(f;\mu) = \frac{1}{n}\sum_{i=1}^n f(Y_i) \tfrac{d\lambda}{d\mu}(Y_i), \quad Y_i \sim^{ind.} \mu,$

soit une estimée de $I(f)$ plus commode à calculer. Cette technique, dite de l’échantillonnage préférentiel, peut aussi servir à améliorer la qualité de l’estimateur $I_n$ par exemple en réduisant sa variance.Read More »

# Bayesian binary classification using partitions of unity

Sunday afternoon project. I think it is possible to get topological garanties for the reconstruction of the classification boundary using the Nash-Tognoli theorem.

# 1. The problem

Points $x_i$, $i=1\dots, N$ are distributed on some space $\mathbb{M}$ and associated a label $\ell_i\in \{0,1\}$. It is assumed that

$\ell_i \sim \text{Ber}(p(x_i)), \qquad (1)$

where $p: \mathbb{M} \rightarrow [0,1]$ is some unknown integrable function. Estimating $p$ from the data $\{(x_i,\ell_i)\,|\, i=1,2,\dots, N\}$ allows us to predict $\ell_{n+1}$ given $x_{n+1}$.Read More »

# Bayesian learning

Friday july 28 at 17:00
Rutherford Physics Building, Room 118, McGill

Next week, I’ll be talking about Bayesian learning at the Mathematical congress of the americas and at the Canadian undergraduate mathematics conference. These are somewhat challenging talks: I need to sell the idea of Bayesian statistics to a general mathematical audience (which knows nothing about it), interest them in some though problems of Bayesian nonparametrics, and then present some of our research results. This must be done in under 20 minutes.

To make the presentation more intuitive and accessible, I borrowed some language from machine learning. I’m talking about learning rather than inference, uncertain knowledge rather than subjective belief, and “asymptotic correctness” rather than consistency. These are essentially synonymous, although some authors might use them in different ways. This should not cause problems for this introductory talk.Read More »

# A limit

Félix Locas presented me this problem.

Let $r(n) = \lfloor \log_2 \frac{n}{\log_2 n} \rfloor$. Show that

$\lim_{n \rightarrow \infty} \left( \log 2+\sum_{k=1}^{r(n)} \frac{1}{k(k+1) 2^k} \right)^n = 1.$

## My solution

The series $\sum_{k=1}^{\infty} \frac{1}{k(k+1) 2^k}$ is easy to calculate. It is, for instance, the difference between the integrals of geometric series:

$\sum_{k=1}^\infty \frac{1}{k(k+1) 2^k} = \sum_{k=1}^\infty \frac{1}{k 2^k} - \sum_{k=1}^\infty \frac{1}{(k+1) 2^k} = 1-\log 2.$Read More »

# The discretization trick

I explain the discretization trick that I mentioned in my previous post (Posterior consistency under possible misspecification). I think it was introduced by Walker (New approaches to Bayesian consistency (2004)).

Let $\mathbb{F}$ be a set of densities and let $\Pi$ be a prior on $\mathbb{F}$. If $x_1, x_2, x_3, \dots$ follows some distribution $P_0$ having density $f_0$, then the posterior distribution of $\Pi$ can be written as

$\Pi(A | x_1, \dots, x_n) \propto \int_A \prod_{i=1}^n f(x_i) \Pi(df).$

The discretization trick is to find densities $f_{1}, f_2, f_3, \dots$ in the convex hull of $A$ (taken in the space of all densities) such that

$\int_A \prod_{i=1}^n f(x_i) = \prod_{i=1}^n f_i(x_i) \Pi(A).$Read More »

# Posterior consistency under (possible) misspecification

We assume, without too much loss of generality, that our priors are discrete. When dealing with Hellinger separable density spaces, it is possible to discretize posterior distributions to study consistency (see this post about it).

Let $\Pi$ be a prior on a countable space $\mathcal{N} = \{f_1, f_2, f_3, \dots\}$ of probability density functions, with $\Pi(f) > 0$ for all $f \in \mathcal{N}$. Data $X_1, X_2, X_3, \dots$ follows (independently) some unknown distribution $P_0$ with density $f_0$.

We denote by $D_{KL}(f_0, f) = \int f_0 \log\frac{f_0}{f}$ the Kullback-Leibler divergence and we let $D_{\frac{1}{2}}(f_0, f) = 1 - \int \sqrt{f_0 f}$ be half of the squared Hellinger distance.

The following theorem states that the posterior distribution of $\Pi$ accumulates in Hellinger neighborhoods of $f_0$, assuming the prior is root-summable (i.e. $\sum_{f \in \mathcal{N}} \Pi(f)^\alpha < \infty$ for some $\alpha > 0$) . In the well-specified case (i.e. $\inf_{f \in \mathcal{N}} D_{KL}(f_0, f) = 0$), the posterior accumulates in any neighborhood of $f_0$. In the misspecified case, small neighborhoods of $f_0$ could be empty, but the posterior distribution still accumulates in sufficiently large neighborhoods (how large exactly is a function of $\alpha$ and $\inf_{f \in \mathcal{N}} D_{KL}(f_0, f)$).Read More »

# The choice of prior in bayesian nonparametrics – part 2

See part 1. Most proofs are omitted; I’ll post them with the complete pdf later this week.

# The structure of $\mathcal{M}$

Recall that $\mathbb{M}$ is is a Polish space (ie. a complete and separable metric space). It is endowed with its borel $\sigma$-algebra $\mathfrak{B}$ which is the smallest family of subsets of $\mathbb{M}$ that contains its topology and that is closed under countable unions and intersections. All subsets of $\mathbb{M}$ we consider in the following are supposed to be part of $\mathfrak{B}$. A probability measure on $\mathbb{M}$ is a function $\mu : \mathfrak{B} \rightarrow [0,1]$ such that for any countable partition $A_1, A_2, A_3, \dots$ of $\mathbb{M}$ we have that $\sum_{i=1}^\infty \mu(A_i) = 1.$ The set $\mathcal{M}$ consists of all such probability measures.

Note that since $\mathbb{M}$ is complete and separable, every probability measure $\mu \in \mathcal{M}$ is regular (and tight). It means that the measure of any $A\subset \mathbb{M}$ can be well approximated from the measure of compact subsets of $A$ as well as from the measure of open super-sets of $A$:

$\mu(A) = \sup \left\{\mu(K) \,|\, K \subset A \text{ is compact}\right\}\\ = \inf \left\{\mu(U) \,|\, U \supset A \text{ is open}\right\}.$

## Metrics on $\mathcal{M}$

Let me review some facts. A natural metric used to compare the mass allocation of two measures $\mu, \nu \in \mathbb{M}$ is the total variation distance defined by

$\|\mu - \nu\|_{TV} = \sup_{A \subset \mathbb{M}}|\mu(A) - \nu(A)|.$Read More »