Constrained semiparametric modelling (for directional statistics)



Angular data arises in many scientific fields, such as in experimental biology for the study of animal orientation, and in bioinformatics in relation to the protein structure prediction problem.



The statistical analysis of this data requires adapted tools such as 2\pi-periodic density models. Fernandez-Duran (Biometrics, 60(2), 2004) proposed non-negative trigonometric sums (i.e. non-negative trigonometric polynomials) as a flexible family of circular distributions. However, the coefficients of trigonometric polynomials expressed in the standard basis 1, \cos(x), \sin(x), \dots are difficult to interpret and we do not see how an informative prior could be specified through this parametrization. Moreover, the use of this basis was criticized by Ferreira et al. (Bayesian Analysis, 3(2), 2008) as resulting in a “wigly approximation, unlikely to be useful in most real applications”.

Trigonometric density basis

Here, we suggest the use of a density basis of the trigonometric polynomials and argue it is well suited to statistical applications. In particular, coefficients of trigonometric densities expressed in this basis possess an intuitive geometric interpretation. Furthermore, we show how “wiggliness” can be precisely controlled using this basis and how another geometric constraint, periodic unimodality, can be enforced [first proposition on the poster]. To ensure that nothing is lost by using this basis, we also show that the whole model consists of precisely all positive trigonometric densities, together with the basis functions [first theorem on the poster].

Prior specification

Priors can be specified on the coefficients of mixtures in our basis and on the degree of the trigonometric polynomials to be used. Through the interpretability of the coefficients and the shape-preserving properties of the basis, different types of prior knowledge may be incorporated. Together with an approximate understanding of mass allocation, these include:

  • periodic unimodality;
  • bounds on total variation; and
  • knowledge of the marginal distributions (in the multivariate case).

The priors obtained this way are part of a well-studied family called sieve priors, including the well-known Bernstein-Dirichlet prior, and are finite mixtures with an unknown number of components. Most results and interpretations about the Bernstein-Dirichlet prior (see Petrone & Wasserman (J. R. Stat. Soc. B., 64(1),  2002), Kruijer and Van der Vaart (J. Stat. Plan. Inference, 138(7), 2008), McVinish et al. (Scand. J. Statist., 36(2), 2009) can carry over to the priors we consider, but we dot not discuss them further.

Approximation-theoric framework

Our density models arise as the image of “shape-perserving” linear approximation operators. This approximation-theoric relationship is used to obtain a notably large prior Kullback-Leibler support and ensures strong posterior consistency at all bounded (not necessarily continuous) density. The result partly relies on known properties of sieve priors, as well as general consistency results (Walker (Ann. Statist., 32(5), 2004)), but extends known result by removing an usual continuity hypothesis on the densities at which consistency is achieved (see Wu & Ghosal (‎Electron. J. Stat., 2, 2008), Petrone & Veronese (Statistica Sinica, 20, 2010)). For contraction rates, higher order smoothness conditions are usually required (see Shen & Ghosal (Scand. J. Statist., 42(4), 2015)).

For example, consider the prior induced by the random density

T_n \mathcal{D} := \sum_i \mathcal{D}(R_{i,n}) C_{i,n},\qquad (1)

where \mathcal{D} is a Dirichlet process, n is distributed on \mathbb{N} and R_{i,n} is a partition of the circle. It has the strong posterior consistency at all bounded density provided that the associated operator

T_n : f \mapsto \sum_i C_{i,n} \int_{R_{i,n}} f

is such that \|T_n f - f\|_\infty \rightarrow 0 for all continuous f.

More generally, let \mathbb{F} be a set of bounded densities on some compact metric space \mathbb{M}, let T_n : L^1(\mathbb{M}) \rightarrow L^1(\mathbb{M}), n \in \mathbb{N}, be a sequence of operators that are:

  • shape preserving: T_n maps densities to densities and T_n(\mathbb{F}) \subset \mathbb{F}; and
  • approximating: \|T_n f - f\|_\infty \rightarrow 0 for all continuous f;

and finally let \Pi_n be priors on T_n(\mathbb{F}) with full support. A sieve prior on \mathbb{F} is defined by

\Pi : A \mapsto \sum_n \rho(n) \Pi_n(A \cap T_n(\mathbb{F})).

If 0 < \rho(n) < Ce^{-c d_n} for some increasing sequence d_n bounding the dimensions of T_n (\mathbb{F}), then the posterior distribution of \Pi is strongly consistent at each density of \mathbb{F}.

The approximation theory literature is rich in such operators. The theorem shows that they provide strongly consistent priors on arbitrary density spaces simply given priors \Pi_n on T_n(\mathbb{F}).

Basic density estimation:


A thousand samples (grey histogram) were drawn from the density in orange. The prior is defined by (1) with the Dirichlet process centered on the uniform density and with a precision parameter of 2. The degree n is distributed as a \text{Poiss}(15). The blue line is the posterior mean, the dark blue shaded region is a 50% pointwise credible region around the median, and the light blue shaded region is a 90% credible region.


Présentation (20 minutes) au séminaire du 5e.

Je présente le théorème d’approximation de Weierstrass pour les fonctions périodiques, en utilisant une base des polynômes trigonométriques récemment suggérée par Róth et al. (2009). Celle-ci se prête naturellement bien à notre application.

Théorème d’approximation de Weierstrass.
Soit f : \mathbb{R} \rightarrow \mathbb{R} une fonction 2\pi-périodique. Si f est continue, alors on peut construire des polynômes trigonométriques f_1, f_2, f_3, \dots tels que

f(x) = \sum_{i=1}^{\infty} f_i(x)

et tels que la convergence de la série ci-dessus est uniforme.

Ce théorème intervient dans plusieurs domaines: en topologie pour démontrer le théorème du point fixe de Brouwer, en géométrie pour l’inégalité isopérimétrique et en géométrie algébrique pour le théorème de Nash-Tognoli. Il implique que \{1, \cos(x), \sin(x), \cos(2x), \sin(2x), \dots\}, en tant que système orthonormal, est complèt dans L^2(\mathbb{S}^1). Plus généralement, on s’en sert pour ramener un problème sur les fonctions continues à un problème sur les polynômes, où le calcul différentiel et l’algèbre linéaire s’appliquent. Les démonstrations constructives du théorème fournissent de plus des outils permettant d’effectuer la régression ou la reconstruction de courbes et de surfaces.Read More »

Constructive approximation of compact hypersurfaces

PDF text.

Let M \subset \mathbb{R}^k be a compact manifold of codimension 1. We show that M can be well approximated by a part of an algebraic manifold.

Theorem 1.
For all \varepsilon > 0, there exists c > 0 and a polynomial function P defined on \mathbb{R}^k such that N := P^{-1}(0) \cap (-c,c)^k is diffeomorphic to M and

\sup_{x \in N} \inf_{y \in M} \|x - y\| < \varepsilon.

This was proved by Seifert in a 1936 german language paper. It was later generalized to the Nash-Tognoli theorem which implies that non-singular real algebraic sets have precisely the same topological invariants as compact manifolds [1].

Here, however, our motivations are more elementary and practical. Our proof of theorem 1 points towards constructive approximation processes and shows the separability of the space of all compact hypersurfaces, under an appropriate topology. This is relevant in statistics and computer graphics, for smooth hypersurface regression and reconstruction. Growing sequences of finite dimensional search spaces of smooth manifolds are required in these applications.

In preparation for our proof, in section 2, we also discuss the Jordan-Brouwer theorem, the orientability of M and that M = f^{-1}(0) for some function f having regular value 0. We show that these three facts are essentially equivalent, in the sense that any one can be quite easily obtained from another.

Read More »

Constructive approximation on compact manifolds

I presented this (pdf, in french) in a short talk for a differential topology course.

I also dabbled with (pdf, in french) the approximation of compact hypersurfaces. I wasn’t able to get a constructive result in time, so I left it as a very rough draft. [I posted a much improved follow up in April.] In the document, I sketch a proof of the following.

Theorem. Let M be a compact hypersuface of \mathbb{R}^n. There exists a sequence of polynomials \{P_n\} defined on a compact of \mathbb{R}^n such that for n sufficiently large, P_n^{-1}(0) is a hypersurface and

\text{dist}(P_n^{-1}(0), M) \rightarrow 0.

Linear approximation operators and statistical models

We discuss the approximation properties of sequences of linear operators T_n mapping densities to densities. We give conditions for their convergence, explicit their general form, obtain rates of convergences and generalise the index parameter to obtain nets \{T_n\}_{n \in N}.

Notations. Let (\mathbb{M}, d) be a compact metric space, equipped with a finite measure \mu defined on its Borel \sigma-algebra, and denote by \mathcal{F} \subset L^1 the set of all essentially bounded probability densities on \mathbb{M}. The set \mathcal{F} is then a complete separable metric space under the total variation distance proportional to || f-g ||_1 = \int |f-g| d\mu.

In bayesian statistics, it is of interest to specify a probability measure P on \mathcal{F}, representing uncertainty about which distribution of \mathcal{F} is generating independent observations x_i \in \mathbb{M}. The problem is that \mathcal{F} is usually rather big: by Baire’s category theorem, if \mathbb{M} is not a finite set of points, then \mathcal{F} cannot be written as a countable union of finite dimensional subspaces. To help in prior elicitation, that is to help a statistician specify P, we may decompose \mathcal{F} in simpler parts.

Here, I discuss how to obtain a sequence of approximating finite dimensional sieves \mathcal{S}_n \subset \mathcal{F}, such that \cup_n \mathcal{S}_n is dense in \mathcal{F}. A prior P on \mathcal{F} may then be specified as the countable mixture

P = \sum _{n \geq 1} \alpha_n P_{\mathcal{S}_n}, \quad \alpha_n \geq 0,\, \sum_n \alpha_n = 1,

where P_{\mathcal{S}_n} is a prior on \mathcal{S}_n for all n.

Let me emphasize that the following ideas are elementary.  Some may be found, with more or less generality, in analysis and approximation theory textbooks. It is, however, interesting to recollect the facts relevant in statistical applications.

1. The basics

The finite dimensional sieves \mathcal{S}_n take the form

\mathcal{S}_n =  \left\{ \sum_{i=0}^{m_n} c_i \phi_{i,n} \right\}, \quad m_n \in \mathbb{N}

where the \phi_{i,n} are densities and the coefficients c_i range through some set which we assume contains the simplex \Delta_n = \left\{ (c_i) : \sum c_i = 1,\, c_i \geq 0 \right\}.

The following lemma gives sufficient conditions for \cup_n \mathcal{S}_n to be dense in \mathcal{F}, with the total variation distance.

Read More »