Bits of string: compositional data

(§) The article A flexible Bayesian tool for CoDa mixed models (Martı́nez-Minaya and Rue 2024) looks promising; it frames compositional count data in a way so that it’s possible to fit by using INLA.

(§) Dimension of of log ratio vectors matters. Compare the CLR $(-1, 1, 0)$ with $(-1, 1, 0, 0, \ldots)$ (a hundred zeros repeated). The former is roughly $(0.09, 0.67, 0.24)$ in the simplex, while the latter is roughly $(0.004, 0.03, 0.01, 0.01, \ldots)$ , which is practically uniform. The log ratios between coordinates 1 and 2 (or 1 and 3) remain the same however.

(§) Suppose we’re simulating (count) data where a subcomposition made up of $n_1$ components is active and the remaining $n_2$ components are inactive. We could start with log ratios (not centered, but relative to some unknown denominator) and set the inactive variables to all-0, so we have a log ratio vector $x = (x_1, \ldots, x_{n_1}, 0, 0, \ldots, 0)$ . What size should $x_i$ be?

As the note above suggests this should depend on dimension. We could decide that on average we want to have some set proportion, $p$ , of counts to fall in the active subcomposition; let’s say $p = 1/4$ . Then the probabilities (for a multinomial draw) corresponding to the active parts should sum to $1/4$ . Since we invert log ratios by $\text{lr}^{-1}(x) = \mathcal C(e^{x_1}, \ldots ,e^{x_{n_1}}, 1,1, \ldots, 1)$ , where $\mathcal C$ is the closure operator, it seems that we want $3\sum e^{x_i} = n_2\ (*)$ on average.

We’ll draw independently $X_i \sim N(0, \sigma)$ and choose $\sigma$ so that $(*)$ above is true on average. If $Y \sim N(\mu, \sigma)$ then $\text{E} e^Y = e^{\mu + \sigma^2/2}$ (cf. this) so if we take the expectation of $(*)$ we get

$3n_1e^{\sigma^2/2} = n_2 \implies \sigma = \sqrt{2\ln\frac{n_2}{3n_1}}.$

Backlinks:

(202508181349) Bits of string

Martı́nez-Minaya, Joaquı́n, and Haavard Rue. 2024. “A Flexible Bayesian Tool for CoDa Mixed Models: Logistic-Normal Distribution with Dirichlet Covariance.” Statistics and Computing 34 (3): 116. https://doi.org/10.1007/s11222-024-10427-3.

this file last touched 2025.09.05