(§) The article A flexible Bayesian tool for CoDa mixed models (Martı́nez-Minaya and Rue 2024) looks promising; it frames compositional count data in a way so that it’s possible to fit by using INLA.
(§) Dimension of of log ratio vectors matters. Compare the CLR with (a hundred zeros repeated). The former is roughly in the simplex, while the latter is roughly , which is practically uniform. The log ratios between coordinates 1 and 2 (or 1 and 3) remain the same however.
(§) Suppose we’re simulating (count) data where a subcomposition made up of components is active and the remaining components are inactive. We could start with log ratios (not centered, but relative to some unknown denominator) and set the inactive variables to all-0, so we have a log ratio vector . What size should be?
As the note above suggests this should depend on dimension. We could decide that on average we want to have some set proportion, , of counts to fall in the active subcomposition; let’s say . Then the probabilities (for a multinomial draw) corresponding to the active parts should sum to . Since we invert log ratios by , where is the closure operator, it seems that we want on average.
We’ll draw independently and choose so that above is true on average. If then (cf. this) so if we take the expectation of we get
(§) The sum-to-one constraint forces a negative correlation (something goes up and something else must go down) but the effect might be ignored in high dimension under mild assumptions. From Townes et al. (2019), if we assume
with and , the marginals are
with , and . This last identity hints at the well-known small big Poisson horse kick approximation to the binomial.
More interesting for our purpose is the expression for the covariance between two compositional parts,
which if both and are small is practically zero. In some settings it might be quite reasonable to assume that , ie they are roughly of similar magnitude. Then gets quite small as (the dimension) grows.
Note to self: But big Q what if we assume some prior over and this one has a covariance also? Where does it go
Backlinks: