(§) The article A flexible Bayesian tool for CoDa mixed models (Martı́nez-Minaya and Rue 2024) looks promising; it frames compositional count data in a way so that it’s possible to fit by using INLA.
(§) Dimension of of log ratio vectors matters. Compare the CLR with (a hundred zeros repeated). The former is roughly in the simplex, while the latter is roughly , which is practically uniform. The log ratios between coordinates 1 and 2 (or 1 and 3) remain the same however.
(§) Suppose we’re simulating (count) data where a subcomposition made up of components is active and the remaining components are inactive. We could start with log ratios (not centered, but relative to some unknown denominator) and set the inactive variables to all-0, so we have a log ratio vector . What size should be?
As the note above suggests this should depend on dimension. We could decide that on average we want to have some set proportion, , of counts to fall in the active subcomposition; let’s say . Then the probabilities (for a multinomial draw) corresponding to the active parts should sum to . Since we invert log ratios by , where is the closure operator, it seems that we want on average.
We’ll draw independently and choose so that above is true on average. If then (cf. this) so if we take the expectation of we get
Backlinks: