The *posterior* distribution is a distribution over alone, once we have conditioned on the observed data . And is also a distribution over . So after fixing the value of , it does make sense to compare these two distributions over , and ask about the divergence between them.

The *joint* distribution on the other hand is a bivariate distribution over all the possible values of and together.

]]> – Calculate the integral analytically,

– Use Monte Carlo to sample from and use empirical averages of the sample to approximate any expectation we want,

– Or use variational methods to optimize an approximation .

I have some minor questions regarding the approximation of the posterior p(y|z).

How do we approximate a conditional probability p(y|z) which should in fact be some function of (y, z) by a single variable function q(z)? I am somewhat confused about this part. Any comments? Thanks! ]]>