0. Notation sets

\(X_{1:D}\), where we introduce the Matlab-like notation 1 : D to denote the set {1, 2, . . . , D}.

1. Prob & Stats Terminologies

1.1 Normal

2. Matrix Terminologies

3. Opt Terms.

4. VI terms

5. Math Facts


Here are some equalities that you need to put in mind:


\(|a-b| = a(1-2b)+b\) for \(a,b \in\) {\( 0,1\)}


6. Inequalities

Here are some inequalities that you need to put in mind for your research convenience:


\(\sum_{t=1}^T\frac{1}{t} \leq log(T)+1\)

Anonymous 2

\(1-x \leq \exp^{-x}, \forall x \geq 0\)

Cauchy-Schwartz inequality


Jensen’s inequlity

If \(f\) is a real continuous function that is convex, and \(x\) is a random variable, then $f(\mathbb{E} x) \leq \mathbb{E}f(x)$. A more detailed explanation can be found [here].

Hoeffding’s lemma

For a zero-mean random variable \(U\) bounded almost surely as \(a \leq U \leq b\), then $\mathbb{E} exp(\lambda \, U) \leq exp{\frac{\lambda^2(b-a)^2}{8}}$

Markov’s inequality

If \(U\) is a non-negative random variable on \(\mathbb{R}\), then for all \(t>0\)

$Pr(U>t) \leq \frac{1}{t} \mathbb{E}[U]$


where both inequalities use the fact that \(U\) is non-negative.

Chebyshev’s inequality

If \(Z\) is a random variable on \(\mathbb{R}\) with mean \(\mu\) and variance \(\sigma^2\), then

$Pr(|Z- \mu| \geq \sigma t)\leq \frac{1}{t^2}$


Hint: by Markov’s inequality

Chernoff’s Bounding method

Let \(Z\) be a random variable on \(\mathbb{R}\). Then for all \(t>0\)

$Pr(Z\geq t) \leq inf_{s>0} e^{-st}M_z(s)$

where \(M_z\) is the moment-generating function of \(Z\).


For any \(s>0\) we can use Markov’s inequality to obtain:

$Pr(Z \geq t) = Pr(sZ \geq st) = Pr(e^{sZ} \geq e^{st}) \leq e^{-st}\mathbb{E}[e^{sZ}] = e^{-sZ}M_z(s)$

Since \(s>0\) was arbitrary, this proof follows.


Conditional independent

Two events A and B are conditionally independent give C if Pr(A,B|C) = Pr(A|C) Pr(B|C). Then:

The proof can be found in (Hoff 2009) Section 2.3.

Dirac measure

(Murphy, 2012) Equation 2.41.

Covariance and correlation

If A and B such that Cov(A, B)=0, then A and B are uncorrelated.

Not vice versa. Uncorrelated does not mean independent.

See (Murphy, 2012) Section 2.5.1.