Bayesian Networks
These are exam preparation notes, subpar in quality and certainly not of divine quality.
See the index with all articles in this series
Bayes Rule
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]Inference
Bayesian Interference is about inferring probabilities from prior probabilities.
Given a set of prior events, a bayesian network estimates the probability for another event.
Justification of using heuristics:
In real-world scenarios you never know the true probability of events. To somehow try to estimate \(P\) it makes sense to look at historic events and infer its statistical probability into the future.
To get these probabilities, a set of of so-called atomic events. Atomic events a value for each random variable and are mutually exclusive.
The problem with bayesian networks is that a fully connected network has an upper bound of connections of \(2^N\). This makes it impractical to precalculate all inferred probabilties.
This is why we need further prior knowledge. The additional prior knowledge in this case is the causality between events. If the causality between events is known or can be ruled out, this affects the interconnections and drastically reduces the number of edges.
From unconditional to conditional
\[P(x|\underline e) = \frac{P(x,\underline e)}{P(\underline e)} = \alpha P(x, \underline e) =\alpha \sum_\underline y P(x,\underline e, \underline y)\]\(\alpha\) can be found using:
\[\frac{1}{\alpha}=P(\underline e)=\sum_{x,\underline y} P(x,\underline e, \underline y)\]Summing up over x and y is posisble because they both sum to one, leaving only \(P(\underline e)\)
Conditional independence
X and Y are conditionally independent given Z if
\[P(X,Y|Z) = P(X|Z)P(Y|Z)\]alternative notation
\[X тлл Y | Z\]Markov Blanket
If the Markov Blanket of a node A is given, A is conditionally independent.
Members of the Markov blanket are:
- parents,
- children,
- parents of the children
Topological Sorting
Sort a graph in an order, so that no node comes prior to a node pointing to it.
Message Passing
In bipartite trees there are three phases of message passing
- request
- collect
- distribute
Unaffected nodes are marginialized out. If at a junction, multiply with both edges.
Note: The reason the bipartite tree is undirected is because bayesian probabilities are reversible. Also the nodes \(f_i\) are already aware of the direction.
Junction Trees
A junction tree (aka clique tree, join tree) is a decomposed graph that is now a tree with special properties. This makes solving certain problems, especially interference easier.
Not all acyclic graphs map to trees. Junction trees only exist iff graph is decomposable.
A separator separates two sets of nodes A and B if every path from A to B has to path through the separator.
A proper decomposition A,B,C if C is separator to A and B, C is complete.
A set of nodes is complete if all nodes are fully interconnected.
A clique is a maximally complete subgraph.
[A moral graph is the] counterpart of a directed acyclic graph is formed by adding edges between all pairs of nodes that have a common child, and then making all edges in the graph undirected.
Composing the tree
- construct DAG
- convert to moral graph
- construct a chordal graph
- identify cliques
- construct bipartite graph
- create junction tree
- Do message passing