Archive

Tags

# A general algorithmic comparison: Generative vs Discriminative model

January 30, 2019

В

WhenВ comes to a general bioinformatics task, there are always a bunch of tools on which we have to evaluate based on their underlying algorithm. Moreover, when we are about to build a model on our own,В we again have to pick from a variety of ML model. How do we know which models may fit better to specific biological context? Can we derive a role of thumb for this model selection?

В

Let's start with a general comparison: Most of ML model can beВ broadly classified as either generative model or discriminative model. We can characterize them based on the following attributes:

В

Intuition:

• Generative model learns the model that generates the observed data. It goes back to learn 'everything' about the distribution that most likely to produce what we have observed.

• Discriminative model directly discriminativeВ the observed data regardless of the underlying generation process. The model learns the boundary instead of distribution.

Probability model:

• Generative model learns joint probability: P(X,Y)В which can be converted in Bayesian fashion: P(X|Y)=P(Y|X)P(X).

• Discriminative model learns conditional probability: P(Y|X=x).

Parametric / Nonparametric:

• Generative model is mostly parametric since it models the underlying distribution.

• Discriminative model is mostly nonparametric. Note that, by definition, model with unfixed amount of parameters is defined as nonparametric.В

Example models:

• Generative / nonparametric: GMM which learns Gaussian distribution and have unfixed amount of parameters (latent parameters increases depending on the sample size)

• Generative / parametric: various Bayes based modelВ

• Discriminative / parametric: GLM, LDA and logistic regression

• Discriminative / nonparametric: KNN, SVM, K-Mean and CART

Resistance to outlier:

• Generative model shows strong resistance to outliers thanks to the soft assignment by probability distribution

• Discriminative model is relatively weak resisting to outliers due to its hard classification boundary.

Training data amount:

• Generative model needs less data thanks to the prior knowledge of distribution.

• Discriminative model need more data.

Bias:

• Generative model has high bias (underfitting) due to the generalization by assumed distribution. This is why regularizations are often required for example in linear regression

• Discriminative model has low bias

Variance:

• Generative model has low variance (as we know, bias and variance are trade-off)

• Discriminative model tends to have high variance (overfitting). That is why bagging / boosting are used to average out the variance.

В

The consensus states that discriminative classifiers, as the more 'direct' method, almost always outperforms generative one.В However,В the best choice eventually depends on which assumption our data fits better. Let's make two examples

В

В

An exampleВ of Generative vs DiscriminativeВ

K-Means and Gaussian mixture model (GMM)В are exemplifiedВ here. Both methods are used in unsupervised clustering. Since the latent variablesВ (membership of each data point) are unknownВ and direct MLE is impossible, theyВ applieВ EM strategy in a very similar fashionВ to approach the optimal solution.

В

K-Means:В As a discriminative & nonparametric method, K-Means iteratively assigns data membership based on WCSS (within-cluster sum of squares)В update centroid to optimal fitting position.

В

GMM:В As generative & nonparametric model,В Gaussian mixture iteratively assigns the data membership based on whatever used for loss function andВ update distribution to optimal fitting position.

В

In terms of constrains, Gaussian mixture of course requires the data to be normally distributed. K-Means on the other hand has a number of pickyВ constrains.

В

Variance of distribution is spherical

Actually K-Means requires the data to be normally distributed as well. The figure below shows a funny clusters produced by K-Means out of non-normally distributed data.

В

В

Equal size and same scalability

Besides, K-Means requires the data to have equal size and same scalability. Otherwise, the boundary tends to bias to the side with larger amount of data / larger scale just like the middle figure below.В

В

В

В

В

In real world, those assumptions required by K-Means are rarely perfectly met. On the other hadn, GMM appears to beВ more robust.

However, Both K-Means and GMMВ have apparent drawback: Both methodsВ apply EM which is heuristic and may yield to local optimum solution. Let's further consider the applicability of deterministic and heuristic method.

В

В

В

An example of Deterministic vs Heuristic

We compare SciClone's heuristic method: Variational Bayes Mixture ModelВ (VB)В and PyClone's deterministic method: MarkovВ Chain Monte CarloВ (MCMC).В VB as a generative & nonparametric & heuristic method applies uses EM in a Bayesian fashion. It, like GMM described earlier,В iteratively optimizes posterior (P(A|B)) distribution based on latent variables

Where P(A) is prior and P(B) is observations

В

В

Beta and Direchlet distribution as prior of VB

Beta or Direchlet are usually used as prior distribution for the following reasons:

• Conjugate prior: For unsupervised clustering, the likelihood is usually represented by binary distribution (binomial, bernoulliВ or negative binomial) or multiclass model (multinomialВ or categorical). The prior must belong to the same distribution family. Hence:В Beta or Direchlet.

• 0-1: Ranging from 0 to 1, best representing the probability of probability

• Convenient update: Update posterior conveniently when new evidence comes in: Beta(О±+a, ОІ+b). The Bayes theorem therefore becomes

В

While VB generatively & heuristically models the distribution that generates the observable data, PyClone applies MCMC whichВ directly samples (non-generatively) from exact distribution (deterministic). Therefore,В PyClone guarantees the global optimal solution.

В

PyClone's MCMCВ is computationally intensive. Therefore, VB is often used as surrogate when dataset is large and MCMC is computationally impractical.В For clonal evolution inference however, sample size is absolutely bearable by MCMC.

В

I listed these two examples to illustrate the fact that different method has its own pros and cons when fitting to specificВ biological context. Given the huge amount of methods available, it is hell of the burden to hit and trial over different statisticalВ methods. By getting a better understanding of general characteristics of broad model class, we can narrow our choice down and then test our data for individual classifier. After all, which model works better depends on the exact constrains our data meet.

В

В

В

В