New Jersey, USA

Follow

©2017 by DNA Ghost. Proudly created with Wix.com

Archive

Tags

# Survival analysis and logistic regression

June 4, 2019

Survival analysis and logistic regression share many similarities. The use of logistic regression for survival analysis, although not strictly correct due to data censoring, provides intuitive understanding.

Logistic regression

Let's recall how logistic regression is done. It takes the form of logit function. The logarithm here makes the odds symmetrical to 1 and normally distributed.

The coefficient is estimated using MLE which maximizes the log likelihood of all data point while rotating coefficient of logit function. Below is how probability - logit function converted.

The coefficient of the logit function represents the likelihood changes by one unit changes of variable. The larger the coefficient, the more the variable affects the outcome.

Survival analysis

The main goal of survival analysis is to see if the candidate variable affects patient survival. The hypothesis test is performed on coefficient which, similar to logistic regression, reflects the hazard ratio of the given variable.

KM-estimator

KM-estimator and Cox model are usually used for survival analysis. In KM-estimator, Logrank test determines the significance of variable's influence on survival. Here the Logrank is used instead of t-test or Wilcoxon rank sum test because data is censored and parametric assumption is not guaranteed. However, KM-estimator has the following defects:

• KM estimator and Logrank are non-parametric

• KM only works on single / categorical strata. When data is forced divided to too many strata, you may loss power.

• Logrank test in KM estimator only tell the weight of evidence that strata are different in their risk, not the magnitude of difference (hazard ratio)

Cox model

Cox model on other hand simultaneously tests multiple variables on the following assumptions:

• Independent observation (individual)

• Independent censoring

• proportional hazard

Wald test, likelihood test and Score test are used to test the null hypothesis that two models (regressions) are nested and are not different model. In another word, the null hypothesis is that all of the coefficients are 0. Intuitively, these tests tell how well the combination of given variables predicts patients' survival.

They are all follows chi-squared distribution and asymptotically equivalent. When sample size is small, likelihood test is preferred.

Wald test compares squared difference between the estimated and hypothesized parameter values

Which, as the squared z-statistics under normal distribution, follows chi-squared distribution. Since SE of coefficient is estimated by MLE in the form of square roots of the diagonal elements of the variance-covariance matrix, the z-value (df = n) should be used instead of t value.

In univariate cox model, the Wald test in the variate equal to the Wald test of overall model.

Score test compares the slope of estimated and hypothesized parameter values in the log likelihood distribution. Simplified as:

Where the numerator is the slope of log likelihood of estimated parameter value a.k.a. the coefficient.

In the univariate Cox model, the Score test is equal to the Logrank test.

Likelihood test compares the log likelihood of estimated and hypothesized parameter value. This simplified version is:

Note that these tests should be distinguished from hypothesis test done on each variable. The later test determines, while fixing other variables, whether the given variable affects survival. When cox model is tested on single variable, the Wald test on variable and on overall model is the same (df=1).