The survBootOutliers package provides two novel outlier detection methods for survival data.
Both methods propose to perform outlierdetection in a
multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness
of fit. The first method is a single-step procedure that presents a delete-1 statistic based on bootstrap
hypothesis, testing for the increase in the concordance c-index. The second method is based on a sequential
procedure that maximizes the c-index of the model using a a greedy one-step-ahead search. Finally, we use
both methods to perform robust estimation for the Cox regression, removing from the regression a fraction of
the data by their measure of outlyingness. Our preliminary results on three different datasets have shown to
improve the estimation of the Cox Regression coefficients and also the model predictive ability.
This package provides three new outlier detection methods to perform outlier detection in a survival analysis context. The first method OSD, for One Step Deletion, is a sequential procedure that maximizes the c-index of a fitted Cox regression using a a greedy one-step-ahead search, in each step the observation that when removed maximizes the concordance increase is permanently deleted from the dataset, the algorithm ends until k observations are removed, these are considered the most outlying ones. The second and third methods are based on bootstrap methods. The second method BHT, for Bootstrap Hypothesis Testing, is based on creating B bootstrap samples for each observation that is removed from the dataset, then an hypothesis test is made for the B concordance variations to be larger than zero, the observations with the lowest p-values are considered the most outlying. The last method DBHT, for Dual Bootstrap Hypothesis Testing, draws 2B bootstrap samples for each observation, B samples with each observation absent, just like with BHT, the other B bootstrap samples are drawn with the observation under test being deliberately inserted in each of the bootstrap samples. The hypothesis test is different, the two histograms are tested for inequality, for non-outlying observations the histograms are expected to be similar but for outlying observtions the histograms drawn when the observation is absent is expected to have higher concordance on average.
\title{survBootOutliers: An R package for outlier detection in survival analysis}
The package still provides three other methods considered more traditional based on Martingale-based residuals, Deviancre resiudals and Cox likelihood displacement.
These methods are based on the Master Thesis at Instituto Superior Técnico, named "Outlier detection in survival analysis" evaluated in May 2015. The link for the full text is left here for more detail: \href{https://fenix.tecnico.ulisboa.pt/downloadFile/844820067124612/dissertacao.pdf}.
\section{Example data}
The well-known Worcester Heart Attack Study data is given as example and provided within the package;
The survBootOutliers package provides two novel outlier detection methods for survival data.
Both methods propose to perform outlierdetection in a
multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness
of fit. The first method is a single-step procedure that presents a delete-1 statistic based on bootstrap
hypothesis, testing for the increase in the concordance c-index. The second method is based on a sequential
procedure that maximizes the c-index of the model using a a greedy one-step-ahead search. Finally, we use
both methods to perform robust estimation for the Cox regression, removing from the regression a fraction of
the data by their measure of outlyingness. Our preliminary results on three different datasets have shown to
improve the estimation of the Cox Regression coefficients and also the model predictive ability.
This package provides three new outlier detection methods to perform outlier detection in a survival analysis context. The first method OSD, for One Step Deletion, is a sequential procedure that maximizes the c-index of a fitted Cox regression using a a greedy one-step-ahead search, in each step the observation that when removed maximizes the concordance increase is permanently deleted from the dataset, the algorithm ends until k observations are removed, these are considered the most outlying ones. The second and third methods are based on bootstrap methods. The second method BHT, for Bootstrap Hypothesis Testing, is based on creating B bootstrap samples for each observation that is removed from the dataset, then an hypothesis test is made for the B concordance variations to be larger than zero, the observations with the lowest p-values are considered the most outlying. The last method DBHT, for Dual Bootstrap Hypothesis Testing, draws 2B bootstrap samples for each observation, B samples with each observation absent, just like with BHT, the other B bootstrap samples are drawn with the observation under test being deliberately inserted in each of the bootstrap samples. The hypothesis test is different, the two histograms are tested for inequality, for non-outlying observations the histograms are expected to be similar but for outlying observtions the histograms drawn when the observation is absent is expected to have higher concordance on average.
The package still provides three other methods considered more traditional based on Martingale-based residuals, Deviancre resiudals and Cox likelihood displacement.
These methods are based on the Master Thesis at Instituto Superior T<U+00E9>cnico, named "Outlier detection in survival analysis" evaluated in May 2015. The link for the full text is left here for more detail: \href{https://fenix.tecnico.ulisboa.pt/downloadFile/844820067124612/dissertacao.pdf}.
\section{Example data}
The well-known Worcester Heart Attack Study data is given as example and provided within the package;
\begin{Schunk}
\begin{Sinput}
> library(survBootOutliers)
...
...
@@ -45,10 +46,271 @@ The well-known Worcester Heart Attack Study data is given as example and provide