Title: | Robust Outliers Detection |
Version: | 0.0.0.3 |
Description: | Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) <doi:10.1016/j.jesp.2013.03.013> and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) <doi:10.1016/j.jesp.2017.09.011>. There is also the more known but less robust Mahalanobis distance method, only for comparison purposes. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.1 |
Depends: | R (≥ 2.10) |
BugReports: | https://github.com/mdelacre/Routliers/issues |
Suggests: | knitr, rmarkdown, testthat |
Imports: | MASS, stats, graphics, ggplot2 |
NeedsCompilation: | no |
Packaged: | 2019-05-22 15:31:07 UTC; Administrateur |
Author: | Marie Delacre [aut, cre], Olivier Klein [aut] |
Maintainer: | Marie Delacre <marie.delacre@ulb.ac.be> |
Repository: | CRAN |
Date/Publication: | 2019-05-23 08:30:03 UTC |
Routliers: Robust Outliers Detection
Description
Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) <doi:10.1016/j.jesp.2013.03.013> and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) <doi:10.1016/j.jesp.2017.09.011>. There is also the more known but less robust Mahalanobis distance method, only for comparison purposes.
Author(s)
Maintainer: Marie Delacre marie.delacre@ulb.ac.be
Authors:
Olivier Klein Klein.Olivier@ulb.ac.be
See Also
Useful links:
Report bugs at https://github.com/mdelacre/Routliers/issues
Data collected the day after the terrorist attacks in Brussels (on the morning of 22 March 2016) assessing the Sense of Coherence, anxiety and depression symptoms of 2077 subjects (1056 were in Brussels during the terrorist attacks, and 1021 were not).
Description
The Sense of Coherence was assessed with the SOC-13 (Antonovsky, 1987): 7-point Likert scale (13 items) Anxiety and depression were assessed with the HSCL-25 (Derogatis, Lipman, Rickels, Uhlenhuth & Covi, 1974).Subjects have to mention in a 4-point Likert Scale how much there were bothered or upset by each trouble during the last 14 days (1 = not at all; 2 = a little; quite a few; 4 = a lot).
Usage
data(Attacks)
Format
A data frame with 2077 rows and 46 variables:
- age
age of participants, in years
- presencebxl
were participants present in Brussels during the terrorist attacks; 1 = yes; -1 = no
- genre
participant gender, 1 = female; -1 = male
- soc1
Vous avez le sentiment que vous ne vous souciez pas reellement de ce qui se passe autour de vous: 1 = Tres rarement ou rarement; 7 = Souvent
- soc1r
item1 reversed
- soc2
Vous est-il arrive dans le passe d etre surpris(e) par le comportement de gens que vous pensiez connaitre tres bien ?: 1 = Jamais; 7 = Toujours
- soc2r
item2 reversed
- soc3
Est-il arrive que des gens sur lesquels vous comptiez vous decoivent ?: 1= Jamais; 7 = Toujours
- soc3r
sense of coherence, item3 reversed
- soc4
Jusqu a maintenant, votre vie : 1 = N a eu aucun but ni objectif clair; 7 = A eu des buts et des objectifs tres clairs
- soc5
Avez-vous le sentiment que vous etes traite(e) injustement ?:1 = Tres souvent; 7 = Tres rarement ou jamais
- soc6
Avez-vous le sentiment que vous etes dans une situation inconnue et que vous ne savez pas quoi faire ?: 1 = Tres souvent; 7 = Tres rarement ou jamais
- soc7
Faire les choses que vous faites quotidiennement est : 1 = Une source de plaisir et de satisfaction; 7 = Une source de souffrance profonde et d ennui
- soc7r
item7 reversed
- soc8
Avez-vous des idees ou des sentiments confus(es) ?: 1 = Tres souvent; 7 = Tres rarement ou jamais
- soc9
Vous arrive-t-il d avoir des sentiments intimes que vous prefereriez ne pas avoir ?: 1 = Tres souvent; 7 = Tres rarement ou jamais
- soc10
Beaucoup de gens (meme s’ils ont beaucoup de caractere) se sentent parfois de pauvres cloches. Avez-vous deja eu ce sentiment dans le passe ?: 1 = Jamais; 7 = Tres souvent
- soc10r
item10 reversed
- soc11
Quand quelque chose arrive, vous trouvez generalement que : 1 = Vous surestimez ou sous-estimez son importance; 7 = Vous voyez les choses dans de justes proportions
- soc12
Avez-vous le sentiment que les choses que vous faites dans la vie quotidienne ont peu de sens ?: 1 = Tres souvent; 7 = Tres rarement ou jamais
- soc13
Vous avez le sentiment que vous n etes pas sur(e) de vous maitriser : 1 = Tres souvent; 7 = Tres rarement ou jamais
- hsc1
Mal de tete
- hsc2
Tremblement
- hsc3
Fatigue ou etourdissement
- hsc4
Nervosite, agitation au fond de soi
- hsc5
Peur soudaine sans raison particuliere
- hsc6
Continuellement peureux ou anxieux
- hsc7
Battements du coeur qui s'emballent
- hsc8
Sensation d etre tendu, stresse
- hsc9
Crise d angoisse ou de panique
- hsc10
Tellement agite qu'il en est difficile de rester assis
- hsc11
Manque d energie, tout va plus lentement que d habitude
- hsc12
Se fait facilement des repproches
- hsc13
Pleure facilement
- hsc14
Pense a se tuer
- hsc15
Mauvais appetit
- hsc16
Probleme de sommeil
- hsc17
Sentiment de desespoir en pensant au futur
- hsc18
Decourage, morose
- hsc19
Sentiment de solitude
- hsc20
Perte d interets et d envies sexuelles
- hsc21
Sentiment de s etre fait prendre au piège ou fait prisionnier
- hsc22
Agite ou se tracasse beaucoup
- hsc23
Aucun interet pour quoique ce soit
- hsc24
Sentiment que tout est fatiguant
- hsc25
Sentiment d etre inutile
Details
In french
Study five of Rogers, T. & Milkman, K. L. (2016). Reminders through association. Psychological Science, 27, 973-986.
Description
Participants have to answer to many questions (in a 11-page-survey). For 5 questions (indicated by $$ at the beginning of the question), they are told that there is a correct answer and that they will earn $0.06 if they provide this correct answer. At the beginning of the experiment, there are also told that they will earn a $0.60 bonus if they choose the answer E on the last question (whatever this is the correct answer or not).
Usage
data(Intention)
Format
- age
age
- choice
Did participants choose to have a reminder? (1 = yes; 0 = no). Note that in conditions 2 and 4, participants had no choices and therefore, 0 is coded for all subjects in these two conditions
- Condition
-
Condition 1 = free-reminder-through-association condition: participants read that they can choose to have (for free) an image of an elephant (presented on screen) that would appear at the bottom of page 11 as a reminder of selecting answer E; Condition 2 = non condition: no reminders; Condition 3 = costly-reminder-through-association condition: participants read that if they pay $0.03, an image of an elephant (presented on screen) would appear at the bottom of page 11 as a reminder of selecting answer E Condition 4 = forced-reminder-through-association condition: participants read that an image of an elephant (presented on screen) would appear at the bottom of page 11 as a reminder of selecting answer E.
- correct
Did participants earn $0.60 bonus? (1 = yes; 0 = no)
- dup
No available information
- fee_for_reminder
How much was paid for a reminder? ($0.00 or $0.03)
- filter_.
No available information
- final_problem
Earned money for answering E on the last question: $0.00 (if E was not selected) or $0.60 (if E was selected)
- gender
Gender; 0 = male; 1 = female
- id
participants id
- plus
Earned money at the beginning ( $0.06 for all participants)
- problem1
First question for which participants earn a $0.03 bonus if they provide the correct answer
- problem2
Second question for which participants earn a $0.03 bonus if they provide the correct answer
- problem3
Third question for which participants earn a $0.03 bonus if they provide the correct answer
- problem4
Fourth question for which participants earn a $0.03 bonus if they provide the correct answer
- problem5
Fifth question for which participants earn a $0.03 bonus if they provide the correct answer
- Total_Amount_Earned
Intention$final_problem minus Intention$fee_for reminder; They are 4 possibles outcomes: (1) $-0.03, if a reminder was paid and answer E was not selected on the last question; (2) $0.00, if no reminder was paid and answer E was not selected on the last question; (3) $0.57, if a reminder was paid and answer E was selected on the last question; (4) $0.60, is no reminder was paid and answer E was selected on the last question
- Total_Amount_Earned_if.forced.to.pay.for.cue
equals Intention$Total_Amount_Earned in all but one condition: in condition 1 (free-reminder-through-association condition): Intention$Total_Amount_Earned_if.forced.to.pay.for.cue= Intention$Total_Amount_Earned - 0.03
Replication of Experiments Evaluating Impact of Psychological Distance on Moral Judgment (Eyal, Liberman & Trope, 2008; Gong & Medin, 2012) Study 2
Description
For 6 scenarios, participants have to evaluate the wrongness of actions, with a scale ranging from 1 (not ok) to 5 (completely ok) Contributors: Biljana Jokic Iris Zezelj osf link: https://osf.io/8wqvc/
Usage
data(Morality)
Format
a data frame with 145 rows and 10 columns
- number
participant id
- Orig_rep
Is participant English or Serbian?
- social_distance
Is the person in the scenario someone participants know (i.e. colleague, neighbor) ?
- swing_r
A girl pushing another kid off a swing because she really wants to use it before going home
- flag_r
A woman cutting it up a national flag into small pieces and using it in order to clean her house
- hands_r
A man eating his food with his hands, like most of his family members, also in public, after he washes them
- mother_r
A loving man who promised her dying mother that he would visit her grave every week but didn't keep his promise because he was very busy
- kiss_r
Two cousins kissing each other passionately on the mouth, in secret, because there are in love
- dog_r
Eating our dog that was hitten by a car in front of our house and was killed
- mean_judge_r
average of all scenarios judgment
MAD function to detect outliers
Description
Detecting univariate outliers using the robust median absolute deviation
Usage
outliers_mad(x, b, threshold, na.rm)
Arguments
x |
vector of values from which we want to compute outliers |
b |
constant depending on the assumed distribution underlying the data, that equals 1/Q(0.75). When the normal distribution is assumed, the constant 1.4826 is used (and it makes the MAD and SD of normal distributions comparable). |
threshold |
the number of MAD considered as a threshold to consider a value an outlier |
na.rm |
set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE |
Value
Returns Call, median, MAD, limits of acceptable range of values, number of outliers
Examples
#### Run outliers_mad
x <- runif(150,-100,100)
outliers_mad(x, b = 1.4826,threshold = 3,na.rm = TRUE)
#### Results can be stored in an object.
data(Intention)
res1=outliers_mad(Intention$age)
# Moreover, a list of elements can be extracted from the function,
# such as all the extremely high values,
# That will be sorted in ascending order
#### The function should be performed on dimension rather than on isolated items
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
res=outliers_mad(x = SOC)
mahalanobis function to detect outliers
Description
Detecting multivariate outliers using the Mahalanobis distance
Usage
outliers_mahalanobis(x, alpha, na.rm)
Arguments
x |
matrix of bivariate values from which we want to compute outliers |
alpha |
nominal type I error probability (by default .01) |
na.rm |
set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE |
Value
Returns Call, Max distance, number of outliers
Examples
#### Run outliers_mahalanobis
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6","soc7r",
"soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mahalanobis(x = cbind(SOC,HSC), na.rm = TRUE)
# A list of elements can be extracted from the function,
# such as the position of outliers in the dataset
# and the coordinates of outliers
res$outliers_pos
res$outliers_val
MCD function to detect outliers
Description
Detecting multivariate outliers using the Minimum Covariance Determinant approach
Usage
outliers_mcd(x, h, alpha, na.rm)
Arguments
x |
matrix of bivariate values from which we want to compute outliers |
h |
proportion of dataset to use in order to compute sample means and covariances |
alpha |
nominal type I error probability (by default .01) |
na.rm |
set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE |
Value
Returns Call, Max distance, number of outliers
Examples
#### Run outliers_mcd
# The default is to use 75% of the datasets in order to compute sample means and covariances
# This proportion equals 1-breakdown points (i.e. h = .75 <--> breakdown points = .25)
# This breakdown points is encouraged by Leys et al. (2018)
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6","soc7r",
"soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mcd(x = cbind(SOC,HSC), h = .75)
res
# Moreover, a list of elements can be extracted from the function,
# such as the position of outliers in the dataset
# and the coordinates of outliers
res$outliers_pos
res$outliers_val
Plotting function for the mad
Description
plotting data and highlighting univariate outliers detected with the outliers_mad function
Usage
plot_outliers_mad(res, x, pos_display = FALSE)
Arguments
res |
result of the outliers_mad function from which we want to create a plot |
x |
data from which the outliers_mad function was performed |
pos_display |
set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE) |
Value
None
Examples
#### Run outliers_mad and perform plot_outliers_mad on the result
data(Intention)
res=outliers_mad(Intention$age)
plot_outliers_mad(res,x=Intention$age)
### when the number of outliers is small, one can display the outliers position in the dataset
x=c(rnorm(10),3)
res2=outliers_mad(x)
plot_outliers_mad(res2,x,pos_display=TRUE)
Plotting function for the Mahalanobis distance approach
Description
plotting data and highlighting multivariate outliers detected with the mahalanobis distance approach
Usage
plot_outliers_mahalanobis(res, x, pos_display = FALSE)
Arguments
res |
result of the outliers_mad function from which we want to create a plot |
x |
matrix of multivariate values from which we want to compute outliers. Last column of the matrix is considered as the DV in the regression line. |
pos_display |
set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE) |
Details
plotting data and highlighting multivariate outliers detected with the MCD function Additionnally, the plot return two regression lines: the first one including all data and the second one including all observations but the detected outliers. It allows to observe how much the outliers influence of outliers on the regression line.
Value
None
Examples
#### Run plot_outliers_mahalanobis
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mahalanobis(x = cbind(SOC,HSC))
plot_outliers_mahalanobis(res, x = cbind(SOC,HSC))
# it's also possible to display the position of the multivariate outliers ion the graph
# preferably, when the number of multivariate outliers is not too high
c1 <- c(1,4,3,6,5,2,1,3,2,4,7,3,6,3,4,6)
c2 <- c(1,3,4,6,5,7,1,4,3,7,50,8,8,15,10,6)
res2 <- outliers_mahalanobis(x = cbind(c1,c2))
plot_outliers_mahalanobis(res2, x = cbind(c1,c2),pos_display = TRUE)
# When no outliers are detected, only one regression line is displayed
c3 <- c(1,4,3,6,5)
c4 <- c(1,3,4,6,5)
res3 <- outliers_mahalanobis(x = cbind(c3,c4))
plot_outliers_mahalanobis(res3,x = cbind(c3,c4))
Plotting function for the MCD
Description
plotting data and highlighting multivariate outliers detected with the MCD function Additionnally, the plot return two regression lines: the first one including all data and the second one including all observations but the detected outliers. It allows to observe how much the outliers influence of outliers on the regression line.
Usage
plot_outliers_mcd(res, x, pos_display = FALSE)
Arguments
res |
result of the outliers_mad function from which we want to create a plot |
x |
matrix of multivariate values from which we want to compute outliers. Last column of the matrix is considered as the DV in the regression line. |
pos_display |
set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE) |
Value
None
Examples
#### Run plot_outliers_mcd
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mcd(x = cbind(SOC,HSC),na.rm=TRUE,h=.75)
plot_outliers_mcd(res,x = cbind(SOC,HSC))
# it's also possible to display the position of the multivariate outliers ion the graph
# preferably, when the number of multivariate outliers is not too high
c1 <- c(1,4,3,6,5,2,1,3,2,4,7,3,6,3,4,6)
c2 <- c(1,3,4,6,5,7,1,4,3,7,50,8,8,15,10,6)
res2 <- outliers_mcd(x = cbind(c1,c2),na.rm=TRUE)
plot_outliers_mcd(res2, x=cbind(c1,c2),pos_display=TRUE)
# When no outliers are detected, only one regression line is displayed
c3 <- c(1,2,3,1,4,3,5,5)
c4 <- c(1,2,3,1,5,3,5,5)
res3 <- outliers_mcd(x = cbind(c3,c4),na.rm=TRUE)
plot_outliers_mcd(res3,x=cbind(c3,c4),pos_display=TRUE)