Atypilac values analyses columns with character data and check if they can be different data type: if it is int vector transformed to string - "integer", vector containing 'false' and 'true' or 'yes' and 'no' with different abbreviations and capitalizations that may indicate that vector could be transform to boolean - "boolean", and if numeric values were written with coma instead of dot. Results are presented in a list with performed analyses that contain column names.

atypical_values(df, variables, analyses)

Arguments

df

- A data frame

variables

- A char vector containing names of columns in a data frame, for which the analysis will be performed

analyses

- A char vector containing names of analyses which will be performed on data frame

Value

List with boolean vectors for types int or numeric and numeric vector with 1 indicating 'true' and 'false', 2 for 'yes' and 'no'.

Examples

yes_no <- c("yES", "n",'y',"No",'yes',"nO")
true_false <- c('f','t','TrUe','FaLsE')

df <- data.frame(
 'a' = gsub('\\.',',',as.character(rnorm(10))),
 'b' = c(gsub('\\.',',',as.character(rnorm(5))),rnorm(5)),
 'c' = as.character(c(TRUE,FALSE), replace=TRUE, size=10),
 'd' = sample(yes_no, 10, replace=TRUE),
 'e' = sample(true_false,10, replace=TRUE),
 'f' = c(sample(yes_no,5, replace=TRUE),sample(true_false,5, replace=TRUE)),
 'g' = as.character(sample(1:100,10))
)
atypical_values(df,analyses = c("integer","boolean"))
#> $integer
#>     a     b     c     d     e     f     g 
#> FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 
#> 
#> $boolean
#> a b c d e f g 
#> 0 0 1 2 1 0 0 
#> 
atypical_values(df, variables = c('a','d','e'))
#> $integer
#>     a     d     e 
#> FALSE FALSE FALSE 
#> 
#> $boolean
#> a d e 
#> 0 2 1 
#> 
#> $numeric
#>     a     d     e 
#>  TRUE FALSE FALSE 
#>