R/atypical_values.R
atypical_values.Rd
Atypilac values analyses columns with character data and check if they can be different data type: if it is int vector transformed to string - "integer", vector containing 'false' and 'true' or 'yes' and 'no' with different abbreviations and capitalizations that may indicate that vector could be transform to boolean - "boolean", and if numeric values were written with coma instead of dot. Results are presented in a list with performed analyses that contain column names.
atypical_values(df, variables, analyses)
- A data frame
- A char vector containing names of columns in a data frame, for which the analysis will be performed
- A char vector containing names of analyses which will be performed on data frame
List with boolean vectors for types int or numeric and numeric vector with 1 indicating 'true' and 'false', 2 for 'yes' and 'no'.
yes_no <- c("yES", "n",'y',"No",'yes',"nO")
true_false <- c('f','t','TrUe','FaLsE')
df <- data.frame(
'a' = gsub('\\.',',',as.character(rnorm(10))),
'b' = c(gsub('\\.',',',as.character(rnorm(5))),rnorm(5)),
'c' = as.character(c(TRUE,FALSE), replace=TRUE, size=10),
'd' = sample(yes_no, 10, replace=TRUE),
'e' = sample(true_false,10, replace=TRUE),
'f' = c(sample(yes_no,5, replace=TRUE),sample(true_false,5, replace=TRUE)),
'g' = as.character(sample(1:100,10))
)
atypical_values(df,analyses = c("integer","boolean"))
#> $integer
#> a b c d e f g
#> FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#>
#> $boolean
#> a b c d e f g
#> 0 0 1 2 1 0 0
#>
atypical_values(df, variables = c('a','d','e'))
#> $integer
#> a d e
#> FALSE FALSE FALSE
#>
#> $boolean
#> a d e
#> 0 2 1
#>
#> $numeric
#> a d e
#> TRUE FALSE FALSE
#>