redundant_cols`()` suggests redundant columns and deletes them if requested.

redundant_cols(
  df,
  variables = colnames(df),
  correlated = FALSE,
  corr_treshold = NULL,
  delete = FALSE
)

Arguments

df

Data frame.

variables

Vector of names of variables to consider. All of variables are taken as default.

correlated

If `TRUE`, function treats highly correlated variables as redundant columns and also deletes them if `delete` is `TRUE`. `FALSE` as default.

corr_treshold

Number between 0 and 1 that defines high correlation. `0.9` as default, meaning columns with correlation above `0.9` will be treated as redundant.

delete

If `TRUE`, function returns data frame with deleted redundant columns. It also includes columns skipped in `variables`. `FALSE` as default.

Value

If `delete` = `FALSE`, vector of names of redundant columns. Otherwise data frame with deleted redundant columns. In case there are not any redundant columns, `character(0)` is returned.

Examples

library("toRpEDA")

# finding index and static columns
df <- iris
df$index <- 1:NROW(df)
df$static <- 5
redundant_cols(df)
#> [1] "index"  "static"

# deleting redundant columns
df <- redundant_cols(df, delete = TRUE)

# finding highly correlated columns
df <- mtcars
df$variable <- df$hp * 5
redundant_cols(df, correlated = TRUE)
#> [1] "cyl" "hp" 
redundant_cols(df, corr_treshold = 0.8)
#> [1] "mpg"  "cyl"  "disp" "hp"  

# finding duplicated columns
df <- iris
df$something <- df$Sepal.Length
redundant_cols(df)
#> [1] "something"