3

I'm trying to translate the following syntax found in tidyverse into base R as a function, though I'm having difficulties following the same output.

Here's the syntax:

x <- function(x) {x %>% 
    select(where(negate(is.numeric))) %>% 
    map_dfc(~ model.matrix(~ .x -1) %>% 
              as_tibble) %>% 
    rename_all(~ str_remove(., "\\.x")) 
}

I understand that select can be represented as indexing within a dataframe such as x[,]. As for the pipe function %>%, I can just index a function within a variable i.e. x <- ...

I can manage to transfer select(where(negate(is.numeric)))

into:

x <- function(x){
  x[, !sapply(x, is.numeric)]
  
}

Though, this makes it difficult, as I'm thinking it can be replaced with a conditional argument:

 map_dfc(~ model.matrix(~ .x -1)

Here's the expected output with some example data:

# A tibble: 12 x 5
   black brown white female  male
   <dbl> <dbl> <dbl>  <dbl> <dbl>
 1     1     0     0      1     0
 2     1     0     0      1     0
 3     1     0     0      1     0
 4     1     0     0      1     0
 5     0     0     1      1     0
 6     0     0     1      1     0
 7     0     0     1      0     1
 8     0     0     1      0     1
 9     0     1     0      0     1
10     0     1     0      0     1
11     0     1     0      0     1
12     0     1     0      0     1

reproducible code:

structure(list(wgt = c(64L, 71L, 53L, 67L, 55L, 58L, 77L, 57L, 
56L, 51L, 76L, 68L), hgt = c(57L, 59L, 49L, 62L, 51L, 50L, 55L, 
48L, 42L, 42L, 61L, 57L), age = c(8L, 10L, 6L, 11L, 8L, 7L, 10L, 
9L, 10L, 6L, 12L, 9L), id = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 2L), .Label = c("black", "brown", "white"
), class = "factor"), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))
5

Calling your input data xx,

onehot = function(data) {
  x = Filter(Negate(is.numeric), data)
  x = as.data.frame(Reduce(cbind, lapply(x, function(col) model.matrix(~ . - 1, data = data.frame(col)))))
  setNames(x, sub(pattern = "^col", replacement = "", names(x)))
}

onehot(xx)
#    black brown white female male
# 1      1     0     0      1    0
# 2      1     0     0      1    0
# 3      1     0     0      1    0
# 4      1     0     0      1    0
# 5      0     0     1      1    0
# 6      0     0     1      1    0
# 7      0     0     1      0    1
# 8      0     0     1      0    1
# 9      0     1     0      0    1
# 10     0     1     0      0    1
# 11     0     1     0      0    1
# 12     0     1     0      0    1

There are other packages that do one-hot encoding like this, see here for some examples, but the above is all base.

0
6

1) If input is the input data frame, define a model matrix function mm and lapply it to the non-numeric columns and put them together into a single data frame. Finally fix up the names.

mm <- function(x) model.matrix(~ x - 1)
result <- do.call("data.frame", lapply(Filter(Negate(is.numeric), input), mm))
names(result) <- sub(".*\\.x", "", names(result))
result

giving:

   black brown white female male
1      1     0     0      1    0
2      1     0     0      1    0
3      1     0     0      1    0
4      1     0     0      1    0
5      0     0     1      1    0
6      0     0     1      1    0
7      0     0     1      0    1
8      0     0     1      0    1
9      0     1     0      0    1
10     0     1     0      0    1
11     0     1     0      0    1
12     0     1     0      0    1

2) To make it similar to the tidyverse version we can use the Bizarro pipe which does not require any packages.

input ->.;
  Filter(Negate(is.numeric), .) ->.;
  lapply(., function(x) model.matrix(~ x - 1)) ->.;
  do.call("data.frame", .) ->.;
  setNames(., sub(".*\\.x", "", names(.))) -> result
result

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.