Writing conditional filter statements in dplyr

Somehow only recently did I realise that you can use if statements directly within R’s dplyr library filter function. This lets you create conditional filter criteria that can filter on different variables based on some other condition external to the function call.

For instance you can change what you filter for by referencing another unrelated variable in your code. There are various ways you can get the same effect without using this technique which I have happily used in the past, so it doesn’t open up exciting new opportunities as such. But it does feel more concise and readable than what I previously did.

For anyone else who also somehow overlooked that this would work in a way that now feels retrospectively obvious: imagine you have a variable called my_filter.

Now imagine you wanted to filter the famous built-in mtcars dataset such that if filter was set to low then you are returned only the cars with 4 cylinders, if set to medium then only those with 6 cylinders and finally if set to high (or anything else) then only those with 8 cylinders. This will do that:

library(dplyr)

my_filter <- "low" # try "medium" or anything else to see the outcomes change

filter(
  mtcars,
  if (my_filter == "low") {
    cyl == 4
  } else if  (my_filter == "medium") {
    cyl == 6
  }
  else {
    cyl == 8
  }
)

I’ve picked a really simple example to illustrate the above. If you’re always conditioning on the same variable then you can also just conditionalise the outcome. For instance you can rewrite the last statement above as:

filter(
  mtcars,
  cyl == 
    if (my_filter == "low") { 4 }
    else if (my_filter == "medium") { 6 }
    else { 8 }
)

Which is more readable in this instance.

But the advantage of the first method is that you can concisely filter on different variables conditionally. This for example shows the cars with at least 150 horsepower if the variable in question reads “hp” but the cars with 8 cylinders if the variable reads “cyl”.

filter(
  mtcars,
  if (my_filter == "hp") {
    hp >= 200
  } else if  (my_filter == "cyl") {
    cyl == 8
  }
)

You can use the fact that a variable always equals itself if you want to create logic such as “if my_filter is set to low then return only cars with 4 cylinders otherwise don’t apply any filter at all”. Here’s an example of that:

filter(
  mtcars,
  if (my_filter == "low") {
    cyl == 4
  } else {
    cyl == cyl
  }
)

Inline if_else statements don’t work quite the same way, so this for instance will not work:

filter(
  mtcars,
  if_else(my_filter == "low",  cyl == 4,  cyl == 6)
)

But case_when style statements are fine. This works:

filter(
  mtcars,
  case_when(my_filter == "low" ~  cyl == 4,  
            my_filter == "medium" ~ cyl == 6,
            my_filter == "high" ~ cyl == 8,
            .default = cyl == cyl)
)

Leave a comment