`%in%` and `%notin%` operators in R

Renesh Bedre 6 minute read

%in% operator in R

Page content

Introduction
%in% to check the value in a vector
%in% to check the value in a Data Frames
%in% and %notin% to filter (subset) Data Frames based on multiple values (with dplyr)
%in% and %notin% to remove columns from Data Frames
%in% to select columns from Data Frames
%in% to compare two Data Frames
Comparison of %in% and == operators
Create %notin% operator (opposite to %in%)

Introduction

%in% is a built-in infix operator, which is similar to value matching function match. %in% is an infix version of match.
%in% returns logical vector (TRUE or FALSE but never NA) if there is a match or not for its left operand. Output logical vector has the same length as left operand.
If there are two vector x and y then the syntax of %in%: x %in% y
%in% works only with vectors
%notin% is not a built-in operator and can be created by negating the %in% operator (see below)
Help syntax for %in% operator: ?"%in%"

Here are examples of how to use %in% to manipulate vectors and Data Frames in R,

`%in%` to check the value in a vector

%in% helpful to check any value in a vector and returns TRUE or FALSE

x <- c(1,5,10,20,20,24,45)

# check any number in x vector
20 %in% x
[1] TRUE

# check vector in vector
# 
y <- c(5,45)
y %in% x
[1] TRUE TRUE

# check sequence of numbers in other sequence (find any overlapping numbers)
1:5 %in% 4:7
[1] FALSE FALSE FALSE  TRUE  TRUE

# find common numbers (intersection)
x[x %in% y]  # similar to intersect(x,y)
[1]  5 45

# check characters presents in other sequence of characters (find any overlapping characters)
LETTERS[1:5] %in% LETTERS[4:7]
[1] FALSE FALSE FALSE  TRUE  TRUE

Run the code in colab

If you have big vector (say vector with 1000 values), you can use any, all, or which functions with %in% operator

x <- 1:1000
y <- 900:2000

# check if there is any common values between a and b vectors
any(x %in% y)
[1] TRUE

# check if there are all values common between a and b vectors
all(x %in% y)
[1] FALSE

# get indexes of common values
a <- 1:10
b <- 6:200

which(a %in% b)
[1]  6  7  8  9 10

Run the code in colab

`%in%` to check the value in a Data Frames

Create a Data Frame

df <- data.frame(col1 = c("A", "B", "C"),
  col2 = c(1, 2, 3),
  col3 = c(0.1, 0.2, 0.3))
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2
3    C    3  0.3

Check if any value present in Data Frame columns

'B' %in% df$col1
[1] TRUE

# to check if any col1 value is B
df$col1  %in% 'B'
[1] FALSE  TRUE FALSE

Run the code in colab

Check vector in a Data Frame and update Data Frame values,

# check vector values in a Data Frame
lapply(df, `%in%`, c(1, 4, 0.1))
# output
$col1
[1] FALSE FALSE FALSE

$col2
[1]  TRUE FALSE FALSE

$col3
[1]  TRUE FALSE FALSE

# find and replace with 0
df[sapply(df, `%in%`, c(1, 4, 0.1))] <- 0
df
# output
  col1 col2 col3
1    A    0  0.0
2    B    2  0.2
3    C    3  0.3

Run the code in colab

`%in%` and `%notin%` to filter (subset) Data Frames based on multiple values (with dplyr)

Pandas rows selection

Filter (subset) Data Frame where multiple values match to col1,

library(dplyr)
df  %>% filter(col1 %in% c('A', 'B'))  # same as df[df$col1 %in% c('A', 'B'),]
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2

Filter (subset) Data Frame where multiple values does not match to col1 using %notin%,

#  You need to first create %notin% operator (see below at the end of this article)
library(dplyr)
df  %>% filter(col1 %notin% 'C')
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2

`%in%` and `%notin%` to remove column from Data Frames

Remove column in R

%in% and %notin% can be used to remove single or multiple columns from Data Frames

# remove col2
df[ , !(names(df) %in% "col2")]
# output
  col1 col3
1    A  0.1
2    B  0.2
3    C  0.3

#  %notin% operator to remove columns. 
#  You need to first create %notin% operator (see below at the end of this article)
df[ , (names(df) %notin% "col2")]
# output
  col1 col3
1    A  0.1
2    B  0.2
3    C  0.3

# to remove multiple column use column names vector such as c("col2", "col3")

`%in%` to select columns from Data Frames

%in% can be used to select single or multiple columns from Data Frames

# select single column
df[ ,(names(df) %in% "col3"), drop=FALSE]
# output
  col3
1  0.1
2  0.2
3  0.3

# select multiple columns
df[ ,(names(df) %in% c("col1", "col2"))]
# output
  col1 col2
1    A    1
2    B    2
3    C    3

`%in%` to compare two Data Frames

%in% can be used to compare two Data Frames and subset Data Frames based on the column value match. This is more similar like left join query i.e. select all records from one Data Frame where column values match to another Data Frame

Create another Data Frame,

df2 <- data.frame(col1 = c("A", "B", "D", "E"),
  col4 = c(100, 200, 300, 400),
  col5 = c("a", "b", "c", "d"))

df2
# output
  col1 col4 col5
1    A  100    a
2    B  200    b
3    D  300    c
4    E  400    d

Now compare df and df2 to get all records from df2 where col1 values match to col1 values in df (similar to left join of tables),

subset(df2, df2$col1 %in% df$col1)
# output
  col1 col4 col5
1    A  100    a
2    B  200    b

Comparison of `%in%` and `==` operators

== operator compares the value between two vectors element-wise (the first value of one vector compared with the first value of another vector), whereas %in% compares the value between two vectors one by all (the first value of the first vector compared with all values of the second vector)
With == operator, the length of the left and right operands must be the same. It is not necessary to have the same length for left and right operands for %in% operator.

x <- c(1, 2, 3)
y <- c(3, 2, 1)

x %in% y
[1] TRUE TRUE TRUE

x == y
[1] FALSE  TRUE FALSE

# compare and get indexes of two vectors
a <- c(1, 2, 9, 2)
b <- c(1, 2, 3, 4, 5)

# == operator only found first two indexes
which(a == b)
[1] 1 2
Warning message:
In a == b : longer object length is not a multiple of shorter object length

# %in% operator found all matched value indexes
which(a %in% b)
[1] 1 2 4

Run the code in colab

Create `%notin%` operator (opposite to `%in%`)

%notin% operator is not built-in and can be created by applying Negate function to %in%.

%notin% is opposite to %in% operator

You can also use %notin% as by putting ! in front of the %in% expression (!%in%)

`%notin%` <- Negate(`%in%`)

Check the value in a vector using %notin%

x <- c(1,5,10,20,20,24,45)

# check any number in x vector
50 %notin%  x # same as !(50 %in%  x) 
[1] TRUE

Update values of Data Frame to NA where values does not match,

# create data frame
df <- data.frame(col1 = c("A", "B", "C"),
  col2 = c(1, 2, 3),
  col3 = c(0.1, 0.2, 0.3))

# update values of Data Frame to NA where values does not match to c(2, 3)
df[sapply(df, `%notin%`, c(2, 3))] <- NA  # df[sapply(!(df, `%in%`, c(2, 3)))] <- NA
df
# output
  col1 col2 col3
1 <NA>   NA   NA
2 <NA>    2   NA
3 <NA>    3   NA

Run the code in colab

References

Value Matching
The %notin% operator
https://stackoverflow.com/questions/42637099/difference-between-the-and-in-operators-in-r/42637186
R Infix Operator

This work is licensed under a Creative Commons Attribution 4.0 International License

Share on

Twitter Facebook LinkedIn

`%in%` and `%notin%` operators in R

Page content

Introduction

`%in%` to check the value in a vector

`%in%` to check the value in a Data Frames

`%in%` and `%notin%` to filter (subset) Data Frames based on multiple values (with dplyr)

`%in%` and `%notin%` to remove column from Data Frames

`%in%` to select columns from Data Frames

`%in%` to compare two Data Frames

Comparison of `%in%` and `==` operators

Create `%notin%` operator (opposite to `%in%`)

References

Share on

You may also enjoy

Differential gene expression analysis using DESeq2

Create a gene counts matrix from featureCounts

Parsing and analyzing BAM files

Entrez programming utilities for downloading the nucleotide and protein sequences from NCBI

Page content

Introduction

%in% to check the value in a vector

%in% to check the value in a Data Frames

%in% and %notin% to filter (subset) Data Frames based on multiple values (with dplyr)

%in% and %notin% to remove column from Data Frames

%in% to select columns from Data Frames

%in% to compare two Data Frames

Comparison of %in% and == operators

Create %notin% operator (opposite to %in%)

References

Share on

You may also enjoy

Differential gene expression analysis using DESeq2

Create a gene counts matrix from featureCounts

Parsing and analyzing BAM files

Entrez programming utilities for downloading the nucleotide and protein sequences from NCBI

`%in%` to check the value in a vector

`%in%` to check the value in a Data Frames

`%in%` and `%notin%` to filter (subset) Data Frames based on multiple values (with dplyr)

`%in%` and `%notin%` to remove column from Data Frames

`%in%` to select columns from Data Frames

`%in%` to compare two Data Frames

Comparison of `%in%` and `==` operators

Create `%notin%` operator (opposite to `%in%`)