Functions in R – apply(), mapply(), tapply(), lapply()

new blog

Functions in R

We are going to introduce some basic functions which can help us to work easily in day-to-day work.

apply()

The apply function can be used to apply function to margins of an array or matrix.

The syntax of apply() function is :

apply(x, margin, fun, …)

x = an array , including a matrix.

margin = a vector giving the subscripts which the function will be applied over.

fun = the function to be applied

… = optional arguments to function

We create a matrix of 200 elements . We use rnorm() function to create random numbers.It contains 20 rows and 10 columns.

x <- matrix(rnorm(200), 20,10)

For calculating function on rows , we use margin =1 . We use apply() to calculate sum of rows of matrix x.

rowSums<- apply(x, 1, sum)

rowSums

We can calculate mean value of 20 rows of matrix by using mean function in apply() .

rowMeans<- apply(x,1,mean)

We can find sum column-wise by using margin= 2 and sum function .

colSums <- apply(x, 2, sum)

colSums  

We also find mean of all columns by using mean function.

colMeans1 <- apply(x, 2, mean)

colMeans1

We create an array a as :

a <- array(1:20, c(2,2,2))

a

We can also find mean of all rows by using following code :

apply(a, 1,mean)

We can also find mean of all columns by using following code:

We can also find mean of array in respect of combination of rows and columns by using margin = c(1,2) .

We can also find sum on third dimension . It calculate sum on two matrix separately.

apply(a,3,sum)

It shows sum of first matrix as 10.

It shows sum of second matrix as 26.

lapply()

It returns a list of same length as X , each element of which is the result of applying FUN to corresponding element of X . It takes three arguments a list , a function and other arguments.  

We create a list x which stores a vector “a” and another vector “b” stores random numbers.

x <- list(a = 1:5, b = rnorm(10))

We want to find mean of list x :

lapply(x, mean)

We apply function runif() to “x” vector. So, x is changed to list and then we apply runif() to every element of x.

x <- 1:4

lapply(x,runif)

It create random numbers  of uniform distribution. So , it create random number of each element in x.

We also specify min and max parameters to generate random numbers between these numbers.

lapply(x, runif, min=0, max=10)

We create a list object “m” :

m <- list(a= matrix(1:4, 2,2), b = matrix(1:6, 3,2))

We create a function  as function(x) x[,1] to apply on list elements. The list elements are matrices .

The function(x) is used to show first column of matrix .

lapply(m,function(x) x[,1])

We create a list “x”  of four elements contains random numbers . The element “c” contains 20 random numbers with mean equals to 1 . The element “d” contains 100 random numbers with mean 5 .

x <- list(a= 1:4, b = rnorm(10), c=rnorm(20,1), d= rnorm(100,5))

We find mean value of each element of list .

lapply(x, mean)

unlist()

It simplifies list output to a vector .

unlist(lapply(x, mean))

sapply()

It is used to show output in vector or matrix form .

sapply(x, mean)

We create a matrix “m” of 30 rows and 3 columns. We use cbind(rnorm(30,0),rnorm(30,2),rnorm(30,5))

to combine 30 random numbers of mean 0,30 random numbers of mean 2 and random numbers of mean 5.

m <- matrix(cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)

We find mean of first column as:

mean(m[,1])

We find mean of second column as:

mean(m[,2])

We find mean of third column as:

mean(m[,3])

We can find column-wise mean of matrix by using function(x) mean(m[,x]).

The function is used to find mean of each column of matrix “m”.

sapply(1:3, function(x) mean(m[,x]))  

We use length() function to count the number of elements.

We use function(x) length(x[x<0]) , it is used to count the number of elements in each column of matrix where value is less than 0.

apply(m, 2, function(x) length(x[x<0]))

We use function(x) mean(x[x>0]) to find mean of column of matrix “m” where values are greater than 0 .

apply(m, 2, function(x) mean(x[x>0]))

We want to find square of numbers 1 to 3. We use simplify parameter to represent output in list or vector . If simplify = TRUE or T , than output represent in vector or matrix. If simplify = FALSE or F , then

output represent in list .

We used SIMPLIFY=F , it shows output in list form.

sapply(1:3, function(x) x^2, simplify=F)

tapply()

The tapply function can be used to apply a function to a category of items.

We check the structure of tapply function as:

str(tapply)

INDEX  = list of one or more factors

We are using mtcars dataset to apply tapply function.

We check the details of mtcars dataset .

?mtcars

We want to calculate average weight of car for each category of number of cylinders .

tapply(mtcars$wt,mtcars$cyl,mean)

We create a object of random numbers as :

x <- c(rnorm(10), runif(10), rnorm(10,1))

We create a factor variable by using gl() function . It takes first argument as number of factors and second argument as the number of replications of each factor. We create three factors for 10 random numbers in pairs in x.

f <-gl (3,10)

We calculate average of each factor.

tapply(x, f, mean)

We create a data frame “a” as combination of “x” and “y” vectors.

x <- 1:20

y<- factor(rep(letters[1:5], each= 4))

a<-data.frame(x,y)

We calculate sum of “x” values by factors associated with “y”.

tapply(x,y, sum)

We are using iris dataset. We attach iris dataset by using this code:

attach(iris)

We can view iris dataset as:

View(iris)

We check out the structure of iris dataset .

str(iris)

We calculate average Petal.Length of each Species . Here , Species is factor variable .

tapply(iris$Petal.Length, Species, mean)

by()

It is same as tapply() , which is applied to data frames.

The syntax of by() is :

by(data, INDICES, FUN, …, simplify = TRUE)

We calculate average of four columns of iris data by Species wise.

by(iris[,1:4], iris$Species,colMeans)

mapply()

The mapply() function stands for multivariate apply.  

We apply rep function to replicate values . The first argument represent function to apply .The second argument is a vector to pass the function . The third argument is number of times to replicate the values. So , the vector have values 1 to 4 ,which replicate 4 times to 1 times.

mapply(rep, 1:4, 4:1)

We create two list objects as:

blue<- list(a = c(1:10), b = c(11:20))

red <- list(c = c(21:30), d = c(31:40))

We calculate summation of vector “a” and “b” from list “blue” . We also sum vector “c” and “d” from list “red”.

mapply(sum, blue$a,blue$b,red$c, red$d)

We sum two lists elements wise .We sum “a” with “c” and “b” with “d” . In output , “a” and “b” represent the total sum  of “a” with “c” and “b” with “d” respectively.

mapply(sum,blue,red)

split()

The syntax of split is :

split(x, f, drop = FALSE, …)

It divides the data in the vector x in to groups defined by f.

x – vector or data frame containing values to be divided into groups

f – a factor variable

drop – if levels that do not occur should be dropped

x <- c(rnorm(10), runif(10), rnorm(10,1) )

f <- gl(3,10)

We split vector “x” by factor-wise “f”.  It shows factors associated with data from vector “x”.

split(x, f)

We are going to use airquality dataset . So , we check the description of dataset.

?airquality

head(airquality)

We split dataset by Month variable. It shows month wise data .

s <- split( airquality,  airquality$Month)

We calculate average of each column on “s” list.

lapply(s, colMeans, na.rm = TRUE)

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top