SQL in R

Install sqldf package as below :

install.packages(“sqldf”)

We load sqldf package as :

library(sqldf)

We are working on UCBAdmissions dataset. We load UCBAdmissions dataset .

data(“UCBAdmissions”)

The data contains students admission status as Admit , Gender , Dept as Department , Freq as Frequency of students.

We convert dataset to data frame .

ucb <- as.data.frame(UCBAdmissions)

We are using sqldf() to run SQL commands. We select all columns of ucb .

sqldf(“select * from ucb”)

We can use conditional statement by using where statement . We want to see all Female observation. We select all columns from ucb data frame where Gender is equal to “Female“.

sqldf(“select * from ucb where Gender = ‘Female'”)

We want to see all Admitted students.

sqldf(“select * from ucb where Admit = ‘Admitted'”)

It shows admitted students department-wise in decreasing number of students admitted.

sqldf(“select * from ucb where Admit = ‘Admitted’ order by Freq DESC”)

We want to see list of departments .

sqldf(“select distinct Dept from ucb”)

Output:

It shows total number of students admitted .

sqldf(“select sum(Freq) from ucb where Admit = ‘Admitted'”)

Output:

It shows total number of students rejected in admission.

sqldf(“select sum(Freq) from ucb where Admit = ‘Rejected'”)

Output:

It shows total number of males got admitted .

sqldf(“select sum(Freq) as total_dudes from ucb where Admit = ‘Admitted’ AND Gender = ‘Male'”)

Output:

It shows total frequency of females who are rejected.

sqldf(“select sum(Freq) as total_ladies from ucb where Admit = ‘Rejected’ AND Gender = ‘Female'”)

Output:

It shows average number of admitted student per department .

sqldf(“select Dept, avg(Freq) as average_admitted from ucb where Admit = ‘Admitted’ group by Dept”)

Output:

It shows minimum number of observations of students rejected .

sqldf(“select min(Freq) from ucb where Admit = ‘Rejected'”)

Output:

We create a majors data frame as:

majors <- data.frame(major = c(“math”, “biology”, “engineering”, “computer science”, “history”, “architecture”), Dept = c(LETTERS[1:5], “Other”), Faculty = round(runif(6, min = 10, max = 30)))

It shows majors dataset as:

sqldf(“select * from majors”)

Output:

It shows frequency count in between 25 and 120.

sqldf(“select * from ucb where Freq between 25 AND 120”)

Output:

It shows all gender observations. We used “%” to match one or more characters .

sqldf(“select * from ucb where Gender Like ‘%male%'”)

Output:

We used “Ma%” which means Gender starts with Ma should be selected .

sqldf(“select * from ucb where Gender Like ‘Ma%'”)

Output:

It shows observations where Gender is Female and Freq is greater than 125.

sqldf(“select * from ucb where Gender = ‘Female’ AND Freq >= 125 “)

Output:

We are using nested statements . First , the inner query inside “()” should executed first. Then , external query executed. The internal query is ” select max(Freq) from ucb where Admit = ‘Admitted’)” . It shows maximum frequency of admitted students .

In external query , it shows department where frequency is maximum. The output shows department name which have maximum number of frequency of admitted students.

sqldf(“select Dept from ucb where Freq = (select max(Freq) from ucb where Admit = ‘Admitted’)”)

Output:

We have internal query as ” select max(Freq) from ucb where Gender = ‘Female’)” . We want to see maximum number of frequency of females . So , in external query we want to see department where frequency is maximum. The output shows department name where females have large frequency .

sqldf(“select Dept from ucb where Freq = (select max(Freq) from ucb where Gender = ‘Female’)”)

Output:

We want to check department name having maximum number of frequency in majors dataset.

sqldf(“select Dept from majors where Faculty = (select max(Faculty) from majors)”)

Output:

We want to join ucb and majors dataset where both have common departments.

sqldf(“select * from ucb inner join majors on ucb.Dept = majors.Dept”)

Output:

We are applying left joint to returns all rows from the left dataset , and rows have common department from right dataset. It shows all rows from ucb dataset and rows with common department from majors dataset.

sqldf(“select * from ucb left join majors on ucb.Dept = majors.Dept”)

Output:

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top