Install sqldf package as below :
install.packages(“sqldf”)
We load sqldf package as :
library(sqldf)
We are working on UCBAdmissions dataset. We load UCBAdmissions dataset .
data(“UCBAdmissions”)
The data contains students admission status as Admit , Gender , Dept as Department , Freq as Frequency of students.
We convert dataset to data frame .
ucb <- as.data.frame(UCBAdmissions)
We are using sqldf() to run SQL commands. We select all columns of ucb .
sqldf(“select * from ucb”)
We can use conditional statement by using where statement . We want to see all Female observation. We select all columns from ucb data frame where Gender is equal to “Female“.
sqldf(“select * from ucb where Gender = ‘Female'”)
We want to see all Admitted students.
sqldf(“select * from ucb where Admit = ‘Admitted'”)
It shows admitted students department-wise in decreasing number of students admitted.
sqldf(“select * from ucb where Admit = ‘Admitted’ order by Freq DESC”)
We want to see list of departments .
sqldf(“select distinct Dept from ucb”)
Output:
It shows total number of students admitted .
sqldf(“select sum(Freq) from ucb where Admit = ‘Admitted'”)
Output:
It shows total number of students rejected in admission.
sqldf(“select sum(Freq) from ucb where Admit = ‘Rejected'”)
Output:
It shows total number of males got admitted .
sqldf(“select sum(Freq) as total_dudes from ucb where Admit = ‘Admitted’ AND Gender = ‘Male'”)
Output:
It shows total frequency of females who are rejected.
sqldf(“select sum(Freq) as total_ladies from ucb where Admit = ‘Rejected’ AND Gender = ‘Female'”)
Output:
It shows average number of admitted student per department .
sqldf(“select Dept, avg(Freq) as average_admitted from ucb where Admit = ‘Admitted’ group by Dept”)
Output:
It shows minimum number of observations of students rejected .
sqldf(“select min(Freq) from ucb where Admit = ‘Rejected'”)
Output:
We create a majors data frame as:
majors <- data.frame(major = c(“math”, “biology”, “engineering”, “computer science”, “history”, “architecture”), Dept = c(LETTERS[1:5], “Other”), Faculty = round(runif(6, min = 10, max = 30)))
It shows majors dataset as:
sqldf(“select * from majors”)
Output:
It shows frequency count in between 25 and 120.
sqldf(“select * from ucb where Freq between 25 AND 120”)
Output:
It shows all gender observations. We used “%” to match one or more characters .
sqldf(“select * from ucb where Gender Like ‘%male%'”)
Output:
We used “Ma%” which means Gender starts with Ma should be selected .
sqldf(“select * from ucb where Gender Like ‘Ma%'”)
Output:
It shows observations where Gender is Female and Freq is greater than 125.
sqldf(“select * from ucb where Gender = ‘Female’ AND Freq >= 125 “)
Output:
We are using nested statements . First , the inner query inside “()” should executed first. Then , external query executed. The internal query is ” select max(Freq) from ucb where Admit = ‘Admitted’)” . It shows maximum frequency of admitted students .
In external query , it shows department where frequency is maximum. The output shows department name which have maximum number of frequency of admitted students.
sqldf(“select Dept from ucb where Freq = (select max(Freq) from ucb where Admit = ‘Admitted’)”)
Output:
We have internal query as ” select max(Freq) from ucb where Gender = ‘Female’)” . We want to see maximum number of frequency of females . So , in external query we want to see department where frequency is maximum. The output shows department name where females have large frequency .
sqldf(“select Dept from ucb where Freq = (select max(Freq) from ucb where Gender = ‘Female’)”)
Output:
We want to check department name having maximum number of frequency in majors dataset.
sqldf(“select Dept from majors where Faculty = (select max(Faculty) from majors)”)
Output:
We want to join ucb and majors dataset where both have common departments.
sqldf(“select * from ucb inner join majors on ucb.Dept = majors.Dept”)
Output:
We are applying left joint to returns all rows from the left dataset , and rows have common department from right dataset. It shows all rows from ucb dataset and rows with common department from majors dataset.
sqldf(“select * from ucb left join majors on ucb.Dept = majors.Dept”)
Output: