import
Importing in R
We can import different types of data files by using different functions in R. We can import files like txt , csv , spss , sas7bdata etc.
read.table()
We can reads a file in table format and create a data frame , with corresponding lines and variables in the file.
The syntax of read.table() as:
read.table(file, header , sep, dec, skip)
file – the name of the file where data are to be read from.
header – a logical value indicating whether the file contains the names of the variables as its first line . The default value of header is FALSE . If first line contains names of variables then we set header equals to TRUE.
sep – the values on each line are separated by this character . The default separator is white space .
dec – the character used in the file or decimal points.
skip – the number of lines of the data file to skip before beginning to read data.
We can copy data to clipboard and read from their also. We have dataset separated by “,” .
We create a new object “class” to store data from clipboard. We have to read data contains separator “,“. We do not have any header in the dataset . The variable names are created by itself using names as V1 , V2 , V3 ,V4 and V5.
class<-read.table(“clipboard”,sep=”,”)
We can also read “.data ” files by using read.table() . The file is stored in “adult.data “.
We create a new variable “data” to store the dataset. We have used “,” in our file as character which represent by dec parameter.
data <- read.table(“https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data”, dec=”,”, header=F)
We can view top 6 observations of “data” as:
head(data)
We assign column names of “data” as:
names(data)<-c(“age”,”workclass”,”fnlwgt”,”education”,”education-num”,”marital-status”,”occupation”,”relationship”, “race”,”sex”,”capital-gain”,”capital-loss”,”hours-per-week”,”native-country”,” Annual Income “)
We can see changes in column names of “data” dataset.
head(data)
We can import “.dat” files also . We are using read.delim() function to read file.
You can download file from this link – dataset . The syntax of read.delim() function is :
read.delim(file, header = TRUE, sep = “\t”,..)
Where file – the name of the file which the data are to be read from.
header – a logical value to represent whether the file contains the names of the variables as the first line.
sep – the values on each line are separated by this character . The default separator is white space .
n<-read.delim(“https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports-extended.dat” ,sep = “,”, header = F)
We can see top 6 observations of “n” as:
head(n)
We want to read text file by using read.table() function . The file includes separator(delimiter) as “,” .
You can download file from this link –data
We create a new object “data” as :
data<-read.table(“https://archive.ics.uci.edu/ml/machine-learning-databases/00246/3D_spatial_network.txt”,sep = “,” )
We can see top observations of “data” as:
We can read “.dta” files by using “foreign” package . For using foreign package , we have to installed the package . We can install package by following code :
install.packages(“foreign”)
We load “foreign” package for use .
library(foreign)
The file can be downloaded from this link – brumm.dta
STATA<-read.dta(file.choose())
When we run this code the below window is pop up –
We choose” brumm.dta” and then click on Open button to open the file.
The file is loaded and stored in as STATA . We can check top 6 observations from the file by using following command –
head(STATA)
We can download and unzip our zipped file in R. We create an object to store zip file as :
dataset_url <- “http://s3.amazonaws.com/practice_assignment/diet_data.zip”
We download file by using download.file() function . The syntax of the function is:
download.file(url , destfile) .
where url – it is the link of the file to download
destfile – a name of file where the download file is saved .
download.file(dataset_url,”diet_data.zip”)
Now , we unzip the folder to extract files from it. The first argument shows the folder name which is to extract . In the second argument , we use exdir parameter to create a new folder to store the extracted files . We have extracted five files and save them in “new” directory.
x<-unzip(“diet_data.zip”, exdir=”new”)
We check the current directory , where the file is stored . The “new” directory stored in Documents library of computer .
getwd()
We want to read CSV file “Andy.csv“. The file is stored in “new” directory . For reading CSV files , we use read.csv() function .
andy <- read.csv(file=”new/Andy.csv”)
andy
We are creating a character object “rwdt” to store data .
rwdt <- “STATE READY TOTAL
AL 36 36
AK 5 8
AZ 15 16
UT 11 11
VT 33 49
VA 108 124
WV 27 36
WI 122 125
WY 12 14″
It shows data by including “\n” in the data . “\n” is used to represent new line .
We check the data type of “rwdt” as:
class(rwdt)
We are using textConnection() function to make a text connection for reading data stored in “rwdt” object .
raw_data <- textConnection(rwdt)
It shows class , mode , text , opened , can read and can write attributes of “raw_data” object .
We read and store data by using read.table() function. It convert data into data frame format.
raw <- read.table(raw_data, header=TRUE)
We close the text connection by using close.connection() function .
close.connection(raw_data)
raw
We are checking the data type of “raw” object . It shows data frame type.
class(raw)
We can also read “.dta” file by using fread() function . For using it , we have to install “data.table” package .
We can install “data.table” package by using following code :
install.packages(“data.table”)
We can load “data.table” package as :
library(data.table)
We can check descriptive information of fread() function as:
?fread
The file is stored at link – ch11b.dat
mydat <- fread(‘http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat’)
It shows top 6 observations of mydat .
head(mydat)
We can import SAS dataset . First , we need to install ” sas7bdat ” package .
install.packages(“sas7bdat “)
We load ” sas7bdat ” library by using this code :
library(sas7bdat)
We can import “blood.sas7bdat” data file by using this code:
mySASData <- read.sas7bdat(“D:/desktop2/blood.sas7bdat”)
We can see top 6 observations of dataset .
head(mySASData)
We can also import SPSS files in R. We have to install “foreign” package .
install.packages(“foreign “)
We load ” foreign ” package as :
library(foreign)
We can download file from this link – dataset.sav
We are using read.spss() function to read it. We have used to.data.frame parameter to return dataset as data frame.
We can import dataset and store in “rt” object as :
rt<-read.spss(“D:/desktop2/dataset.sav”,to.data.frame=TRUE)
We want to read “xlsx” file format . we can use read.xlsx() function which is stored in “xlsx” package.
We can install “xlsx” package as:
install.packages(“xlsx”)
We can load “xlsx” package as:
library(xlsx)
We can read “xlsx” files by using read.xlsx() function .
We can download file using this link – datasets.xlsx
We create a new variable “exfile” to save “xlsx” data file. we are using sheetIndex parameter to represent sheet index in the workbook.
exfile<-read.xlsx(“D:/desktop2/datasets.xlsx”,sheetIndex = 1)
We can see top 6 observations of “exfile” are –
head(exfile)