Operators
To perform the computation on variables we use different operators .
Operator Description Priority
+ Addition Lowest
– Subtraction Lowest
* Multiplication Next Highest
/ Division Next Highest
** Exponentiation Highest
– Negation Highest
We are creating a computation variable as :
x = 2+3*4;
The value of x is 14.
x= 2**3 + 4* -5
It gives value of x is -12 .
We can check output as :
Missing values in SAS
Numeric variable – The missing values are represented by a single period(.) .
Character variable – The missing values are represented by a single blank(‘ ‘).
Special missing values are represented by a single period followed by a single letter or an underscore(_) as .A , .S ,.Z , ._ . These are only available for numeric variables and are used for distinguishing between different types of missing values.
We create a new dataset “abc” to store missing value in numeric variable “marks”.
Output :
Length
The length of a variable is the number of bytes SAS allocates for storing the variable. SAS uses 8 bytes as default length of a variable.
We can increase or decrease length variable by using length statement .
We create a new data set to store id and name of employees . I have assign length of id as 5 bytes and length of name as 12 bytes.
Data new;
length id 5 name $ 12;
input id name;
datalines;
12544848 Srinivasan
145752 GoravKhurana
;
Output:
Delimiter
A delimiter is a sequence of one or more characters , symbols to separate data values or columns .
Raw data
A data that is collected from a source . We read raw data in Data step under Datalines.
We create a new dataset as :
Data app;
input id name$;
datalines;
102 danny
103 Sam
;
run;
We submit our program by using following symbol:
Output:
As , you can see from the output numeric data stored in right-align format and character data stored in left-align format.
Different input methods in SAS
1. List input Method
In this method the variables are listed with data types. The delimiter should be uniform between any pair of adjacent columns. Any missing data will cause problem in the output .
DATA TEMP;
INPUT EMPID ENAME $ DEPT $ ;
DATALINES;
1 Rick IT
2 Dan OPS
3 Tusar IT
4 Pranab OPS
5 Rasmi FIN
;
Output :
You have to note following points while using List input method :
- Fields must be separated by at least one blank(other delimiter).
- Fields must be read in order from left to right.
- You cannot skip or re-read fields.
- Missing values must be represented by a place holder such as a period.
- Character values cannot contain embedded blanks.
- The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set.
We create a new dataset ‘temp’ as:
Data temp;
input subj name $ gender height ;
datalines;
1102 Alice 1 125
1103 Thomas 1 120
1104 Philips 2 140
1105 . 2 100
1106 Alex . 145
;
run;
We used period(.) to represent missing value .
Output:
2. Column Input Method
The data values are standard numeric or character values that are arranged in neatly defined columns , then it is called as column input method. It allows you to read values that are entered in fixed column.
We create a new dataset to store ID and Name values. We list the variable names in the Input statement and specify corresponding column positions. We defined ID variable with 1-4 represents column position of 1 to 4 . Similarly , Name stores in column position 5 to 10.
Data new;
Input ID 1-4 Name $5-10 ;
datalines;
1201Ram
1205Shyam
;
run;
Output:
The important points to note about column input are :
- Missing values can be left as blank.
- It uses column positions to specified the length of character variable . We can exceed the default 8 characters and embedded spaces.
- It allows part of a value to be read and allows values to be re-read.
- Spaces are not required between the data values.
We create a new dataset “first” as :
/*Column Input Method */
Data first;
input Id 1-5 first_name $ 6-16 last_name $ 17-23 weight 28-29 height 24-26 gender $ 30-31;
datalines;
1024 Satya Shiv 145 58 F
1025 Ramashraya Mishra 125 45
1028 Neelkanth Datta 120 72 M
1029 Srinivasan Akhil 158 F
;
run;
We used blank space as missing values .
Output :
3. Named Input Method
In this method , the variables are listed with data types . The variable names declared in front of matching data .The delimiter should be uniform between any pairs of adjacent columns.
We create “new” dataset with ID and Name :
Data new;
input ID= Name=$;
datalines;
Name=Jack ID=101
ID=102 Name=Grammy
;
run;
Output:
4. Formatted Input Method
In this method, the variables are read from a fixed starting point .
We use following INPUT statement when using formatted input :
INPUT <pointer-control> variable informat. ;
Where
- pointer-control tells SAS at what column to start reading data value
- variable is the name of the variable being created
- informat is a special instructions that tells SAS how to read raw data values
The pointer-control appears in brackets(<>) to indicate that it is optional.
There are two pointer controls :
- The @n pointer control moves the input pointer a specific column number n
- The +n pointer control moves the input pointer forward n columns to a column number that is relative to the current position
The following SAS program uses @n column pointer control to read ID , NAME and DEPARTMENT into a temporary data set “temp”.
In this program , we used @1 to read ID from first column. We used @4 to read Name from forth column and @13 to read Department from thirteenth column.
DATA TEMP;
INPUT @1 ID $ @4 NAME $ @13 DEPARTMENT $ ;
DATALINES;
19 Sammy IT
152 Dan OPS
85 Sangha IT
142 Charu OPS
25 Riya FIN
;
Output:
The following SAS program used @n column pointer control to read two character variables – last_name and first_name and two numeric variables – weight and height into a temporary data set “temp”.
data temp;
input @13 last_name $8.
@6 first_name $7.
@27 height 3.
@24 weight 2.;
datalines;
1024 Alice Smith 1 65 185 2,025
1025 Mary Jones 2 68 158 4,065
1026 Thomas James 2 125 1,524
1028 Fred Churchil 1 63 182 3,854
;
run;
Output:
We used +n relative pointer control to move forward from current position. We used +3 to move forward three columns ahead to read weight variable and move one column forward to read height variable .
data temp;
input @6 first_name $7.
@13 last_name $8.
+3 weight 2.
+1 height 3.;
datalines;
1024 Alice Smith 1 65 185 2,025
1025 Mary Jones 2 68 158 4,065
1026 Thomas James 2 125 1,524
1028 Fred Churchil 1 63 182 3,854
;
run;
Output:
We create a temporary data set “new” which used +0 to read from first column of line and +1 to move forward one column to read name variable.
Data new;
input +0 Id 3.
+1 name$;
datalines;
101 Ram
102 Shyam
;
run;
Output:
5. Modifying List Input Method
List input can be modified y using different modifiers . we are using two modifiers :
- Ampersand(&) – It modifies allows you to read character values that contain embedded blanks.
- Colon(:) – It modifies allows you to read nonstandard data values and character values that are longer than eight characters, but which have no embedded blanks.
The Ampersand(&) Modifier
It allows us to use list input to read character values containing single embedded blanks.
We can read city with embedded blanks in below data set .
DATA citypops;
length city $ 20;
input city & population;
DATALINES;
New York 8008278
Los Angeles 3694820
Chicago 2896016
Houston 1953631
Philadelphia 1517550
Phoenix 1321045
San Antonio 1144646
San Diego 1223400
Dallas 1188580
San Jose 894943
;
RUN;
Output:
We can write above program as :
DATA citypops;
input city & $12. population;
DATALINES;
New York 8008278
Los Angeles 3694820
Chicago 2896016
Houston 1953631
Philadelphia 1517550
Phoenix 1321045
San Antonio 1144646
San Diego 1223400
Dallas 1188580
San Jose 894943
;
RUN;
Output:
The Colon(:) Modifier
It allows us to use list input to read nonstandard data and character values that are longer than eight characters , but which contain no embedded blanks. The colon(:) indicates that we need to read values until a blank(or other delimiter) is encountered .
We create a new data set which includes id and name . We use colon(:) modifier to read values with no embedded blanks.
Data new;
input id name : $ 12.;
datalines;
102 Satya Shiv
054 Dharamendra
154 Asharam
;
run;
Output:
6. Line pointer control Method
In a raw data file ,when the data values for one observation are spread out over several records , we used line pointer control input method to read it.
There are two types of line pointer controls :
- The forward slash(/) specifies a line location that is relative to the current one.
- The #n specifies the absolute number of line to which you want to move the pointer.
Reading Multiple Records Sequentially
We can read multiple record sequentially by using multiple input statements . We used multiple statements to read multiple lines as one input statement read one line at a time.
We create a new data set to read multiple statements to combine in an observation as:
Data multi;
input ID;
input Name $15.;
input Address $24.;
datalines;
101
Satya Shiv Modi
D-175 , D.D.A. Flats
102
Rakesh verma
71/B , Durgapuri Chowk
;
run;
Output :
We are using hash(#) line pointer control to tell SAS to advance to a specific record before reading the next data value.
We create a new data set “multi” to store records of ID , name and Address . The line pointer control tell SAS to advance to first record to read ID value. Then , it specifies #2 to move to second record to read Name value and so on.
Data multi;
input #1 ID
#2 Name $15.
#3 Address $24.;
datalines;
101
Satya Shiv Modi
D-175 , D.D.A. Flats
102
Rakesh verma
71/B , Durgapuri Chowk
;
run;
Output:
We used forward slash(/) line pointer control tells SAS to advance the input pointer to the next record.
We tell SAS to read ID in first record , then the slash(/) line pointer control tells SAS to move to the next record to read in the Name value. Then , the line pointer control tells SAS to move to the next record to read in the Address value.
Data new;
input ID / Name $15. / Address $24.;
datalines;
101
Satya Shiv Modi
D-175 , D.D.A. Flats
102
Rakesh verma
71/B , Durgapuri Chowk
;
run;
Output:
Reading Multiple Records Non-Sequentially
We can read data values in non-sequential manner. First , we advance to third record to read Address value , then we move to second record to read Name value and lastly move to first record to read ID value.
Data multi;
input #3 Address $24.
#2 Name $15.
#1 ID;
datalines;
101
Satya Shiv Modi
D-175 , D.D.A. Flats
102
Rakesh verma
71/B , Durgapuri Chowk
;
run;
Output :
Reading Multiple Records Sequentially and Non-Sequentially
We can read record sequentially and non-sequentially. Here , we advance to third record to read Address value, then we jump to first line to read ID value and then we read data in sequential manner by advance to second record to read name values.
Data multi;
input #3 Address $24.
#1 ID
/ Name $15.;
datalines;
101
Satya Shiv Modi
D-175 , D.D.A. Flats
102
Rakesh verma
71/B , Durgapuri Chowk
;
run;
Output:
7.Mixed Input Method
We can mix the input methods to read data values .
We read the data as :
- First , we used column input to read first field(ParkName).
- We used list input to read two fields , as they are separated by a single blank .
- The values in the last field (Acre) are arranged in defined column . We used formatted input to specify starting column to read from.
Data parks;
input ParkName $1-22 State $ Year @40 Acre ;
datalines;
Yellowstone ID 1872 4065493
Everglades FL 1934 1398800
Yosemite CA 1864 760917
Great Smoky Mountains NC 1926 520269
Wolf Trap Farm VA 1966 130
;
RUN;
Output: