SAS Basic Operators & Methods

Operators

To perform the computation on variables we use different operators .

Operator     Description                 Priority

     +               Addition                         Lowest

     –               Subtraction                        Lowest

     *               Multiplication                Next Highest

     /               Division                        Next Highest

    **               Exponentiation                Highest

     –               Negation                        Highest

We are creating a computation variable as :

x = 2+3*4;

The value of x is 14.

x= 2**3 + 4* -5

It gives  value of x is -12 .

We can check output as :

Missing values in SAS

Numeric variable – The missing values are represented by a single period(.) .

Character variable – The missing values are represented by a single blank(‘ ‘).

Special missing values are represented by a single period followed by a single letter or an underscore(_) as .A , .S ,.Z , ._  . These are only available for numeric variables and are used for distinguishing between different types of missing values.

We create a new dataset “abc” to store missing value in numeric variable “marks”.

Output :

Length

The length of a variable is the number of bytes SAS allocates for storing the variable. SAS uses 8 bytes as default length of a variable.

We can increase or decrease length variable by using length statement .

We create a new data set to store id and name of employees . I have assign length of id as 5 bytes and length of name as 12 bytes.

Data new;

length id 5 name $ 12;

input id name;

datalines;

12544848 Srinivasan

145752 GoravKhurana

;

Output:

Delimiter

A delimiter is a sequence of one or more characters , symbols  to separate data values or columns .

Raw data

A data that is collected from a source . We read raw data in Data step under Datalines.

We create a new dataset as :

Data app;

input id name$;

datalines;

102 danny

103 Sam

;

run;

We submit our program by using following symbol:

Output:

As , you can see from the output numeric data stored in right-align format and character data stored in left-align format.

Different input methods  in SAS

1. List input Method

In this method the variables are listed with data types. The delimiter should be uniform between any pair of adjacent columns. Any missing data will cause problem in the output .

DATA TEMP;

INPUT   EMPID ENAME $ DEPT $ ;

DATALINES;

1 Rick  IT

2 Dan  OPS

3 Tusar  IT

4 Pranab  OPS

5 Rasmi  FIN

; 

Output :

You have to note following points while using List input method :

  • Fields must be separated by at least one blank(other delimiter).
  • Fields must be read in order from left to right.
  • You cannot skip or re-read fields.
  • Missing values must be represented by a place holder such as a period.
  • Character values cannot contain embedded blanks.
  • The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set.

We create a new dataset ‘temp’ as:

Data temp;

input subj name $ gender height ;

datalines;

1102 Alice 1 125

1103 Thomas 1 120

1104 Philips 2 140

1105 . 2 100

1106 Alex . 145

;

run;

We used period(.) to represent missing value .

Output:

2. Column Input Method

The data values are standard numeric or character values that are arranged in neatly defined columns , then it is called as column input method. It allows you to read values that are entered in fixed column.

We create a new dataset to store ID and Name values. We list the variable names in the Input statement and specify corresponding  column positions. We defined ID variable with 1-4 represents column position of 1 to 4 . Similarly , Name stores in column position 5 to 10.

Data new;

Input ID 1-4 Name $5-10 ;

datalines;

1201Ram

1205Shyam

;

run;

Output:

The important points to note about column input are :

  • Missing values can be left as blank.
  • It uses column positions to specified the length of character variable . We can exceed the default 8 characters and embedded spaces.
  •  It allows part of a value to be read and allows values to be re-read.
  • Spaces are not required between the data values.

We create a new dataset “first” as  :

/*Column Input Method  */

Data first;

input Id 1-5 first_name $ 6-16 last_name $ 17-23 weight 28-29 height 24-26 gender $ 30-31;

datalines;

1024 Satya      Shiv   145 58 F

1025 Ramashraya Mishra 125 45

1028 Neelkanth  Datta  120 72 M

1029 Srinivasan Akhil  158    F

;

run;

We used blank space as missing values .

Output :

3. Named Input Method

In this method , the variables are listed with data types . The variable names declared in front of matching data .The delimiter should be uniform between any pairs of adjacent columns.

We create “new” dataset with ID and Name :

Data new;

input ID= Name=$;

datalines;

Name=Jack ID=101

ID=102 Name=Grammy

;

run;

Output:

4. Formatted Input Method

In this method, the variables are read from a fixed starting point .

We use following INPUT statement when using formatted input :

INPUT <pointer-control> variable informat. ;

Where

  • pointer-control tells SAS at what column to start reading data value
  • variable is the name of the variable being created
  • informat is a special instructions that tells SAS how to read raw data values

The pointer-control appears in brackets(<>) to indicate that it is optional.

There are two pointer controls :

  • The @n pointer control moves the input pointer a specific column number n
  • The +n pointer control moves the input pointer forward n columns to a column number that is relative to the current position

The following SAS program uses @n column pointer control to read ID , NAME and DEPARTMENT into a temporary data set “temp”.

In this program , we used @1 to read ID from first column. We used @4 to read Name from forth column and @13 to read Department from thirteenth column.

DATA TEMP;

INPUT   @1 ID $ @4 NAME $ @13 DEPARTMENT $ ;

DATALINES;

19 Sammy    IT

152 Dan      OPS

85 Sangha     IT

142 Charu   OPS

25 Riya     FIN

;

Output:

The following SAS program used @n column pointer control to read two character variables – last_name and first_name and two numeric variables – weight and height into a temporary data set “temp”.

data temp;

input @13 last_name $8.

      @6 first_name $7.

      @27 height 3.

      @24 weight 2.;

datalines;

1024 Alice  Smith    1 65 185 2,025

1025 Mary   Jones    2 68 158 4,065

1026 Thomas James    2    125 1,524  

1028 Fred   Churchil 1 63 182 3,854

;

run;

Output:

We used +n relative pointer control to move forward from current position. We used +3 to move forward three columns ahead to read weight variable and move one column forward to read height variable .

data temp;

input @6 first_name $7.

      @13 last_name $8.

      +3 weight 2.

      +1 height 3.;

datalines;

1024 Alice  Smith    1 65 185 2,025

1025 Mary   Jones    2 68 158 4,065

1026 Thomas James    2    125 1,524  

1028 Fred   Churchil 1 63 182 3,854

;

run;

Output:

 

We create a temporary data set “new” which used +0 to read from first column of line and +1 to move forward one column to read name variable.

Data new;

input +0 Id 3.

      +1 name$;

datalines;

101 Ram

102 Shyam

;

run;  

Output:

5. Modifying List Input Method

List input can be modified y using different modifiers . we are using two modifiers :

  • Ampersand(&) – It modifies allows you to read character values that contain embedded blanks.
  • Colon(:) – It modifies allows you to read nonstandard data values and character values that are longer than eight characters, but which have no embedded blanks.

The Ampersand(&) Modifier

It allows us to use list input to read character values containing single embedded blanks.

We can read city with embedded blanks in below data set .

DATA citypops;

length city $ 20;

input city &  population;

DATALINES;

New York  8008278

Los Angeles  3694820

Chicago  2896016

Houston  1953631

Philadelphia  1517550

Phoenix  1321045

San Antonio  1144646

San Diego  1223400

Dallas  1188580

San Jose  894943

;

RUN;

Output:

We can write above program as :

DATA citypops;

input city & $12. population;

DATALINES;

New York  8008278

Los Angeles  3694820

Chicago  2896016

Houston  1953631

Philadelphia  1517550

Phoenix  1321045

San Antonio  1144646

San Diego  1223400

Dallas  1188580

San Jose  894943

;

RUN;

Output:

The Colon(:) Modifier

It allows us to use list input to read nonstandard data and character values that are longer than eight characters , but which contain no embedded blanks. The colon(:) indicates that we need to read values until a blank(or other delimiter) is encountered .

We create a new data set which includes id and name . We use colon(:) modifier to read values with no embedded blanks.  

Data new;

input id   name : $ 12.;

datalines;

102 Satya Shiv

054 Dharamendra

154 Asharam

;

run;

Output:

6. Line pointer control Method

In a raw data file ,when the data values for one observation are spread out over several records , we used line pointer control input method to read it.

There are two types of line pointer controls :

  • The forward slash(/) specifies a line location that is relative to the current one.
  • The #n specifies the absolute number of line to which you want to move the pointer.

Reading Multiple Records Sequentially

We can read multiple record sequentially by using multiple input statements . We used multiple statements to read multiple lines as one input statement read one line at a time.

We create a new data set to read multiple statements to combine in an observation as:

Data multi;

input  ID;

input Name $15.;

input Address $24.;

datalines;

101

Satya Shiv Modi

D-175 , D.D.A. Flats

102

Rakesh verma

71/B , Durgapuri Chowk

;

run;      

Output :

We are using hash(#) line pointer control to tell SAS to advance to a specific record before reading the next data value.

We create a new data set “multi” to store records of ID , name and Address . The line pointer control tell SAS to advance to first record to read ID value. Then , it specifies #2 to move to second record to read Name value and so on.

Data multi;

input #1 ID

      #2 Name $15.

      #3 Address $24.;

datalines;

101

Satya Shiv Modi

D-175 , D.D.A. Flats

102

Rakesh verma

71/B , Durgapuri Chowk

;

run;      

Output:

We used forward slash(/) line pointer control tells SAS to advance the input pointer to the next record.

We tell SAS to read ID in first record , then the slash(/) line pointer control tells SAS to move to the next record to read in the Name value. Then , the line pointer control tells SAS to move to the next record to read in the Address value.

Data new;

input ID / Name $15. / Address $24.;

datalines;

101

Satya Shiv Modi

D-175 , D.D.A. Flats

102

Rakesh verma

71/B , Durgapuri Chowk

;

run;      

Output:

Reading Multiple Records Non-Sequentially

We can read data values in non-sequential manner. First , we advance to third record to read Address value , then we move to second record to read Name value and lastly move to first record to read ID value.

Data multi;

input #3 Address $24.

      #2 Name $15.

      #1 ID;

datalines;

101

Satya Shiv Modi

D-175 , D.D.A. Flats

102

Rakesh verma

71/B , Durgapuri Chowk

;

run;      

Output :

Reading Multiple Records Sequentially and Non-Sequentially

We can read record sequentially and non-sequentially. Here , we advance to third record to read Address value, then we jump to first line to read ID value and then we read data in sequential manner by advance to second record to read name values.

Data multi;

input #3 Address $24.

      #1 ID

      / Name $15.;

datalines;

101

Satya Shiv Modi

D-175 , D.D.A. Flats

102

Rakesh verma

71/B , Durgapuri Chowk

;

run;      

Output:

7.Mixed Input Method

We can mix the input methods  to read data values .         

We read the data as :

  • First , we used column input to read first field(ParkName).
  • We used list input to read two fields , as they are separated by a single blank .
  • The values in the last field (Acre) are arranged in defined column . We used formatted input to specify starting column to read from.

Data parks;

input ParkName $1-22 State $ Year @40 Acre ;

datalines;

Yellowstone           ID 1872          4065493

Everglades            FL 1934          1398800

Yosemite              CA 1864            760917

Great Smoky Mountains NC 1926          520269

Wolf Trap Farm        VA 1966                130

;

RUN;

Output:

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top