Notes on SAS

The Data Step

Subsetting IF

In a Data Step you can exclude some observations from the dataset with an if statement.
infile FileName;
input year city damages;
* this limits input data to 1980s data;
if 1980 <= year <= 1989;
run;
• IF .. IN statement
if year in (1980, 1981, 1982);
• AND, OR
if year = '1980' and city = 'Baltimore';

infile ‘filename’

In the data step, import data from a file with the infile command.
input year city cost;
run;

Set

• Use the set command to create a new data set from an already created set. The following creates a dataset of 1980s tornado data from the larger set of tornado data.
if 1980 <= year <= 1989;
run;

The PROC Step

PROC SORT

• The sort procedure, sorts data. You can sort by multiple fields.
• Also you can print by a field.
by year city;
by year;
run;

PROC Univariate

PROC Univariate generates descriptive statistics
histogram year;
run;

PROC means

Use proc means when you are only interested in basic descriptive statistics.

PROC freq

• generates tables for data in categories.

PROC gplot

plot year*cost;
title 'Year by Cost tornados';
run;

PROC corr

compute the correlation

var exam1 exam2 hwscore;
run;

PROC reg

• p: prints obs, predicted, residuals
• r: same as p, plus more
• clm: 95% conf interval for mean of each obs
• cli: 95% prediction intervals.
model final=exam1 hwscore / p r cli clm;
plot final*hwscore;
run;

Multiple Regression Analysis

Variable Selection

SAS has several methods for selecting variables
proc reg data=cdi;
model y = x1-x8 /selection=rsquare best=1;
model y = x1-x8 /selection=adjrsq best=5;
model y = x1-x8 /selection=cp best=10;
model y = x1-x8 /selection=forward slentry=0.10;
model y = x1-x8 /selection=stepwise slentry=0.10 slstay=0.10;
model y = x1-x8 /selection=backward slstay=0.10;
run;
additional pages to try: more sas