Notes on SAS

The Data Step

Subsetting IF

In a Data Step you can exclude some observations from the dataset with an if statement.
data tornados_1980s;
  infile FileName;
  input year city damages;
  * this limits input data to 1980s data;
  if 1980 <= year <= 1989;
run;
  • IF .. IN statement
    if year in (1980, 1981, 1982);
  • AND, OR
    if year = '1980' and city = 'Baltimore';

    infile ‘filename’

    In the data step, import data from a file with the infile command.
    data tornados;
      infile 'tornados.dat';
      input year city cost;
    run;

    Set

  • Use the set command to create a new data set from an already created set. The following creates a dataset of 1980s tornado data from the larger set of tornado data.
    data tornados_1980s;
      set tornados;
      if 1980 <= year <= 1989;
    run;

    The PROC Step

    PROC SORT

  • The sort procedure, sorts data. You can sort by multiple fields.
  • Also you can print by a field.
    proc sort data=tornados;
      by year city;
    proc print data=tornados;
      by year;
    run;

    PROC Univariate

    PROC Univariate generates descriptive statistics
    proc univariate data=tornados;
      histogram year;
    run;

    PROC means

    Use proc means when you are only interested in basic descriptive statistics.

    PROC freq

  • generates tables for data in categories.

    PROC gplot

    proc gplot data=tornados;
      plot year*cost;
      title 'Year by Cost tornados';
    run;

    PROC corr

    compute the correlation

    proc corr data=grades;
      var exam1 exam2 hwscore;
    run;

    PROC reg

  • p: prints obs, predicted, residuals
  • r: same as p, plus more
  • clm: 95% conf interval for mean of each obs
  • cli: 95% prediction intervals.
    proc reg data=grades;
      model final=exam1 hwscore / p r cli clm;
      plot final*hwscore;
    run;

    Multiple Regression Analysis

    Variable Selection

    SAS has several methods for selecting variables
    proc reg data=cdi;
      model y = x1-x8 /selection=rsquare best=1;
      model y = x1-x8 /selection=adjrsq best=5;
      model y = x1-x8 /selection=cp best=10;
      model y = x1-x8 /selection=forward slentry=0.10;
      model y = x1-x8 /selection=stepwise slentry=0.10 slstay=0.10;
      model y = x1-x8 /selection=backward slstay=0.10;
    run;
    additional pages to try: more sas