Skip to the content.

I’ve been learning SAS, and a part of learning a new language is learning its footguns. One I’ve found relates to SAS’s merge-construct, specifically that you can omit the by statement, which leads to wildly unexpected behaviour, and even whose is completely silent. However, you can set the

OPTIONS MERGENOBY=ERROR;

to get SAS to report an error if the by statement is missing. I’m going to include that option in every SAS-script I write from now on.

Background

In SAS, merge is used to perform what is known as joins in relational databases. For example, if we have two tables

Table A:
id | Name
---+------
1  | Alice
2  | Bob
3  | Eve

Table B:
id | Country
---+---------
1  | Austria
2  | Belgium
3  | Ethiopia

Then we can merge them together by the id column to get:

merge A B; by id;
id | Name  | Country
---+-------+---------
1  | Alice | Austria
2  | Bob   | Belgium
3  | Eve   | Ethiopia

That’s all nice and dandy. It can even handle missing values:

Table C:
id | Department
---+------------
1  | Accounting
3  | Engineering

merge A C; by id;
id | Name  | Department
---+-------+------------
1  | Alice | Accounting
2  | Bob   | .
3  | Eve   | Engineering

Notice how SAS uses . to represent missing values. However, what happens if we omit the by statement? Catastrophe:

merge A C;
id | Name  | Department
---+-------+------------
1  | Alice | Accounting
3  | Bob   | Engineering
3  | Eve   | .

That is SO FAR from what I wanted to happen. What happens is that SAS simply runs through the rows of both tables and overwrites the i‘th row of the first table with the contents of the i‘th row of the the second table.

The solution

Since this is never what I want to in my work (at least to far), I can tell SAS to make a merge without a by an error:

OPTIONS MERGENOBY=ERROR;

Simply place this at the top of your SAS file, and this particular footgun will never shoot you again. Until you forget to add it.