I’ve been learning SAS, and a part of learning a new language is learning its footguns. One I’ve found relates to SAS’s merge
-construct,
specifically that you can omit the by
statement, which leads to wildly unexpected behaviour, and even whose is completely silent. However, you
can set the
OPTIONS MERGENOBY=ERROR;
to get SAS to report an error if the by
statement is missing. I’m going to include that option in every SAS-script I write from now on.
Background
In SAS, merge
is used to perform what is known as joins in relational databases. For example, if we have two tables
Table A:
id | Name
---+------
1 | Alice
2 | Bob
3 | Eve
Table B:
id | Country
---+---------
1 | Austria
2 | Belgium
3 | Ethiopia
Then we can merge
them together by the id
column to get:
merge A B; by id;
id | Name | Country
---+-------+---------
1 | Alice | Austria
2 | Bob | Belgium
3 | Eve | Ethiopia
That’s all nice and dandy. It can even handle missing values:
Table C:
id | Department
---+------------
1 | Accounting
3 | Engineering
merge A C; by id;
id | Name | Department
---+-------+------------
1 | Alice | Accounting
2 | Bob | .
3 | Eve | Engineering
Notice how SAS uses .
to represent missing values. However, what happens if we omit the by
statement? Catastrophe:
merge A C;
id | Name | Department
---+-------+------------
1 | Alice | Accounting
3 | Bob | Engineering
3 | Eve | .
That is SO FAR from what I wanted to happen. What happens is that SAS simply runs through the rows of both tables and overwrites
the i
‘th row of the first table with the contents of the i
‘th row of the the second table.
The solution
Since this is never what I want to in my work (at least to far), I can tell SAS to make a merge
without a by
an error:
OPTIONS MERGENOBY=ERROR;
Simply place this at the top of your SAS file, and this particular footgun will never shoot you again. Until you forget to add it.