The SEI CERT C Coding Standard contains three recommendations pertaining to dead code.
- MSC07-C. Detect and remove dead code
- MSC12-C. Detect and remove code that has no effect or is never executed
- MSC13-C. Detect and remove unused values
This document is about alerts that are mapped to MSC12-C. We analyze all the MSC12-C alerts mapped to git or zeek. We did not target alerts mapped to MSC07-C or MSC13-C, although some alerts might have been included in our analysis...not all SA tools broke down dead code in the same way as the SEI CERT standard.
Here we analyze and categorize all the alerts produced for git or zeek that pertain to MSC12-C.
For each category of alerts, we can assign it an attribute:
Attribute | Meaning |
---|---|
NO_EXAMPLES | There are no examples that we should repair |
LOW | There is repairable code and repairing it is low priority |
HIGH | There is repairable code and repairing it is high priority |
For these categories, we ignored alerts in sqlite3.c, mainly because sqlite3.c is external to zeek.
A zeek-only alert, but relevant to C.
auxil/broker/caf/libcaf_core/caf/detail/parser/read_bool.hpp:32
is code that defines a finite state machine, using macros. The important macro is state()
, defined in fsn.hpp:42
We cannot repair automatically because the label is inside a macro definition. (This applies to all our 'label' alerts!)
From git:builtin/difftool.c:722:
argc = parse_options(argc, argv, prefix, builtin_difftool_options,
builtin_difftool_usage, PARSE_OPT_KEEP_UNKNOWN_OPT |
PARSE_OPT_KEEP_DASHDASH);
should be:
(void) parse_options(argc, argv, prefix, builtin_difftool_options,
builtin_difftool_usage, PARSE_OPT_KEEP_UNKNOWN_OPT |
PARSE_OPT_KEEP_DASHDASH);
Assignment can be removed, the variable is never read (outside this line).
However, git often used argc
to indicate the number of program arguments that have not yet been processed, and this invariant was important to developers even if argc
is never subsequently read. Therefore while removing the assignment is safe, it is not uncontroversial.
"Redundant assignment of 'yymsp[0].minor.yy528' to itself."
From zeek:auxil/broker/3rdparty/sqlite3.c:165963
Ignore this one, as it is in sqlite3.c. (In theory, we would remove self-assignments)
From zeek:auxil/highwayhash/highwayhash/hh_avx2.h:330
const V4x64U zero = ba ^ ba;
Don't repair, this is a common way to quickly produce 0. (Or perhaps replace with "0").
Line 332 of same file:
const V4x64U ones = ba == ba; // FF .. FF
Don't repair, I suspect using true
or 1 is incorrect. (Repair should be manual here, something like ~zero
)
From zeek:auxil/broker/include/broker/span.hh:157
return subspan(num_elements - num_elements, num_elements);
Theoretically repair with zero. Not sure this is worth repairing.
In conclusion, I wouldn't repair these automatically, given how few there are. Each repair depends on the operator. The unrepaired code may be nonportable as well.
From zeek:auxil/c-ares/src/lib/ares__parse_into_addrinfo.c:188:
if (!got_cname || (got_cname && cname_only_is_enodata))
Replace with:
if (!got_cname || cname_only_is_enodata)
since (a || (!a && b)) == a || b
This might be less readable, and it is the only instance.
From zeek:auxil/c-ares/src/lib/ares_parse_caa_reply.c:135:
if (caa_curr->plength <= 0 || (int)caa_curr->plength >= rr_len - 2)
Replace with:
if (caa_curr->plength == 0 || (int)caa_curr->plength >= rr_len - 2)
However, I am inclined to ignore s/<=0/=0/ repairs, in that <= 0 is (a) harmless (b) makes the code more resillient (in case pLength becomes signed) and (c) easier to read.
Ignore this one, as it is in sqlite3.c. (In theory, we remove expressions that test negativity.)
From git:builtin/receive-pack.c:692:
const char *retval = NONCE_BAD;
...
retval = NONCE_OK;
This is similar to an EXP33-C hazard:
Consider this code:
int x; // uninitialized
...
x = 5;
Solution: If there exists a code path where x
is read w/o being initialized, then x
should be initialized when declared. This means that the EXP33-C alert is a true positive and MSC12-C would be a false positive.
On the other paw, if there is no such code path, then that means that after x
is declared (whether initialized or not) it is always initialized before being read. This means that EXP33-C is a false positive.
But MSC12-C should also be a false positive. This is because initializing a variable to a known good value is a good defense-in-depth strategy, and makes the code less brittle, in case the code is later modified to read the value without writing to it first.
In conclusion redundant initializations should never be fixed, but should be treated as false positives for MSC12-C.
This is the only message produced by rosecheckers...the others are all produced by cppcheck
From git:builtin/am.c:313
assert(!state->author_name);
Any MSC12-C alert inside an assert statement is a false positive. (perhaps we should fix rosecheckers)
Likewise MSC12-C alerts inside git's error(_(
function (macro) are also false positives.
From studying at these alerts, we have identified the following common problems:
Prominence: Not observed Example:
if (cant_happen) {
#ifdef FOO
...
}
Solution: Should be already dealt with in our conditional macro detection scripts (which might need to be more robust). See our conditional compilation document for more information.
Prominence: Not observed Example:
#ifdef FOO
static int flag = true;
#else
static int flag = false;
#endif
if (flag) {
}
Notes: Probably not detectable in the general case. Solution: We may let the user decide this. There may be alert categories where the repair should be disabled by default. But MSC12-C does not seem to suffer from this problem. (Perhaps MSC07-C alerts may be more susceptible).
In our alert data format and test documents, we describe our process of selecting alerts to try to repair. Selection is indicated by the randomness
field in the alerts.json
test suite data files. As those document indicate, the selection should be random.
But the alerts mapped to MSC12-C had to undergo different selection than alerts for the other CERT guidelines. Consequently, entries for the randomness
field have a different set of meanings and selection methods for the MSC12-C test suite data than they do for the other test suite data. For the MSC12-C test suite data, we initially had a disappointing set of test results (and of randomly selected alerts). To produce more-useful test data, we created a new sample set, reusing some specially-selected alerts from our initial set of randomly-selected alerts, and selecting other new target alerts to adjudicate.
The following table indicates the value of the randomness
field for MSC12-C alerts:
Label | Meaning |
---|---|
disqualified random | Originally selected but no longer included due to being in a now-excluded category |
random and sample | Originally selected and remains included |
sample | Not previously selected, but now part of the sample testing |
disqualified sample | Originally selected but later disqualified |
random | Originally selected, but no longer included due to being "boring" |
An alert with randomness=random
is in a targeted alert category, but one (or more) of the original randomly-selected alerts was chosen for the sample testing set, and this alert was not.
Generally, what we did was to filter our initial set of disappointing randomly-selected alerts to remove certain types of alerts. Then, we selected representatives of particular categories of alerts we decided to include. We started this first by checking if there was already at least one such alert in the randomly-selected group. If yes, then at least one of those previously-randomly-selected alerts was reselected. If not, then an alert was selected from the set of target alerts for that target category.
Specifically, for each of the MSC12-C test data files, below we provide our analysis of the alerts in the file, then describe how the sample set was selected, including information about excluded alert categories and targeted alert categories.
103 of the 131 alerts are false and disqualified, and about unused labels. (Label '.*' is not used
)
By "unused labels", we mean there are no goto
statements directed to the labels. The code was produced by yacc or bison and provided many goto
labels in order to simulate a state diagram. While we could have tried to remove these labels, we decided that this did not constitute an improvement to the code, and thus we labeled these to be false positives.
This exemplifies a potential hazard of repairing a CERT recommendation: Unlike CERT rules, the nature of recommendations implies that complying with the recommendation might not always improve the code. In theory, we could program Redemption to remove these labels.
For the sample dataset, we selected and adjudicated 5 alerts that are NOT about an unused label. (There were already 2 randomly-selected and adjudicated alerts that were not about an unused label.)
16 of the 25 alerts are about an unused argc
in:
argc = parse_options(argc, argv, ...);
The messages were Assignment of function parameter has no effect outside the function.
For the sample dataset, we:
- selected 1 of these 16 for the sample set. (The randomly-selected alerts already included 4.)
- selected and adjudicated 4 more alerts from the remaining 9 alerts out of the 25 alerts. (The randomly-selected alerts already included 1 from this set.)
That provided 5 "diverse" alerts for the sample set (which includes some randomly-selected alerts).
We decided to ignore all rosecheckers MSC12-C alerts and not to try to repair them or analyze their code. We believe they are all un-repairable. Many of them complain of expressions inside macros (Git and Zeek use multi-line macros extensively), complicating the analysis. We decided that even if some small number of these alerts are true positives, the high false-positive rate means there is too little value in these alerts to justify repairing them.
Our analysis of the rosecheckers source code's MSC12-C checker reveals the following:
rosechecker's MSC12-C alert only fires on 'expression statements' (a statement that is a full expression). To be flagged, this expression statement must not be the last statement in a basic block. Also, the expression must not be the following types: Assignment, Conditional, Pointer Dereference, Function Call, delete, addition, subtraction
.
Details are provided below of our original analysis of rosecheckers MSC12-C alerts for zeek and git.
366 of the 480 alerts have a valid (nonzero) line number. 105 have line number 0, so we consider them "disqualified" and excluded them. 9 alerts were unknown and uncategorized, at that time.
281 of the 366 alerts are about an assert
function (or a macro), including variants like ASSERT_UNUSED()
.
85 of the 366 alerts remained beyond that as candidates for the sample set, since they have a valid line number and the line has no assert function.
For the sample dataset, we:
- selected 1 of these 281 (there were 5 such alerts in the original randomly-selected set),
- selected 4 more alerts from the remaining 85 (there were 0 such alerts in the original randomly-selected set)
331 of the 721 alerts are about the standard assert()
macro.
331 of the 721 alerts are about an error()
or error_errno()
function.
17 of the 721 are about a VERIFY_CI()
macro
42 of the 721 alerts are about something besides assert
s or error
s or VERIFY_CI
For the sample dataset, we:
- selected 1 of the 331 alerts about
assert
(there was 1 such alert in the randomly-selected set) - selected 1 of the 331 alerts about
error
(there were 4 such alerts in the randomly-selected set) - selected 1 of the 17 alerts about
VERIFY_CI
- selected 2 more alerts (ideally about distinct problems)