Assertions should be fatal

Thanks to Dawson Engler for helping get us connected to Coverity last Fall. Dan Veditz and I have done several scans of Mozilla sources (Firefox branch and trunk, Thunderbird with Calendar enabled) using Coverity’s SWAT static analysis toolset.

(As an aside: I am a big fan of the Meta Compiler approach, and I have hopes of helping extend it to the Mozilla-specific domains of HTML, XUL, CSS, the DOM, and JavaScript. Some day, practical programming languages will get beyond out-Java’ing Java or out-C++’ing C++ and let us declare our invariants, instead of trying to assert them with runtime checks that duplicate and invert logic, and that are compiled out of production code.)

The good news is that our nominal error rates are respectable at first glance: as good as or better than other large open source projects. The results show many trivial redundant null checks, missing or inconsistent null checks, and the like. These numerous bugs will be filed and fixed in batches, and we haven’t classified them all yet, so the bug-filing won’t be immediate. These are low priority bugs.

There were fewer more serious errors caught, including dead code, but the most worrisome problems visible to the static analyzers were not obviously bugs. Rather, they were cases where an index into an array would be out of bounds only if assertions (NS_ASSERTION, etc.) were not seen as fatal when botched.

The toolset can be taught that NS_ASSERTION is fatal (exits the control flow graph), just as <assert.h>’s assert macro is understood by default to be fatal. But thanks to some bad ancient history in the Mozilla project, for most builds, NS_ASSERTION is not fatal!

(The bad history dates from 1999 and involved various people, mostly Netscape employees, all “trying to get their jobs done” in the face of bogus assertions from some of their fellow hackers. Instead of forcing bad assertions to be fixed, at some opportunity/social/political cost to oneself, it was too easy to go along with the minority view that “assertions should just be warnings”.

To be fair, some wanted all assertions to be combined via newer macros with run-time checks that would not compile out of production builds, but that went too far: performance-critical code with a limited set of trusted callers should not have to run-time check all indices.

Anyway, I’m sorry I gave in and went along with the assertions-as-warnings flow. That was a decision made in haste and repented at leisure!)

So back to the present. Given assertions-as-warnings, the problem becomes: do we have enough test coverage with DEBUG builds to be sure that developers will actually botch assertions whose invariants are being violated by a rare bad caller or broken data dependency? And of those few developers, how many will actually notice the warning messages in the cluttered standard output or standard error of the Mozilla-based program they’re testing? And of those, how many will actually do anything about fixing the broken assertions?

Anyone committed to reality has to believe that the odds ratios multiply into too small a fraction for us to be confident that assertions are effective “path killers” for the static analyzers to respect.

I go further: anyone committed to software quality should want fatal assertions. Assertions are proof-points, to oneself and to others reading one’s code. They are not idle wishes, or doubtful questions posed to the software gods. If your code has a bogus assertion, you should want to know why it botched, and how to fix it. You should want to keep it in a fixed form, rather than remove it, if at all possible.

Dan and I pored over the several dozen potential static and dynamic overrun errors that the Coverity tools found, and at least for the ones in Firefox, we convinced ourselves that the callers were well-behaved. So again, based on our four eyeballs and fallible brains, we believe that the tools found nothing overtly bad or exploitable.

But, we must fix this historic assertions-are-warnings botch.

Just making assertions fatal now will turn tinderboxes on Linux and Windows, at least, bright orange.

I call that a good start, and then we can close the tree, divide the labor among all hands on deck at, and fix or remove the bogus assertions. We could do this right at the start of the 1.9 alpha cycle. You may well object that this will be too disruptive. If so, please comment here proposing a better plan that actually fixes the problem in a finite number of days. We shouldn’t put this off beyond the start of the 1.9 cycle.