October 2006 – Brendan Eich

Mozilla has gone from open source whipping boy in 1999 to open source poster child since 2004, due in large part to the success of Firefox. For that we can thank some amount of luck with our “timing the market” — the browser market that no one knew existed, until it was suddenly clear in the summer of 2004 that Microsoft had dropped the ball and Firefox was picking it up.

But we didn’t just get lucky. Putting the regular user first with Firefox’s default UI and interaction design, building on real-world web standards instead of start-from-zero fantasy web standards, persevering against bug adversity over time, and calculated (but secondary — the apps came first) work on “the platform”, especially to aid web developers and power users via add-ons, were all necessary to our success.

In particular, Firefox couldn’t have happened without Mozilla 1.0, a release milestone about which I wrote a requirements manifesto in early 2001. Mozilla 1.0 came out in June of 2002, and finally won back reputation lost due to Netscape 6 and the big rewrite. It set the stage for Firefox, Thunderbird, and other XUL applications.

For Mozilla 1.0, the architectural die was cast in late 1998 and 1999, so the process to reach 1.0, with its stable set of APIs on which most of a browser could be built, consisted mainly of the hard work of finishing. Since then, much of the code base has been revised in some way, but always incrementally. No “big bangs.”

The current CVS trunk, which will become the Mozilla 1.9 stable branch some time in the first half of next year, contains significant rendering rearchitecture, along with lots of other important work I won’t go into right now. The graphics work and the reflow refactoring are perhaps the most aggressive changes conceivable in the Mozilla milestone process. That process, more or less still the same as described in the “mozilla 1.0 manifesto”, depends crucially on community QA and patch contributions, which must converge on a product alpha/beta/release cycle to have enough real-world testing to claim meaningful code coverage. This cycle continues in 1.9, on which Firefox 3 will be based.

So Mozilla is a large, mature, fairly conservatively maintained open source code base. What else is new?

Lots: the web as a system of interoperating and open standards is under renewed assault. Browser competition is heating up, which is great except where it portends innovations locked into proprietary stacks. The mobile device space is growing dramatically, yet with power storage and dissipation limits temporarily repealing Moore’s Law. Worse, carriers and handset makers control all software most users get, leaving little room for the kind of choice that led to Firefox. Let’s save the mobile thought for another time.

What about desktop systems? In four years, desktops will have polycore CPUs, with (still specialized but increasingly useful) teraflops in the GPU. Multimedia and 3D content will become more tractable by mere mortals without expensive, complicated tools, or there will be a lot of wasted power and bandwidth. Never mind 3D — just better search, better text rendering, usable video, and all kinds of user-oriented optimization tasks should soak up those hardware threads’ cycles. Whatever happens, it seems a shame to leave all that capacity to games, other proprietary software, and closed content formats.

So what should we do in Mozilla-land about this?

First, we should do Mozilla 2, targeting 2008, “after” Mozilla 1.9 — but we should start working on it very soon, next month. We should not follow 1.9 with 1.10 and run headlong into the law of diminishing returns. Second, after Mozilla 2 is well under way, we will worry about those super-duper CPUs and GPUs. I have some thoughts, but I’ll save them for another post.

Mozilla 2 means among other things a chance to break frozen API compatibility, which removes constraints on the current architecture and allows us to eliminate old APIs and their implementations, renew and improve the APIs and code we want to keep, and realize significant runtime and code size wins. For instance, we can get rid of RDF, which seems to be the main source of “Mozilla ugliness” humorously decried by Steve Yegge.

For Mozilla 2, we will have a JIT-oriented JavaScript VM (details soon) that supports the forthcoming ECMAScript Edition 4 (“JS2”) language. Among the desirable characteristics of this VM will be a conservative, incremental garbage collector (GC). If it makes sense, we can use this GC module to manage DOM object memory instead of using XPCOM reference counting. We can use its conservative scanning code to assist in cycle collection. And we can JIT calls directly into DOM glue code entry points (provided no JS mutation has overridden a method property value), bypassing the powerful but relatively slow typelib-based dispatching machinery of XPConnect.
This will kick Ajax performance in Firefox up a notch or three.

This is just a start.

For Mozilla 2, because we can break API compatibility where it makes sense to do so, we can and will provide better APIs, “on the outside” of Gecko — and remove XPCOM API boilerplate “on the inside”. We can translate old C++ portability veneer into standard C++ where doing so does not cost us portability, performance, or correctness. We can even switch to C++ exceptions if doing so wins in both code size and runtime performance in a fair contest.

So instead of code like this:

PRBool
nsXULDocument::OnDocumentParserError()
{
// don't report errors that are from overlays
if (mCurrentPrototype && mMasterPrototype != mCurrentPrototype) {
nsCOMPtr<nsIURI> uri;
nsresult rv = mCurrentPrototype->GetURI(getter_AddRefs(uri));
if (NS_SUCCEEDED(rv)) {
PRBool isChrome = IsChromeURI(uri);
if (isChrome) {
nsCOMPtr os(
do_GetService("@mozilla.org/observer-service;1"));
if (os)
os->NotifyObservers(uri, "xul-overlay-parsererror",
EmptyString().get());
}
}
return PR_FALSE;
}
return PR_TRUE;
}

you’ll see code like this:

bool
XULDocument::OnDocumentParserError()
{
// don't report errors that are from overlays
if (mCurrentPrototype && mMasterPrototype != mCurrentPrototype) {
IURI *uri = mCurrentPrototype->GetURI();
if (IsChromeURI(uri)) {
GetObserverService()->NotifyObservers(uri, "xul-overlay-parsererror");
}
return false;
}
return true;
}

(I’ve taken the liberty of supposing that we can lose the ns prefix from interfaces in Mozilla 2’s C++ bindings, using a proper C++ namespace [not shown] instead.) I should add that we will not break API compatibility gratuitously; some of our APIs are fine, thank you. “No big rewrites” in the sense that Joel writes about (throw it all away and start over). And much implementation code will be kept, but transformed.

Now, we can’t hope to achieve these code transformations by hand. For one thing, much of our code is not exception-safe. This is where Oink comes in. It would take an army of OCD-savants typing at 120wpm a long time to convert all of Mozilla’s hand-crafted allocate/free and lock/unlock code patterns to RAII, but with help from Oink’s front end Elsa, we can automate the task of rewriting the source. And (with some flow-sensitive work on the Oink side) we should be able to check that we’ve converted every last exception-unsafe case to use RAII.

Oink, or really its friend Cqual, is also good for more involved static code analysis than finding patterns to rewrite and ensuring exception safety. With the right qualifiers, we can enforce higher level safety properties such as “no format string comes from the network”, or “chrome (UI) code must sanitize content data that flows into it”.

I think we should extend the Oink framework to include JS2 in the analysis, for total data flow. With the type system for JS2, we’ll finally have type soundness on which to build higher-level model checkers.

Building this Oink-based tooling won’t be done in a day, and we should be careful not to overinvest if simpler techniques suffice. But from what Roc, Graydon and I have seen, Oink looks quite promising. So Mozilla 2 is not just about simplifying APIs, removing old code and XPCOM overhead, and making the source code more approachable. It’s also about material improvements to program security, which is inherently weak in all browsers implemented in languages such as C and C++. Security requires defense at every level of abstraction, from high-level JS that enforces confidentiality properties, down to buffer manipulations that should be provably memory-safe.

There is no silver bullet. Virtual machines are, as Michael Franz and others point out, a great way to reduce the size of one’s TCB and track information flow, supporting richer security models and policies — safe “mashups in the browser”.

We will optimize the JS2 VM aggressively in the Mozilla 2 timeframe. But we can’t switch to “managed C++” (neither can Microsoft, notice) for any near term competitive browser, nor is JS2 the right language for the low-level systems programming that lies on critical rendering and interaction paths in any browser.

We will combine approaches, moving as much “middleware” C++ as we can, when it’s not on any critical path and it uses only safe pointers, into JS2. Again Oink/Elsa can help to automate this translation. I envision a checker that first scores all code against a series of costs of translating from C++ to JS2, identifying the low-hanging fruit statically. Profiling results for common user-level tasks and page load tests should be used to veto any low-cost judgments that might translate C++ that’s otherwise ripe for JS2, but that actually ends up dominating a critical path.

Whether we invest in C++ to JS2 automation, do it by hand, or use a hybrid of Oink-based and by-hand techniques, I don’t want to decide right now. But as with deCOMtamination, we have an opportunity to work smarter by investing in tools instead of spending programmer talent in repetitive and mostly menial labor on a very large scale. The C++ that we can’t move to JS2 can still be made more secure by combining the other key Mozilla 2 levers: conservative GC (which removes free memory read hazards), Oink-based checkers (with appropriate new qualifiers annotating our source), and dynamic 24-by-7 valgrind plus gcov tinderboxes running automated tests.

Finally, we will run at reduced privilege when we can, and otherwise use OS-enforced security mechanisms that are sound and well-documented. We should put most plugins out of process while we’re at it.

I haven’t really begun to talk about further graphics work (3D canvas) and security models (those safe browser-based mashups). Nor have I begun to discuss Firefox 4, the likely version to be based on Mozilla 2, except to say that we will keep an unbranded Firefox version building at all times as Mozilla 2 is developed. Mainly I am focusing on the Mozilla platform, on which Firefox, its add-ons, and other apps all stand or fall.

So the goals for Mozilla 2 are:

Clean up our APIs to be fewer, better, and “on the outside.”
Simplify the Mozilla codebase to make it smaller, faster, and easier to approach and maintain.
Take advantage of standard language features and fast paths instead of XPCOM and ad hoc code.
Optimization including JIT compilation for JS2 with very fast DOM access and low memory costs.
Tool-time and runtime enforcement of important safety properties.

Oh, and isn’t it time that we get off of CVS? The best way to do that without throwing 1.9 into an uproar is to develop Mozilla 2 using a new Version Control System (VCS) that can merge with CVS (since we will want to track changes to files not being revamped at first, or at all; and we’ll probably find bugs whose fixes should flow back into 1.9). The problem with VCSes is that there are too many to choose from now. Nevertheless, looking for mostly green columns in that chart should help us make a quick decision. We don’t need “the best” or the “newest”, but we do need better merging, branching, and renaming support.

Last point: much of what I wrote here, much of my work in Mozilla, is focused on the platform, yet I noted above that we always put apps such as Firefox first, and do not claim to be “a platform” for everyone. In spite of this, people are building apps such as Songbird on top of XULRunner.
So what are we, platform or app? The answer is “both, in a virtuous cycle”. Because we serve users first and most broadly, but also developers at several layers — Web, XUL, C++ — we have both undersold and under-invested in the C++ layer. In the long run, neglecting the C++ codebase puts the app at risk. So with Mozilla 2, we’re going to balance the books.

This is enough for now; detailed roadmap and wiki work will follow. I’m intent on moving the Mozilla codebase to a true next level: cleaner, leaner, safer, with better APIs and C++ bindings, and very fast page-load and DOM performance. While this is easier said than done, it is palpably within our reach for 2008. Let’s do it.