Paren-Free

The tl;dr version

Krusty the ventriloquist

<Krusty>So, you kids want CoffeeScript, do you?</Krusty>

<script type="harmony">   // placeholder MIME type

if year > 2010 {
    syntax++
}

for i in iter {           // i is a fresh let binding!
    frob(i)
}

while lo <= hi {
    let mid = (lo + hi) / 2
    // binary search blah blah blah
}

... return [i * i for i in range(n)]   // array comprehension

</script>

No parentheses around control structure “heads”. If Go can do it, so can JS. And yes, I’m using automatic semi-colon insertion (JSLint can suck it).

There are open issues (are braces required around bodies?) but this is the twitter-friendly section. More below, after some twitter-unfriendly motivation.

Background

We had a TC39 meeting last week, graciously hosted at Apple with Ollie representing. Amid the many productive activities, Dave presented iterators as an extension to proxies.

The good news is that the committee agreed that some kind of meta-programmable iteration should be in the language.

Enumeration

Proxies had already moved to Harmony Proposal status earlier this year, but with an open issue: how to trap for (i in o) where o is a proxy with a huge (or even an infinite — rather, a lazily created and indefinite) number of properties.

js> var handler = {
    enumerate: function () { return ["a", "b", "c"]; }
};
js> var proxy = Proxy.create(handler);
js> for (var i in proxy)
    print(i);
a
b
c

The proxy handler’s fundamental enumerate trap eagerly returns an array of all property names “in” the proxy, coerced to string type if need be. Each string is required to be unique in the returned array. But for a large or lazy object, where the trapping loop may break early, eagerness hurts. Scale up and eagerness (never mind the uniqueness requirement) is fatal. TC39 agreed that a lazy-iteration derived (optional) trap was wanted.

js> var handler = {
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> for (var i in proxy) {
    if (i == 3) break;
    print(i);
}
0
1
2

The iterators strawman addressed this use-case by proposing that for-in would trap to iterate if present on the handler for the proxy referenced by o, in preference to trapping to enumerate.

js> var handler = {
    enumerate: function () { return ["a", "b", "c"]; },
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> for (var i in proxy) {
    if (i == 3) break;
    print(i);
}
0
1
2

To avoid switching from enumeration to iteration under a single for-in loop, once the loop has started enumerating a non-proxy, if a proxy is encountered on that object’s prototype chain, the prototype proxy’s enumerate trap will be used, not its iterate trap.

js> var handler = {
    has: function (name) { return /^[abc]$/.test(name); },
    enumerate: function () { return ["a", "b", "c"]; },
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> var obj = Object.create(proxy);
js> for (var i in obj) {
    print(i);
}
a
b
c

Enumeration walks the prototype chain, and this is why a proxy might want both enumerate and iterate.

Iteration

What all this means: you can implement Pythonic iterators with proxies, and return a sequence of arbitrary values to a for-in loop that’s given the proxy directly (not on a prototype chain of a non-proxy object, as noted above). A large/lazy proxy would trap iterate instead of enumerate and return string keys, but other iterator-proxies could return Fibonacci numbers, integer ranges, or whatever the proxy implementor and consumer want. This was an intended part of the package deal.

js> function fib(n) {
    var i = 0;
    var a = 0, b = 1;
    return {
        next: function () {
            if (++i > n)
                throw StopIteration;
            [a, b] = [b, a + b];
            return a;
        }
    };
}
js> var handler = {iterate: function () { return fib(10); } };
js> var proxy = Proxy.create(handler);
js> for (var i in proxy)
    print(i);
1
1
2
3
5
8
13
21
34
55

(JS1.7 and above, implemented in both SpiderMonkey and Rhino, prefigured this proposal by supporting an unstratified iteration protocol based on Python 2.5. This JS1.7 Iterator extension is fairly popular in spite of some design flaws, and from the exercise of implementing and shipping it we’ve recognized those flaws and fixed them via proxies combined with the iterators strawman.)

The bad news is that the committee did something committees often do: try to compromise between divergent beliefs or subjective value theories.

In this case the compromise was based on the belief that for-in should not become the wanted meta-programmable iteration syntax. The argument is that for-in must always visit string-typed keys of the object, or at least whatever strings the accepted proxy enumerate trap returns in an array. If a Harmony proxy could somehow be enumerated by pre-Harmony for-in-based code, non-string values in the iteration might break the old code.

(The counter-argument is that once you let the proxy handler trap enumerate, a lot can change behind the back of old for-in-based code; also, enumeration is an underspecified mess. But these points do not completely overcome the objection about potential breakage in old code.)

Fear of Change

To fend off such breakage, we could make for-in meta-programmable only in Harmony code — any loop loaded under a pre-Harmony script tag type would not iterate a proxy.

This opt-in protection probably does not resolve the real issue, which is whether syntax can have its semantics changed much (or at all) in a mature language such as JS, which is being evolved via mostly-compatible standard versions in multi-year cycles.

I acknowledged during the meeting that we would not make progress without trying to agree on new syntax. This was too optimistic but I wanted to discover more about the divergent beliefs that made extending for-in via proxies a showstopper.

A quick whip-round the room with an empty cup managed to net us loose change from latter-day Java and C++:

for (var i : x)   // or let i, or just i for any lvalue i
     ...

as our meta-programmable “new syntax”. Bletch!

Not to worry. For-colon is probably not going to fly for some reasons I raised on es-discuss, but it also should die a deserved death as a classic bad compromise forged in the heat of a committee meeting.

The difficulty before us is precisely this how-much-to-change question.

ES5 strict mode already changes runtime semantics for existing syntax (eval of var no longer pollutes the caller’s scope; arguments does not alias formal parameters; a few others), for the better. Unfortunately, developers porting to "use strict" must test carefully, since these are meaning shifts, not new early errors.

My point is that syntactic and semantic change has happened over the last 15 years of JS, it is happening now with ES5 strict, and it will happen again.

Change is Coming

We believe that future JS, the Harmony language, must include at least one incompatible change to runtime semantics: no more global object at the top of the scope chain. Instead, programmers would have lexical scope all the way up, with the module system for populating the top scope. By default, the standard library we all know would be imported; also by default in browsers, the DOM would be there.

Can the world handle another incompatible change to the semantics of existing syntax, namely the for-in loop?

There are many trade-offs.

On the one hand, adding new syntax ensures no existing code will ever by confused, even if migrated into Harmony-type script. On the other, adding syntax hurts users and implementors in ways that combine to increase the complexity of the language non-linearly. The chances for failure to standardize and mistakes during standardization go up too.

What’s more, it will be a long time before anyone can use the new syntax on the web, whereas for-in and proxies implementing well-behaved iterators could be used much sooner, with fallback if (!window.Proxy).

Utlimately, it’s a crap shoot:

  • Play it safe, enlarge the language, freeze (and finally standardize, ahem) the semantics of the old syntax, and try to move users to the new syntax? or
  • Conserve syntax, enable developers to reform the for-in loop from its enumeration-not-iteration state?

All this is prolog. Perhaps the “play it safe” position is right. And more important, what if new syntax could be strictly more usable and have better semantics?

New Clothes and Body

Here’s my pitch: committees do not design well, period. Given a solid design, a committee with fundamental disagreements can stall or eviscerate that design out of conservatism or just nay-saying, until the proposal is hardly worth the trouble. At best, the language grows larger more quickly, with conservative add-ons instead of holistic rethinkings.

I’m to blame for some of this, since I’ve been playing the standards game with JS. Why not? It seems to be working, and the alternatives (ScreamingMonkey, another language getting into all browsers) are nowhere. But I observe that even for Harmony, and notably for ES5, much of the innovation came before the committee got together (getters, setters, let, destructuring). Other good parts of ES5 and emerging standards came from strong individual or paired designers (@awbjs, @markm, @tomvc).

And don’t get me wrong: sometimes saying “no” is the right thing. But in a committee tending a mature but still living programming language, it’s too easy to say “no” without any “but here’s a better way” follow-through. To be perfectly clear, TC39 members generally do provide such follow-through. But we are still a committee.

I want to break out of this inherently clog-prone condition.

So, given the concern about changing the meaning of for-in, and the rise of wrist-friendly “unsyntax” (Ruby, Python, CoffeeScript) over the shifted-keystroke-burdened C-family syntax represented by JS, why not make opting into Harmony enable new syntax with the desired meta-programmable semantics?

Paren-Free Heads

It would be a mistake to change syntax (and semantics) utterly. VM implementors and web developers having to straddle both syntaxes would rightly balk. There will be commerce between Harmony and pre-Harmony scripts, via the DOM and the shared JS object heap. But can we relax syntactic rules a bit, and lose two painfully-shifted, off-QWERTY-home-row characters, naming () in control structure heads?

for i in iter {
    // i is a value of any type;
}

Here’s your new syntax with new semantics!

We can simplify the iterator strawman too. If you want to iterate and not enumerate, use the new syntax. If you want to iterate keys (both “own” and any enumerable unshadowed property names on prototypes), use a helper function:

for i in keys(o) {
    // i is a string-typed key
}

The old-style for (var i in o)... loop only traps to enumerate. Large/lazy proxies? Use the new for k in keys(o) {...} form.

Are the braces required? C has parenthesized head expressions and unbraced single-statement bodies. Without parens, a C statement such as

if x
    (*pf)(y);

would be ambiguous (don’t try significant newlines on me — I’ve learned my lesson :-/). You need to mandate either parens around the head, or braces around the body (or both, but that seems like overkill).

So C requires parens around head expressions. But many style guides recommend always bracing, to ward off dangling else. Go codifies this fully, requiring braces but relieving programmers from having to parenthesize the head expression.

I swore I’d never blog at length about syntax, but here I am. Syntax matters, it’s programming language UI. Therefore it needs to be improved over time. JS is overdue for an upgrade. So my modest proposal here is: lose the head parens, require braces always.

You could argue for optional braces if there’s no particular ambiguity, e.g.

if foo
    bar();

But that will be a hard spec to write, a confusing spec to read, and educators and gurus will teach “always brace” anyway. Better to require braces.

Pythonic significant whitespace is too great a change, and bad for minified/crunched/mangled web scripts anyway. JS is a curly-brace language and it always will be.

Implicit Fresh Bindings

Another win: the position between for and in is implicitly a let binding context. You can destructure there too, but whatever names you bind, they’ll be fresh for each iteration of the loop.

This allows us to solve an old and annoying closure misfeature of JS:

js> function make() {
    var a = [];
    for (var i = 0; i < 3; i++)         a.push(function () { return i; });     return a; } js> var a = make();
js> print(a[0]());
3
js> print(a[1]());
3
js> print(a[2]());
3

Changing var to let in the C-style three-part for loop does not help.

But for-in is different, and in Harmony we (TC39) believe it should make a fresh let binding per iteration. I’m proposing that the let be implicit and obligatory. And of course the head is paren-free, so the full fix looks like this:

js> function make() {
    var a = [];
    for i in range(3) {
        a.push(function () { return i; });
    }
    return a;
}
js> var a = make();
js> print(a[0]());
0
js> print(a[1]());
1
js> print(a[2]());
2

Part of the Zen of Python: “Explicit is better than implicit.” Of course, Python has implicit block-scoped variable declarations, so this is more of a guideline, or a Zen thing, not some Western-philosophical absolute ;-). Having to declare an outer or global name in Python is therefore an exception, and painful. Like the sound of one hand slapping your face.

Of course JS shouldn’t try to bind block-scoped variables implicitly all over the place, as Python does; once again, that would be too great a change. But implicit for-in loop let-style variable declaration is winning both as sensible default, and to promulgate the closure-capture fix.

Comprehensions

When we implemented iterators and generators in JS1.7, I also threw in array comprehensions:

js> squares = [i * i for (i in range(10))];
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
js> odds = [i for (i in range(20)) if (i % 2)]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

At first I actually implemented paren-free heads for the for-in parts in the square brackets, but when I got to the optional trailing if I balked. Too far from JS, and in practical terms, a big-enough refactoring speed-bump for anyone sugaring a for-in loop as a comprehension. But paren-free Harmony rules:

js> squares = [i * i for i in range(10)];
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
js> odds = [i for i in range(20) if i % 2]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

The same win applies to generator expressions.

Thanks

Thanks to TC39 colleagues for their general excellence — we’re a committee but I’ll try not to hold that against any of us.

Thanks especially to @AlexRussell and @arv, who at last week’s meeting brought some attitude about improving syntax and semantics in Harmony that I fought at first (for fear of the committee opening up all design and compatibility constraints and failing to reach Harmony). Their attitude stimulated me to think outside the box, and outside the parens.

Some of you may be thinking “this is crazy!” Others of you will no doubt say “more! more!” I have some other thoughts, inspired by TC39 conversations, that could help make Harmony a better language without it being over-compatible warm beer, but I’ll save them for another post.

My point here is not to rush another syntax strawman through TC39, but to stimulate thinking. I’m serious about paren-free FTW, but I’m more serious about making Harmony better through judicious and holistic re-imaginings, not only via stolid committee goal-tending.

/be

Proxy Inception

After marinating for a few months, my JSConf.eu slides:

(Mobile/No-Flash version)

These are based directly on the excellent work of Mark Miller and Tom Van Cutsem, who developed the harmony:proxies proposal that is now approved for the next major iteration of the JavaScript standard (ECMA-262, probably edition 6 but we’ve learned the hard way not to number prematurely — anyway, approved for “ECMAScript Harmony” [my original Harmony-coining post]).

Harmony Proxies are already prototyped in Firefox 4 betas, thanks to Andreas Gal.

When I reached the “meta-level shifting” slide:

meta-level-shifting

someone in the audience tweeted about how my talk was like Inception (github-sourced simulator). Meta-meta dreams within dreams (warning: meta-to-the-4th-shifting leads to Limbo).

The money-shot slide in my view is:

selective-interception

which depicts how Proxies finally level the playing field between browser implementors using burned-into-browser-binaries C++ and web developers using downloaded JS.

It’s hard to overstate how this matters. The DOM (IE’s for sure, but all of them, back to the original I hacked in Netscape 2) suffers from its “VM territory” privileges, which have been abused to make all kinds of odd-ball “host objects”. Proxies both greatly reduce the weirdness of host objects and let JS hackers emulate and even implement such objects.

Novice JS hackers and all JS programmers happy at the base level of the language need not worry about the details of Proxies. Proxies cannot break the invariants that keep the JS lucid dream unfolding on stage. Specifically, you can’t hack traps onto an existing non-proxy object — you can only create a new proxy and start using it afresh, perhaps passing it off as a preexisting kind of object that it emulates [1].

But when you need to go backstage of the dream and change the rules without breaking the dreamer’s illusion, by interceding on every get, set, call, construct, etc., then Proxies are indispensable.

Firefox 4 is using Proxies to implement all of its security wrappers.

Long-time SpiderMonkey fans will ask “why no __noSuchMethod__” (or: why not also have a noSuchMethod or invoke trap, or a flag to get telling when it is trapping a get for the entire callee part of a call expression)? The short answer is to keep the set of handler traps minimal in terms of JS semantics (modulo scalability), which do not include “invoke-only methods”. The longer answer is on es-discuss.

/be

[1] Inside the engine, a clever trick from Smalltalk called becomes is used to swap a newborn Proxy and an existing object that has arbitrarily many live references. Thus an object requiring no behavioral intercession can avoid the overhead of traps until it escapes from a same-origin or same-thread context, and only if it does escape through a barrier will it become a trapping Proxy whose handler accesses the original object after performing access control checks or mutual exclusion.

The local jargon for such object/Proxy swapping is “brain transplants”.

Should JS have bignums?

jwz finally learns some JS and picks at an old scab that had almost healed. I reply in various comments. I include some little-known, kind-of-funny (not always ha-ha funny) history along the way to set several records straight.

The issue before us now is whether to add value types to JS, perhaps by extending proxies, so you can implement non-reference-semantics objects with operators (because without operators, what’s the point?); or just add bignums; or do nothing.

Comments welcome (to keep up with Akismet-gaming spammers — anyone have a better WP plugin to stop comment spam?).

/be

A Minute With

A Minute With Brendan is going great. I wanted to post a quick link to it for those of you who may have missed it. Good use of HTML5 <audio> too. Thanks to @Voodootikigod for producing it.

The latest episode is about ES5 strict mode, stressing the importance of verifying that "use strict"; does what you intend. Strict mode implementation in Firefox is tracked here; it’s getting close to done.

Word from a Microsoft representative at Velocity is that “if Firefox supports strict mode, IE9 will too.” Here’s hoping! If not, start writing letters or something ;-).

/be

Static Analysis FTW

One of the best “researchy” investments we’ve made at Mozilla over the last few years has been in static analysis, both for C++ (including Taras Glek‘s excellent Dehydra, with which you write the custom analysis in JS) and now for JS itself:

DoctorJS is based on Dimitris Vardoulakis‘s work this summer implementing CFA2 for JavaScript at Mozilla. Dimitris is a student at Northeastern University under Olin Shivers (who is in fact a super-hero, not a super-villain as his name might suggest). Dimitris is one of many awesome interns we’ve managed to recruit in recent summers.

Credit for web design and nodejs-wrangling go to Dave Herman and Patrick Walton.

What is static analysis (for those of you who skipped the wikipedia link in my first line)? You could think of static code analysis as running your program without actually executing it. A static analysis tool reads source code (or sometimes bytecode or machine code) and associates concrete and abstract values with program locations or slots on a stack built as it scans the code in straight-line or abstract fashion — but the tool does not actually take every branch, of course. Yet in spite of not running your program, a good static analysis can tell you non-obvious and sometimes amazing facts about your code.

This description doesn’t begin to do justice to the cleverness of the algorithms used to keep precision while not taking too long (or effectively forever), but I hope it gives a feel for how most analyses work. The real fun starts when you have higher-order functions (as JS has).

All static analyses are approximate, since only running your program will (in general) tell you what output it gives for a given input, or even whether it ever halts. But simple programs can be modeled with great precision, and even conservative static analyses that give up at some point can shed light by pointing out sketchy or buggy parts of your code. Windows’ Static Driver Verification, based on the SLAM project at MSR, is a notable success story.

It should be clear that an optimizing compiler does static analysis of several kinds in order to translate your source language into efficient instructions written in another language, perhaps physical machine code, virtual machine code for a “managed language runtime”, or another higher-level programming language (e.g. JS — see OpenLaszlo, GWT, Cappuccino, and my latest favorite, Skulpt, among many examples).

A compiler that checks types is obviously conservative (sometimes too conservative), in that it will call a program that fails to type-check an erroneous program, even if that program would have behaved well at runtime for all possible inputs. Dynamic languages are popular in large part because programmers can keep types latent in the code, with type checking done imperfectly (yet often more quickly and expressively) in the programmers’ heads and unit tests, and therefore programmers can do more with less code writing in a dynamic language than they could using a static language.

(For many common tasks; not all static languages are less expressive all the time; qualifications needed ad nauseum. I am not religious — I use static and dynamic languages all the time — and if there is one thing I’ve learned as a  programmer, it is that there is never one right language for all jobs.)

Static analysis, since it is approximate, is not going to solve every problem. But a clever analysis, making good use of all the information its author can program it to glean, can do a lot more than what conventional static languages’ type checkers can do.

For example, do you really need to write type annotations in your code for it to go fast? I’ve argued that you don’t, for example here (where I did argue for optional annotations of runtime “guards” or “contracts”, only at API boundaries — different beasts from “types” as the term is conventionally used in static languages). Let’s see how well DoctorJS does with some SunSpider (crypto-aes.js) code:

/*
 * AES Cipher function: encrypt 'input' with Rijndael algorithm
 *
 *   takes   byte-array 'input' (16 bytes)
 *           2D byte-array key schedule 'w' (Nr+1 x Nb bytes)
 *
 *   applies Nr rounds (10/12/14) using key schedule w for 'add round key' stage
 *
 *   returns byte-array encrypted value (16 bytes)
 */
function Cipher(input, w) {    // main Cipher function [§5.1]
    . . .

DoctorJS’s output includes this JSON fragment:

[{
    "name": "Cipher",
    "tagfile": "js",
    "addr": "/^function Cipher(input, w) {    \/\/ main Cipher function [§5.1]$/",
    "kind": "f",
    "type": "Array[number] function(Array[number], Array[Array[number]])",
    "lineno": "13"
},
. . .

From the type property we can see that DoctorJS figured out that that the Cipher function takes an array of numbers as its input parameter (this should be an array of bytes, but the analysis can’t yet figure that out — yet), and a second array of arrays of numbers named w (the “key schedule”). This by itself is pretty amazing.

The addr property gives a regexp to find Cipher in the crypto-aes.js source, which happens also to be a valid ctags (or jsctags) tag address.

The other properties should be self-explanatory.

The idea for DoctorJS came to me just over a week ago when I said to Dave Herman something like “we should take Dimitris’s analysis, put it on NodeJS, and make a twitter-ific web service with several formats returned by different apps, so that everyone can use the fruits of the analysis.”

Typical pointy-haired whiteboard operation by me :-P. Of course the details involved choosing to fork a process for each analysis request, since the analysis could take a while, and it is not written in “callback” or continuation-passing style (nor should it be: this concurrency vs. code simplicity trade-off is in general a false dilemma, and it’s an issue affecting Node and JS to which I’ll return soon); fixing bugs; configuring servers and proxies; and doing some fast site design. For all of this, my thanks to Dimitris, Dave, Patrick, and Zandr Milewski (who does Mozilla Labs IT).

DoctorJS is up now, and we hope people find it useful, not just a curiosity. Is there another output format for summary jsctags or type information you would prefer, which is much more concise than the JSON currently served (so it could be worthwhile adding an app to serve that other format, instead of you having to transcode)? Are there other results you would like to see, e.g. linking uses of variables to their definitions? Or even complete JSON-encoded abstract syntax trees? Did you find what look like bugs? Please let us know.

Completely separate from DoctorJS, Dehydra, and other static analysis services and tools: an online type inference mostly-static analysis for JaegerMonkey, from the always-awesome Brian Hackett. This looks promising, although it is lower priority at the moment than other JM work.

BTW, I talked to Chris Williams of JSConf fame about DoctorJS in the current episode of A Minute With Brendan. Yes, I’m now nerding out in public about hot JS and Mozilla topics every week for a minute or so. Stay tuned, it’ll be a regular thing.

/be

A Brief History of JavaScript

It’s good to be back. I let the old blog field lie fallow in order to focus on work in Ecma TC39 (JS standards), Firefox 3.5, 3.6 and 4; and recently on a new project that I’ll blog about soon.

In the mean time [UPDATE and in case the embedded video fails], here’s the video link from my JSConf 2010 surprise keynote in April. Highlights include:

  • What would happen in a battle between Chuck Norris and Bruce Campbell
  • Clips from Netsca^H^H^H^H^H^HEvil Dead 2 and Army of Darkness
  • Discussion of where JS has been and what lies in its immediate future

True facts:

  • I did meet John McCarthy of LISP fame in 1977
  • My haircut was influenced by Morrissey’s (hey, it was the late ’80s)
  • JS’s function keyword did come from AWK

/be

TraceMonkey Update

We have been busy, mostly fixing bugs for stability, but also winning a bit more performance, since TraceMonkey landed on mozilla-central, from which Firefox 3.1 alpha-stage nightly builds are built. Tonight’s builds include a fix for the bug that ilooped a SunSpider test (my apologies to those of you who suffered that bug’s bite).

But what I’m sure everyone wants to know is: how do we compare to V8?

Here are the results from head-to-head SunSpider on Windows XP on a Mac Mini and Windows Vista on a MacBook Pro, testing against last night’s Firefox automated build and yesterday’s Chrome beta:

tracemonkeyv8

We win by 1.28x and 1.19x, respectively. Maybe we should rename TraceMonkey “V10” ;-).

Ok, it’s only SunSpider, one popular yet arguably non-representative benchmark suite. We are not about to be braggy. (“Don’t be braggy” is our motto here at Mozilla ;-).)

But it’s worth digging deeper into the results. Let’s look at the ratios by test:

tmfaster

We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.

This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven’t had enough hours in the day to get to it, but it’s “next”.

This reminds me: TraceMonkey is only a few months old, excluding the Tamarin Tracing Nanojit contributed by Adobe (thanks again, Ed and co.!), which we’ve built on and enhanced with x86-64 support and other fixes. We’ve developed TraceMonkey in the open the whole way. And we’re as fast as V8 on SunSpider!

This is not a trivial feat. As we continue to trace unrecorded bytecode and operand combinations, we will only get faster. As we add recursion, trace-wise register allocation, and other optimizations, we will eliminate the losses shown above and improve our ratios linearly across the board, probably by 2 or greater.

I’ll keep updating the blog every week, as we do this work. Your comments are welcome as always.

V8 is great work, very well-engineered, with room to speed up too. (And Chrome looks good to great — the multi-process architecture is righteous, but you expected no less praise from an old Unix hacker like me.)

What spectators have to realize is that this contest is not a playoff where each contending VM is eliminated at any given hype-event point. We believe that Franz&Gal-style tracing has more “headroom” than less aggressively speculative approaches, due to its ability to specialize code, making variables constant and eliminating dead code and conditions at runtime, based on the latent types inherent in almost all JavaScript programs. If we are right, we’ll find out over the next weeks and months, and so will you all.

Anyway, we’re very much in the game and moving fast — “reports of our death are greatly exaggerated.” Stay tuned!

TraceMonkey: JavaScript Lightspeed

I’m extremely pleased to announce the launch of TraceMonkey, an evolution of Firefox’s SpiderMonkey JavaScript engine for Firefox 3.1 that uses a new kind of Just-In-Time (JIT) compiler to boost JS performance by an order of magnitude or more.

Results

Let’s cut straight to the charts. Here are the popular SunSpider macro- and micro-benchmarks average scores, plus results for an image manipulation benchmark and a test using the Sylvester 3D JS library’s matrix multiplication methods:

asbenchlg

Here are some select SunSpider micro-benchmarks, to show some near-term upper bounds on performance:

ff31wtracingsm

This chart shows speedup ratios over the SpiderMonkey interpreter, which is why “empty loop with globals” (a loop using global loop control and accumulator variables) shows a greater speedup — global variables in JavaScript, especially if undeclared by var, can be harder to optimize in an interpreter than local variables in a function.

Here are the fastest test-by-test SunSpider results, sorted from greatest speedup to least:

ff31vsff3

The lesser speedups need their own chart, or they would be dwarfed by the above results:

ff31wtracingsm

(Any slowdown is a bug we will fix; we’re in hot pursuit of the one biting binary-trees, which is heavily recursive — it will be fixed.)

With SunSpider, some of the longest-running tests are string and regular-expression monsters, and since like most JS engines, we use native (compiled C++) code for most of the work, there’s not as much speedup. Amdahl’s Law predicts that this will bound the weighted-average total Sunspider score, probably to around 2. No matter how fast we JIT the rest of the code, the total score will be . . . 2.

But this is only a start. With tracing, performance will keep going up. We have easy small linear speedup tasks remaining (better register allocation, spill reduction around built-in calls). We will trace string and regular expression code and break through the “2” barrier. We will even trace into DOM methods. The tracing JIT approach scales as you move more code into JS, or otherwise into view of the tracing machinery.

Finally, schrep created a screencast that visually demonstrates the speedup gained by TraceMonkey. These speedups are not just for micro-benchmarks. You can see and feel them.

How We Did It

We’ve been working with Andreas Gal of UC Irvine on TraceMonkey, and it has been a blast. We started a little over sixty days (and nights 😉 ago, and just yesterday, shaver pushed the results of our work into the mozilla-central Hg repository for inclusion in Firefox 3.1.

The JIT is currently pref’ed off, but you can enable it via about:config — just search for “jit” and, if you are willing to report any bugs you find, toggle the javascript.options.jit.content preference (there’s a jit.chrome pref too, for the truly adventurous).

Before TraceMonkey, for Firefox 3, we made serious performance improvements toSpiderMonkey, both to its Array code and to its interpreter. The interpreter speedups entailed two major pieces of work:

  • Making bytecode cases in the threaded interpreter even fatter, so the fast cases can stay in the interpreter function.
  • Adding a polymorphic property cache, for addressing properties found in prototype and scope objects quickly, without having to look in each object along the chain.

I will talk about the property cache and the “shape inference” it is based on in another post.

By the way, we are not letting moss grow under our interpreter’s feet. Dave Mandelin is working on a combination of inline-threading and call-threading that will take interpreter performance up another notch.

While doing this Firefox 3 work, I was reminded again of the adage:

Neurosis is doing the same thing over and over again, expecting to get a different result each time.

But this is exactly what dynamically typed language interpreters must do. Consider the + operator:

a = b + c;

Is this string concatenation, or number addition? Without static analysis (generally too costly), we can’t know ahead of time. For SpiderMonkey, we have to ask further: if number, can we keep the operands and result in machine integers of some kind?

Any interpreter will have to cope with unlikely (but allowed) overflow from int to double precision binary floating point, or even change of variable type from number to string. But this is neurotic, because for the vast majority of JS code, in spite of the freedom to mutate type of variable, types are stable. (This stability holds for other dynamic languages including Python.)

Another insight, which is key to the tracing JIT approach: if you are spending much time in JS, you are probably looping. There’s simply not enough straight line code in Firefox’s JS, or in a web app, to take that much runtime. Native code may go out to lunch, of course, but if you are spending time in JS, you’re either looping or doing recursion.

The Trace Trees approach to tracing JIT compilation that Andreas pioneered can handle loops and recursion. Everything starts in the interpreter, when TraceMonkey notices a hot loop by keeping cheap count of how often a particular backward jump (or any backward jump) has happened.

for (var i = 0; i < BIG; i++) {
    // Loop header starts here:
    if (usuallyTrue())
        commonPath();
    else
        uncommonPath();
}

Once a hot loop has been detected, TraceMonkey starts recording a trace. We use the Tamarin Tracing Nanojit to generate low-level intermediate representation instructions specialized from the SpiderMonkey bytecodes, their immediate and incoming stack operands, and the property cache “hit” case fast-lookup information.

The trace recorder completes when the loop header (see the comment in the code above) is reached by a backward jump. If the trace does not complete this way, the recorder aborts and the interpreter resumes without recording traces.

Let’s suppose the usuallyTrue() function returns true (it could return any truthy, e.g. 1 or "non-empty" — we can cope). The trace recorder emits a special guard instruction to check that the truthy condition matches, allowing native machine-code trace execution to continue if so. If the condition does not match, the guard exits (so-called “side-exits”) the trace, returning to the interpreter at the exact point in the bytecode where the guard was recorded, with all the necessary interpreter state restored.

If the interpreter sees usuallyTrue() return true, then the commonPath(); case will be traced. After that function has been traced comes the loop update part i++ (which might or might not stay in SpiderMonkey’s integer representation depending on the value of BIG — again we guard). Finally, the condition i < BIG will be recorded as a guard.

// Loop header starts here:
inlined usuallyTrue() call, with guards
guard on truthy return value
guard that the function being invoked at this point is commonPath
inlined commonPath() call, with any calls it makes inlined, guarded
i++ code, with overflow to double guard
i < BIG condition and loop-edge guard
jump back to loop header

Thus tracing is all about speculating that what the interpreter sees is what will happen next time — that the virtual machine can stop being neurotic. And as you can see, tracing JITs can inline method calls easily — just record the interpreter as it follows a JSOP_CALL instruction into an interpreted function.

One point about Trace Trees (as opposed to less structured kinds of tracing): you get function inlining without having to build interpreter frames at all, because the trace recording must reach the loop header in the outer function in order to complete. Therefore, so long as the JITted code stays “on trace”, no interpreter frames need to be built.

If the commonPath function itself contains a guard that side-exits at runtime, then (and only then) will one or more interpreter frames need to be reconstructed.

Let’s say after some number of iterations, the loop shown above side-exits at the guard for usuallyTrue() because that function returns a falsy value. We abort correctly back to the interpreter, but keep recording in case we can complete another trace back to the same loop header, and extend the first into a trace tree. This allows us to handle different paths through the control flow graph (including inlined functions) under a hot loop.

What It All Means

Pulling back from the details, a few points deserve to be called out:

  • We have, right now, x86, x86-64, and ARM support in TraceMonkey. This means we are ready for mobile and desktop target platforms out of the box.
  • As the performance keeps going up, people will write and transport code that was “too slow” to run in the browser as JS. This means the web can accommodate workloads that right now require a proprietary plugin.
  • As we trace more of the DOM and our other native code, we increase the memory-safe codebase that must be trusted not to have an exploitable bug.
  • Tracing follows only the hot paths, and builds a trace-tree cache. Cold code never gets traced or JITted, avoiding the memory bloat that whole-method JITs incur. Tracing is mobile-friendly.
  • JS-driven <canvas> rendering, with toolkits, scene graphs, game logic, etc. all in JS, are one wave of the future that is about to crest.

TraceMonkey advances us toward the Mozilla
2
future where even more Firefox code is written in JS. Firefox gets faster and safer as this process unfolds.

I believe that other browsers will follow our lead and take JS performance through current interpreter speed barriers, using just-in-time native code compilation. Beyond what TraceMonkey means for Firefox and other Mozilla projects, it heralds the JavaScript Lightspeed future we’ve all been anticipating. We are moving the goal posts and changing the game, for the benefit of all web developers.

Acknowledgments

I would like to thank Michael Franz and the rest of his group at UC Irvine, especially Michael Bebenita, Mason Chang, and Gregor Wagner; also the National Science Foundation for supporting Andreas Gal’s thesis. I’m also grateful to Ed Smith and the Tamarin Tracing team at Adobe for the TT Nanojit, which was a huge boost to developing TraceMonkey.

And of course, mad props and late night thanks to Team TraceMonkey: Andreas, Shaver, David Anderson, with valuable assists from Bob Clary, Rob Sayre, Blake Kaplan, Boris Zbarsky, and Vladimir Vukićević.

Popularity

It seems (according to one guru, but coming from this source, it’s a left-handed compliment) that JavaScript is finally popular.

To me, a nerd from a tender age, this is something between a curse and a joke. (See if you are in my camp: isn’t the green chick hotter?)

Brendan Eich convinced his pointy-haired boss at Netscape that the Navigator browser should have its own scripting language, and that only a new language would do, a new language designed and implemented in big hurry, and that no existing language should be considered for that role.

I don’t know why Doug is making up stories. He wasn’t at Netscape. He has heard my recollections about JavaScript’s birth directly, told in my keynotes at Ajax conferences. Revisionist shenanigans to advance a Microhoo C# agenda among Web developers?

Who knows, and it’s hard to care, but in this week of the tenth anniversary of mozilla.org, a project I co-founded, I mean to tell some history.

As I’ve often said, and as others at Netscape can confirm, I was recruited to Netscape with the promise of “doing Scheme” in the browser. At least client engineering management including Tom Paquin, Michael Toy, and Rick Schell, along with some guy named Marc Andreessen, were convinced that Netscape should embed a programming language, in source form, in HTML. So it was hardly a case of me selling a “pointy-haired boss” — more the reverse.

Whether that language should be Scheme was an open question, but Scheme was the bait I went for in joining Netscape. Previously, at SGI, Nick Thompson had turned me on to SICP.

What was needed was a convincing proof of concept, AKA a demo. That, I delivered, and in too-short order it was a fait accompli.

Of course, by the time I joined Netscape, and then transferred out of the server group where I had been hired based on short-term requisition scarcity games (and where I had the pleasure of working briefly with the McCool twins and Ari Luotonen; later in 1995, Ari and I would create PAC), the Oak language had been renamed Java, and Netscape was negotiating with Sun to include it in Navigator.

The big debate inside Netscape therefore became “why two languages? why not just Java?” The answer was that two languages were required to serve the two mostly-disjoint audiences in the programming ziggurat who most deserved dedicated programming languages: the component authors, who wrote in C++ or (we hoped) Java; and the “scripters”, amateur or pro, who would write code directly embedded in HTML.

Whether any existing language could be used, instead of inventing a new one, was also not something I decided. The diktat from upper engineering management was that the language must “look like Java”. That ruled out Perl, Python, and Tcl, along with Scheme. Later, in 1996, John Ousterhout came by to pitch Tk and lament the missed opportunity for Tcl.

I’m not proud, but I’m happy that I chose Scheme-ish first-class functions and Self-ish (albeit singular) prototypes as the main ingredients. The Java influences, especially y2k Date bugs but also the primitive vs. object distinction (e.g., string vs. String), were unfortunate.

Back to spring of 1995: I remember meeting Bill Joy during this period, and discussing fine points of garbage collection (card marking for efficient write barriers) with him. From the beginning, Bill grokked the idea of an easy-to-use “scripting language” as a companion to Java, analogous to VB‘s relationship to C++ in Microsoft’s platform of the mid-nineties. He was, as far as I can tell, our champion at Sun.

Kipp Hickman and I had been studying Java in April and May 1995, and Kipp had started writing his own JVM. Kipp and I wrote the first version of NSPR as a portability layer underlying his JVM, and I used it for the same purpose when prototyping “Mocha” in early-to-mid-May.

Bill convinced us to drop Kipp’s JVM because it would lack bug-for-bug compatibility with Sun’s JVM (a wise observation in those early days). By this point “Mocha” had proven itself via rapid prototyping and embedding in Netscape Navigator 2.0 , which was in its pre-alpha development phase.

The rest is perverse, merciless history. JS beat Java on the client, rivaled only by Flash, which supports an offspring of JS, ActionScript.

So back to popularity. I can take it or leave it. Nevertheless, popular Ajax libraries, often crunched and minified and link-culled into different plaintext source forms, are schlepped around the Internet constantly. Can we not share?

One idea, mooted by many folks, most recently here by Doug, entails embedding crypto-hashes in potentially very long-lived script tag attributes. Is this a good idea?

Probably not, based both on theoretical soundness concerns about crypto-hash algorithms, and on well-known poisoning attacks.

A better idea, which I heard first from Rob Sayre: support an optional “canonical URL” for the script source, via a share attribute on HTML5 <script>:

<mce:script mce_src=”https://my.edge.cached.startup.com/dojo-1.0.0.js” shared=”https://o.aolcdn.com/dojo/1.0.0/dojo/dojo.xd.js”>
</mce:script><br />

If the browser has already downloaded the shared URL, and it still is valid according to HTTP caching rules, then it can use the cached (and pre-compiled!) script instead of downloading the src URL.

This avoids hash poisoning concerns. It requires only that the content author ensure that the src attribute name a file identical to the canonical (“popular”) version of the library named by the shared attribute. And of course, it requires that we trust the DNS. (Ulp.)

This scheme also avoids embedding inscrutable hashcodes in script tag attribute values.

Your comments are welcome.

Ok, back to JavaScript popularity. We know certain Ajax libraries are popular. Is JavaScript popular? It’s hard to say. Some Ajax developers profess (and demonstrate) love for it. Yet many curse it, including me. I still think of it as a quickie love-child of C and Self. Dr. Johnson‘s words come to mind: “the part that is good is not original, and the part that is original is not good.”

Yet here we are. The web must evolve, or die. So too with JS, wherefore ES4. About which, more anon.

Firefox 3 looks like it will be popular too, based on space and time performance metrics. More on that soon, too.

My @media Ajax Keynote

JavaScript 2 and the Open Web

Brendan Eich

Mozilla Corporation

@media Ajax London

20 Nov 2007

Herewith a hacked-up version of my S5 slides, with notes and commentary interpolated at the bottom of each slide.

Dilbert – the Big Time

dilbertjoke

See how JS is paired with Flash — poor, mundane HTML, CSS, DOM! HTML5 needs a new name.

Yoda on ES4

yodasaber

I described Doug Crockford as “the Yoda of Lambda JavaScript programming” at a mid-2006 talk he invited me to present at Yahoo!, so I thought I would start by riffing on whether the role still fits. So far, so good.

Yoda in Trouble

yodaintrouble

But three prequels later, the outlook for Yoda is not good. Large, heavy, spinning, flying-saucer-proprietary runtimes are hurtling toward him!

Enough Star Wars — Doug’s too tall for that part (even if it’s only a muppet). Let’s try a taller wizard…

The Bridge of EcmaDoom

khazad

I don’t really believe ES4 is a demon from the ancient world, of course. I’m afraid the JS-Hobbits are in trouble, though. As things stand today, Silverlight with C# or something akin (pushed via Windows Update) will raze the Shire-web, in spite of Gandalf-crock’s teachings.

Mal, Latin for “Bad”

mal

They’ll swing back to the belief that they can make people… better. And I do not hold to that.

– Mal Reynolds, Serenity

I can roleplay too: let’s see, renegade veteran from the losing side of an epic war against an evil empire… yeah, I can relate.

I really do think that JS’s multi-paradigm nature means there is no one-true-subset for all to use (whether they like it or not), and the rest — including evolutionary changes — should be kept out. I reject the idea that instead of making JS better, programmers should somehow be made “better”. The stagnation of JS1, and “little language” idolatries surrounding it (those one-true-way JS subsets), impose a big tax on developers, and drive too many of them away from the web standards and toward WPF, Flex, and the like.

Ok, enough role-playing geek fun — let’s get down to brass tacks: what’s really going on with the ES4 fracas?

Cui Bono

  • Who decides what is “better”?
  • Browser, plugin, and OS vendors?
  • Web developers?
  • All of the above, ideally
    • Without taking too long
    • Or making a mess
  • Preventing change could spare us from “worse”
  • Or help proprietary “change” to take off for the worst

Clearly ES4 will be tough to standardize. Standards often are made by insiders, established players, vendors with something to sell and so something to lose. Web standards bodies organized as pay-to-play consortia thus leave out developers and users, although vendors of course claim to represent everyone fully and fairly.

I’ve worked within such bodies and continue to try to make progress in them, but I’ve come to the conclusion that open standards need radically open standardization processes. They don’t need too many cooks, of course; they need some great chefs who work well together as a small group. Beyond this, open standards need transparency. Transparency helps developers and other categories of “users” see what is going on, give corrective feedback early and often, and if necessary try errant vendors in the court of public opinion.

Given all the challenges, the first order for ES4 work is to finish the reference implementation and spec writing process, taking into account the ongoing feedback. Beyond that, and I said this at the conference, I believe we need several productized implementations well under way, if not all but done, by the time the standard is submitted for approval (late 2008). This will take some hard work in the next ten months.

My hope is to empower developers and users, even if doing so requires sacrifice on the part of the vendors involved.

Inevitable Evolution

  • Web browsers are evolving
  • They need to, against Silverlight, AIR, and OS stacks
  • Browsers (and plugins!) need all three of
    1. Better Security
    2. Better APIs for everything (see 1)
    3. Better programming language support (see 1 and 2)
  • No two-legged stools, all three are needed

Some assert that JS1 is fine, browsers just need better APIs. Or (for security), that JS1 with incompatible runtime semantic changes and a few outright feature deletions is fine, but mainly: browsers just need better APIs. Or that Security comes first, and the world should stop until it has been achieved (i.e., utopia is an option). But I contend that JS must improve along with browser APIs and security mechanism and policy, both to serve existing and new uses, and to have a prayer of more robust APIs or significantly better security.

It’s clear from the experiences of Mozilla and security researchers I know that even a posteriori mashups built on a capability system will leak information. So information flow type systems could be explored, but again the research on hybrid techniques that do not require a priori maximum-authority judgments, which do not work on the web (think mashups in the browser without the user having to click “OK” to get rid of a dialog), is not there yet. Mashups are unplanned, emergent. Users click “OK” when they shouldn’t. These are hard, multi-disciplinary research problems.

Where programming languages can help, type systems and mutability controls are necessary, so JS1 or a pure (semantics as well as syntax) subset is not enough.

Evolving Toward “Better”

  • Security: hard problem, humans too much in the loop
  • APIs: served by WHAT-WG (Apple, Mozilla, Opera)
  • Languages: only realistic evolutionary hope is JS2

I am personally committed to working with the Google Caja team, and whoever else will help, to ensure that JS2 (with the right options, and as few as possible) is a good target for Caja. The irony is that when combined with backward compatibility imperatives, this means adding features, not removing them (for example, catchalls).

A note on names: I used JS2 in the title and the slides to conjure with the “JS” and “JavaScript” names, not to show any disrespect to ES/ECMAScript. All the books, the ‘J’ in AJAX (when it’s an acronym), the name of the language dropped most often at the @media Ajax conference, all call the language by the “JavaScript” name. Yeah, it was a marketing scam by Netscape and Sun, and it has a mixed history as a brand, but I think we are stuck with it.

Alternative Languages

  • Why not new/other programming languages?
  • JS not going away in our lifetimes
  • JS code is growing, not being rewritten
  • No room for multiple language runtimes in mobile browser
    • Apple, Mozilla, Opera attest to this in my hearing
  • One multi-language runtime? Eventually, not soon enough
    • A patent minefield…
    • How many hard problems can we (everyone!) solve at once and quickly?

This slide compresses a lot, but makes some points often missed by fans of other languages. Browsers will always need JS. Browsers cannot all embed the C Python implementation, the C Ruby implementation, etc. etc. — code footprint and cyclic leaks among heaps, or further code bloat trying to super-GC those cycles, plus all the security work entailed by the standard libraries, are deal killers.

The multi-language, one-runtime approach is better, but not perfect: IronPython is not Python, and invariably there is a first-among-equals language (Java on JVMs, C# on the CLR). We are investing in IronMonkey to support IronPython and IronRuby, and in the long run, if everyone makes the right moves, I’m hopeful that this work will pay off in widespread Python and Ruby support alongside JS2. But it will take a long while to come true in a cross-browser way.

Silverlight is not able to provide browser scripting languages in all browsers. Even if IE8 embeds the DLR and CLR, other browsers will not. Note the asymmetry with ScreamingMonkey: it is likely to be needed only by IE, and only IE has a well-known API for adding scripting engines.

Why JS2

  • JS1 is too small => complexity tax on library and app authors
  • JS1 has too few primitives => hard idiom optimization problem
  • JS1 lacks integrity features => better security has to be bolted on
  • JS1 is not taught much => Java U. still cranking out programmers
  • JS2 aims to cover the whole user curve, “out of the box”

The “too small” and “too few primitives” points remind me of Guy Steele’s famous Growing a Language talk from OOPSLA 1998 (paper). If you haven’t seen this, take the time.

During the panel later the same day, Jeremy Keith confronted me with the conclusion that JS2 was pitched only or mainly at Java heads. I think this slide and the next gave that impression, and a more subtle point was lost.

I hold no brief for Java. JS does not need to look like Java. Classes in JS2 are an integrity device, already latent in the built-in objects of JS1, the DOM, and other browser objects. But I do not believe that most Java U. programmers will ever grok functional JS, and I cite GWT uptake as supporting evidence. This does not mean JS2 panders to Java. It does mean JS2 uses conventional syntax for those magic, built-in “classes” mentioned in the ES1-3 and DOM specs.

In other words, and whatever you call them, something like classes are necessary for integrity properties vital to security in JS2, required for bootstrapping the standard built-in objects, and appropriate to a large cohort of programmers. These independent facts combine to support classes as proposed in JS2.

Normal Distribution

bellcurve

JS1 is used by non-programmers, beginning programmers, “front end designers”. It is copied and pasted, or otherwise concatenatively programmed, with abandon (proof: ES4 has specified toleration of Unicode BOMs in the middle of .js files! How did those get there?). This was a goal, at least in the “pmarca and brendan”@netscape.com vision circa 1995, over against Java applets. We succeeded beyond our wildest nightmares.

Netscape 2 had some very clueful early adopters of JS (Bill Dortch, if you are reading this, leave a comment). Years later, Doug Crockford led more select hackers toward the right end of the distribution, but much of the middle was necessarily bypassed: you can’t reach this cohort without taking over Java U.

What’s more, I observe that the Lambda-JS Jedi order is inherently elitist (I’m not anti-elitist, mind you; natural elites happen in all meritocratic systems). For many current Knights, it must remain so to retain its appeal.

Now, there’s nothing wrong with using closures for (partial) integrity and prototypes for inheritance; I like these tools (I should, I picked them in a hurry in the early days). But really, why should everyone be required to learn the verbose, error-prone, and inherently costly functional-JS-OOP incantations (power constructors, module patterns, etc.), instead of using a few concise, easier to get right, and more efficient new declarative forms that JS2 proposes?

It’s not as if JS2 is renouncing prototypes or closures in favor of “the Java way”. That’s a misinformed or careless misreading. Rather, we aim to level the playing field up, not down. JS2 users should be able to make hardened abstractions without having to write C++ or create Active X objects. And power-constructor and module pattern fans can continue to use their favorite idioms.

Wait a Minute!

Perhaps you object (strenously):

  • “I like my JS small, it is not complex with the right kung-fu!”
  • “Most runtime is in the DOM, who cares about JS optimizations”
  • “Security through smallness, and anyway: fix security first”
  • “People are learning, Yoda is teaching them”

JS1 favors closures (behavior with attached state) over objects (state with attached behavior) with both more abstraction (because names can be hidden in closures) and greater integrity (because var bindings are DontDelete). While JS1 is multi-paradigm, going with the grain of the design (closures over objects) wins. In my talk, I acknowledged the good done by Doug and others in teaching people about functional programming in JS.

However, there are limits. JS1 closure efficiency, and outright entrainment hazards that can result in leaks, leave something to be desired (the entrainment hazards led Microsoft to advise JScript hackers to avoid closures!). You could argue that implementations should optimize harder. Arguing is not needed, though — high quality optimizing runtime, which fit on phones, are what’s needed.

Beyond efficiency, using closures for modules and class-like abstractions is verbose and clumsy compared to using new syntax. Dedicating new syntax is scary to some, but required for usability (over against __UGLY__ names), and allowed under script type versioning. In the absence of macros (syntactic abstraction), and so long as macros can be added later and syntactic sugar reimplemented via macros at that time, my view is that we should be generous with syntactic conveniences in JS2.

So JS2 has more syntax (user interface) as well as substance (complementary primitives). This could be a burden on people learning the new language, but I think not a big one. In practice over the next few years, the bulk of the focus in books and classes will be on JS1 and Ajax. A programming language for the web should be a many-storied mountain, and most users will not ascend to the summit.

The main burden of new syntax is on folks writing source-to-source translators, static analyzers, and the like. These more expert few can take the heat so that the many can enjoy better “user interface”.

OK, Granted

  • Some truth in these claims, just not enough in my view
  • The odds ratios multiply to a pretty small success likelihood
  • Meanwhile, Silverlight is charging hard with C# (DLR fan-bait aside)
  • Flash and AIR likewise rely on ActionScript 3, not JS1, to compete
  • And really, JS1 users who are hitting its limits need relief soon

To respond to the contrarian arguments in the previous slide:

  • Whoever prefers a subset is free to continue using it on JS2 implementations. If your pointy-haired boss imposes class-obsessed B&D programming on you, get a new job.
  • DOM profiles show high dynamic dispatch, argument/result conversion, and other costs imposed by untyped JS1 in current implementations. Better implementations and JS2 usage can help.
  • Security is best defined as end-to-end properties that must be engineered continuously according to economic trade-offs as the system evolves. Utopia is not an option.
  • Some people are learning, but many others are not, and vendors sell C# and AS3 against JS1 for good reason.

As Neil Mix wryly observed in a post to the es4-discuss list:

When I hear concerns that ES4 will “break the web,” I can’t help but think of how many times I’ve heard that the web is already broken! The risks of not adopting ES4 surely must factor into this calculus, too.

Why Not Evolve?

  • We’re not proto-humans from 2001: A Space Odyssey
    • Making space ships out of bones
    • Or modules out of lambdas
  • We’re highly-evolved tool users, with opposable thumbs — we can:
  • Make better use of an existing tool (JS1)
  • Improve the tool itself (JS2)
  • Why not do both?

At the very least, don’t put all eggs in the “make people better” basket.

But… But…

You may still object:

  • “JS should remain small no matter what!”
  • “Classes suck, I hate my pointy-haired Java boss”
  • “Aren’t you rejecting your own elegant (yet messy) creation?”
  • “Who replaced you with a pod-person? We don’t even know you any longer!”

I get all kinds of fan-mail :-/.

What I Seek

  • To make JS (not people)… better
  • Better for its present and near-future uses on the web
  • Especially for building Ajax libraries and applications
  • JS programs are increasing in size and complexity
  • They face increasing workload — lots of objects, runtime
  • JS users deserve improvements since ES3 (eight years ago)

The argument that stagnation on the web fostered Ajax and Web 2.0 is false. Only when XMLHttpRequest was cloned into other browsers, and especially as Firefox launched and took market share back from IE, did we see the sea-change toward web apps that rely on intensive browser JS and asynchronous communication with the server.

In Particular

  • JS users deserve an optional type system
    • Instead of tedious (often skipped) error checking
    • So APIs can prove facts about their arguments
    • Without requiring all calling code to be typed
    • (… at first, or ever!)
  • They deserve integrity guarantees such as const
  • They deserve real namespaces, packages, compilation units
  • They deserve some overdue “bug fixes” to the ES3 standard
  • They deserve generous syntactic sugar on top

Like Mal Reynolds, I really do believe these things — not that I’m ready to die for my beliefs (“‘Course, that ain’t exactly plan A”).

The Deeper Answer

Why JS2 as a major evolutionary jump matters:

  • So browsers and plugins lose their “native-code innovation” lock
  • Downloaded JS2 code can patch around old native code bugs
  • Or reimplement a whole buggy subsystem, at near-native (or better!) speed
  • No more waiting for IE (or Firefox, or Safari, or Opera)
  • Distributed extensibility, web developers win

There won’t be a single browser release after which all power shifts to web developers writing in JS2. The process is more gradual, and it’s easy to forget how far we’ve come already. We’re well along the way toward disrupting the desktop OSes. Yet JS2 on optimizing VMs will liberate developers in ways that JS1 and plugins cannot.

Believe It

  • Yes, this was the Java dream that died in 1995, or 1997
  • This time for sure (Tamarin may be the most widely deployed VM ever)
  • It’s coming true with JS — if only it can evolve enough in time

I lived through the Java dream. Netscape was building an optimizing JIT-compiling runtime in 1996 and 1997, while Sun acquired Anamorphic and built HotSpot. The layoffs at the end of 1997 brought all the Netscape Java work crashing to a halt, and caused the “Javagator” rendering engine team to reinvent their code in C++ as Gecko (originally, “Raptor”).

In spite of all the hype and folly, the dream could have come true given both enough time and better alignment (including open source development) between Sun and Netscape. A lot more time — Java was slow back then, besides being poorly integrated into browsers.

There are many ironies in all of this. Two I enjoy often:

  • Flash filled the vacuum left by the decline of Java in the browser, and now provides a vector for JS2 on Tamarin.
  • Microsoft dumped Java, depriving Outlook Web Access of an asynchronous I/O class, wherefore XMLHttpRequest.

Non-Issues

  • JS1 performance on synthetic pure-JS (no DOM) benchmarks
  • Trace-based JITing accelerates JS1 at least an order of magnitude
  • Work from Michael Franz‘s group at UC Irvine (Mozilla supported)
  • No int type annotations required
  • Preliminary results based on Tamarin-to-Java bytecode translation, with a custom tracing JIT targeting the JVM (whew!), next…

If you take only one point away from this talk (I said), it should be that type annotations are not required for much-improved pure JS performance.

Tracing JIT Benchmarks

javagrandesuite

This chart shows results, normalized using SpiderMonkey performance at unity (so taller is faster), for the JavaGrande benchmark ported to JS (untyped JS except where noted: “Tamarin with Type Annotations”). The “Trace-Tree JIT” blue bars show results for a clever translation of Tamarin bytecode into Java bytecode (with runtime type helpers) fed into a tracing JIT implemented in Java(!). Amazingly, this approach competes with Rhino and Tamarin, even Tamarin run on typed-JS versions of the benchmarks.

The Crypt benchmark could not be run using the trace-based JIT at the time this chart was generated.

Tracing JIT Benchmarks (2)

javagrandeb

More good results, especially given the preliminary nature of the research. With an x86 back-end instead of the Java back-end used for these benchmarks, and further tuning work, performance should go up significantly. Even at this early stage, Series, SOR, and SparseMatMult all show the tracing JIT working with untyped JS beating Tamarin on typed-JS versions of these benchmarks.

Non-Issues (2)

  • Making JS2 look like any other language
  • Stuart, yesterday: fans already fake JS1 to resemble Ruby, Python, …
  • But: JS2 learns from other languages
    • AS3: nominal types, namespaces, packages
    • Python: iterators and generators, catch-alls
    • Dylan, Cecil: generic methods
  • Only those who don’t learn history are doomed to repeat it
  • Problems will be shaken out in ES4 “beta”
    • No rubber-stamped standards! (cough OOXML)

In response to an informal recap of my presentation the other day, Rob Sayre mentioned Peter Norvig‘s presentation on design patterns in dynamic programming. This caused me to flash back to the bad old days of 1998, when certain “Raptor” architects would wave the Gamma book and make dubious assertions about one-true-way design patterns in C++.

Norvig’s slides show what was lost by the swerve toward static, obsessively classical OOP in C++ and Java, away from dynamic languages with first-class functions, first-class types, generic methods, and other facilities that make “patterns” invisible or unnecessary. JS2 aims to restore to practical programmers much of what was lost then.

Integrity in JS2

  • Object, Array, etc., globals can be replaced in JS1
    • JSON CSRF hazards pointed out by Joe Walker
    • ECMA spec says this matters, or not, half the time
    • JS2 makes the standard class bindings immutable
  • Objects are mutable, extensible
    • Even with privileged/private members via closures
    • Too easy to forge instance of special type
    • JS2 has class exactly to solve this problem
    • JS2 lets users make fixtures, fixed (“don’t delete”) properties
  • JS1 user-defined properties can be replaced/hijacked
    • JS2 has const and final

The Romans called wheat integrale, referring to the potent and incorruptible completness of the kernel. Integrity as a security property is not far removed from this sense of completeness and soundness. JS1 simply lacks crucial tools for integrity, and JS2 proposes to add them.

The following slides (I’ve coalesced multiple slides where possible) show the evolution of a webmail library from JS1 to JS2, via gradual typing, in order to increase integrity and simplify code, avoiding repetitious, error-prone hand-coded latent type checking. The transport code is omitted, but you can see JSON APIs being used for transfer encoding and decoding.

Evolutionary Programming

Version 1 of a webmail client, in almost pure JS1

function send(msg) {
  validateMessage(msg);
  msg.id = sendToServer(JSON.encode(msg));
  database[msg.id] = msg;
}
function fetch() {
  handleMessage(-1);                  // -1 means "get new mail"
}
function get(n) {
  if (uint(n) !== n)                  // JS1: n>>>0 === n
    throw new TypeError;
  if (n in database)
    return database[n];
  return handleMessage(n);
}
var database = [];
function handleMessage(n) {
  let msg = JSON.decode(fetchFromServer(n));
  if (typeof msg != "object")
    throw new TypeError;
  if (msg.result == "no data")
    return null;
  validateMessage(msg);
  return database[msg.id] = msg;
}
function validateMessage(msg) {
  function isAddress(a)
    typeof a == "object" && a != null &&
    typeof a.at == "object" && msg != null &&
    typeof a.at[0] == "string" && typeof a.at[1] == "string" &&
    typeof a.name == "string";
  if (!(typeof msg == "object" && msg != null &&
        typeof msg.id == "number" && uint(msg.id) === msg.id &&
        typeof msg.to == "object" && msg != null &&
        msg.to instanceof Array && msg.to.every(isAddress) &&
        isAddress(msg.from) && typeof msg.subject == "string" &&
        typeof msg.body == "string"))
    throw new TypeError;
}

It’s rare to see anything in real-world JS like the detailed checking done by validateMessage. It’s just too tedious, and the language “fails soft” enough (usually), that programmers tend to skip such chores — sometimes to their great regret.

Evolution, Second Stage

Version 2: Structural types for validation.

type Addr = { at: [string, string], name: string };
type Msg = {
  to: [Addr], from: Addr, subject: string, body: string, id: uint
};
function send(msg: like Msg) {
  msg.id = sendToServer(JSON.encode(msg));
  database[msg.id] = msg;
}
function fetch()
  handleMessage(-1);
function get(n: uint) {
  if (n in database)
    return database[n];
  return handleMessage(n);
}
function handleMessage(n) {
  let msg = JSON.decode(fetchFromServer(n));
  if (msg is like { result: string } && msg.result == "no data")
    return null;
  if (msg is like Msg)
    return database[msg.id] = msg;
  throw new TypeError;
}

Important points:

  • Structural types are like JSON, but with types instead of values
  • The like type prefix makes a “shape test” spot-check
  • Note how fetch is now an expression closure
  • No more validateMessage! Structural types ftw! 🙂

Evolution, Third Stage

Version 3a: Integrity through structural type fixtures (other functions are unchanged since Version 2)

type MsgNoId = {
  to: [Addr], from: Addr, subject: string, body: string
};
function send(msg: like MsgNoId) {
  msg.id = sendToServer(JSON.encode(msg));
  database[msg.id] = copyMessage(msg);
}
function handleMessage(n) {
  let msg = JSON.decode(fetchFromServer(n));
  if (msg is like { result: string } && msg.result == "no data")
    return null;
  if (msg is like Msg)
    return database[id] = copyMessage(msg);
  throw new TypeError;
}
function copyMessage(msg) {
  function newAddr({ at: [user, host], name })
    new Addr([user, host]: [string, string], name);
  let { to, from, subject, body, id } = msg;
  return new Msg(to.map(newAddr), newAddr(from), subject, body, id);
}

This stage copes with a confused or malicious client of the webmail API, who could mutate a reference to a message to violate the validity constraints encoded in stage 1’s validateMessage.

The MsgNoId type allows the library client to omit a dummy id, since send initializes that property for the client.

Note the righteous use of parameter and let destructuring in copyMessage.

Alternative Third Stage

Version 3b (other functions are unchanged since Version 3a)

function send(msg: like MsgNoId) {
  msg.id = sendToServer(JSON.encode(msg))
  database[msg.id] = msg wrap Msg
}
function handleMessage(n) {
  let msg = JSON.decode(fetchFromServer(n))
  if (msg is like { result: string } && msg.result == "no data")
    return null
  return database[msg.id] = msg wrap Msg
}

wrap is both an annotated type prefix and a binary operator in JS2. It makes a wrapper for an untyped object that enforces a structural type constraint on every read and write, in a deep sense. So instead of copying to provide integrity through isolation, this alternative third stage shares the underlying message object with the library client, but checks all accesses made from within the webmail library.

Observations on Evolution

  • At no point so far did clients have to use types
  • Code shrank by half from stage 1 to 3a, more to 3b
  • First stage just used a tiny bit of JS2 (uint)
  • Second stage added structural types and is like tests
    • Sanity-checking the “shape” of API arguments
    • But trusting the client not to mutate behind the library’s back!
  • Third stage copied into structural type instances with fixtures — integrity against confused/malicious client
  • Alternative third stage used wrap instead

Notice the lack of classes so far.

Observations on Evolution (2)

  • A “modularization” stage would use package or namespace
  • If copying or wrapping too costly, drop like from formal params, and either:
    • Change client to pass structural type instances
    • Or use nominal types (class, interface) throughout
    • Either way, client changes required at this point
  • Use optional strict mode for verification before deployment
  • (Many thanks to Lars Thomas Hansen for the example code)

UPDATE: A revised and extended version of this evolutionary arc is now available as a tutorial at ecmascript.org, with compelling side-by-side comparisons of successive stages.

Conclusions

  • JS2 focuses on programming in the large and code migration:
    • Evolutionary programming with structural types
    • Gradual typing from like to wrap or fixed types
  • Rapid prototypes start out untyped, just like today
  • We believe most web JS can remain untyped, with good performance
  • Library APIs and implementations can buy integrity and efficiency by the yard
  • Higher integrity with efficiency may induce “islands” of typed code (e.g., real-time games)

The “typed APIs with untyped code” pattern is particularly winning in our experience building the self-hosted built-ins in the ES4 reference implementation.

What Else Is New?

  • ScreamingMonkey lives! It runs a self-hosted ES4 compiler that generates bytecode from the compiler’s own source
  • Much optimization work remains to be done
  • But the C# chess demo from MIX07, ported to ES4, runs now
  • ScreamingMonkey chess demo is ~15x faster than the JScript version (per fresh e-mail today from Mark Hammond)
  • Demos of two other new APIs, the <video> tag and 3D <canvas>, follow…

During this slide, I shot screaming slingshot flying monkeys (complete with little black masks and capes) into the audience. I’m sorry I could bring only a handful on this trip!

Video Tag Demo

  • Implements the WHAT-WG <video> tag proposal
  • Opera and now Apple have implemented too
  • page-defined DHTML playback controls
  • Uses Ogg Theora and Vorbis for video and audio
  • Embeds <video> in SVG <foreignObject> for transforms
  • Developed by Chris Double

Chris provides source code, Firefox 3 alpha builds, and Ogg video/audio files.

Canvas3D Demo

  • Alternative OpenGL-ES rendering context for <canvas>
  • Embeds OpenGL’s shader language in <script>, read via DOM
  • Example KMZ (Google Earth) viewer
  • Developed by Vladimir Vukicevic

Now available as a Firefox 3 addon!

Finis