Paren-Free

The tl;dr version

Krusty the ventriloquist

<Krusty>So, you kids want CoffeeScript, do you?</Krusty>

<script type="harmony">   // placeholder MIME type

if year > 2010 {
    syntax++
}

for i in iter {           // i is a fresh let binding!
    frob(i)
}

while lo <= hi {
    let mid = (lo + hi) / 2
    // binary search blah blah blah
}

... return [i * i for i in range(n)]   // array comprehension

</script>

No parentheses around control structure “heads”. If Go can do it, so can JS. And yes, I’m using automatic semi-colon insertion (JSLint can suck it).

There are open issues (are braces required around bodies?) but this is the twitter-friendly section. More below, after some twitter-unfriendly motivation.

Background

We had a TC39 meeting last week, graciously hosted at Apple with Ollie representing. Amid the many productive activities, Dave presented iterators as an extension to proxies.

The good news is that the committee agreed that some kind of meta-programmable iteration should be in the language.

Enumeration

Proxies had already moved to Harmony Proposal status earlier this year, but with an open issue: how to trap for (i in o) where o is a proxy with a huge (or even an infinite — rather, a lazily created and indefinite) number of properties.

js> var handler = {
    enumerate: function () { return ["a", "b", "c"]; }
};
js> var proxy = Proxy.create(handler);
js> for (var i in proxy)
    print(i);
a
b
c

The proxy handler’s fundamental enumerate trap eagerly returns an array of all property names “in” the proxy, coerced to string type if need be. Each string is required to be unique in the returned array. But for a large or lazy object, where the trapping loop may break early, eagerness hurts. Scale up and eagerness (never mind the uniqueness requirement) is fatal. TC39 agreed that a lazy-iteration derived (optional) trap was wanted.

js> var handler = {
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> for (var i in proxy) {
    if (i == 3) break;
    print(i);
}
0
1
2

The iterators strawman addressed this use-case by proposing that for-in would trap to iterate if present on the handler for the proxy referenced by o, in preference to trapping to enumerate.

js> var handler = {
    enumerate: function () { return ["a", "b", "c"]; },
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> for (var i in proxy) {
    if (i == 3) break;
    print(i);
}
0
1
2

To avoid switching from enumeration to iteration under a single for-in loop, once the loop has started enumerating a non-proxy, if a proxy is encountered on that object’s prototype chain, the prototype proxy’s enumerate trap will be used, not its iterate trap.

js> var handler = {
    has: function (name) { return /^[abc]$/.test(name); },
    enumerate: function () { return ["a", "b", "c"]; },
    iterate: function () { for (var i = 0; i < 1e9; i++) yield i; } }; js> var proxy = Proxy.create(handler);
js> var obj = Object.create(proxy);
js> for (var i in obj) {
    print(i);
}
a
b
c

Enumeration walks the prototype chain, and this is why a proxy might want both enumerate and iterate.

Iteration

What all this means: you can implement Pythonic iterators with proxies, and return a sequence of arbitrary values to a for-in loop that’s given the proxy directly (not on a prototype chain of a non-proxy object, as noted above). A large/lazy proxy would trap iterate instead of enumerate and return string keys, but other iterator-proxies could return Fibonacci numbers, integer ranges, or whatever the proxy implementor and consumer want. This was an intended part of the package deal.

js> function fib(n) {
    var i = 0;
    var a = 0, b = 1;
    return {
        next: function () {
            if (++i > n)
                throw StopIteration;
            [a, b] = [b, a + b];
            return a;
        }
    };
}
js> var handler = {iterate: function () { return fib(10); } };
js> var proxy = Proxy.create(handler);
js> for (var i in proxy)
    print(i);
1
1
2
3
5
8
13
21
34
55

(JS1.7 and above, implemented in both SpiderMonkey and Rhino, prefigured this proposal by supporting an unstratified iteration protocol based on Python 2.5. This JS1.7 Iterator extension is fairly popular in spite of some design flaws, and from the exercise of implementing and shipping it we’ve recognized those flaws and fixed them via proxies combined with the iterators strawman.)

The bad news is that the committee did something committees often do: try to compromise between divergent beliefs or subjective value theories.

In this case the compromise was based on the belief that for-in should not become the wanted meta-programmable iteration syntax. The argument is that for-in must always visit string-typed keys of the object, or at least whatever strings the accepted proxy enumerate trap returns in an array. If a Harmony proxy could somehow be enumerated by pre-Harmony for-in-based code, non-string values in the iteration might break the old code.

(The counter-argument is that once you let the proxy handler trap enumerate, a lot can change behind the back of old for-in-based code; also, enumeration is an underspecified mess. But these points do not completely overcome the objection about potential breakage in old code.)

Fear of Change

To fend off such breakage, we could make for-in meta-programmable only in Harmony code — any loop loaded under a pre-Harmony script tag type would not iterate a proxy.

This opt-in protection probably does not resolve the real issue, which is whether syntax can have its semantics changed much (or at all) in a mature language such as JS, which is being evolved via mostly-compatible standard versions in multi-year cycles.

I acknowledged during the meeting that we would not make progress without trying to agree on new syntax. This was too optimistic but I wanted to discover more about the divergent beliefs that made extending for-in via proxies a showstopper.

A quick whip-round the room with an empty cup managed to net us loose change from latter-day Java and C++:

for (var i : x)   // or let i, or just i for any lvalue i
     ...

as our meta-programmable “new syntax”. Bletch!

Not to worry. For-colon is probably not going to fly for some reasons I raised on es-discuss, but it also should die a deserved death as a classic bad compromise forged in the heat of a committee meeting.

The difficulty before us is precisely this how-much-to-change question.

ES5 strict mode already changes runtime semantics for existing syntax (eval of var no longer pollutes the caller’s scope; arguments does not alias formal parameters; a few others), for the better. Unfortunately, developers porting to "use strict" must test carefully, since these are meaning shifts, not new early errors.

My point is that syntactic and semantic change has happened over the last 15 years of JS, it is happening now with ES5 strict, and it will happen again.

Change is Coming

We believe that future JS, the Harmony language, must include at least one incompatible change to runtime semantics: no more global object at the top of the scope chain. Instead, programmers would have lexical scope all the way up, with the module system for populating the top scope. By default, the standard library we all know would be imported; also by default in browsers, the DOM would be there.

Can the world handle another incompatible change to the semantics of existing syntax, namely the for-in loop?

There are many trade-offs.

On the one hand, adding new syntax ensures no existing code will ever by confused, even if migrated into Harmony-type script. On the other, adding syntax hurts users and implementors in ways that combine to increase the complexity of the language non-linearly. The chances for failure to standardize and mistakes during standardization go up too.

What’s more, it will be a long time before anyone can use the new syntax on the web, whereas for-in and proxies implementing well-behaved iterators could be used much sooner, with fallback if (!window.Proxy).

Utlimately, it’s a crap shoot:

  • Play it safe, enlarge the language, freeze (and finally standardize, ahem) the semantics of the old syntax, and try to move users to the new syntax? or
  • Conserve syntax, enable developers to reform the for-in loop from its enumeration-not-iteration state?

All this is prolog. Perhaps the “play it safe” position is right. And more important, what if new syntax could be strictly more usable and have better semantics?

New Clothes and Body

Here’s my pitch: committees do not design well, period. Given a solid design, a committee with fundamental disagreements can stall or eviscerate that design out of conservatism or just nay-saying, until the proposal is hardly worth the trouble. At best, the language grows larger more quickly, with conservative add-ons instead of holistic rethinkings.

I’m to blame for some of this, since I’ve been playing the standards game with JS. Why not? It seems to be working, and the alternatives (ScreamingMonkey, another language getting into all browsers) are nowhere. But I observe that even for Harmony, and notably for ES5, much of the innovation came before the committee got together (getters, setters, let, destructuring). Other good parts of ES5 and emerging standards came from strong individual or paired designers (@awbjs, @markm, @tomvc).

And don’t get me wrong: sometimes saying “no” is the right thing. But in a committee tending a mature but still living programming language, it’s too easy to say “no” without any “but here’s a better way” follow-through. To be perfectly clear, TC39 members generally do provide such follow-through. But we are still a committee.

I want to break out of this inherently clog-prone condition.

So, given the concern about changing the meaning of for-in, and the rise of wrist-friendly “unsyntax” (Ruby, Python, CoffeeScript) over the shifted-keystroke-burdened C-family syntax represented by JS, why not make opting into Harmony enable new syntax with the desired meta-programmable semantics?

Paren-Free Heads

It would be a mistake to change syntax (and semantics) utterly. VM implementors and web developers having to straddle both syntaxes would rightly balk. There will be commerce between Harmony and pre-Harmony scripts, via the DOM and the shared JS object heap. But can we relax syntactic rules a bit, and lose two painfully-shifted, off-QWERTY-home-row characters, naming () in control structure heads?

for i in iter {
    // i is a value of any type;
}

Here’s your new syntax with new semantics!

We can simplify the iterator strawman too. If you want to iterate and not enumerate, use the new syntax. If you want to iterate keys (both “own” and any enumerable unshadowed property names on prototypes), use a helper function:

for i in keys(o) {
    // i is a string-typed key
}

The old-style for (var i in o)... loop only traps to enumerate. Large/lazy proxies? Use the new for k in keys(o) {...} form.

Are the braces required? C has parenthesized head expressions and unbraced single-statement bodies. Without parens, a C statement such as

if x
    (*pf)(y);

would be ambiguous (don’t try significant newlines on me — I’ve learned my lesson :-/). You need to mandate either parens around the head, or braces around the body (or both, but that seems like overkill).

So C requires parens around head expressions. But many style guides recommend always bracing, to ward off dangling else. Go codifies this fully, requiring braces but relieving programmers from having to parenthesize the head expression.

I swore I’d never blog at length about syntax, but here I am. Syntax matters, it’s programming language UI. Therefore it needs to be improved over time. JS is overdue for an upgrade. So my modest proposal here is: lose the head parens, require braces always.

You could argue for optional braces if there’s no particular ambiguity, e.g.

if foo
    bar();

But that will be a hard spec to write, a confusing spec to read, and educators and gurus will teach “always brace” anyway. Better to require braces.

Pythonic significant whitespace is too great a change, and bad for minified/crunched/mangled web scripts anyway. JS is a curly-brace language and it always will be.

Implicit Fresh Bindings

Another win: the position between for and in is implicitly a let binding context. You can destructure there too, but whatever names you bind, they’ll be fresh for each iteration of the loop.

This allows us to solve an old and annoying closure misfeature of JS:

js> function make() {
    var a = [];
    for (var i = 0; i < 3; i++)         a.push(function () { return i; });     return a; } js> var a = make();
js> print(a[0]());
3
js> print(a[1]());
3
js> print(a[2]());
3

Changing var to let in the C-style three-part for loop does not help.

But for-in is different, and in Harmony we (TC39) believe it should make a fresh let binding per iteration. I’m proposing that the let be implicit and obligatory. And of course the head is paren-free, so the full fix looks like this:

js> function make() {
    var a = [];
    for i in range(3) {
        a.push(function () { return i; });
    }
    return a;
}
js> var a = make();
js> print(a[0]());
0
js> print(a[1]());
1
js> print(a[2]());
2

Part of the Zen of Python: “Explicit is better than implicit.” Of course, Python has implicit block-scoped variable declarations, so this is more of a guideline, or a Zen thing, not some Western-philosophical absolute ;-). Having to declare an outer or global name in Python is therefore an exception, and painful. Like the sound of one hand slapping your face.

Of course JS shouldn’t try to bind block-scoped variables implicitly all over the place, as Python does; once again, that would be too great a change. But implicit for-in loop let-style variable declaration is winning both as sensible default, and to promulgate the closure-capture fix.

Comprehensions

When we implemented iterators and generators in JS1.7, I also threw in array comprehensions:

js> squares = [i * i for (i in range(10))];
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
js> odds = [i for (i in range(20)) if (i % 2)]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

At first I actually implemented paren-free heads for the for-in parts in the square brackets, but when I got to the optional trailing if I balked. Too far from JS, and in practical terms, a big-enough refactoring speed-bump for anyone sugaring a for-in loop as a comprehension. But paren-free Harmony rules:

js> squares = [i * i for i in range(10)];
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
js> odds = [i for i in range(20) if i % 2]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

The same win applies to generator expressions.

Thanks

Thanks to TC39 colleagues for their general excellence — we’re a committee but I’ll try not to hold that against any of us.

Thanks especially to @AlexRussell and @arv, who at last week’s meeting brought some attitude about improving syntax and semantics in Harmony that I fought at first (for fear of the committee opening up all design and compatibility constraints and failing to reach Harmony). Their attitude stimulated me to think outside the box, and outside the parens.

Some of you may be thinking “this is crazy!” Others of you will no doubt say “more! more!” I have some other thoughts, inspired by TC39 conversations, that could help make Harmony a better language without it being over-compatible warm beer, but I’ll save them for another post.

My point here is not to rush another syntax strawman through TC39, but to stimulate thinking. I’m serious about paren-free FTW, but I’m more serious about making Harmony better through judicious and holistic re-imaginings, not only via stolid committee goal-tending.

/be

Proxy Inception

After marinating for a few months, my JSConf.eu slides:

(Mobile/No-Flash version)

These are based directly on the excellent work of Mark Miller and Tom Van Cutsem, who developed the harmony:proxies proposal that is now approved for the next major iteration of the JavaScript standard (ECMA-262, probably edition 6 but we’ve learned the hard way not to number prematurely — anyway, approved for “ECMAScript Harmony” [my original Harmony-coining post]).

Harmony Proxies are already prototyped in Firefox 4 betas, thanks to Andreas Gal.

When I reached the “meta-level shifting” slide:

meta-level-shifting

someone in the audience tweeted about how my talk was like Inception (github-sourced simulator). Meta-meta dreams within dreams (warning: meta-to-the-4th-shifting leads to Limbo).

The money-shot slide in my view is:

selective-interception

which depicts how Proxies finally level the playing field between browser implementors using burned-into-browser-binaries C++ and web developers using downloaded JS.

It’s hard to overstate how this matters. The DOM (IE’s for sure, but all of them, back to the original I hacked in Netscape 2) suffers from its “VM territory” privileges, which have been abused to make all kinds of odd-ball “host objects”. Proxies both greatly reduce the weirdness of host objects and let JS hackers emulate and even implement such objects.

Novice JS hackers and all JS programmers happy at the base level of the language need not worry about the details of Proxies. Proxies cannot break the invariants that keep the JS lucid dream unfolding on stage. Specifically, you can’t hack traps onto an existing non-proxy object — you can only create a new proxy and start using it afresh, perhaps passing it off as a preexisting kind of object that it emulates [1].

But when you need to go backstage of the dream and change the rules without breaking the dreamer’s illusion, by interceding on every get, set, call, construct, etc., then Proxies are indispensable.

Firefox 4 is using Proxies to implement all of its security wrappers.

Long-time SpiderMonkey fans will ask “why no __noSuchMethod__” (or: why not also have a noSuchMethod or invoke trap, or a flag to get telling when it is trapping a get for the entire callee part of a call expression)? The short answer is to keep the set of handler traps minimal in terms of JS semantics (modulo scalability), which do not include “invoke-only methods”. The longer answer is on es-discuss.

/be

[1] Inside the engine, a clever trick from Smalltalk called becomes is used to swap a newborn Proxy and an existing object that has arbitrarily many live references. Thus an object requiring no behavioral intercession can avoid the overhead of traps until it escapes from a same-origin or same-thread context, and only if it does escape through a barrier will it become a trapping Proxy whose handler accesses the original object after performing access control checks or mutual exclusion.

The local jargon for such object/Proxy swapping is “brain transplants”.