To me, a nerd from a tender age, this is something between a curse and a joke. (See if you are in my camp: isn’t the green chick hotter?)
Brendan Eich convinced his pointy-haired boss at Netscape that the Navigator browser should have its own scripting language, and that only a new language would do, a new language designed and implemented in big hurry, and that no existing language should be considered for that role.
Who knows, and it’s hard to care, but in this week of the tenth anniversary of mozilla.org, a project I co-founded, I mean to tell some history.
As I’ve often said, and as others at Netscape can confirm, I was recruited to Netscape with the promise of “doing Scheme” in the browser. At least client engineering management including Tom Paquin, Michael Toy, and Rick Schell, along with some guy named Marc Andreessen, were convinced that Netscape should embed a programming language, in source form, in HTML. So it was hardly a case of me selling a “pointy-haired boss” — more the reverse.
Whether that language should be Scheme was an open question, but Scheme was the bait I went for in joining Netscape. Previously, at SGI, Nick Thompson had turned me on to SICP.
What was needed was a convincing proof of concept, AKA a demo. That, I delivered, and in too-short order it was a fait accompli.
Of course, by the time I joined Netscape, and then transferred out of the server group where I had been hired based on short-term requisition scarcity games (and where I had the pleasure of working briefly with the McCool twins and Ari Luotonen; later in 1995, Ari and I would create PAC), the Oak language had been renamed Java, and Netscape was negotiating with Sun to include it in Navigator.
The big debate inside Netscape therefore became “why two languages? why not just Java?” The answer was that two languages were required to serve the two mostly-disjoint audiences in the programming ziggurat who most deserved dedicated programming languages: the component authors, who wrote in C++ or (we hoped) Java; and the “scripters”, amateur or pro, who would write code directly embedded in HTML.
Whether any existing language could be used, instead of inventing a new one, was also not something I decided. The diktat from upper engineering management was that the language must “look like Java”. That ruled out Perl, Python, and Tcl, along with Scheme. Later, in 1996, John Ousterhout came by to pitch Tk and lament the missed opportunity for Tcl.
I’m not proud, but I’m happy that I chose Scheme-ish first-class functions and Self-ish (albeit singular) prototypes as the main ingredients. The Java influences, especially y2k Date bugs but also the primitive vs. object distinction (e.g.,
String), were unfortunate.
Back to spring of 1995: I remember meeting Bill Joy during this period, and discussing fine points of garbage collection (card marking for efficient write barriers) with him. From the beginning, Bill grokked the idea of an easy-to-use “scripting language” as a companion to Java, analogous to VB‘s relationship to C++ in Microsoft’s platform of the mid-nineties. He was, as far as I can tell, our champion at Sun.
Kipp Hickman and I had been studying Java in April and May 1995, and Kipp had started writing his own JVM. Kipp and I wrote the first version of NSPR as a portability layer underlying his JVM, and I used it for the same purpose when prototyping “Mocha” in early-to-mid-May.
Bill convinced us to drop Kipp’s JVM because it would lack bug-for-bug compatibility with Sun’s JVM (a wise observation in those early days). By this point “Mocha” had proven itself via rapid prototyping and embedding in Netscape Navigator 2.0 , which was in its pre-alpha development phase.
The rest is perverse, merciless history. JS beat Java on the client, rivaled only by Flash, which supports an offspring of JS, ActionScript.
So back to popularity. I can take it or leave it. Nevertheless, popular Ajax libraries, often crunched and minified and link-culled into different plaintext source forms, are schlepped around the Internet constantly. Can we not share?
One idea, mooted by many folks, most recently here by Doug, entails embedding crypto-hashes in potentially very long-lived script tag attributes. Is this a good idea?
Probably not, based both on theoretical soundness concerns about crypto-hash algorithms, and on well-known poisoning attacks.
A better idea, which I heard first from Rob Sayre: support an optional “canonical URL” for the script source, via a
share attribute on HTML5 <script>:
<mce:script mce_src=”https://my.edge.cached.startup.com/dojo-1.0.0.js” shared=”https://o.aolcdn.com/dojo/1.0.0/dojo/dojo.xd.js”>
If the browser has already downloaded the
shared URL, and it still is valid according to HTTP caching rules, then it can use the cached (and pre-compiled!) script instead of downloading the
This avoids hash poisoning concerns. It requires only that the content author ensure that the
src attribute name a file identical to the canonical (“popular”) version of the library named by the
shared attribute. And of course, it requires that we trust the DNS. (Ulp.)
This scheme also avoids embedding inscrutable hashcodes in
script tag attribute values.
Your comments are welcome.
Yet here we are. The web must evolve, or die. So too with JS, wherefore ES4. About which, more anon.
Firefox 3 looks like it will be popular too, based on space and time performance metrics. More on that soon, too.
27 Replies to “Popularity”
I like the hash idea.
I ties in very well with the ideas put forward by Van Jacobson in “A New Way to look at Networking” – https://video.google.com/videoplay?docid=-6972678839686672840
In it he claims the internet, which he helped develop, is no longer good enough. TCP is stream oriented, but most applications are data oriented. By using data labels (like SHA1) instead of hostnames and ports you open up possibilities.
The video is pretty interesting by itself.
It also reminded me of git’s use of SHA1 as a data label. And also of BitTorrent.
If you are worried about cache poisoning, then providing (optional) extra security could be provided via HMAC-SHA256 or similar, so those cases where you really want to be sure it’s the right data.
If the browser supported this for all content, then Van Jacobson’s idea would be realised at the top level, and then could be later baked into the infrastructure.
The “canonical URL” idea also depends on trusting the source not to modify the library at that URL. Imagine the dependency hell if people weren’t careful about that; it’d be like a world where applications are locally installed but shared libraries are NFS mounted over the Internet.
David: HMAC is great for message-passing, but since such schemes are vulnerable to birthday, collision, extension, and other attacks, and the research is not yet done, burning hashes into script tags that live for years or decades still seems like a risk to avoid, if we can.
John: NFS (shudder, I know it well, having done the first SGI port back in ’85) has its issues, but at a high level it’s the right model. The CDN solution of highly-available standard JS libraries still seems attractive to me, and AOL has done it for Dojo (presuming AOL propers in the years ahead…). This is what all the major search/web-app sites do (Y! has date-named complete versions of scripts and other resources, Google can serve flat files without breaking a sweat, etc.). The web should be JS’s hard drive, yet because of differential minification and lack of common, well-known CDN locations, too many apps still pay for their own hosting and edge caching.
Of course, if @shared wins, its reason for existing goes away. If you could count on connectivity and nine nines of availability, and if those CDN-hosted libraries were already in place, script authors could simply use @src and be “done”. But the world is not that ideal; @shared is meant to lower costs on the evolutionary pathway to that highly-available CDN-as-NFS world.
The problem of securing the CDN machines seems more easily solved, but I don’t mean to trivialize it. Securing the distributed hash-equated scripts in all the pages that might use @hash, on the other hand….
Wait, how does your proposal prevent poisoning? It actually seems to make it considerably easier: evil site has script src=”evil.js” shared=”https://o.aolcdn.com/dojo/1.0.0/dojo/dojo.xd.js”. If you visit that site first, and then visit edge.cached.startup, it will go straight for the cache and run the evil script. Am I missing something?
Bertrand: only the @shared value would be shared among script tags. The @src would be loaded only if there was no cache entry for @shared.
Thanks for functional features in JS, anyways.
The second worst-case scenario for the shared attribute:
No-one uses it.
The worst-case scenario:
Everyone uses it. i.e. @src and @shared as per the example code. Then the canonical-url in @shared would never get downloaded into the browser-cache.
Unlikely scenarios, but too funny not to mention.
A more relevant issue is that @shared reduces the library-author’s control over how his library is used. The page-author is basically saying “Use the canonical source but don’t let my web-page appear in the Referer field in the logs.”
A minor issue: when the page-author changes the version of the library he has to ensure two URLs are updated.
A couple of questions related to dynamic script insertion:
– What happens when a script element has @shared but no @src? Does onerror get fired if the shared url isn’t in the cache?
– What if the script element has @shared, no @src, and text? Does it act as an inline-script unless the shared url is in the cache?
Out of interest, are scripts currently stored in the cache pre-compiled? If not, is it on the cards?
Sean: @shared would be background-downloaded at low priority if not already cached.
Versioning within the URL is already standard, see the Dojo and YUI examples. It’s the only way to go.
I agree @shared without @src is a bug, given the legacy installed base.
Scripts are not yet stored precompiled. I meant to write a ccache(1) analogue for JS in Firefox 3 but ran out of time due to other work, which is the subject of the next post. Should happen soon enough, but it will be interesting to measure the hit rate. Too much custom minification and even my never-cleared cache may see more misses than hits.
Now that you mention it, that @shared would always be used for cache maintenance is obvious.
I wasn’t calling @shared without @src a bug. I was wondering if it would be a way to check whether a script isn’t in the cache.
another chapter in brendan’s voyage then 😉
anyways good luck with the launch. scheme is pretty cool and deserves increased attention. hopefully hope and history will rhyme on the web soon.
Idea with shared attribute is good, but it lacks two key stones: versioning and automatic dependency management.
Versioning is needed to tell to browser: “My code is designed to work with libfoo-1.4.2”. And to declare on shared site: “libfoo-1.5.0 is not compatible with code designed for libfoo=1.0; libfoo 1.5 needs bar>=1.3, baz>=1.0”.
It is also good idea to sign shared code. Otherwise domain of shared code can expire, somebody can replace old site by it own and crack any selected site in entire internet in two clicks.
Check documentation about existing distributed package management systems for more information. You even can integrate one of them (apt/smart/yum) into browser instead of implementing wheel again and again.
Sean wrote: “I wasn’t calling @shared without @src a bug. I was wondering if it would be a way to check whether a script isn’t in the cache.”
The legacy installed base expects a script tag without @src to contain code in-line, which will be evaluated by the browser. So we can’t impute @src from @shared if @src is missing.
By “bug” I meant “probably not what the author intended, but the browser must carry on as before.” The background download of the @shared attribute could be started even without @src — that might be handy. You wouldn’t be able to check the cache this way, though.
Volodymyr: I don’t propose to climb more than one mountain at once. Signed scripts were tried, they are a general pain, and so for now they have “failed”. They could make a come-back, certainly.
Likewise, versioning is already handled by CDNs such as AOL’s and Yahoo’s using URL conventions in the path part. We should keep talking about how to abstract and improve versioning, but it is not easy enough to “solve” that we should block progress on sharing of popular libraries waiting for versioning innovation.
Thanks – I even found this post interesting enough to be worthwhile scrolling left and right to read every line, but I would have much preferred not having to do that.
(In reality I just disabled the stylesheet, but that doesn’t sound as good)
Bertrand Le Roy: I don’t moderate comments, so if you had trouble posting another one, please retry (and let me know what error you saw, so I can try to get the MT installation fixed).
Your blog misuses “diatribe” (look it up, and lighten up — my memoir here even had a fine bit of Broadway for fun) and completely misses the point about the @hash proposal being unsound. Doug wrote:
“Second, browsers can cache by hash code. If the cache contains a file that matches the requested hash=, then there is no need to go to the network regardless of the url.”
This is vulnerable to the poisoned message attack. The browser cannot safely index a cache by the value of @hash in order to avoid fetching the given URL.
“Poisoning” a URL can happen via the DNS as I noted, but that’s a different and much lesser threat. The DNS can be auto-updated and administered. Stale @hash values copied into thousands to millions of web pages cannot.
Hey Brendan. I think you’re right, everyone here needs to lighten up. English not being my first language, I did look up diatribe and even after doing that I pretty much stand behind my use of the word. Doug and you have a long history of heated debates (which is fine, I actually love a good argument) and Doug apparently started this one to which you answered with some venom of your own… But we’re not here to discuss the extent of my vocabulary (or lack thereof).
Ironically (and I swear that’s the truth), I can’t find a trace of your comment on my own post, even in the spam folder. I have no idea what happened there.
So on to being on-topic 🙂 I personally don’t buy the argument of poisoning by creating an evil script that has the same signature as the legit one: this attack remains mainly impractical despite the existing proofs of concept, and actually quite a lot of software has been relying on crypto hashes with success.
Even if you buy into that argument, I’ve shown on my own post, which you’ve read, a third approach which I think avoids that problem and also solves the problem of having a weird crypto string in the markup (which I admit is a legitimate concern that you raised). I’m not a security expert so I admit I may be missing something here. If I’m wrong I’d like to understand how.
Finally, as I’ve said before, I must be missing something about your approach because I still don’t see how it’s more secure than Doug’s. What prevents an attacker to have a page that poisons the cache of any visitor that goes there first with a script that has the same shared attribute as a legitimate script?
Bertrand and I have been emailing, and I believe he no longer backs “diatribe”. I’m also explaining the @shared idea in more detail. It was sketched here in a way that required some thought on the reader’s part, and the willingness to assume I’m not dumb ;-).
I’ve long recommended hash matching at the HTTP header level. It requires some server support, but that would be easily done for needed cases, and it would work across all content types and avoids the copy-and-paste problem.
Tom: hash matching for integrity could be helpful, but how do you avoid re-downloading different Dojo or jQuery copies, in order to have a content-addressed cache? If the hash is in the response header, then you’ve already started (if not finished) the download.
Use a variation on “If-None-Match”. But rather than the server-dependent etag, use an externally defined algorithm like SHA-1. The browser would calculate the hash for resources like large JS files and store a mapping from base file name to SHA-1 hash. If there was another request for a file with the same base file name (or perhaps heuristics based on a subset of the path — and I think there could be some simple pattern here), the browser would send up a header like “If-Not-Match-Hash” to the server along with the original file request.
Interesting discussion here.
I’m not sure the poisoned message attack applies in the case of Doug’s suggestion. I thought that it required you to be able to generate both documents in order to find a colliding hash. But, given some arbitrary hash, it’s not necessarily easy to generate another document with the same hash.
Patrick: the problem with long-lived hashes copied all over the web is exactly the longevity combined with what we don’t know (“it’s not necessarily easy”, you wrote) about the strength of the hash algorithm. 13 years ago we had greater faith in the top two hash algorithms.
Did anyone watch the video I linked to? It really is relevant.
Should we not address data direct and the network finds it?
I’d be interested in a any comments.
Van Jacobson in “A New Way to look at Networking” – https://video.google.com/videoplay?docid=-6972678839686672840
Excuse my ignorance, but if you’re going to reference a canonical source for JS code via a @shared tag and another supposedly identical copy via a @src tag, why not just reference the canonical source via the @src tag? That way you don’t have to extend the JS language. If you’re concerned that the canonical source may take time to download, use a caching proxy server like everyone else does. If you’re concerned that your caching proxy server may not have a copy when it’s needed, you’re solving a non-problem. If it’s not there, it’s not (often) needed.
Trevor: “you” the average consumer does not know about caching proxies. CDNs have bad days too, and startups want to provision dedicated edge presence, or risk relying on ec2. @shared helps both caching and failover.
The good/original quote is funny, but according to one source it is misattributed to Samuel Johnson. See: