We have been busy, mostly fixing bugs for stability, but also winning a bit more performance, since TraceMonkey landed on mozilla-central, from which Firefox 3.1 alpha-stage nightly builds are built. Tonight’s builds include a fix for the bug that ilooped a SunSpider test (my apologies to those of you who suffered that bug’s bite).
But what I’m sure everyone wants to know is: how do we compare to V8?
Here are the results from head-to-head SunSpider on Windows XP on a Mac Mini and Windows Vista on a MacBook Pro, testing against last night’s Firefox automated build and yesterday’s Chrome beta:
We win by 1.28x and 1.19x, respectively. Maybe we should rename TraceMonkey “V10” ;-).
Ok, it’s only SunSpider, one popular yet arguably non-representative benchmark suite. We are not about to be braggy. (“Don’t be braggy” is our motto here at Mozilla ;-).)
But it’s worth digging deeper into the results. Let’s look at the ratios by test:
We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.
This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven’t had enough hours in the day to get to it, but it’s “next”.
This reminds me: TraceMonkey is only a few months old, excluding the Tamarin Tracing Nanojit contributed by Adobe (thanks again, Ed and co.!), which we’ve built on and enhanced with x86-64 support and other fixes. We’ve developed TraceMonkey in the open the whole way. And we’re as fast as V8 on SunSpider!
This is not a trivial feat. As we continue to trace unrecorded bytecode and operand combinations, we will only get faster. As we add recursion, trace-wise register allocation, and other optimizations, we will eliminate the losses shown above and improve our ratios linearly across the board, probably by 2 or greater.
I’ll keep updating the blog every week, as we do this work. Your comments are welcome as always.
V8 is great work, very well-engineered, with room to speed up too. (And Chrome looks good to great — the multi-process architecture is righteous, but you expected no less praise from an old Unix hacker like me.)
Anyway, we’re very much in the game and moving fast — “reports of our death are greatly exaggerated.” Stay tuned!
30 Replies to “TraceMonkey Update”
No doubt about it, Chrome is pretty damn fast, but they aren’t quite up to par on things like Acid 3, SVG and a couple other DOM issues and rendering artifacts. I hope speed isn’t their primary focus… (Interesting to note that the memory consumption of the browser also has a correlation to the size of application on the monitor(s))
Thank you very much for running these tests Brendan. When I read about V8 yesterday one of my very first questions was how it compares to the latest Tracemonkey builds.
I am currently working on a project using the Spidermonkey JS API and it is good news to hear that V8 does not beat Tracemonkey out of the water and in fact the race is quite close. You guys are doing great work.
One more thing: Is it possible with the current nightly builds to embed Tracemonkey (using the classic Spidermonkey JS API)?
@Christian: yes, you can embed TraceMonkey. We are not breaking the JS API intentionally in any way you should notice.
One change we did make: the standard global properties for Object, Math, etc. used to be monitored by the global object class’s getProperty and setProperty hooks, but now use JS_PropertyStub. Again I do not expect embedders to notice this, and it’s better for performance and integrity of the standard bindings. The default get/set monitoring is an ancient JS API “feature”, going back to 1995, but not really a wanted feature AFAICT.
Any other API changes should be documented on MDC via dev-doc-needed keywords on bugs.
Maybe it’s because I’m running a trunk build checked out a couple of hours ago and not a special branch, but on my Vista machine, Tracemonkey (JITting on) takes 2156.4ms +/- 4.4% and Chrome 1788.8ms +/- 2.9%.
Siddharth: PGO does a lot for our RegExp code, and it is on in the automated builds. If you built your own Minefield, you didn’t get it by default.
Yes, you’re right 🙂 An automated build gives 1757.4ms +/- 4.7%.
This is great stuff! It’ll be interesting to see where Mozilla and Google take their respective implementations.
Could you please explain a bit more what the following means?
“We have a plan, to trace recursion (not just tail recursion).”
Does this mean tail recursion which is also just-in-time compiled?
Peter: recursion in general. Tail recursion is just a loop, which Trace Trees eat for breakfast. Non-tail is more challenging but we’re doing it.
To increase regular expression performance, have mozilla considered implementing the “Thompson’s NFA” algorithm described in the following page?
Breton: yes, I mentioned Ken Thompson’s original grep to Andreas the other day. We had an intern study regular expression tracing in the Tamarin Tracing context, too. We need to get recursion and a few other things done, but I think regexp tracing (with ES3-compatibility to complicate things, to be sure) is inevitable.
If V8 was created with big web apps in mind, why are they doing whole-file JIT? With really large apps this will inevitably be a perf hit. As you are saying V8 is well-engineered, I must have misunderstood their «Intro» on the Google Code site.
@Γριφεγ: V8 can do lazy function-by-function compilation (without any interpreter) based on what is called. In Chrome all source is parsed into ASTs but AIUI not necessarily compiled in a whole-file or whole-script-tag-content sense.
“Well-engineered” is evident from a read through the code. Check it out yourself.
A JIT engine in Firefox 3.1 will be so great, making Firefox on the top of my list again. However, I guess Mozilla’s decision in using JIT compilation has so much room for optimization when we compare it with things like JIT compilation in dot net framework. The performance of .net code is super great when we think of it as a jitted code, and in some cases performs better than its native counter parts.
Why in the world is it faster on a Mac Mini than on a Mac Book Pro?! Is it because of XP/Vista?
I’m glad something is being done and also that you’re confident to hold up with V8.
Though I’m curious how results from the V8 Benchmark Suite fit into the picture. Surely they’ve chosen to include benchmarks where V8 is superiour. But still, I get an overal score of 181 for todays 3.1 nightly of Firefox when jit is off, and only 158 when jit is on—but blazing 1646 for Chrome.
What type of optimization will get Tracemonkey on par?
Second point, more JS speed is great and everything else is not your business. But I’m still bothered by Firefox’ DOM speed which comes into play in nearly every usage as a browser. Here Opera, Safari, Chrome are ten times faster.
So to really get a performance boost for the whole app tuning JS just isn’t enough. And sadly I can’t see anything to be done in Gecko performance-wise.
In the above test, Firefox (all versions including 3.1a2) are far slower than any other modern browser, including IE.
@RoBoMe: the V8 benchmarks are heavier on recursion and function calling than others, and we fall off trace on them, mainly due to recursion but also due to untraced native methods. We’re looking at them now in detail and working through the bugs.
@_ck_: yes, you are describing layout (rendering) and DOM, not core JS, performance bottle-necking.
You might be describing an O(n^2) layout bug we should fix, which would be a spot-fix, not a general performance shift on the order of TraceMonkey.
The Gecko DOM is a huge, ripe target for improvement via tracing, and otherwise (Jason Orendorff’s quickstubs just landed, I’m not sure you tested with a build that includes them). If you made a test and can share it, please file a bug and attach it (cc: me — brendan@moz will autocomplete when you submit the bug). We’ll study it and see what should be optimized.
Chrome does seem to cold start up faster.
I really like the multi-process choice. Out of all the chrome ideas, this is the best. I think Firefox should follow suit.
TBH, it’s flash sites that really kill the browser and slow it down. I hope adobe works harder to make it more compatible
I think you should compare the latest nightly build of Firefox to the latest nightly build of Chromium, that’d be a more valid comparison.
CVertex: agreed on MP, we’re looking into it.
waleofsuous: check the date on this blog post: Chrome was just out, and it hadn’t changed that day. When I give updates, I’ll blog latest tip to tip perf, but until then you’ll have to benchmark for yourself.
Again, and for the last time: this is not some world series or playoff, so don’t get too worked up. The main point is that we’re competitive and working on even bigger wins in the near term. And I see that webkit.org is too (SquirrelFish Extreme!), which is good. Microsoft must be feeling the heat and seeing the light.
The competitive trends should become clearer over time, and if V8 ends up fastest when the different competing engines approach their designs’ asymptotes, great — we’ll probably be right behind, so developers can count on excellent performance across most browsers. But we may well be ahead, because we can inline and speculate better.
Hi Brendan, I’m considering reworking Thomas Thurman’s Gnusto engine, which has some form of JIT. I looked at Andreas Gal’s site, but I fear it all looks far above my level (and probably not too applicable for a JIT compiler running in JS?)
Do you know of any good resources explaining the basic theory and methods behind JIT compilation?
hmm… what’s this “SquirrelFish Extreme!” you are talking about here???
Just a reminder that TraceMonkey fans are looking forward to your next weekly update
latest Sunspider comparatives :
Safari, WebKit nightly r36309 : 1763.8ms +/- 2.3%
Minefield/3.1b1pre Gecko/20080912031847, TraceMonkey jit.chrome jit.content set to true : 2055.6ms +/- 3.4%
@waleofsuous, @muonis: Those last two comments look exactly the same — I will delete the later one if you can confirm, or in any event (since it is redundant). The results also look like Mac numbers — please confirm.
These numbers show that SquirrelFish Extreme has landed in the WebKit nightlies — good job SFX team! JITting regexp evaluation really pays off on SunSpider.
TraceMonkey recursion-handling and regexp compilation are under way. More after I return from traveling.
Performance in the nightlies is slipping. I ran a test between 3.1 10-12-2008 trunk & Chrome 0.2.149.30. Results are not so promising now…
** TOTAL **: *1.82x as slow* 2380.8ms +/- 1.3% 4330.4ms +/- 0.7% significant
Any update here? About the tracemonkey and trace into DOM issues.
Seems no newsworthy progress on Tracemonkey. That’s sad.
Plenty newsworthy, just me being too busy and lame to blog :-P. I will have new posts this week.
What’s the current status of the project introducing an info-flow monitor in the next JS engine? Do you have a project page for that now? The latest info I have is https://www.dagstuhl.de/Materials/Files/09/09141/09141.EichBrendan.Slides.pdf