12 February 2007

Threads suck

This is not an original thought, but I write with some authority here.

I hacked Unix kernel code out of grad school at SGI, in SGI’s “good old days” (1985-1992). Among other things, we took single-threaded (ST) kernel code and multi-threaded (MT) it on SGI’s SMP boxes. I won a free trip to New Zealand and Australia in 1990 along with Bent Hagemark, with kernel source on magtape in our hot little hands, on account of others’ bugs in adapting some single-threaded (ignoring interrupts) AT&T “Streams” code to SGI’s SMP kernel, and made fixes in the field (thanks to the SGI sales guys in Brisbane, we even got two nights on the Gold Coast in compensation — not long enough!).

You must be this tall to hack on threaded systems, and that means most programmers should run away crying. But they don’t. Instead, as with most other sharp tools, the temptation is to show how big one is by picking up the nearest ST code and jamming it into a MT embedding, or tempting race-condition fate otherwise. Occasionally the results are infamous, but too often, with only virtual fingers and limbs lost, no one learns.

Threads violate abstractions six ways to Sunday. Mainly by creating race conditions, deadlock hazards, and pessimistic locking overhead. And still they don’t scale up to handle the megacore teraflop future.

We can hope for better static analyses to find all races. In the real world, the code is C or C++ and there’s no hope for static salvation. Sure, some languages try to put deadlocks in a syntactic cage, but that walks right into the overhead problem, in spite of heroic VM-based optimizations. Unexpected costs, even if constant or linear, can sink any abstraction. For example (still piquant to Mozilla hackers busy deCOMtaminating), virtual method calls cost; they should be avoided where you’re writing hardware. The same goes for locks: not all abstractions must be MT-safe; some must be ST and fast.

So my default answer to questions such as the one I got at last May’s Ajax Experience, “When will you add threads to JavaScript?” is: “over your dead body!”

There are better ways. Clueful hackers keep rediscovering Erlang. Then there is STM. One retro stylist I know points to an old language-based solution, Hermes.

A requirement for JS3 (along with hygienic macros) is to do something along these more implicit lines of concurrency support. In all the fast yet maintainable MT systems I’ve built or worked on, the key idea (which Will Clinger stated clearly to me over lunch last fall) is to separate the mutable unshared data from the immutable shared data. Do that well, with language and VM support, and threads become what they should be: not an abstraction violator from hell, but a scaling device that can be composed with existing abstractions.

So here’s a promise about threads, one that I will keep or else buy someone a trip to New Zealand and Australia (or should I say, a trip here for those over there): JS3 will be ready for the multicore desktop workload.

Does this mean we won’t be hand-coding MT Gecko? I think Graydon said it best: “I’d rather eat glass.” Most high-level concurrent Mozilla programming will be in JS3 on the back of an evolved Tamarin.

26 Responses to “Threads suck”

  1. Jesse says:

    I would have never believed this if not for current events, but you guys might be just the right people to actually make this happen.
    Thanks for sharing, it’s inspiring.

  2. Neil Mix says:

    Right on, Brendan. Thanks for taking a brazen stance on what I’m sure many consider a controversial issue.

  3. DannoHung says:

    Hell yes! I’ve got a personal love of message passing concurrency, so I hope you go with that over STM or some of the other concurrency methods people are talking about.
    Death to threads.

  4. Mike says:

    This may not be strictly on-topic, but what about cooperative multitasking inside today’s single thread?
    I’m a developer of ajax-style applications and I often run into situations where just a couple of “cooperative multitasking”-style features would make things a lot easier:
    1) Allowing programmer to enter a “blocking” state inside a function (like calling alert() but without the dialog window and with a programmatic way of dismissing the blocking state) to be able to wait for a condition before continuing running the rest of the method. This would keep the state in the call stack even if the implemented flow is of an asynchronous nature.
    It would also f ex make it possible to create floating div dialogs that block in the same way as alerts/prompts. Today the latter are your only option when popping up your own dialog at onbeforeunload/onunload.
    2) Letting our single thread do other things, like UI updates and handling user events, while in a blocking operation like the above blocking state and also when doing a synchronous XMLHttpRequest (this would make the latter finally practical to use as today it is virtually killing the window).
    3) Having the Stop button abort a blocking operation, maybe after an extra “do you wish to abort blocking operation?” confirmation. Maybe it should make the blocking operation throw some kind of AbortException. (Again, this would make synchronous XMLHttpRequests possible to use.)
    Some of this is biased more towards the browser than towards core JavaScript, but from the examples maybe you can see what improvements to JS could be made? Or are there other things in the pipeline that will solve the cases I mention?
    (I suppose #1 without #2 is sort of similar to Continuations, but doesn’t solve the same problems.)

  5. Kris Zyp says:

    Aren’t some of these things addressed in FF 3.0? From what I understand FF 3 no longer blocks everything during a synchronous remote call, and would presume it does that with event stacking (put the current event on top the current stack). If that is true, that really seems like FF 3 has opened up this concurrency issue (I think some browsers do this with alert calls). While event stacking seems nice for prevent locking, event stacking can cause some of the same issues that true concurrency causes. Programmers must be aware that a call that might lead to a synchronous XMLhttprequest, may lead to events be handled during the call, and this leads to memory be altered during a call that programmer might have believed to be safe. Is this true, I may have misunderstood this mechanism in FF 3?

  6. Mike says:

    From my limited testing it seems FF 2.0 processes events (f ex timeout events) while blocking in an alert call, while IE7 does not.

  7. Kris Zyp says:

    Brendan, so is the intention to have concurrency be completely transparent? You had mentioned composable memory transactions, and this is not transparent, albiet considerably less error prone than traditional semaphores. Also, Erlang style message passing is not simply an tranparent VM implementation optimization, but seems that at best would require a notable paradigm shift for starting new threads (actually processes, no memory sharing, right?), and at worst a completely different language.
    One can certainly utilize completely transparent concurrency techniques to utilize multi-cores for improved performance (probably with limits in efficiency), although there are certainly those of us who desire more than just all of our CPUS utilized, but when a program is doing conceptually doing multiple things at once, it is often actually easier to code it with multi execution flows. I would love to see at last some of form fibering support. I know that there is discussions about coroutines and continuations that seemed to have been rejected (which would have facilitated fibers), but is it possible to have a simple more stripped down fiber support (ability to start a new fiber and suspend and resume a fiber)? Fibers are very non-hazardous, because they must be explicitly suspended.
    Of course, fibers don’t actually directly get us anywhere with multi-core concurrency, although if transparent concurrency was in the VM to utilize multi-cores, presumably the concurrency algorithm would find more opportunities for concurrency when multiple fibers were running (just because they can often run without very much shared data interchange).

  8. Mike says:

    Yes, I agree with Kris that there should be more use cases considered for JavaScript “multi-tasking” than just maximizing CPU utilization. Personally I advocate features that enhance clarity and make it easier for the programmer to express stateful workflows in the client logic, without having to resort to lots of asynchronous handlers and all their boilerplate code.
    Being able to keep the logic (and state) for a workflow inside one main function that can “block” and wait for some condition to be fulfilled (user input, fetching data from the server, etc), while still processing other user events, would be a great improvement when programming UI:s in JavaScript.

  9. Brendan Eich says:

    Kris: there’s no free lunch. You can’t have concurrency be “completely transparent” if memory is shared, and if you allow processes to communicate in a blocking fashion, you can still have deadlocks. If you instead use non-blocking operations, you can have races and livelocks.
    Shared nothing can be scaled to highly parallel hardware more easily than shared everything. I was struck years ago by the yaws vs. apache demo (http://www.sics.se/~joe/apachevsyaws.html).
    See Neil’s fine work implementing “threads” (really coroutines) using trampolined generators: http://www.neilmix.com/2007/02/07/threading-in-javascript-17/ if you want to avoid a twisty maze of callbacks and their attendant state machine control blocks in Firefox 2.
    Brad: I’m not sure what the regexp comment has to do with any of this, but Brian Crowder is working on jsregexp.c, including better performance. Find him on IRC in #jslang if you want to chat.
    /be

  10. Neil Mix says:

    Thanks for the compliment, Brendan. Yes, of course you’re right, it’s coroutines. I apologize for confusing the nomenclature; sadly “threading” gets a lot more eyeballs and mindshare than “coroutining”. I’ll try to be clearer about the distinction in the future.
    So tell us, do you have any well-formed ideas about the concurrency mechanisms you’re eyeing for JS3? Is the mention of STM a hint or an attempt to elicit feedback? Do you think it will be a well-known paradigm or something new? Curious minds want to know! ;)

  11. Mike says:

    Yes, coroutines is something like what I am talking about. I’ve seen Neil’s work and it is very impressive. Though, as we are all aware of, there are obvious drawbacks emulating a feature like this by generating code, and it would be better to address on the language or library level.
    Actually, the reason I suggested the less specific term “cooperative multitasking”, and not coroutines, was because of your comment on Neil’s article that it couldn’t be done, because of script-native-script call stacks. I therefore attempted to step back and describe the minimal atomic requirements and not getting stuck on having it behave exactly as coroutines. Specifically, my point is that the current “coroutine” should not just yield to execute another coroutine but also yield to the main event loop to allow user events to be handled. Waiting for a user event implies that the current coroutine’s call stack will block until some other code triggers the condition to wake it up.
    I think there is great demand for something along the lines to support this programming paradigm, be it a new language feature or a new native class in the BOM.

  12. Kris Zyp says:

    Just trying to see if there a size limit to comments that is blocking me from commenting…

  13. Kris Zyp says:

    Brendan,
    Neil’s work with coroutines is actually part of the basis of what I was suggesting for threading/fibering. He is effectively creating fibering (cooperative threading) with his usage of coroutines. As you can see from his “threading” library, it is indeed possible to have a form of non-hazardous threads/fibers. I believe that fibering is the correct word, coroutines are the foundation he used to create the fibers, but I know that coroutines appear to be rejected from ES because of the script-native-script problem, but it seems fibers (a somewhat more limited concept) themselves could be implemented with out this problem, and still achieve what UI developers would love to have, conceptually separated execution flows with inline blocking for asynchronous events.

  14. Kris Zyp says:

    I can’t post a long comment here, but here is discussion on transparent concurrency for ES.

  15. Brendan Eich says:

    Mike @3:56am: I was not referring to Neil’s “Narrative JavaScript” work, rather to his Thread.js creation (see http://www.neilmix.com/2007/02/07/threading-in-javascript-17/ — also the next newer item on his blog).
    Kris @7:27am: AFAIK “fiber” is a term originating from Microsoft in Windows NT (please correct me if I am wrong). It’s cute, a little too cute. The long-standing (Simula67 was what, 1967? IIRC coroutine goes back even further) provenance of “coroutine” makes me want to avoid new words for old. But I agree that “cooperative threads” is also used (NSPR even calls these things “user-level threads”).
    Kris @10:17am: thanks for the Speculative Lock Elision paper. It’s really about spinlocks, however. It doesn’t handle the case where you need to synchronize with another thread by waiting until you are notified. That is *not* done by camping on a short-term lock that may in fact be implemented in a multi-core or SMP system via competitive spinning.
    The way to synchronize for an arbitrarily slow, or just too slow at the limit, resource is to use a condition variable (wrapped in a lock of course). Logging and rolling back memory operations don’t help here, unless the writes are so rare that you might as well use message-passing.
    With message passing and shared-nothing, a writing process publishes an append-only memory to its reading process. The reader may block for want of more data, instead of waiting on a condvar. There is no need for explicit locks or condvars; there is no need for memory logging and rollback.
    The counter-argument in favor of shared memory and threads is that message passing overhead can be fatally high. I’ve argued that threads involve fatal risks that can’t be mitigated, whereas we can work on message passing overhead — and message passing style makes code more easy to map onto multiple CPUs and even the GPU.
    Adding memory logging/rollback does not address the general problem because it does not include the case where a thread has to wait indefinitely, or for too many cycles during which to save and undo writes. As an optimization, speculative lock elision is fine, though.
    Your mention of lazy computation and futures and promises reminded me of Alice and ConcurrentML. However, adding laziness to a language with mutation is problematic (I guess this was your point in mentioning promises and futures). As the Haskell people argue, it’s easier to start non-strict and pure and then add special effects as an exceptional “monster in a box” using monads. With JS, the monster is the box.
    More in a bit, I’m still researching.
    /be

  16. Kris Zyp says:

    Brendan,
    I thought that cooperative threads/fibers were subtlety different than coroutines. Coroutines provide the language expressiveness to achieve cooperative threads (CT), but CTs could be achieved in other ways as well. The difference may not be significant for our discussion. However, I think that the paradigm might be helpful for possible implementation. That is, if CTs were actually implemented as real (OS) threads such that they were synchronized so that only one thread could ever be running at once (pausing and resuming cooperatively), then I believe the script-native-script stack problem could be solved. Native C coroutine support would be unnecessary if the threads were OS threads that were paused and resumed.

  17. Kris Zyp says:

    It still seems that one could utilize multiple cores in a language with zero concurrency constructs (even no CTs) through speculative execution. Memory logging/rollback is simply the mechanism to make the speculative execution safely run (in another thread). If this could be done efficiently, with good multi-core utilization, it would certainly provide low risk programming (effectively no shared data/threads hazards).
    Sorry for all the posts, don’t feel that you need to respond to my commments, we are here to hear your thoughts on concurrency.

  18. Mike says:

    @Brendan
    Oh, so you mean THAT “threading-in-javascript-17″ article ;-) [Got those mixed up when commenting from memory instead of actually re-reading the article, sorry...]
    Anyway, it seems we are several UI programmers shouting questions here. Do you reckon whatever concurrency solution you come up with will cover the needs we describe here, or will you mainly focus on the performance on multi-core issue?

  19. Brendan Eich says:

    Kris @8:17am: I tried to respond to your claim about speculative lock elision — read above about condvars and the need to coordinate without spinning (waiting for an indefinite interval, or at least too long to keep a transaction open). If you read the STM considered harmful link I put in the first comment on this blog item, you’ll see that shared memory with transactionality can backfire if you nest transactions and end up contending more than you had hoped. And again, speculation is no way to synchronize, and neither are spinlocks.
    If you say “ok, let’s use speculative lock elision or STM instead of fine-grained spinlocks, but condvars or monitors elsewhere”, then you haven’t really solved the problems with shared memory threads that I listed in this blog item. You may be right about transactions trumping bare locks, though.
    Mike @1:49pm: UI folks should use generators in Firefox 2, perhaps with Neil’s small Thread.js on top if they like it. Feedback based on that experience would be very helpful. You’re right that the multicore future is looming, but it does not mean UI hackers will be stuck on one core thread, using cooperative generators only.
    As noted, I’m still researching, but it should be clear from what I’ve written so far that I think message passing shared nothing beats shared memory preemptively scheduled threads. And while STM may be the way to go if you have to use threads, it does not solve all of the concurrency problems we face, or even most of the problems we will face in the future due to ridiculously parallel hardware.
    /be

  20. Brendan Eich says:

    Kris @12:57: the script-native-script state saving and restoring problem would be solved if every OS did the work for us, but they don’t, so that’s out. The dark-room-debugging on Symbian tales I’ve heard from Opera colleagues on ECMA TG1 have turned my hair gray(er).
    TG1 made the right decision in not imposing native stack unwinding and resuming, or the worse alternative of failing if native frames above the target frame are active.
    But don’t worry, generators are not the last word. There will be a JS3 after ES4/JS2. How it will manage to abstract OS and hardware differences and standardize a better-than-threads concurrency system that maps onto parallel hardware remains to be seen, but it’s a goal that I intend to achieve.
    /be

  21. Kris Zyp says:

    I think I was unclear in what I was suggesting with concurrency. I was not suggesting threading with an alternative synchronization/locking approach (like SLE or STM), rather that an engine running a pure single-threaded language (like JS1) could in fact utilize multiple cores (without any concurrency mechanisms in the language) as an engine optimization. I thought this could be done with speculative execution, where the speculative branches could safely execute on another core by using similiar logging/rollback techniques as used by STM & SLE (hence the reference).
    I am in total agreement that preemptive threading with shared data is bad, and am glad you will not let it in JS. If multicore utilization can not be sufficiently achieved automatically with engine optimizations (and SE surely would have its limits), I am excited to see the potential for alternative (message-passing) architecture.

  22. drudru says:

    Brendan: I’m so glad you have this stance on threads. IMHO, it is the correct one, and the rest of the CS community is finally coming around to it. I too have seen too many programmers misuse and abuse threads. It is just too difficult to do the right thing. In my career, whenever threads were involved, there were always bugs related to threads. This was especially true with multi-threaded UI code in user land on windows (I never did multi-threaded kernel work)
    Anyways, here is a great paper that just seals it for me: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html
    Anyways, I too think that JavaScript is a great language. A lot of not-so-gold-plated CS types can work with it and actually get things done. In my book, that gets high marks. Therefore, I definitely look forward to the balance and innovation you will bring to the problem of concurrency.
    cheers, dru

  23. rencontre says:

    I’m still researching, but it should be clear from what I’ve written so far that I think message passing shared nothing beats shared memory preemptively scheduled threads, i fing just this link : http://www.neilmix.com/2007/02/07/threading-in-javascript-17/ if you want to avoid a twisty maze of callbacks and their attendant state machine control blocks in Firefox 2.