Friday, 21 October 2011

Mike and I debut our new Disruptor presentation

Last Tuesday Mike and I unveiled our brand shiny new presentation: Understanding the Disruptor, a Beginner's Guide to Hardcore Concurrency.  This was a preview of the talk we'll be doing at JAX London on the 2nd November.

A video of the session is available, as are the slides.  I promise not to say "so" anywhere near as many times when I repeat my performance at JAX (is there anything more painful than watching yourself on video?).

I thought the session went really really well.  We got some great questions at the end, we had an audience that was engaged, and I was dead pleased we didn't lose anyone with the assembly language.  We had some very valuable feedback afterwards too.

As well as our presentation, there were three great lightning talks:
    Somay Nakhal on Java Thread States - Somay gave a nice overview of thread lifecycles with code and some great diagrams.  I liked how he made this more applicable to the real world than the sort of book examples you get.

    Ged Byrne on the shiny new LJC Book Club - Ged reminded us how great it is to read an actual, paper book.  How committing to reading page by page forces you to learn in a different way to jumping around internet references that might not give you the context you need.  I thought this was a great presentation with humour, and I liked the way he challenged us to "expand our minds".  Although the actual book he was reviewing was Oracle Coherence 3.5, I've decided I need to read Beautiful Software, which Ged quoted at the end of the talk.

    Peter Lawrey on Common Java Misconceptions - A session which plays well with what we're trying to preach when we talk about Tackling Folklore.  He covered a few topics that are assumed to be "truth".  For example, dealing with garbage collection is not a mandatory part of writing Java - you could write GC-friendly code for a start.  Also it's naive to assume the JDK is written in an efficient way, anyone who's actually dug around it for a while will realise that newer, more efficient methods of programming have not been applied to all areas of the (massive) existing code base.  I think it's great to have people out there talking about this stuff, it's too easy to make assumptions and take things for granted.  The most important thing he said: "If you're told something, don't just believe it - test it yourself first".
All of us (me, Mike and the lightning talk presenters) got such a great response it has encouraged us at the LJC to try and push for more real developers presenting their experiences.  We have a lot of great presentations from vendors, but what's more applicable to Java guys and girls across the board is other developers sharing the problems they're trying to solve and how they go about that process.

I'm very much looking forward to presenting this again at JAX.

4 comments:

  1. Hi,

    I watched the video and really enjoyed it. I am new to this, so pardon my ignorance. Am I correct in assuming that the lazySet() changes might not be visible to other threads as soon as they are done?

    How do the other threads that might be doing a get() finally see the new items being pushed into the ring buffers if the sequence number of the ring is not volatile?

    Also, how do you know that the ring buffer is not full (i.e. consumers have read the data the producer has produced)? Do you read all the consumers' seqnums in some sort of an intelligent manner? Would have loved to see the answer in the video.

    Great work.

    ReplyDelete
  2. The main difference between lazySet and a volatile write is that the lazySet does not guarantee that the value is made immediately visible, i.e. store buffers are not immediate flushed out to memory. The value will still become visible, eventually. The guarantee that the lazySet provides is that the data will be made visible in the correct order. I.e. the write to the ring buffer will occur before the update of the associated sequence.

    You're correct regarding the protection against overflowing the ring buffer. The disruptor reads the sequences of the consumers, but only once per cycle of the ring buffer. We cache the lowest value, until the sequence number that is about to be written is greater than the cached value plus the size of the ring buffer.

    ReplyDelete
  3. Hi Trish/Mike. Great presentation - very clear and insightful.

    I'm trying to use the disruptor in a slightly different way from your use case, and I'm wondering if I'm barking up the wrong tree. In my use case I have one producer, multiple 'intermediate' consumers, and one 'final' consumer. The intermediate consumers are really just there to filter out 'bad' events, so the final consumer only sees 'good' events. Every event must be validated by *every* intermediate consumer before it is seen by the final consumer (the validation process is eminently parallelisable in a divide-and-conquer fashion). If any of the intermediate consumers reject an event, then that event shouldn't be seen by the final consumer.

    Initially I thought I could do something like disruptor.handleEventsWith(interConsumer1, interConsumer2, ...).then(finalConsumer), but this doesn't allow me to filter out events from the pipeline before they are observed by the final consumer. Is it possible for a consumer to mutate an event in-place, and 're-publish' it so that subsequent consumers would be guaranteed to see the changes? If this is possible then I could include a flag field in the event and have intermediate consumers set it to indicate that the event has been filtered. The final consumer would then read the flag field and ignore any events for which it had been set. The producer would of course reset the flag whenever it published a new event.

    However if this is NOT possible then I'm thinking the only way I can do this is to use multiple ring buffers, rather than one, and chain them together sequentially. Each event would be processed by exactly one consumer, which would then either drop the event or pass it onto the next consumer (by publishing it to the next ring buffer in the chain). However this clearly means each event must go through several serial 'hops' before being observed by the final consumer, so although adding more intermediate consumers would increase throughput, it wouldn't reduce latency (in fact it may increase latency due to the overhead of the extra hops). Moreover, while the first intermediate consumer would be running at full capacity, subsequent intermediate consumers would potentially be idle for some proportion of the time, depending on how likely events were to make it through each of the preceding filter steps.

    Any advice you can give me would be most appreciated. Thanks!

    ReplyDelete
  4. You have correctly hit upon the various alternatives to your solution. Personally I would choose for the validation Consumers to set a flag on the Event to state whether it passed that specific validation. Remember that it's important that only one thing ever writes to a single variable, so you'd probably need a flag on the Event for each consumer to set - e.g. passedTypeValidation, passedSizeValidation etc. Then your final consumer will gate on all the previous consumers (using the then(finalConsumer) syntax you suggested) and basically have to AND all your flags to determine whether to process it or not (reading variables is totally fine, and boolean operations are cheap).

    I think this is probably your simplest solution.

    ReplyDelete