Wednesday, 11 February 2009

Seaside 3: Partial Continuations

This is the second post in a series looking at the upcoming new release of Seaside. Check out the first post on exception handling.

Continuations in Seaside

Seaside is often referred to as a "continuation-based" web framework and certainly in its early days continuations were used throughout to work its magic. Seaside 2.8 still uses first-class continuations (more on what that means in a minute) in three different places:

  1. to abort normal request handling and immediately return a response;
  2. to interrupt a piece of code and resume it when the user clicks on a link or follows a redirect (to send cookies to the user, for example); and
  3. to implement Component call/answer.

The next upcoming release of Seaside, however, will completely eliminate the use of continuations within the core framework itself. Case 1 has been reimplemented using exceptions and the code for 2 and 3 moved to an optionally-loadable package. This means that you can now choose to install Seaside without using any continuations at all, which is good news for portability to a few Smalltalk dialects that don't currently support continuations.

At the same time, we are also replacing our use of full continuations with partial continuations and this article will look at what that means and why we are making this change. This stuff can get confusing (particularly while debugging it!) so don't worry if you have to let the information mellow and then come back and read it again. I've simplified a few things, sacrificing detail in the hopes of making the subject more comprehensible for people who are just curious about how it works. I'd appreciate any feedback on how well I've struck this balance.

What is a Continuation?

First of all, when I talk about continuations here, I'm talking about first-class continuations. Seaside also uses a continuation-passing style to implement its application render loop (this is the _k you see in Seaside URLs). This is a somewhat-related concept but is not what we're talking about today.

Continuations are often defined as "the remaining computation" but I think this can seem a bit obscure if you don't already understand them. To me, the simplest explanation is that a continuation saves a snapshot of a running process that you can resume later. You call a method, which calls another method, which calls another method, and so on and then at some point you create a snapshot of this chain of methods and save that snapshot object somewhere. At any point in the future you can restore it, abandoning the code you are currently running, and your program will be back in exactly the same place in exactly the same method as when you took the snapshot. That's a first-class continuation.

Smalltalk users should not find this too hard to come to terms with. When you save your Smalltalk image, you can open it later and be back exactly where you left off. You can open that saved image as many times as you like and return each time to the same state. If you save the image into a new file, you can still go back and load the old image. A continuation does basically the same thing but captures, instead of the whole image, only a single process.

Implementing Call and Answer

One of Seaside's most-demonstrated features is the ability to write multi-step tasks that query the user for information:

answer := self confirm: 'Do it?'.
answer ifTrue: [ self doItAlready ]

This is exactly the kind of thing we facilitate with continuations: we need to pause in the middle of this method to ask the user for feedback. If they ever answer the question, we want to resume the method where we left off. So let's look at how first-class continuations might be used to make this work.

Understanding the Diagrams

But first a word of explanation. The following diagrams depict context chains (though they are abstract enough that they could just as easily be a stack of frames). Every time you call a method or evaluate a block, a new context is created at the "top" of the chain. Every time a method returns or a block finishes, the topmost context is removed. The method context knows what method is being called, what object it was called on, and the value of any variables defined in the method. It also knows the context below it in the chain. If you need help understanding this process, take a look at this example illustration which shows the process step by step.

The diagrams below each represent a chain of contexts handling a single HTTP request. Each request is the result of clicking on a link and each causes the execution of a callback. Each callback eventually sends either #call: or #answer:.

The diagrams show the context chain at the point in time when #call: or #answer: is sent and then try to illustrate what happens next. The upward-pointing arrows show the progress as methods are called and the downward-pointing arrows show the progress as methods return. I show exceptions with a dashed arrow, the tail coming from the location where the exception is raised and the head pointing to the location where it is handled. In cases where a continuation has been saved, the diagrams show both the currently executing context chain and the saved one and the arrows behave as normal. Obviously these are very simplified illustrations; I'm more interested in getting the general idea across here than in the exact details.

To help make things clearer, each diagram is marked with a gray line. Everything above the gray line is user code: part of the callback that is being executed. Everything below the gray line, is part of the internal framework code: reading from sockets, looking up sessions, and so on.

A Naïve Implementation

Ok, so let's look at one possible implementation using continuations. Let's assume a user is staring at a web page with a link that says "do it". Clicking on that link will execute a callback with the example code shown above, which should prompt the user with the question "Do it?". While processing this request, the following things happen:

Full Continuation - Request 1


  1. The framework looks up the correct callback and executes it.
  2. During the callback (inside the #inform: method in the above example), the #call: message is sent.
  3. This results in every context being saved into a continuation for later use.
  4. An exception is signaled, which stops processing of the callback and returns control to the framework code.
  5. The framework continues its work and returns a response to the browser (in Seaside, a render phase would happen to allow the Components to generate the response, but I'm simplifying here).

The response to the browser should show the prompt "Do it?" and a link or button to confirm the action. When the user confirms the action, they trigger another callback, which will execute self answer: true. When this second request is received, the following happens:

Full Continuation - Request 2

  1. The framework looks up the correct callback and executes it.
  2. The callback sends #answer:.
  3. The current chain of contexts is thrown away and the exact contexts we previously saved in the continuation are retrieved and restored. Note that these methods will now return a second time. This is the weird part about continuations but remember it's no more weird than saving your Smalltalk image in the middle of a calculation. Each time you open the image you will get a result for the same calculation.
  4. Now that we have restored the saved context chain, execution resumes in the first callback as if the #call: method (remember, this is where we saved the continuation) had just returned.
  5. The restored callback finishes executing (in our example, it checks the value of answer and sends #doItAlready).
  6. The framework returns a response to the browser.

The problem here, and why I called this a naïve implementation, is that you can see the response is incorrectly returned to Request 1. The socket associated with Request 1 is, unfortunately, long gone and the browser is no longer waiting for a response over there. The browser is, in fact, waiting for a response that never arrives on the socket associated with Request 2. Ooops!

A (mostly) Working Call and Answer

So the first implementation doesn't work but hopefully you can see what was going on with the continuations. The problem is that, when we restore the continuation, we really don't want to abandon everything the framework is doing. At the very least, we need to keep the contexts that will return the response to the correct socket.

A simple way to limit the contexts captured by a continuation is to create a new process. A new process starts with a new, empty context chain, so when we create a continuation only the contexts in that chain will be captured. We can use a semaphore to cause the first process to wait while the new process handles the request. When the second process is finished, it signals the semaphore and the original process returns the response to the correct place.

This diagram shows exactly this (the contexts of the two processes are shown with different symbols):

New Process - Request 1

  1. At some point in the framework code, a new process is created and the original process waits on a semaphore.
  2. The new process finds and executes the correct callback.
  3. The callback sends #call:.
  4. A continuation is saved (note this time that the continuation extends only to the beginning of the new process).
  5. An exception is signaled, stopping callback processing and returning control to the framework.
  6. The framework creates a response and signals the semaphore.
  7. The original process resumes and returns the response to the browser.

So far, the only benefit here is that the continuation is smaller. But when the second request comes in, you'll see how this starts to solve our problem:

New Process - Request 2

  1. At some point in the framework code, a new process is created and the original process waits on a semaphore.
  2. The new process finds and executes the correct callback.
  3. The callback sends #answer:.
  4. The current chain of contexts is thrown away and the exact contexts we previously saved in the continuation are retrieved and restored (but note: this time only the contexts in the new process are abandoned; the suspended bottom process is unaffected).
  5. Now that we have restored the saved context chain, execution resumes as if the #call: method had just returned.
  6. The callback finishes executing.
  7. The framework creates a response and signals the semaphore to tell the bottom process it is finished.
  8. The original process resumes and returns the response (correctly!) to the browser.

So not only is our continuation smaller, but the second response actually makes it back to the right place. This, by the way, is the implementation used by Seaside 2.8 and earlier versions.

There are a few significant problems with this approach though:

  1. Doing multi-process synchronization adds complexity.
  2. Exceptions do not cross the boundary where the new process is created. That is, if you signal an exception in the second process the first process will never see it (technically, this could be simulated to some degree but that adds even more complexity). This means, for example, that error handling has to be done inside the new process. It also adds challenges when working with databases that use exceptions to mark objects as dirty or that key transaction information off the current process.
  3. Exceptions signaled after restoring a continuation will traverse the restored context chain. Also, when the exception is handled, the restored context chain will be unwound, not the abandoned one. Take a look at the framework code contexts highlighted in red in the last diagram: they never have a chance to finish executing and any ensure blocks they defined will never be executed. Trust me when I say that this can be the cause of some pretty subtle bugs.
  4. There is a trade-off between size/accuracy and convenience because of #2 and #3. If you start the new process right before the callback is executed, you get a smaller continuation and more accurate exception behaviour. Unfortunately, your exceptions don't propagate very far and your callbacks end up running in a different process from, say, your rendering code.
  5. Debugging sucks (at least in Squeak) when code depends on running in a certain process. I'm not sure if the debugger ever steps through the code with the actual process where the error occurred but it certainly doesn't always do so.

Partial Continuations

Enter partial continuations. A partial continuation simply means that, instead of saving the entire context chain, we save only the part we are interested in. And when we restore the partial continuation, we don't replace the existing context chain entirely; we only replace the part that we are not interested in. Let's see how they work.

Partial Continuation - Request 1

When the first request comes in, things work much the same as in our very first example so I won't number the steps. In fact, things work exactly the same except for one thing: using partial continuations, we can now specify the exact range of contexts to save in the continuation. In this case, we choose to save only the contexts that are part of the user (or callback) code. Remember the problems from the first implementation? The framework code is handling one particular HTTP request; these framework contexts would be completely useless to us when responding to any future request (and another request for the same URL is still a new request). Since a callback can span multiple HTTP requests, it is only those contexts that make up the callback that need to be saved and resumed later.

Remember also that the context chain is, in reality, much longer than shown in these diagrams: we might be storing five contexts now instead of, say, 40! A nice space savings.

Now let's look at the second request coming in. This illustration is a bit different and a little more complex because the context chain actually changes during execution, so I will explain it step by step:

Partial Continuation - Request 2

  1. A request comes in.
  2. The framework looks up the correct callback and executes it.
  3. The callback sends #answer:.
  4. We look up the saved partial continuation and, in place of the existing callback code, literally graft the saved contexts onto our current context chain by rewriting the senders. I'm waving my hands over the details but you'll have to trust me. The right side of the diagram shows the state after the contexts have been grafted in place. Note that all the framework contexts remain and we are still running in the same process. As far as Squeak is concerned, those methods were called in that order.
  5. We resume processing the saved callback as if the #call: method had just returned.
  6. Once the restored callback contexts have finished executing, they will return (because of the re-written sender) right back into the framework code that is handling the current request.
  7. A response is generated and returned, via the correct socket, to the browser.

Magic! It sure feels like it and it works beautifully. We have tiny saved continuations, we don't need a new process, and all our framework contexts get a chance to complete.

Conclusion

The partial continuation solution is currently implemented in the development version of Seaside and will be in the next release. Squeak and VisualWorks already support the implementation of partial continuations in Smalltalk code. GemStone is nearly finished adding support to their VM. Smalltalk implementations that cannot easily implement partial continuations have three choices:

  • they can simulate partial continuations to varying degrees of completeness using full continuations;
  • they can continue using a system similar to the one in Seaside 2.8; or
  • they can leave them out. As I mentioned earlier, we have removed all use of continuations from the core parts of Seaside: platforms can choose to simply omit support for #call: and this is now as simple as not providing the Seaside-Flow package.

By the way, at the same time as I was working my way through those subtle bugs I mentioned earlier, Eliot posted about stacks and contexts in the new Cog VM he's working on. As well as being very interesting reading, it shed some timely light on the meaning of primitive 198, which pointed me in the right direction and probably saved me an hour or two. Thanks Eliot! Check it out if you have the time.

I hope this was useful or interesting reading and would love your feedback on anything you found challenging or helpful to your understanding. Happy Seasiding.

13 comments:

Anonymous said...

This is very interesting. It really explains the internal workings of Seaside.
This week I began to study all the code of Seaside and this post helped me to understand the code inside it.
The explanation was very clear also. Very good post.
BTW, I found a typo in the text of the first image (continuations-example.png): in the last column the label should be "#foo returns" instead of "#bar returns".

Cheers,
Miguel Cobá

Julian Fitzell said...

Thanks, I've fixed that image. Good catch!

Anonymous said...

Great post, very insightful! Keep it up!

Anonymous said...

It looks like the naive scheme does not work because the request data (including the socket) is stored using some kind of dynamic binding. Would the naive scheme work if you were using per-process variables for that data?

Harold Fowler said...

Wow, fascinating stuff. Always wondered how that worked!

RT
www.anon-tools.us.tc

Anonymous said...

I used to be a homeless rodeo clown but now I am a world class magician !

Anonymous said...

What you are calling "Partial continuations" are known elsewhere (e.g. Scheme) as "Delimited Continuations". But thanks for given a great example of what benefit of them (I didn't have any use cases in mind before this entry).

Mariano Montone said...

Very interesting! I wish you explained briefly how partial continuations are implemented in Smalltalk (or in Squeak if you want) and its api.

Anonymous said...

I'd also like to know if the guys who implemented the "partial continuations" looked into how delimited continuations are implemented elsewhere. Perhaps the implementation could benefit from such a comparison. The technique you described of "graphing receivers" doesn't sound very nice. :)

Julian Fitzell said...

@Paolo Process variables do help and the first two releases of Seaside 2.9 were using a process variable to hold the current RequestContext for exactly this reason. But it's not that simple: the saved method contexts could have all kinds of local variables that might not have the same values when handling the two requests. You'd have to go through and make them all process variables I guess and even then you're assuming that the exact same path was followed through the code in the two cases. Not very practical.

@Anonymous They can be called partial continuation, delimited continuations, or composable continuations, yes. I don't know which implementations Lukas looked at previously but the one we are using in Squeak is quite straightforward and fairly specific, so far, to the way we use it in Seaside.

@Mariano Simplified, the implementation we use on Squeak and VW is to write the contexts and their local variable values and stacks to a stream.
When evaluating the continuation, we simply read the stream, restoring the contexts with the saved values. Then we take the context we want to eventually return into and set it as the sender of the bottom context in the continuation. Finally, we change "thisContext sender" to point to the top context in the continuation and then we return.
The implementation is quite straightforward and you can find it in the Seaside-Squeak-Continuations package in the Seaside29 repository on SqueakSource. There is also a unit test package.
GemStone is implementing their partial continuation support at the VM level.

Johannes said...

Very interesting and insightful blog post, it helped me understand Seaside a lot better!

Thank you for the blog post and the link you gave me, it's much appreciated.

KanjiRecog said...

I am unable to get a 2.9 Seaside to load into VW nc7.6

So you have some advice? Thanks

Julian Fitzell said...

Afraid not - the 2.9 VW port is pretty recent, though my understanding is that it should be working. Posting to the Seaside mailing list should get the attention of the VW developers.