Home | Libraries | People | FAQ | More |
It is possible to implement a very restricted set of coroutine functionality entirely within the language rules, using a variation of the duff device. See for exampleCoroutines in C. While this is a cunning hack, it has some severe limitations that makes it unsuitable for a general purpose library like Boost.Coroutine:
Boost.Coroutine is implemented around a stack switching model; that is, every time a coroutine is entered, the following actions are carried:
This process is inherently non portable, requires intimate knowledge of the target CPU and environment, and some assembly code. Also it assumes the existence of both a stack and a real CPU, neither of witch are required by the standard.
In practice the library should be portable to most platforms that have
a C++
implementation, with only minimal changes. CPUs with registers
windows shouldn't be a problem, nor should be systems with multiple
stack. For example HP provides assembler
source for coroutine stack switching for the Itanium architecture,
that has both registers windows and multiple stacks [Saboff03].
Implementing Boost.Coroutine on an C++
interpreter would require
support form the interpreter.
If all else fail, there is still the possibility of implementing Boost.Coroutine in top of threads, albeit at a very high performance penalty.
Writing a portable coroutine library has been the subject of numerous
studies. A part from the limited Duff device-based
trick, the most promising has been the stack copying model. The
lower and highest addresses of the current stack are calculated by
taking the address of appropriately placed local objects. Then, when
the stack switching is required, the current stack is copied to some
backup location and from another location a target stack is copied in
place of the old stack. The actual context switch is performed using
setjmp
and longjmp
. As an optimization, the address of
the stack pointer is found, in a non portable manner, in the jmp_buf
and the stack is switched by modifying this pointer instead of
performing an expensive copy.
This trick has meet a moderate success in the C
world, even if in
practice libraries using this trick need to take special per-platform
actions (like working around standard libraries bugs, eager
optimizers or simply identifying the position of the stack pointer in
jmp_buf
). Notice that nowhere the C
standard guarantees that the
stack can be switched with impunity. In fact, it doesn't even
guarantees that a stack exists at all (so does the C++
standard).
In the C++
world things are complicated by the fact that the
standard permits an implementation of longjmp
to unwind the stack
and call destructors. Usually the compiler determines how much to
unwind the stack by comparing stack pointers. Modifying this pointer
will wreak havoc. While most implementations do not take advantage of
this possibilities, at least the OpenMP (that by no mean is
restricted to this platform: for example is supported by the GCC
compiler on most platforms it runs on) requires that. Also
the _LLVM_ compiler provides a __builtin_longjmp
builtin that is
documented to unwind the stack (even if a standard library is not
required to implement longjmp
in term of this builtin, it is likely
to do so).
Also some compilers and some operating systems requires some work to
switch exceptions context. This is true at least on Win32
and on GCC
setjmp/longjmp
based exception handling. This is very system
specific and a library that doesn't take this into account is likely
to be defective.
Finally the interaction between coroutines and threads must be taken into account and any incompatibilities must be documented or fixed.
In conclusion, the backend of a C++
coroutine library cannot be
oblivious of the system it runs on, thus it might as well be system
specific. Similar platforms common code may still be shared.
Currently the library uses fibers
on Win32
systems, custom
assembler code on the very specific "GCC-linux-x86 using
frame-unwind-tables based exception handling and a generic
makecontext/setcontext
based implementation on POSIX 2001
compliant systems. Note that, as the POSIX
standard
knows nothing about C++
, this is not guaranteed
to work on all platforms. Also the interaction between threads and
user contexts is not specified by the standard.
The library has been designed to be very easily ported to other environments. All classes that need access to the non portable context switching code have a (currently undocumented) extra template parameter that permit the selection of the context switching implementation at compile time. Thus is technically possible to mix different context implementations in the same library.
The context switching support is very simple and tightly contained. It
only requires a context structure and two function for context setup
and context swap (the interface is modeled around the POSIX
makecontext API
). At this time the actual extension interface is not
documented and a private detail, because it is likely to be further
simplified (the current requirements of the Context
concept have
been complicated to explore potential performance optimizations).
The stack switching model, while good enough for generalized cooperative multitasking, prevents some useful compiler optimizations. For example a compiler cannot optimize across a coroutine call, nor can inline the coroutine in the caller. This is expected in when the target is not statically known, for example in an event dispatching loop, but in usages where the control flow can be statically determined is a loss. For example generators function objects can usually be inlined while coroutine based function objects cannot.
A compiler that knows about coroutines could apply the same optimizations to coroutine based code, as long as a context switch target is known. This should not require much more power than the ability to inline function pointers and convert a coroutine body to callback based code. Even dynamic coroutine code could be rewritten it to callback based code, but in this case an indirect jump is required anyways and is not necessarily a win.
Compilers capable of inlining coroutines already exist. For example, compilers for languages with support for continuations often transform the code to be complied in the so called continuation passing style. Coroutines in these languages can be trivially implemented as continuations. These compilers can then optimize the code in CPS form, potentially inlining some continuation calls. Thus potentially coroutines are optimized too.
Unfortunately the CPS form is not suitable for C
and C++
because
it does not match well the execution model of these languages and even if possible it could
impose some overhead.
The C# 2.0
language requires compilers to be capable of transforming
semi-coroutine based code to callback based code.
These compilers are not much different from C++
compilers. While the
limitation to optimize semi-coroutine can seem major, in practice a
coroutine can be converted to a semi-coroutine if all nested
functions that call yield
can be inlined in the coroutine itself. A
compiler could fall back on stack switching if it cannot inline a
yield
(or it doesn't know at all if a function can yield).
Copyright © 2006 Giovanni P. Deretta |