### Finite state machines

#### Introduction

Finite state machines are a model of computation that consist in a set of states, a set of input symbols, a function that maps every tuple `(input, state)` to new state and the actions performed at each state. A complete exposition of the concept is beyond the scope of this documentation. Here we will show how coroutines are a straightforward implementation of this model.

#### Sequence recognizer

Consider a state machine that implements this behavior:

For every input, output '1' if the last three inputs where `110`, `0` otherwise .

In practice it is a sequence recognizer.

The formal description of this state machine is the following:

States:

A:
if input == `1` then { output `0`, state = B} else { output `0`, state = A}
B:
if input == `1` then { output `0`, state = C} else { output `0`, state = A}
C:
if input == `1` then { output `0`, state = C} else { output `1`, state = A}

The initial state is `A`. This FSM can be represented by the following Meley model:

For example, given the input:

``0110100010010001101001000111110010011001``

The output will be:

``0001000000000000010000000000001000000100``

This state machine can be implemented in `C++` directly from it definition, using a `switch` statement inside a stateful function object:

``````struct fsm {
void operator() (char input_) {
bool input = input_ != '0';
switch(m_state) {
case A:
std::cout <<"A";
m_state = input? B : C;
break;
case B:
std::cout <<"B";
m_state = input? C : D;
break;
case C:
std::cout <<"C";
m_state = input? B : A;
break;
case D:
std::cout <<"D";
m_state = input? A : C;
break;
};
}

fsm() :
m_state(A) {}

enum state { A, B, C, D};
state m_state;
};``````

Each `case` block directly implements its corresponding state. While the transformation is straightforward, it doesn't make the code very readable. The simplicity of the informal description is lost. While this code might be acceptable for machine generated and maintained FSM, it is does not scale to large finite state machines that must be maintained by humans.

With coroutines the straight forward transformation is:

``````typedef coro::coroutine<void(char)> coroutine_type;

enum state { A, B, C};
void fsm(coroutine_type::self& self, char input_) {
state m_state = A;
while(true) {
bool input = input_ != '0';
switch(m_state) {
case A:
std::cout << (input? '0' : '0');
m_state = input? B : A;
break;
case B:
std::cout << (input? '0' : '0');
m_state = input? C : A;
break;
case C:
std::cout << (input? '0' : '1');
m_state = input? C : A;
break;
}
input_ = self.yield();
}
}``````

We haven't gained much with this transformation. We have simply done moved the (implicit( external loop inside the fsm and applied a control inversion: from its internal point of view, `fsm` is not longer called for each input, but calls the outside for inputs (using `yield()`)

Let's examine the code more carefully and see if we can do better. The `m_state` variable and its assignments are just carefully concealed `goto` statements:

``````void fsm_goto(coroutine_type::self& self, char input) {
while(true) {
A:
if(input != '0') {
std::cout << '0';
input = self.yield();
goto B;
} else {
std::cout << '0';
input = self.yield();
goto A;
}
B:
if(input != '0') {
std::cout << '0';
input = self.yield();
goto C;
} else {
std::cout << '0';
input = self.yield();
goto A;
}
C:
if(input != '0') {
std::cout << '0';
input = self.yield();
goto C;
} else {
std::cout << '1';
input = self.yield();
goto A;
}
}
}``````

`fsm_goto` has lost the state variable `m_state`. The current state of the FSM is stored in the instruction pointer and need not to be managed explicitly. On the other hand the control flow of the code has become explicit and arguably more readable. Then again, calling readable a code full of `goto`s might be a stretch. Let's complete the opera and do the final transformation:

``````void fsm_structured(coroutine_type::self& self, char) {
while(true) {
if(self.result() != '0') {
std::cout << '0';
self.yield();
if(self.result() != '0') {
std::cout << '0';
self.yield();
if(self.result() == '0') {
std::cout << '1';
self.yield();
} else {
std::cout << '0';
self.yield();
}
} else {
std::cout << '0';
self.yield();
}
} else {
std::cout << '0';
self.yield();
}
}
}``````

We used `result()`. The sequence of `if`s represent exactly the informal requirement of matching the sequence `110`.

Whether the above FSM implementation is more readable of the first implementation is arguable. But it is easier to generalize it. The repetition of `if` statements is a strong hint that the code should be refactored:

``````typedef boost::function<void(coroutine_type2::self&)> action_type;

typedef coroutine<char(char)> coroutine_type2;

void terminator(coroutine_type2::self&) {}

void match(coroutine_type2::self& self, char match, char out1, char out2 , action_type act, action_type act2) {
if(self.result() == match) {
self.yield(out1);
act(self);
} else {
self.yield(out2);
act2(self);
}
}

char fsm_match(coroutine_type2::self& self, char) {
action_type s3 (boost::bind(match, _1, '0', '1', '0', terminator, terminator));
action_type s2 (boost::bind(match, _1, '1', '0', '0', s3, terminator));
action_type s1 (boost::bind(match, _1, '1', '0', '0', s2, terminator));
while(true) {
s1(self);
}
}``````
 We use `boost::function` because if `match` where templated on the action type, the compiler wouldn't know which `match` to pass to `boost::bind`. There are more efficient solutions, but this is the most compact and simple.

`match` chooses what symbol to yield and what action to follow by checking if the last parameter to the coroutine is equal to the symbol to be matched.

The structured FSM can be extended to match more complex patterns. For example a matcher for the regular expression `(01+010)` can be written as:

``````void fsm_regexp(coroutine_type::self& self, char) {
while(true) {
if(self.result() == '0') {
std::cout << '0';
self.yield();
if(self.result() == '1') {
std::cout << '0';
self.yield();
while(self.result() == '1') {
std::cout << '0';
self.yield();
}
std::cout <<'0';
self.yield();
if(self.result() == '1') {
std::cout << '0';
self.yield();
if(self.result() == '0') {
std::cout << '1';
self.yield();
} else {
std::cout <<'0';
self.yield();
}
} else {
std::cout <<'0';
self.yield();
}
} else {
std::cout << '0';
self.yield();
}
} else {
std::cout <<'0';
self.yield();
}
}
}``````

Generalizing this example is left as an exercise for the reader.

Notice that the above FSM will fail to match the last six digits of this pattern:

``011011010``

This because when it sees the fourth `1`, while it was expecting a `0`, the state machine does not backtrack to match the `1+` pattern, but returns to the beginning of the pattern and tries to find a `0`.

It is possible to implement backtracking elegantly with coroutines using a goal driven design, but it is beyond the scope of this document. See the example `complex_matcher.cpp` for more details.