Jan 21, 2016

Yet another Akka introduction for dummies

It has been 5+ years since the initial launch of Akka toolkit. You can find many articles that cover what this is all about and why we use it, so this article does not discuss those areas. Instead I am going to introduce what I wanted to know and what I should have known when getting started with Akka actors.


Akka has well maintained comprehensive document for both Java and Scala implementations. The only flaw I can think of is that this is too huge that you can easily get lost in this. Or you have less time to work on this so you just skip reading. Trust me, I was one. Then you google and find articles that cover areas of your interest. It is nice that many developers are working on the same OSS and sharing their knowledge. The difficult part is that, like any other product, it takes time to capture the whole picture, and those how-to fragments you search can not fully support you unless you have the whole picture.

So what I am going to do here is to summarize basics, explain each of them with some reference to official document, and then share some practices that I have learned in a hard way. Unless otherwise stated, I used Java implementation of version 2.4.0.

Summary

  • Akka actor is only created by other actor
    • Hence the parent-child relationship
    • You can not initialize it outside of actor system
      • Actor instance is hidden under ActorRef so methods can not be called directly
      • Test may require some work-around
    • Parent supervises its children
  • When actor throws exception, the actor may be restarted
    • Supervisor (the parent actor) defines the recovery policy (Supervisor Strategy)
      • One-to-one v.s. all-for-one strategy
      • Options: resume, stop, restart, or escalate
      • Maximum retry count and timeout settings are customizable
    • Under the ActorRef, the old actor is replaced with the new one
  • To implement stateful actor, use UntypedPersistentActor
    • Otherwise, state is lost on actor restart
    • Variety of storage plugins are provided
  • Keep each actor simple
    • Do not let one actor do too much
    • Consider adding a supervisor layer to manage different kinds of actors with different strategy

Terms and concepts

Parental Supervision

The first rule you MUST remember about actor life cycle is that an actor can only be created by another actor; created actor is called "child" and is "supervised" by creating actor, "parent." Then who creates the top-level actor? The document says "top-level actor is provided by the library" and some say the top-level actor is also supervised by an imaginary actor. This family tree is described in a form of file system's file-path like hierarchy.



The root guardian is the one I described as "top-level" actor. This creates and supervises two special actors as its children: user guardian and system guardian. Since this tree is described in file-path like hierarchy, root guardian has a path of "/" while user guardian and system guardian have "/user" and "/system" accordingly. User-defined actors all belong to user guardian so your actors have and are accessible with the path of "/user/dad/kid/grand_kid."

As described above, all actors belong to their parents. In other words, you can not initialize your actor outside of actor system, which makes your tests a bit troublesome. If you try to create your actor directly, you will most likely get an error saying "You cannot create an instance of [XXX] explicitly using the constructor (new)." Without a supervising parent, child actor can not exist. So spy(new MyActor()) with Mockito will not work as you expect. For detailed example, see the code fragments below.

Here is one more thing to know about testing. Usually your actor is hidden under ActorRef instance and you can not call actor's method from outside. This will make your unit test difficult. In that case you can use TestActorRef to get the underlying actor with TestActorRef#underlyingActor.
Props props = Props.create(MyActor.class, () -> new MyActor());
TestActorRef<MyActor> testActorRef = TestActorRef.create(actorSystem, props, "my_actor");
// This is the actual actor. You can call its method directly.
MyActor myActor = testActorRef.underlyingActor();

// If you must do spy(new MyActor()) or equivalent you can do this here
Props props = Props.create(MyActor.class, () -> {
    MyActor myActor = spy(new MyActor());

    // BEWARE: preStart is called on actor creation, 
    // so doing spy(testActorRef.underlyingActor()) after TestActorRef#create
    // is too late to mock preStart().
    doThrow(new Exception()).when(myActor).preStart();

    return myActor;
})

Supervisor Strategy

In the previous section we covered how actors are created and who is responsible for supervision. This section will introduce how you can specify the supervising behaviour. Akka employs a "let-it-crash" philosophy, where actors throw exception when it can no longer proceed its task and supervisor takes responsibility for the recovery. When parent actor can not proceed the recovery task, it may escalate the recovery task to its parent. So your actors can stay small and concentrate on their tasks.

Defining Strategy

By default akka provides us two different strategies: one-for-one and one-for-all strategy. With one-for-one strategy, the failing actor is the only subject to handle; one-for-all strategy takes all children including failing one as subject to recovery. If no strategy is set, one-for-one is used.

Defining strategy is straight forward, and below code fragment describes pretty much everything.
public class MyActor extends UntypedActor {
    private static SupervisorStrategy strategy = new OneForOneStrategy(10, Duration.create("1 minute"), t -> {
        // http://doc.akka.io/docs/akka/snapshot/java/fault-tolerance.html#Default_Supervisor_Strategy
        if (t instanceof ActorInitializationException) {
            return stop();
        } else if (t instanceof ActorKilledException) {
            return stop();
        } else if (t instanceof Exception) {
            return restart();
        }

        return escalate();
    });

    @Override
    public SupervisorStrategy supervisorStrategy() {
        return strategy;
    }

    @Override
    public void onReceive(Object o) throws Exception {
    }
}
With above code, it defines strategy as below:
  • The failing actor is the only subject to recovery. (One-for-one strategy)
  • Reties 10 times within the timeout of 1 minute. (Duration instance)
  • Failing actor stops when ActorInitializationException or ActorKilledException is thrown.
  • Failing actor restarts when Exception is thrown.
  • Supervisor escalates this failure when other Throwable is thrown.
This setting is actually the default strategy that is applied when you do not specify any. There is one thing you really need to know about supervision. As you see, you can only have one strategy setting for each supervising actor. It is possible to define how a given supervisor reacts to given exception type, but still you can have only one Duration and retry settings. So again, you will want to divide your actors into small peaces such as adding one additional supervisor layer in the middle.

Supervisor's Directive Options

Then let us take a closer look at directive options that each supervisor can choose: restart, resume, stop, and escalate.

Restart

When supervisor decides to restart failing actor, actor system follows below steps as described in "What Restarting Means."
  1. suspend the actor (which means that it will not process normal messages until resumed), and recursively suspend all children
  2. call the old instance’s preRestart hook (defaults to sending termination requests to all children and calling postStop)
  3. wait for all children which were requested to terminate (using context.stop()) during preRestart to actually terminate; this—like all actor operations—is non-blocking, the termination notice from the last killed child will effect the progression to the next step
  4. create new actor instance by invoking the originally provided factory again
  5. invoke postRestart on the new instance (which by default also calls preStart)
  6. send restart request to all children which were not killed in step 3; restarted children will follow the same process recursively, from step 2
  7. resume the actor
Note that you can optionally stop one child actor or more in step 3. In step 6 those children that were not explicitly terminated in step 3 will restart.

One more thing I noticed is that, when you return restart() on preStart failure (ActorInitializationException) postStop is not called even though step 2 says postStop is called. Take a look at the very bottom of the code below.
public static class DummySupervisor extends UntypedActor {
    private SupervisorStrategy supervisorStrategy;

    public DummySupervisor(SupervisorStrategy supervisorStrategy) {
        this.supervisorStrategy = supervisorStrategy;
    }

    @Override
    public SupervisorStrategy supervisorStrategy() {
        return supervisorStrategy;
    }

    @Override
    public void onReceive(Object o) throws Exception {
        // Do nothing
    }
}

public TestActorRef generateDummySupervisor(SupervisorStrategy supervisorStrategy) {
    Props props = Props.create(DummySupervisor.class, () -> new DummySupervisor(supervisorStrategy));
    return TestActorRef.create(actorSystem, props, "dummy_supervisor-" + randomGenerator.nextInt(1000));
}

@Test
public void shouldPostStopNotBeCalledOnPreStartException() throws Exception {
    List<WorthlessActor> actors = new ArrayList<>();
    // Prep a supervisor that always tries to restart
    SupervisorStrategy myStrategy = new OneForOneStrategy(3, Duration.create("1 minute"), t -> {
        return restart();
    });
    DummySupervisor dummySupervisor = generateDummySupervisor(myStrategy).underlyingActor();

    // Create child actor
    Props worthlessActorProps = Props.create(WorthlessActor.class, () -> {
        WorthlessActor actor = spy(new WorthlessActor());

        // Throw exception on preStart
        doThrow(new Exception()).when(actor).preStart();

        actors.add(actor);

        return actor;
    });
    dummySupervisor.getContext().actorOf(worthlessActorProps);

    Thread.sleep(50);
    assertThat(actors).hasSize(4);

    // They NEVER call postStop so we have to do some clean up when it fails in the middle of preStart().
    verify(actors.get(0), never()).postStop();
    verify(actors.get(1), never()).postStop();
    verify(actors.get(2), never()).postStop();
    verify(actors.get(3), never()).postStop();
}
Actually postStop() is called when stop() is returned on preStart failure, though.

Resume

Resume is pretty straight forward. It just let failing actor resume its task. You just might want to leave log at here.

Stop (Terminate)

Along with the restart, the most important option to note is stop. This will stop the failing actor. The important thing is that actor termination also occurs in the regular operation, such as when actor finishes its task and is no longer needed. When stop is selected, it follows the same steps as termination. Detail is described in Termination section later.

Escalate

When supervisor can not handle its child's failure, supervising actor may fail itself and let its parent actor, the grand parent actor of failing one, take care of it. When the exception is escalated all the way up, the last strategy to be applied is Stopping Strategy.

Termination

Actor termination basically occurs in three occasions:
  • As a part of actor restart (old actor is terminated)
  • When supervisor decides to stop failing actor
  • When actor finishes its task and getContext().stop(targetActorRef) is called
In any cases below steps are followed:
  1. Stopping actor's postStop is called
  2. Watching actors get Terminated message 
So what does "watching actor" in step 2 mean? Other than supervising (parent) actor, actors may "watch" other actors. When you call getContext().watch(anotherActorRef), the calling actor starts to subscribe the anotherActorRef's termination. When anotherActor stops, Terminated message is passed to its parent and watching actors. This is called Death Watch.

You must remember that, when you receive Terminated instance, you can access to closing actor via Terminated#actor. BUT, this is just an ActorRef instance, so you can not know what type of actor is hiding under the hood.

Another important thing is that supervision-related messages are sent and stored in a different mailbox than usual one, so the message reception order is not guaranteed to be in the order of event occurrence.

Stateful Actor

As you already saw, actor's state gets lost on restart since actor system replaces the failing actor with a new actor instance. When you need to have stateful actor you can use an actor called UntyptedPersistentActor instead of UntypedActor. To store the state, you can configure what storage plugin to use.

However, to store data casually and locally, I prefer to create class that caches data. Remember that same arguments are passed on restarting, so the same FooCache instance is passed to new MyActor instance with the code below. Before employing UntyptedPersistentActor, I would re-think if this is really required. You will like to keep your actors simple, so creating simple cache class or add another layer to transfer and store data should be considered first.
FooCache cache = new FooCache();
Props.create(MyActor.class, () -> new MyActor(cache));

What I have learned

The single most important thing I have learned is that we should keep actors small and simple. As your actor become complex, supervisor strategy and test become much more complex. This is clearly stated in the document and I think this is the most important basic. So let us keep this in mind.
The quintessential feature of actor systems is that tasks are split up and delegated until they become small enough to be handled in one piece. In doing so, not only is the task itself clearly structured, but the resulting actors can be reasoned about in terms of which messages they should process, how they should react normally and how failure should be handled. If one actor does not have the means for dealing with a certain situation, it sends a corresponding failure message to its supervisor, asking for help. The recursive structure then allows to handle failure at the right level.