Actor Model and its Significance

Brief introduction to Actor Model, and the problems it solves?

What is Actor Model

The actor model is a model of concurrent computation that treats actors as the universal primitive of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received.

What are Actors?

An actor is the primitive unit of computation.
It receives a message and does some kind of computation based on it.

Actors embodies following things

  • A Mailbox (the queue where messages end up).
  • A Behavior (the state of the actor, internal variables etc.).
  • Messages (pieces of data representing a signal, similar to method calls and their parameters).
  • An Execution Context
  • An Address

What Happens when an Actor Receives a Message?

When an actor receives a message, it can do one of these 3 things:

  • Create more actors
  • Send messages to other actors
  • Designate what to do with the next message

What Problems does the Actor Model Solve?

Actor model is used to overcome the limitations of traditional object-oriented programming models and meet the unique challenges of highly distributed systems.

Traditional OOP languages weren’t designed with concurrency. It was very easy to introduce race conditions since it was using a shared state.
The programmers had to identify and fix all possible problem areas by using locking mechanisms.
Locking is easy to implement for simple programs. But as the programs got complex, the implementation of locking has also become complex.

Actor model overcomes the problem by using share nothing model so that concurrency is not affected and locking mechanism is not needed.

The Illusion of Encapsulation

Object-oriented programming (OOP) is a widely-accepted, familiar programming model. One of its core pillars is encapsulation. Encapsulation dictates that the internal data of an object is not accessible directly from the outside; it can only be modified by invoking a set of curated methods. The object is responsible for exposing safe operations that protect the invariant nature of its encapsulated data.

Unfortunately, the above diagram does not accurately represent the lifelines of the instances during execution. In reality, a thread executes all these calls, and the enforcement of invariants occurs on the same thread from which the method was called. Updating the diagram with the thread of execution, it looks like this:

The significance of this clarification becomes clear when you try to model what happens with multiple threads. Suddenly, our neatly drawn diagram becomes inadequate. We can try to illustrate multiple threads accessing the same instance:

There is a section of execution where two threads enter the same method. Unfortunately, the encapsulation model of objects does not guarantee anything about what happens in that section. Instructions of the two invocations can be interleaved in arbitrary ways which eliminate any hope for keeping the invariants intact without some type of coordination between two threads. Now, imagine this issue compounded by the existence of many threads.

The common approach to solving this problem is to add a lock around these methods. While this ensures that at most one thread will enter the method at any given time, this is a very costly strategy:

  • Locks seriously limit concurrency, they are very costly on modern CPU architectures, requiring heavy-lifting from the operating system to suspend the thread and restore it later.
  • The caller thread is now blocked, so it cannot do any other meaningful work. Even in desktop applications this is unacceptable, we want to keep user-facing parts of applications (its UI) to be responsive even when a long background job is running. In the backend, blocking is outright wasteful. One might think that this can be compensated by launching new threads, but threads are also a costly abstraction.
  • Locks introduce a new menace: deadlocks.

These realities result in a no-win situation:

  • Without sufficient locks, the state gets corrupted.
  • With many locks in place, performance suffers and very easily leads to deadlocks.

Additionally, locks only really work well locally. When it comes to coordinating across multiple machines, the only alternative is distributed locks. Unfortunately, distributed locks are several magnitudes less efficient than local locks and usually impose a hard limit on scaling out. Distributed lock protocols require several communication round-trips over the network across multiple machines, so latency goes through the roof.

In Object Oriented languages we rarely think about threads or linear execution paths in general.

However, in a multithreaded distributed environment, what actually happens is that threads "traverse" this network of object instances by following method calls. As a result, threads are what really drive execution:

In summary:

  • Objects can only guarantee encapsulation (protection of invariants) in the face of single-threaded access, multi-thread execution almost always leads to corrupted internal state. Every invariant can be violated by having two contending threads in the same code segment.
  • While locks seem to be the natural remedy to uphold encapsulation with multiple threads, in practice they are inefficient and easily lead to deadlocks in any application of real-world scale.
  • Locks work locally, attempts to make them distributed exist, but offer limited potential for scaling out.

The Illusion of a Call Stack

Today, we often take call stacks for granted. But, they were invented in an era where concurrent programming was not as important because multi-CPU systems were not common. Call stacks do not cross threads and hence, do not model asynchronous call chains.

The problem arises when a thread intends to delegate a task to the "background". In practice, this really means delegating to another thread. This cannot be a simple method/function call because calls are strictly local to the thread. What usually happens, is that the "caller" puts an object into a memory location shared by a worker thread ("callee"), which in turn, picks it up in some event loop. This allows the "caller" thread to move on and do other tasks.

The first issue is, how can the "caller" be notified of the completion of the task? But a more serious issue arises when a task fails with an exception. Where does the exception propagate to? It will propagate to the exception handler of the worker thread completely ignoring who the actual "caller" was:

This is a serious problem. How does the worker thread deal with the situation? It likely cannot fix the issue as it is usually oblivious of the purpose of the failed task. The "caller" thread needs to be notified somehow, but there is no call stack to unwind with an exception. Failure notification can only be done via a side-channel, for example putting an error code where the "caller" thread otherwise expects the result once ready. If this notification is not in place, the "caller" never gets notified of a failure and the task is lost!

How the Actor Model Meets the Needs of Concurrent, Distributed Systems?

In particular, we would like to:

  • Concurrency
  • Enforce encapsulation without resorting to locks.
  • Reacting to signals, changing state and sending signals to each other to drive the whole application forward. Keeping the invariants intact.

Usage of Message Passing Avoids Locking and Blocking

Instead of calling methods, actors send messages to each other. Sending a message does not transfer the thread of execution from the sender to the destination. An actor can send a message and continue without blocking. It can, therefore, do more work, send and receive messages.

With objects, when a method returns, it releases control of its executing thread. In this respect, actors behave much like objects, they react to messages and return execution when they finish processing the current message. In this way, actors actually achieve the execution we imagined for objects:

An important difference of passing messages instead of calling methods is that messages have no return value. By sending a message, an actor delegates work to another actor. As we saw in The illusion of a call stack, if it expected a return value, the sending actor would either need to block or to execute the other actor's work on the same thread. Instead, the receiving actor delivers the results in a reply message.

Encapsulation

The second key change we need in our model is to reinstate encapsulation. Actors react to messages just like objects "react" to methods invoked on them. The difference is that instead of multiple threads "protruding" into our actor and wreaking havoc to internal state and invariants, actors execute independently from the senders of a message, and they react to incoming messages sequentially, one at a time. While each actor processes messages sent to it sequentially, different actors work concurrently with each other so an actor system can process as many messages simultaneously as many processor cores are available on the machine. Since there is always at most one message being processed per actor the invariants of an actor can be kept without synchronization. This happens automatically without using locks:

Supervision and Monitoring

If one actor carries very important data (i.e. its state shall not be lost if avoidable), this actor should source out any possibly dangerous sub-tasks to children and handle failures of these children as appropriate. Depending on the nature of the requests, it may be best to create a new child for each request, which simplifies state management for collecting the replies. If one actor depends on another actor for carrying out its duty, it should watch that other actor’s liveness and act upon receiving a termination notice.

If one actor has multiple responsibilities each responsibility can often be pushed into a separate child to make the logic and state more simple.

Actors should be like nice co-workers: do their job efficiently without bothering everyone else needlessly and avoid hogging resources. Translated to programming this means to process events and generate responses (or more requests) in an event-driven manner. Actors do not block (i.e. passively wait while occupying a Thread) on some external entity which might be a lock, a network socket, etc. unless it is unavoidable.

Summary

this is what happens when an actor receives a message:

  1. The actor adds the message to the end of a queue.
  2. If the actor was not scheduled for execution, it is marked as ready to execute.
  3. A scheduler entity takes the actor and starts executing it.
  4. Actor picks the message from the front of the queue.
  5. Actor modifies the internal state, and sends messages to other actors.
  6. The actor is unscheduled.

To accomplish this behavior, actors have:

  • A Mailbox (the queue where messages end up).
  • A Behavior (the state of the actor, internal variables etc.).
  • Messages (pieces of data representing a signal, similar to method calls and their parameters).
  • An Execution Environment
  • An Address.

Messages are put into so-called Mailboxes of Actors. The Behavior of the actor describes how the actor responds to messages (like sending more messages and/or changing state). An Execution Environment orchestrates a pool of threads to drive all these actions completely transparently.

This is a very simple model and it solves the issues enumerated previously:

  • Encapsulation is preserved by decoupling execution from signaling (method calls transfer execution, message passing does not).
  • There is no need for locks. Modifying the internal state of an actor is only possible via messages, which are processed one at a time eliminating races when trying to keep invariants.
  • There are no locks used anywhere, and senders are not blocked. Millions of actors can be efficiently scheduled on a dozen threads reaching the full potential of modern CPUs. Task delegation is the natural mode of operation for actors.
  • State of actors is local and not shared, changes and data is propagated via messages, which maps to how modern memory hierarchy actually works. In many cases, this means transferring over only the cache lines that contain the data in the message while keeping local state and data cached at the original core.

Actor model pros:

  • Works better in a distributed system
  • Simpler to understand architecturally in many cases
  • Fault tolerance
  • Models real world phenomena / complex systems with multiple "actors"

Actor model cons:

  • Harder to reason about an algorithm because it doesn't exist in just one place. It gets split across different actors / files and you have to chase it down and follow code.
  • Harder to get "return values". The Actor model is "fire and forget". You have to figure out how to get "return values" back to the original requester. This adds lots of overhead because now you have to set up a way to receive it and pass the context (uniqueId / taskId) through the entire system. You also need to manage the state to hold that information around until the "response" comes back. Without the actor model they would just be local variables in a block scope.