Asynchronous Runtime Verification
Runtime verification techniques have addressed the increasing need for system correctness
as a relatively lightweight approach for system verification, which scales up to large systemswhile
still guaranteeing the detection of abnormal behaviour.
Although monitoring of properties is usually computationally
cheap when compared to the actual computation taking place in
the system, the monitors induce an additional overhead which
is not always desirable in real-time, reactive systems. In
transaction processing systems, the additional overhead
induced by each transaction can limit throughput and cripple
user-experience at peak times of execution. This is
particularly true in applications where load tends to converge
at particular times. For example, in an online betting
setting, one would expect a pattern of usage which surges
during particular intervals when response time and performance
are at a premium, but with relatively low load for the rest of
the time. One approach, sometimes adopted in such
circumstances, is that of evaluating the monitors
asynchronously with the system, possibly on a separate address
space. The overhead is reduced to the cost of logging events
of the system, which will be processed by the monitors.
However, the downside of this approach is that by the time the
monitor has identified a problem, the system may have
proceeded further.
cLarva
The problem of asynchronous monitoring is closely
related to one found in long-lived transactions —
transactions which may last for too long a period to allow for
locking of resources, but which could lead to an inconsistent
internal state if the resources are released too early. To
solve the problem, typically one defines compensations, to
undo partially executed transactions if discovered to be
infeasible halfway through. In the case of asynchronous
monitoring, allowing the system to proceed before the monitor
has completed its checks may lead to situations where the
system should have been terminated earlier. As with long-lived
transactions, we allow this run-ahead computation and adopt
the use of compensations in our setting to enable the undoing
of system behaviour when an asynchronous monitor discovers a
problem late, thus enabling the system to rollback to a sane
state. However, in many real-life cases, it is not realistic
to assume that transactions can be undone at any time after
their completion. Therefore, we enrich our compensation model
with scopes marking boundaries beyond which compensation is no
longer possible. Furthermore, in a setting such as
transaction-processing systems, one can afford most of the
time to run the monitors in synchrony with the system, falling
back to asynchrony only when required due to high system load.
In more stringent settings, monitors can be run
asynchronously, only synchronising when there is a high risk
of violation.
cLarva is a monitoring architecture which monitors systems asynchronously but allows a degree of synchrony. In cLarva, control is continually under the jurisdiction of the system - never of the monitor. Thus the system does not wait for the monitor verdict on each step and by the time a problem is detected it is usually to late to take corrective actions. However, the system exposes two
interfaces to the monitor: (i) an interface for the monitor to communicate the fact that
a problem has been detected and the system should stop; and (ii) an interface for the
monitor to indicate which actions should be compensated, i.e. undone. Using compensations, the system can be reversed to the point when the monitor-detected problem occurred.
The following figure depicts the overall architecture of cLarva.
|