Existing solutions found in literature
Existing solutions found in literature
Coordinated NFV state management for scale-out and scale-in events [2, 3, 4]
A control application runs in the SDN controller and is in charge of coordinating the transfer of network and NFV state between NFs. In this specific case they use the OpenFlow protocol and the OpenNF  protocol.
In OpenNF , the state is sent to the controller before being transferred to the new NF. Moreover, the controller buffers the received packets before redirect them to the new NF (when the state transfer is completed)
The work  extends OpenNF with a Distributed State Transfer (peer-to-peer) to avoid passing state and data transfer to the controller, removing also the need for packet buffering on the controller.
This approach solves the problem of exporting the state when scaling a NF, using a direct state transfer between src and dest NF, which scales better than the one proposed in .
It is not clear to me why we need to buffer packets for the new NF. We can let the src NF to handle the current packets and, when the state transfer is completed, update the network to redirect packets to the newly create NF.
A set of data plane APIs is defined that allow exporting the NF state, with the possibility to specify filters for specific flows to be exported
operation is provided by the controller to export the state from a NF to another, with the possibility to specify different requirements (e.g. preserve order of packets, loss-free)
Rollback-Recovery for Middleboxes 
The idea is to add log all inputs received by the middlebox so that can be replayed on the newly created once the system fails. However, in a multithreaded environment, logging all inputs is not enough since there are non-deterministic variables that impact the overall system state.
Given that, the proposed approach stores also additional information needed for correct replay in the face of non-determinism.
Since a packet (following the output commit property) cannot be released if the information needed to recreate internal state consistent with that output have been committed to stable storage, the work proposes an optimization on how the logged information can be retrieved, saved and committed in order to reduce the final per-packet latency mainly for log free operations.
No-Replay Design (alternative approach)
No-replay approaches are based on the use of system checkpoints. A snapshot of the current system state is taken and, upon a failure, a replica loads the recent snapshot. All packets between two consecutive snapshots are buffered.
This approach reduces the latency for failure-free operations. Packets leaving the middlebox are not released until a checkpoint of the system has been logged to stable storage
The design of a fault-tolerance system for Middleboxes. It is able to do so quickly (e.g. in less than typical transport timeout values) and with little overhead to failure-free operations (e.g. additional per-packet latency of 10-100s microseconds)
Decoupling the state and processing from Network Functions 
Each NF state (e.g. address mapping in a NAT, server mapping in a load balancer) is stored in a distributed DB. Each NF reads and writes into this DB before performing a decision
Upon failure a new NF can be spawn and it will have access to all of the state needed, without disrupting the network
When scaling out a new NF can be launched and traffic immediately directed to it
Each NF instance shares all state, so there is no affinity of traffic to a particular instance. Therefore, packets traversing different paths do not cause a problem
Keeping the entire state to the DB causes possible multiple reads and writes for each packet received. Although different optimizations can be made for the connection with the DB, this approach produces a considerable per-packet (or per-flow) overhead (in between 100-500s of microseconds).
Pico Replication: A High Availability Framework for Middleboxes 
The idea is to use a framework that is able to replicate the state of a middlebox between different replicas. In this way, upon a failure, the SDN network can be updated to forward traffic into the replicated middlebox.
The work is based on frequent checkpoints of the middlebox state (with a flow-level granularity) among replicas. However, since the output commit property, the output packets referring to a modified flows cannot be release since a checkpoint of that flow is made, hence causing additional latency and packet bursting