Share this post:
5G enables super-fast data transmission in a network but it relies on the underlying networking architecture to maximize its network performance. Software-Defined Networking (SDN) is viewed as an emerging future networking architecture, which significantly improves the network performance due to its programmable network management, easy reconfiguration, and on-demand resource allocation. To efficiently distribute messages in SDN, synchronizations among all SDN controllers are required to always maintain the same global view of the networkings. Due to limited synchronization budget, one crucial question is to systematically investigate a way on how these controllers should synchronize with each other to maximize the network performance?
In our recent paper DQ Scheduler: Deep Reinforcement Learning Based Controller Synchronization in Distributed SDN, which won the Best Paper Award at the 53rd IEEE International Conference on Communications (IEEE ICC 2019), we have invented a novel deep reinforcement learning approach to maximize the overall performance of the modern programmable networking architecture called Software-Defined Networking (SDN) that supports 5G. This method synchronizes the distributed SDN controllers and create a dynamic control policy that lets the controllers know when and how to exchange information so as to efficiently deliver messages and utilize network resources, for applications such as content delivery, large-scale mobile computing, and distributed machine learning.
Distributed SDN and SDN Controller Synchronizations
Software-Defined Networking (SDN) is an emerging networking architecture which significantly improves network performance due to its programmable network management, easy reconfiguration, and on-demand resource allocation. One key attribute that differentiates SDN from classic networks is the separation of the SDN’s data and control plane. Specifically, in SDN, all control functionalities are implemented and abstracted in the SDN controller, which sits in the control plane, for operational decision making; while the data plane, consisting of SDN switches, only passively executes the instructions received from the control plane.
Since the logically centralized SDN controller has full knowledge of the network status, it is able to make global optimal decisions. Yet, such centralized control suffers from major scalability and reliability issues. In this regard, distributed SDN is proposed to balance the centralized and distributed control. A distributed SDN network is composed of a set of subnetworks, referred to as domains, each managed by a physically independent SDN controller. The controllers synchronize with each other to maintain a logically centralized network view. However, since complete synchronization among controllers, i.e., all controllers always maintain the same global view, will incur high costs, especially in large networks, practical distributed SDN networks can only afford partial inter-controller synchronizations. Partial synchronizations imply that a controller can only exchange a limited amount of information with a limited number of other controllers.
Markov Decision Process, Reinforcement Learning, and Deep Learning
Markov Decision Process (MDP) offers a mathematical framework for modelling serial decision-making problems, with the assumption that the status (a state in the MDP) of the decision-making problem at the next time step can be fully determined by the status and the current decision made (an action in the MDP). As for Reinforcement Learning (RL), imagine an agent who jumps from state to state in the MDP by taking some actions associated with certain rewards. The agent’s goal is to discover a sequence of state-action pairs, i.e., a policy, which maximizes the accumulated time-discounted rewards. By interacting with the environment modelled by the MDP, the agent’s experiences build up through trial-and-error, where good decisions are positively enforced by positive rewards, and bad ones the opposite. During training, the most important aspect is how the agent generalizes and memorizes what it has learned.
Traditionally, the agent’s estimations of future reward following state-action pairs are kept in a tabular fashion. However, this approach soon becomes impractical in most RL tasks because of large state-action space sizes. In light of this, function approximators have been proposed to approximate the value function, which represents the agent’s past experiences as it estimates the potential value of a given state-action pair. Among these approximators, Deep Neural Network (DNN) stands out due to its exceptional ability in capturing latent and complicated relationships from input data.
- MDP is used to model the SDN controller synchronization policy
- We use RL techniques to find the synchronization policy based on the MDP formulation
- DNN is used to approximate the value function in the RL task, which estimates the goodness of a state-action pair.
Why DRL approach is a good way to solve the SDN controller synchronization problem
RL-based approaches are especially appealing for developing the controller synchronization policy under distributed SDN for the following reasons:
- The abundance of network data made available by SDN switches through the OpenFlow protocol builds up a pool of past experiences which are the ideal trial-and-error inputs for RL algorithms.
- Different SDN domains can be highly heterogeneous; as such, SDN networks are complex systems. Therefore, accurately modelling such systems becomes extremely difficult and mathematically intractable.
In light of this, the model-free RL-based approaches are especially attractive, as they come without any constraints on network’s structure or its dynamicity, thus adaptable for handling real-world SDN networks.
Inter-domain routing focused controller synchronization applications and a motivating example
To materialize potential performance gains controller synchronization can bring under distributed SDN, we chose inter-domain routing as the application of interest. We asked: In dynamic networks whose topologies evolve over time, given the controller synchronization budget and for a set of source and destination nodes located in different domains that send/receive data packages, how does the source controller synchronize with other controllers on the domain-wise path at each time slot, to maximize the benefit (better routing quality in terms of reduced path costs) of controller synchronization over a period of time?
In Fig.1a and Fig.1b, suppose the source node v1 in A1 sends packets to the destination node v2 in A4. The topology in Fig.1a represents A1’s controller’s view of the network, which was obtained during synchronizations between A1 and A2–A4 in the past. However, due to the dynamicity of the networks, the actual topology evolves into the one in Fig.1b, which is not promptly synchronized to the source controller. As a result, the source controller, with the outdated view of the network, still uses the old flow table entries which direct packets sent to v2 to gateways a, b, and c, respectively. In comparison, the source controller will select a shorter route (green lines) that involves b’ and c’ as egress gateways in domains A2, and A3, should it obtain the most up-to-date network topology through synchronizations. This example highlights the important role of controller synchronization in dynamic networks.
Formulating the controller synchronization problem as an MDP
In order to apply DRL approaches to assist the synchronization policy designs, we first formulate the controller synchronization problem as an MDP. Specifically, we use 3-tuple (S, A, R) to characterize the formulated MDP.
- S is the finite state space. In our problem, a state corresponds to the counts of time slots since the source controller was last synchronized with other controllers on the domain-wise path.
- A is the finite action space. An action with respect to a state is defined as the decision to synchronize with the selected domain(s), subject to the given synchronization budget.
- R represents the immediate reward associated with state action pairs, denoted by R(s, a), where s ∈ S and a ∈ A. R(s, a) is calculated as the average reductions in path cost associated with an (s, a) tuple.
The MDP formulation is demonstrated in an example in Fig. 2, where there are six domains on the domain-wise path between the source and destination nodes. The first entry in the state vector indicates that the last synchronization between the source controller and the controller of A2 took place five time slots ago. The action vector consists of binary entries where 1 indicates that the source controller will synchronize with the corresponding domain at current time slot and 0 the opposite. The action vector in the above example indicates that under the synchronization budget of 1, the source controller will synchronize with A5 only.
For evaluations, we conducted three sets of experiments with different number of domains (m in the plot, below) on the domain-wise path. We compare the synchronization strategy developed by the DQ Scheduler with four other synchronization scenarios: performance of (1) lower / (2) upper bounds are the “Optimal” and “No sync” scenarios, which mean that all controllers are always mutually synchronized with each other and no one controller synchronizes with any other controllers at all times, respectively; (3) anti-entropy algorithm (implemented in ONOS controller) and (4) fixed frequency algorithms. We show the results in two groups of plots where the first group are plots of the optimization objectives (time-discounted, accumulated reduction in path costs) over time slots. The second group of plots show the real-time path cost over time slots.
The Average Path Cost (APC) in Fig.3, below, measures the quality of constructed paths. A smaller path is better. These evaluation results demonstrate the superiority of DQ Scheduler in which, during the testing period of 300 time slots, it outperforms the anti-entropy algorithm by 31.2%, 58.3%, and 95.2%, and the algorithm with constant synchronization rate by 90.9%, 90%, and 173.3% in three scenarios, respectively.
This work is our initial attempt in tackling control problems in SDN by DRL approaches, which have seen high-profile success in other domains, such as AlphaGo. Our experiences so far have demonstrated that DRL techniques are indeed suitable candidates in solving control problems in complex SDNs.
However, there are still a couple of outstanding issues which require further investigation. One such issue is the mappings between value functions to policies. Although we have trained the DNN to estimate the goodness of a state-action pair, which can help us choose the best actions by exhausting all possible actions and select the best one, such indirect approach is not sustainable when the action space is large. In another recently completed work, we addressed this issue by carefully designing the DNN employed.