Share this post:
Cloud technology has certainly brought along more convenience, enabling anyone to access their photos, email and other applications anywhere and at anytime. But not all companies have moved everything to the cloud just yet. Those looking to migrate typically have to break down their apps into smaller chunks in a process dubbed refactoring — or restructuring their computer code without changing its intended functions.
These chunks, called clusters or microservices, then get supported by cloud-native applications, and developers can choose any programming language they want to manage them.
But there’s a problem. Refactoring an application is not easy — it involves splitting the code and identifying its functional boundaries so that the overall code remains intact.
To address this issue, we’ve turned to AI to automatically break down the overall application by representing application code as graphs. Our AI relies on Graph Representation Learning — a popular method in deep learning. Graphs are a natural representation for software and applications. We translated the application to a graph where the programs become nodes. Their relationships with other programs become edges and determine the boundary to separate the nodes of common business functionality.
We applied static program analysis to translate the application implementation structure into a graph structure, and propose a novel graph neural network (GNN) that preserves the graph representation and performs the clustering task. To parse the code — to go through the lines of code — we used static program analysis tools such as soot. Our results surpass other approaches — those based on traditional software engineering and also graph-based methods.
In our paper “Graph Neural Network to Dilute Outliers for Refactoring Monolith Application” available here, and presented at AAAI 2021, we detail how programs in an application become nodes in the graph, and their relationships with other programs become edges. This way, the application refactoring task turns into a clustering task — in other words, into grouping of nodes that tend to contribute to a common business functionality. In general, clustering is grouping a set of objects so that objects in the same group are more similar to each other than to those in other groups.
Figure 1: Comparison of the proposed approach: CO-GCN method with the baselines across the four public applications on the (a) Structural Modularity (b) Modularity (c) 1-NED and (d) IFN metrics. The CO-GCN method clearly outperforms the considered baselines.
How does it work?
Take the online stock trading system. The application allows users to login, view their portfolio, lookup stock quotes, and buy or sell stock shares. In the traditional world, all these functionalities would be bundled into a single application — a so-called monolith application.
Many companies have, until now, followed this principle because it was easy to deploy. But when multiple developers contribute to the functionalities, it’s possible for the programs intended for single functionality to get overloaded with different business functions. Say, the developers might mix logic to validate users for login functionality, and sell stock functionality might be coded in the same program files — meaning that a single program is now overloaded with two business functionalities. Also, even though there are good developer practices like maintaining a dedicated package for program files per functionality, they could be corrupted.
With the GNN, it’s different.
For instance, if a program A calls — or invokes — a program B, we can then form a graph with nodes A and B and establish an edge between them based on the call relationship type. Similarly, a program C might extend a program D — so we then add C and D nodes to the graph and form an edge between C and D with the relationship type known in programming language as an extend.
All programs in the application then become nodes. GNNs aim to learn graph representations that preserve both attributive and structural information and perform well on tasks such as question answering, and they are showing promising results in many tasks from different domains.
Social networks are a familiar example where the nodes can be users and their connections can be edges — and GNNs are applied to perform many tasks like detecting communities or recommending connections based on similarity of nodes.
We’ve relied on GNNs to put applications into graph representation, unifying the tasks of node representation that deal with preserving the semantics structure of the graph, outlier node detection and dilution to detect anomalous nodes and ignore their influence for clustering, and node clustering. Our network was able to capture meaningful information about the roles of different components of the application.
We picked four public monolith applications written in Java and .Net programming languages, and broke them down into multiple business clusters. Each business cluster contained mostly programs pertaining to that business functionality. We validated the performance of the clusters using specific metrics and showed that the AI was able to break down applications extremely effectively.
In the paper, we also recommend the top programs that need the developers’ immediate attention to decouple or isolate business functionality. We believe that our recommendations could save developers days of efforts.
IBM Research AI is proudly sponsoring AAAI2021 as a Platinum Sponsor. We will present 40 main track papers, in addition to at least seven workshop papers, 10 demos, four IAAI papers, and one tutorial. IBM Research AI is also co-organizing three workshops. We hope you can join us from February 2-9 to learn more about our research. To view our full presence at AAAI 2021, visit here.
Inventing What’s Next.
Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.