A data flow diagram (DFD) is a visual representation of the flow of data through an information system or business process. DFDs make complex systems easier to understand and are a popular resource for software engineering, systems analysis, process improvement, business management and agile software development.
A data flow diagram uses graphical symbols to illustrate the paths, processes and storage repositories for data from the point it enters a system until it exits. This visual model helps professionals identify ways to improve the efficiency and effectiveness of existing systems and processes, and create new ones.
For example, a DFD of an insurance claim process would visualize how a claim is:
Analysts can examine the DFD to reveal bottlenecks in the process, detect areas where fraud is likely to occur, help stakeholders understand the process and make design improvements.
In the 1970s, software engineers Larry Constantine and Ed Yourdon introduced data flow diagrams in their book, "Structured Design." Instead of focusing on software procedures, they based DFDs on how data moves within a software system.
Computer scientists Tom DeMarco, Chris Gane and Trish Sarson helped popularize DFDs by developing standardized data flow symbols and notations that are still used today.
Initially, data flow diagrams were used mainly in software engineering. After discovering their value for understanding and improving business processes and workflows, business professionals began using them.
Following the introduction of unified modeling language (UML) in the 1990s, software programmers no longer relied exclusively on data flow diagrams for software engineering. UML diagrams provide an intricate, detailed view of structures and behaviors in complex object-oriented systems.
Today, DFDs are used primarily as complementary tools to UML diagrams and flowcharts, providing high-level system overviews during software development.
Data flow diagrams are important because they make it easier to understand the flow of information through complex systems or processes. By visualizing the components of an entire system, DFDs can help users:
There are 4 main components of a DFD:
These are the starting and ending points for the data flow in a DFD. External entities are placed on the edges of a DFD to represent the input and output of information to the entire system or process.
An external entity could be a person, organization or system. For example, a customer could be an external entity in a DFD that models the process of making a purchase and receiving a sales receipt. External entities are also known as terminators, actors, sources and sinks.
Processes are activities that change or transform data. These activities could include computation, sorting, validation, redirection or any other transformation required to advance that segment of the data flow. For example, a credit card payment verification would be a process that occurs within a customer's purchase DFD.
These are the locations in a DFD where data is stored for later use. Data stores could represent databases, documents, files or any repository for data storage. For example, data stores in a product fulfillment DFD might include a customer address database, product inventory database and a delivery schedule spreadsheet.
Data flows are the routes that information takes as it travels between external entities, processes and data stores. For example, in an e-commerce DFD, the route that connects a user entering login credentials with an authentication gateway would be a data flow.
Standardized symbols and notations such as circles, ovals, arrows and rectangles are used to visually represent DFD components. There are 2 common sets of notations used in data flow diagram templates today: the Yourdon and Coad methodology and the Gane and Sarson methodology. Both systems are named after the computer scientists who created them.
The methodologies differ in the symbols that they use to represent processes and data stores but are otherwise the same.
There are 2 types of DFDs that offer different perspectives on a system or process: logical DFDs and physical DFDs.
A logical DFD provides a high-level view of the data flows that are required to perform business or system processes, without going into technical or implementation details. The focus is on the data that is needed and how it moves through the process to complete the business objective.
Logical DFDs can represent business activities such as order fulfillment at a warehouse, a customer making an online purchase or the intake of a patient at a healthcare facility.
A physical DFD visualizes the implementation of a system or process, including the required software, hardware and files. Physical DFDs focus on the underlying technologies, procedures and operations of a system or process.
Physical DFDs are often used to represent complex systems and workflows, such as how supply chain software maintains inventory at a warehouse or how electronic health records securely move through a hospital system.
Data flow diagrams are sometimes created with multiple DFD levels to show progressively more details about a system or process. This layered approach begins with a simple, high-level view and becomes more complex as lower-level DFDs dive deeper into processes and subprocesses.
Also called a “context diagram,” a level 0 DFD is a high-level view that visualizes the entire system as a single process. It is the simplest and most basic of the levels. It should be easily understandable to anyone who views it, regardless of technical skill or job role.
A level 1 DFD explores the component parts of the high-level process in more detail. What was a single process in the context-level DFD is broken into subprocesses that provide more information on the function and data flow pathways.
Level 2 provides even more granular details by adding new subprocesses and their interactions and relationships with data flows and data stores. This level offers a highly intricate view of the inner operations of a system or process.
Because DFDs are intended to be accessible and easy to understand, it is unusual to go beyond the intricacy of level 2. However, highly complex systems might require the elaborate detail of a level 3 DFD, which maps every single aspect of a data process or system.
Most data flow diagrams follow the same basic rules:
Learn how an open data lakehouse approach can provide trustworthy data and faster analytics and AI projects execution.
IBM named a Leader for the 19th year in a row in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools.
Explore the data leader's guide to building a data-driven organization and driving business advantage.
Discover why AI-powered data intelligence and data integration are critical to drive structured and unstructured data preparedness and accelerate AI outcomes.
Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.
Explore how IBM Research is regularly integrated into new features for IBM Cloud Pak® for Data.
Gain unique insights into the evolving landscape of ABI solutions, highlighting key findings, assumptions and recommendations for data and analytics leaders.
Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.