What is a data flow diagram (DFD)?

22 November 2024

Authors

Gregg Lindemulder

Matthew Kosinski

Enterprise Technology Writer

What is a data flow diagram (DFD)?

A data flow diagram (DFD) is a visual representation of the flow of data through an information system or business process. DFDs make complex systems easier to understand and are a popular resource for software engineering, systems analysis, process improvement, business management and agile software development.

A data flow diagram uses graphical symbols to illustrate the paths, processes and storage repositories for data from the point it enters a system until it exits. This visual model helps professionals identify ways to improve the efficiency and effectiveness of existing systems and processes, and create new ones.

For example, a DFD of an insurance claim process would visualize how a claim is:

  1. Submitted by a customer.
  2. Processed and evaluated by the insurance company.
  3. Reviewed or investigated by an adjuster.
  4. Denied or paid out to the policyholder.

Analysts can examine the DFD to reveal bottlenecks in the process, detect areas where fraud is likely to occur, help stakeholders understand the process and make design improvements.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

History of data flow diagrams

In the 1970s, software engineers Larry Constantine and Ed Yourdon introduced data flow diagrams in their book, "Structured Design." Instead of focusing on software procedures, they based DFDs on how data moves within a software system. 

Computer scientists Tom DeMarco, Chris Gane and Trish Sarson helped popularize DFDs by developing standardized data flow symbols and notations that are still used today.

Initially, data flow diagrams were used mainly in software engineering. After discovering their value for understanding and improving business processes and workflows, business professionals began using them.

Following the introduction of unified modeling language (UML) in the 1990s, software programmers no longer relied exclusively on data flow diagrams for software engineering. UML diagrams provide an intricate, detailed view of structures and behaviors in complex object-oriented systems.

Today, DFDs are used primarily as complementary tools to UML diagrams and flowcharts, providing high-level system overviews during software development.

Why are data flow diagrams important?

Data flow diagrams are important because they make it easier to understand the flow of information through complex systems or processes. By visualizing the components of an entire system, DFDs can help users:

  • Gain clarity: A visual representation with simple symbols and labels provides a clearer understanding of complex systems than paragraphs of descriptive text.
  • Analyze systems: DFDs show the relationships and interactions between the components of a system or process for easier analysis.

  • Identify problems: DFDs can make it easier to isolate system design problems such as bottlenecks, inconsistencies, redundancies and others.

  • Improve processes: DFDs help analysts visualize new ways to optimize data flows to accelerate and improve business processes.

  • Drive collaboration: DFDs promote effective communication and collaboration by providing a shared point of reference for stakeholders across an organization.

  • Create documentation: DFDs capture essential information such as the sequence, requirements and processes of a data flow so it can be easily documented.

  • Protect data: DFDs indicate where sensitive information enters and exits a system to help address potential data security risks. 

Components of data flow diagrams

There are 4 main components of a DFD:

  • External entities
  • Processes
  • Data stores
  • Data flows

External entities

These are the starting and ending points for the data flow in a DFD. External entities are placed on the edges of a DFD to represent the input and output of information to the entire system or process. 

An external entity could be a person, organization or system. For example, a customer could be an external entity in a DFD that models the process of making a purchase and receiving a sales receipt. External entities are also known as terminators, actors, sources and sinks.

Processes

Processes are activities that change or transform data. These activities could include computation, sorting, validation, redirection or any other transformation required to advance that segment of the data flow. For example, a credit card payment verification would be a process that occurs within a customer's purchase DFD.

Data stores

These are the locations in a DFD where data is stored for later use. Data stores could represent databases, documents, files or any repository for data storage. For example, data stores in a product fulfillment DFD might include a customer address database, product inventory database and a delivery schedule spreadsheet.

Data flows

Data flows are the routes that information takes as it travels between external entities, processes and data stores. For example, in an e-commerce DFD, the route that connects a user entering login credentials with an authentication gateway would be a data flow. 

Symbols used in data flow diagrams

Standardized symbols and notations such as circles, ovals, arrows and rectangles are used to visually represent DFD components. There are 2 common sets of notations used in data flow diagram templates today: the Yourdon and Coad methodology and the Gane and Sarson methodology. Both systems are named after the computer scientists who created them.

The methodologies differ in the symbols that they use to represent processes and data stores but are otherwise the same.

  • External entities: rectangles
  • Processes: circles (Yourdon and Coad) or rectangles with rounded corners (Gane and Sarson)
  • Data stores: parallel lines (Yourdon and Coad) or open-ended rectangles (Gane and Sarson)
  • Data flows: horizontal lines

Types of data flow diagrams

There are 2 types of DFDs that offer different perspectives on a system or process: logical DFDs and physical DFDs.

Logical DFDs

A logical DFD provides a high-level view of the data flows that are required to perform business or system processes, without going into technical or implementation details. The focus is on the data that is needed and how it moves through the process to complete the business objective. 

Logical DFDs can represent business activities such as order fulfillment at a warehouse, a customer making an online purchase or the intake of a patient at a healthcare facility.

Physical DFDs

A physical DFD visualizes the implementation of a system or process, including the required software, hardware and files. Physical DFDs focus on the underlying technologies, procedures and operations of a system or process.

Physical DFDs are often used to represent complex systems and workflows, such as how supply chain software maintains inventory at a warehouse or how electronic health records securely move through a hospital system. 

Levels of data flow diagrams

Data flow diagrams are sometimes created with multiple DFD levels to show progressively more details about a system or process. This layered approach begins with a simple, high-level view and becomes more complex as lower-level DFDs dive deeper into processes and subprocesses.

Level 0

Also called a “context diagram,” a level 0 DFD is a high-level view that visualizes the entire system as a single process. It is the simplest and most basic of the levels. It should be easily understandable to anyone who views it, regardless of technical skill or job role.

Level 1

A level 1 DFD explores the component parts of the high-level process in more detail. What was a single process in the context-level DFD is broken into subprocesses that provide more information on the function and data flow pathways.

Level 2

Level 2 provides even more granular details by adding new subprocesses and their interactions and relationships with data flows and data stores. This level offers a highly intricate view of the inner operations of a system or process.

Level 3

Because DFDs are intended to be accessible and easy to understand, it is unusual to go beyond the intricacy of level 2. However, highly complex systems might require the elaborate detail of a level 3 DFD, which maps every single aspect of a data process or system.

AI Academy

Is data management the secret to generative AI?

Explore why high-quality data is essential for the successful use of generative AI.

Rules for data flow diagrams

Most data flow diagrams follow the same basic rules:

  • Each data flow is labeled with brief, descriptive text that identifies the type of data being moved.

  • Each process is labeled with a brief verb phrase that describes the data transformation being performed.

  • Each data store is labeled with a noun or noun phrase that describes the data and storage type.

  • Every process and data store has a minimum of one input and one output.

  • Data stores cannot be connected directly to external entities.

  • External entities can transmit data to a process, but cannot transmit data directly to a data store.

  • For clarity, data flows do not cross with one another.
Related solutions
Data management software and solutions

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Explore data management solutions
IBM® watsonx.data™

Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.

Discover watsonx.data
Data and analytics consulting services

Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.

Discover analytics services
Take the next step

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Explore data management solutions Discover watsonx.data