Why big data overwhelms enterprise networks

Finding a better big data transport solution

Today, businesses across industries are facing greater challenges moving large files and massive sets of data quickly and reliably between global sites and teams.

Failing to meet these challenges can limit an organization’s ability to meet critical business imperatives that yield increased revenues, reduced costs, improved customer service, and new or improved business models.

Big data movement can include virtually any number of use cases, such as quickly sending a patient’s genomic sequencing data to a medical expert across the world for critical analysis or securely uploading massive volumes of new video content to online media providers so subscribers have access to the latest movies, music and TV shows.

As the size and volume of data continues to explode and permeate more business processes and decisions, the speed that data moves over the WAN becomes more crucial. However, most enterprise tools in use today cannot reliably and securely move large files and data volumes at high speed over global distances. This is due to the inherent limitations of the Internet’s underlying transfer technology called the Transmission Control Protocol (TCP).

The great TCP bottleneck

The Internet Protocol is the ubiquitous communications protocol for the Internet, serving as the primary means of relaying data across networks that essentially form the Internet.

Given its relative reliability versus other transport protocols, TCP is the most commonly used IP transport layer, which is used to create connections between specific network systems and applications.

Originally developed in the early 1970s when networks were local and data files were small, TCP served as the underlying protocol that enabled data to move efficiently and reliably over LANs with very little bandwidth. To provide this reliability, TCP will guarantee that application data transmitted within a single TCP connection is delivered to an application at the receiver in the same order that it was sent. To do so, TCP uses an algorithm that establishes a connection between sender and receiver, serially creates transfer requests, and then splits the data into small packets that are individually sent over the network without acknowledgment until it reaches a defined limit of unacknowledged packets (this is referred to as the TCP window).

The figure 1 diagram below is a simplified representation that shows the TCP algorithm sending small packets over the network and how the TCP window controls how much data can be transmitted without acknowledgement. As packets are delayed, TCP slows the rate of transmission with a smaller window size that further limits the number of unacknowledged packets allowed.

Diagram showing TCP waits to send each data packet based on receiving acknowledgement from the receiving side

Figure 1 – Diagram showing TCP waits to send each data packet based on receiving acknowledgement from the receiving side.

Should a receipt acknowledgement not complete within the TCP window, the protocol will retransmit the lost or delayed packet. At the same time, the protocol will slow down the packet send rate to ensure reliable in-order delivery of packets and to minimize network congestion.

The TCP transfer algorithm proves efficient when data movement occurs over shorter distances and when networks are not congested. But when round-trip time (RTT) of a packet transfer inevitably increases with distance, performance and efficiency suffer.

Have you ever noticed that your average upload and download speeds over the Internet often fail to closely match your available bandwidth? There is a reason for this. Under ideal situations, a network doesn’t lose data in transit. In real-world conditions, however, it’s common for packets traversing a network to occasionally drop, especially when moving large volumes of data at high speed. In the era of big data and cloud-based applications and storage, wide area networks (WANs) have become increasingly burdened with huge files and massive volumes of data, which can increase packet loss. This can be due to several causes including over subscription, when network nodes, routers, along the transfer path drop packets because they arrive at the nodes faster than they can be processed.

Most network resources along a transfer path today are typically shared among multiple applications and systems, so it’s not possible to provide all of the available bandwidth all the time to a single transfer. This is especially true when very large, high-speed transfers overload network resources for long spans of time. When the buffers on a network device reach capacity and cause packets to drop, TCP aggressively throttles back in an attempt to reduce congestion and gradually ramps back up at a much slower rate. In these circumstances, TCP intrinsically couples reliability (retransmission) with congestion control, which creates a severe artificial throughput penalty for file transport. This extreme throttling down and gradual transfer rate increase severely impacts transfers of large files and data sets.

What’s behind the penalty of distance

Whether over copper wires, optical fiber or wireless radio signals, there is a baseline amount of time it takes for data to physically travel between any two endpoints in a network.

RTT is the time it takes to send data between the origination and destination points plus the time it takes for delivery acknowledgment. RTT increases as distance and congestion grows along the transfer path, more intermediary nodes are introduced and queueing delays increase.

RTT plays a significant role in the way TCP determines the amount of data that can stay in flight between endpoints unacknowledged, the TCP window, and the rate that new packets are sent over the network. When data is transferred over longer distances on large capacity networks, more data exists in flight because RTT is higher. These delays with greater amounts of data in flight triggers TCP to severely decrease the transfer rate, after which it will take a substantial amount of time to recover and resume at the same speed before throttling occurred. Figure 2 displays the impact of RTT and packet loss on TCP throughput. It shows the maximum throughput achievable under various packet loss and network latency conditions for conventional TCP-based transfer technologies. The throughput has a hard theoretical limit that depends only on the network RTT and the packet loss.

Graph showing how TCP-based transfer throughout is severely degraded by distance and network conditions

Figure 2. Graph showing how TCP-based transfer throughput is severely degraded by distance and network conditions.

The inherent design limitation of TCP over WANs creates a paradox—as bandwidth capacity is added to the network, utilization efficiency decreases dramatically. If a business increases FTP traffic in proportion to added bandwidth capacity, it will see a decline in ROI on additional network bandwidth. With ever-growing file sizes and file volumes, the impact of TCP performance limitations on global business operations and collaboration between dispersed teams becomes more substantial.

Considering open-source technology alternatives

Alternative transfer technologies have gained in popularity given enterprises’ now urgent demand to transfer, send, share and sync large unstructured files and data sets.

Most of these high-speed transfer tools attempt to recreate the reliability that TCP provides at an application level, while using the User Datagram Protocol (UDP) for network transport.

While open-source alternatives such as UDT and Tsunami may work in very specific and controlled network situations, there is a high cost to network efficiency. The oversimplified design of these protocols means they inefficiently flood the network with data, severely impacting other applications running on the shared network and ultimately providing minimal performance gains.

There is another solution that eliminates the inherent bottlenecks of TCP and existing open-source protocols through an entirely different approach.

A better transfer technology—IBM Aspera FASP

Transporting bulk data with maximum speed calls for an end-to-end approach that fully utilizes available bandwidth from data source to data destination.

Accomplishing high performance along the entire transfer path requires a new and fundamentally different approach to bulk data movement. This approach would need to address the great range of network RTTs, packet loss rates and network bandwidth capacities that are typically found on commodity Internet WAN environments today.

IBM® Aspera® FASP® (Fast, Adaptive and Secure Protocol) is a bulk data transport technology implemented at the application layer that provides secure high-speed transfer while remaining compatible with existing network and infrastructure. The protocol retransmits only needed data that is not still in flight, for 100% “good data” throughput. Its rate control for universal deployment on shared Internet networks upholds the principles of bandwidth fairness and congestion avoidance in the presence of other network traffic, while providing the option to dedicate bandwidth for high-priority transfers when needed. See figure 3, which shows the throughput achieved under various packet loss and network latency conditions using the innovative Aspera FASP protocol. Using Aspera, bandwidth efficiency does not degrade with latency and is highly resilient to packet loss.

Graph showing how Aspera FASP transfers remain unaffected by transfer distance and highly resilient to network conditions.

Figure 3. Graph showing how Aspera FASP transfers remain unaffected by transfer distance and highly resilient to network conditions.

Aspera FASP technology is a unique, patented technology for transferring large distinct files and streaming data and high bit rate video over long-distance networks where packet loss and RTT typically cripple TCP transfers. The adaptive rate control of Aspera FASP allows transfers to quickly ramp to fully utilize a shared network’s available bandwidth by dynamically detecting and adjusting the transfer rate as necessary, allowing other TCP-based applications to function properly.

With Aspera FASP, users can deliver live video and growing files as well as exchange files and data sets of any size—from many gigabytes to multiple terabytes and larger—quickly and reliably around the world. In addition, the protocol integrates the latest security technologies, practices and auditing technologies to keep your data safe.

Shorten your transfer times from hours to minutes

In the real-time digital world of business, organizations need to access and move large files and data between globally dispersed teams and systems in seconds and minutes, not hours or days.

Figure 4 shows that moving a 10 GB file or data set across the US will take 10-20 hours on a typical 100 Mbps line using standard TCP-based file transfer tools. As the transfer distance extends to Europe and Asia, TCP-based transfer times quickly become impractical and unreliable.

Chart comparing Aspera FASP performance with traditional TCP-based transfer tools over long distances

Figure 4. Examples of 10 GB file transfer times over varying distances and network bandwidths.

As a result, organizations suffer from lack of productivity as wait times extend into hours and days because decisions and actions will slow as the amount of data that can be feasibly used limits the relevancy of the insights gleaned from data analysis.

“With the speed that Aspera FASP offers, you don't need a local ingest site. You can actually transfer data from anywhere in the world.”

— Suresh Bahugudumbi, Senior Manager, NetApp

Hollywood studios, major broadcasters, telecommunications operators, sports leagues, oil and gas companies, life sciences organizations, government agencies and Fortune 500 corporations all face a common challenge: securely and cost-effectively transferring large amounts of data at high speeds for real and near-real time applications. How well they can achieve this goes beyond meeting a given challenge or enabling a single application. It can mean the difference between high ROI and diminishing profits, business success and failure.

Many companies across these and other industries rely on Aspera software for mission-critical transport of their most valuable digital assets—even when they are moving in excess of one terabyte per day over WAN infrastructures.