During cloud migrations, we come across scenarios where there is a need to migrate or transfer files (typically unstructured data), from on-premises (SAN/NAS) to a specific storage service in AWS (e.g., EBS/EFS/S3/FSx). These can be files generated by the application, user uploads, integration files that are created by one application and consumed by others (B2B), etc. In most cases, these unstructured data may vary in total size from a few MBs to 1 TB, and most importantly, the underlying application is not expected to undergo a lot of remediation to utilize the target AWS service.
In this blog post, we share our experience with three specific use cases around unstructured data migration to AWS:
Application A picks up incoming files from an Application X, processes them and generates data files that are 50–300 GB. That, then, becomes the input for another Application Y to consume. The data is shared by means of an NFS Storage accessible to all three applications.
Application A is being migrated to AWS and the Applications X and Y continue to remain on-premises. We used AWS Elastic File System (EFS) to replace NFS on AWS. However, that makes it difficult for the applications to read/write from a common storage solution, and network latency slows down Application X and Application Y
In this case, we used AWS DataSync Service (link resides outside ibm.com) to perform the initial migration of nearly 1 TB of data from the on-premises NFS storage to AWS EFS.
AWS DataSync can transfer data between any two network storage or object storage. These could be network file systems (NFS), server message block (SMB) file servers, Hadoop distributed file systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems and Amazon FSx for OpenZFS file systems.
To solve the need for the applications to read/write from a common storage solution and address the network latency involved during read/write operations across the Direct Connect, we scheduled a regular synchronization of the specific input and output folders using the AWS DataSync service between the NFS and EFS. This means that all three applications look at same set of files after the sync is complete.
Another Application B had to process lot of unstructured data from an FTP location. These files were transferred to the application server through SFTP by dependent applications. Since this application is moved to AWS, we must have the dependent applications also transfer these files to a storage location in AWS.
AWS Transfer Family (link resides outside ibm.com) provides options to transfer your unstructured data to an S3 bucket or an EFS Storage using SFTP, FTPS or FTP protocols. This easily integrates with any standard FTP client (GUI- or CLI-based) and thus allows you to transfer your data from on-premises to AWS. As a managed service backed by in-built autoscaling features, it can be deployed in up to three availability zones to achieve high availability and resiliency.
Private VPC endpoints are available to securely transfer data within the internal network.
AWS Transfer Family can also be used for a one-time data migration for B2B Managed File Transfer.
We used an EFS mount on the application server and directed the other dependent applications to use the AWS Transfer Family SFTP private endpoint to send the files securely. The authentication was handled via SSH Key Pair so that there is no hardcoded username/password in either location. This way, we do not expose the application server over SSH port 22, which was a client-mandated security control.
It was very easy to set up and get going because our application was running in Linux.
However, FSx is not a supported target storage option because AWS Transfer Family suits use cases for a target application hosted on Linux platforms. Some additional programming is needed to access an S3 Bucket if a Windows-based application must consume these managed services.
There is a $0.30 per hour fixed charge while the service is enabled and $0.04 per gigabyte (GB) data upload/download charges are applicable.
Application C used (read/write) a lot of data from a native file system, which was needed in AWS when the application was migrated to AWS. This data on native file systems could not be migrated as-is to EBS Volumes or EFS Storage because both source and target should be network file storage to use the AWS native file/data transfer solutions.
While we could have presented the native file system as NFS share and used AWS DataSync as in the second scenario, this would have required additional installation and configuration on source servers, which is usually not desired in case of Migrations.
We used traditional tools like rsync/robocopy (link resides outside ibm.com) to copy data to AWS Storage like EFS (mounted on EC2) or EBS volumes.
We used a shell script based on rsync to pull data from an on-premises server to the EC2 instance, keeping in mind the security mandate not to expose EC2 instances on SSH port 22. Due to rsync features and good bandwidth available with Direct Connect, the data migration was seamless.
While rsync/robocopy is a good fit for the above problem, it may not be suitable if the following characteristics are exhibited by the application and the environment:
There are no ingress charges, and it is $0.08 to $0.12 per gigabyte (GB) for egress to Internet/on-premises.
In this post, we discussed very common use cases in data migrations to AWS Cloud and how native and traditional tools are used to tackle some unique situations. To summarize our experience, a quick comparison of these tools is depicted below:
We did not discuss the option of using AWS Snow Family (link resides outside ibm.com) due to feasibility issues in the scenarios. It requires physical access to a data center and is only appropriate for transferring very large data (in many TBs) — our data was not very large for any of the above use cases.
Similarly, AWS Storage Gateway (link resides outside ibm.com) was not considered as it is ideal for on-prem backup/archival/DR scenarios and none of the use cases had that requirement.
There are managed services (link resides outside ibm.com) available on AWS for data migrations and each of them cater to a very specific set of use cases.
We will continue to share our experience as we encounter new scenarios for transferring or storing unstructured data in AWS.
We explored why some organizations are prepared for both the disruption and potential of AI. Find out what these AI-ready companies have in common.
Get an in-depth understanding of how hybrid cloud blends private and public cloud environments to enhance your business. Learn about its components, benefits and use cases, and see how it can drive transformation and innovation in your organization.
Learn how DevOps streamlines development and operations, boosting collaboration, speed and quality. Explore key practices and tools to enhance your organization’s efficiency.
Discover IBM cloud migration solutions designed to streamline your journey to the cloud. Learn about different migration types, strategies and benefits that drive efficiency, scalability and innovation.
Create your free IBM Cloud account and access 40+ always-free products, including IBM Watson APIs.
IBM Cloud is an enterprise cloud platform designed for regulated industries, providing AI-ready, secure, and hybrid solutions.
Unlock new capabilities and drive business agility with IBM’s cloud consulting services. Discover how to co-create solutions, accelerate digital transformation, and optimize performance through hybrid cloud strategies and expert partnerships.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com