IBM Support

How to split large files into multiple smaller size files to overcome size restrictions on file transfer

How To


Summary

Log files can be large in size, ranging from hundreds of Megabytes to multiple Gigabytes.
If there is a restriction to upload files due to size restrictions, we can split large into multiple files of smaller size chunks. Then users can transfer the files by uploading multiple smaller size files instead of one large file. On the receiver end, the user can download the smaller chunk files and reassemble them to the original file.

Objective

To enable the transfer (upload or download) of large files by splitting them into multiple files of smaller size.

Environment

Linux Platform

Steps

Follow the steps to split a file into smaller chunks, transfer to the IBM Support case and then re-assemble it on the evidence system. Sometimes, it might be required for the Support team to share a large file with the customer by splitting it into smaller chunks. the same procedure can be used by uploading the split file parts to send to the customer. 

Step 1: Check the file size of log.

$ du -sh collect-20250620-083723.tar.gz
640M collect-20250620-083723.tar.gz

$ ls -l reassemble-collect.tar.gz 
-rw-r--r--. 1 root root 670202803 Jun 20 05:02 collect-20250620-083723.tar.gz

Step 2: Identify the file checksum. 

$ sha256sum collect-20250620-083723.tar.gz 
9a3a717084a83efe0d8d61fa814dd09da0c6e1ed13be9ee3bd8e1b49d701b9e0  collect-20250620-083723.tar.gz

 Step 3: Split the file based on size 

We can use the Linux 'split' command to split files by a specific size. 

The '-b' option will split the file in chunks of bytes. 

syntax:

$ split -b <SIZE> <FILENAME>  <PREFIX>
  • The <SIZE> argument is an integer and optional unit (example: 10K is 10*1024).  Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
  • The <FILENAME> argument is the path to the large file to be split
  • The <PREFIX> argument is a prefix to the name of the Output pieces of FILE to PREFIXaa, PREFIXab, etc. If no prefix is provided, then the default PREFIX is 'x'

Example: split file collect-20250620-083723.tar.gz by size of 100M chunks with the output files prefixed with 'collect-20250620-083723.tar.gz-part'

$ split -b 100M collect-20250620-083723.tar.gz collect-20250620-083723.tar.gz-part

Step 4 : List of output files after split 

$ ls -l collect-20250620-083723.tar.gz-part*
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partaa
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partab
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partac
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partad
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partae
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partaf
-rw-r--r--. 1 root root  41057203 Jun 20 08:59 collect-20250620-083723.tar.gz-partag

Notice that the original large file size is 640M. The split command splits the file into 6 parts of 100M and the last part is the remaining 40M. The split files are organized in the naming sqeuence with the file name <PREFIX> followed by aa, ab, ac, and so and so forth. 

Step 5: Identify the checksums of collected split file parts 
These checksums must be identified on the source system where the file is split. Share the output of the checksum to the support team on the Support case as this information is required to validate the checksums after downloading the files to ensure there is no issue with the transfer. 
$ sha256sum collect-20250620-083723.tar.gz-part*
9b0dc4a83738a7fc5941281822521bc7952f2db9097e66aa7d0dd85097d35a01  collect-20250620-083723.tar.gz-partaa
e2aa6e79092bb479524fd957f7682cceb98dc31a100ab71f014e430bfa7ffa09  collect-20250620-083723.tar.gz-partab
002c9cadeecdd86351dd720bff92bfa8f485548dfea5bc5de56e82b3cad1cb21  collect-20250620-083723.tar.gz-partac
779dba2cbe9fefe5eb7590f74698c5e3f8f2860330fccf1a8149e0d2b451b71c  collect-20250620-083723.tar.gz-partad
5d20c4b4cf16bd0f3d67d08fa88fda0c145a5b01be34f33b1c57aa1cef07202d  collect-20250620-083723.tar.gz-partae
66dd1a9a9aa4b2732b14e2e649eacf519e8b7aeae3b70460038be2a83e187a0a  collect-20250620-083723.tar.gz-partaf
a9ba56810cf462cd73f7f935e5e8ff972820908bf8751e58deb1ed9d90d5e241  collect-20250620-083723.tar.gz-partag

Step 6: Upload or transfer the files via sftp, https, or any supported method (Choose the preferred method based on the type of case handling)

Enhanced Customer Data Repository (ECuRep) - Send data
Step 7: Download and verify the checksums of the file parts
The checksums must match the files as per Step 5. If the cheksums do not match, then you must request the customer to re-upload the file parts that are not matching the checksums. 
Step 8: Re-assemble the split files

You need all the parts of the split file to reassemble the original file by contactenating the files in the original order. Since the split command will automatically split the files using a sequential order of names, it is not mandatory to specify each and every file to concatenate in the order. 

You can concatenate the files by listing each file in the order:

$ cat collect-20250620-083723.tar.gz-partaa collect-20250620-083723.tar.gz-partab collect-20250620-083723.tar.gz-partac collect-20250620-083723.tar.gz-partad collect-20250620-083723.tar.gz-partae collect-20250620-083723.tar.gz-partaf collect-20250620-083723.tar.gz-partag > reassemble-collect-20250620-083723.tar.gz

OR

Specify the file prefix followed by the wildcard character '*' and let the 'cat' command concatenate using the ordering by file name automatically. 

$ cat collect-20250620-083723.tar.gz-part* > reassemble-collect-20250620-083723.tar.gz

In this command, the shell sees "collect-20250620-083723.tar.gz-part*" and expands it into a list of all files and directories in the current directory whose names start with "collect-20250620-083723.tar.gz-part".

Step 9: Validate the size and checksum of re-assembled file matches the original file

If all the split part files have matching checksum, then we expect the checksum for the re-assembled file to match too, unless some step to combine them are not done correctly. Ensure the re-assembly of the file step is done correctly.

$ du -sh reassemble-collect-20250620-083723.tar.gz
640M reassemble-collect-20250620-083723.tar.gz

$ ls -l reassemble-collect-20250620-083723.tar.gz
-rw-r--r--. 1 root root 670202803 Jun 20 09:18 reassemble-collect-20250620-083723.tar.gz

$ sha256sum reassemble-collect-20250620-083723.tar.gz
9a3a717084a83efe0d8d61fa814dd09da0c6e1ed13be9ee3bd8e1b49d701b9e0  reassemble-collect-20250620-083723.tar.gz

If all checkums of split files are correct, then the checksum of the re-assembled file must match the checksum of the original file provided. However, if the final checksum does not match the reassembled file from the customer, then it is possible that something is amiss in any of the steps. Please validate each step once again. 

Step 10: The file is now available to be reviewed

Try to review and extract the file contents. If there is a problem, then you might have to start from Step 1 again to ensure that the source file itself is not a problem. 

Additional Information

Important Considerations:
  • It is important to share the checksum of the original file
  • All the split parts of the file must be available before re-assembly.
  • If there is a problem with the upload or download of the split parts, then the re-assembled file might be corrupted and the corrupted parts might need to be re-collected. 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSUWLY","label":"IBM SevOne Network Performance Management"},"ARM Category":[{"code":"a8m3p0000000rgiAAA","label":"Collection"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
24 June 2025

UID

ibm17231959