How To
Summary
Log files can be large in size, ranging from hundreds of Megabytes to multiple Gigabytes.
If there is a restriction to upload files due to size restrictions, we can split large into multiple files of smaller size chunks. Then users can transfer the files by uploading multiple smaller size files instead of one large file. On the receiver end, the user can download the smaller chunk files and reassemble them to the original file.
Objective
Environment
Steps
Step 1: Check the file size of log.
$ du -sh collect-20250620-083723.tar.gz
640M collect-20250620-083723.tar.gz
$ ls -l reassemble-collect.tar.gz
-rw-r--r--. 1 root root 670202803 Jun 20 05:02 collect-20250620-083723.tar.gz
Step 2: Identify the file checksum.
$ sha256sum collect-20250620-083723.tar.gz
9a3a717084a83efe0d8d61fa814dd09da0c6e1ed13be9ee3bd8e1b49d701b9e0 collect-20250620-083723.tar.gz
Step 3: Split the file based on size
We can use the Linux 'split' command to split files by a specific size.
The '-b' option will split the file in chunks of bytes.
syntax:
$ split -b <SIZE> <FILENAME> <PREFIX>
- The <SIZE> argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
- The <FILENAME> argument is the path to the large file to be split
- The <PREFIX> argument is a prefix to the name of the Output pieces of FILE to PREFIXaa, PREFIXab, etc. If no prefix is provided, then the default PREFIX is 'x'
Example: split file collect-20250620-083723.tar.gz by size of 100M chunks with the output files prefixed with 'collect-20250620-083723.tar.gz-part'
$ split -b 100M collect-20250620-083723.tar.gz collect-20250620-083723.tar.gz-part
Step 4 : List of output files after split
$ ls -l collect-20250620-083723.tar.gz-part*
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partaa
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partab
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partac
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partad
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partae
-rw-r--r--. 1 root root 104857600 Jun 20 08:59 collect-20250620-083723.tar.gz-partaf
-rw-r--r--. 1 root root 41057203 Jun 20 08:59 collect-20250620-083723.tar.gz-partag
Notice that the original large file size is 640M. The split command splits the file into 6 parts of 100M and the last part is the remaining 40M. The split files are organized in the naming sqeuence with the file name <PREFIX> followed by aa, ab, ac, and so and so forth.
$ sha256sum collect-20250620-083723.tar.gz-part*
9b0dc4a83738a7fc5941281822521bc7952f2db9097e66aa7d0dd85097d35a01 collect-20250620-083723.tar.gz-partaa
e2aa6e79092bb479524fd957f7682cceb98dc31a100ab71f014e430bfa7ffa09 collect-20250620-083723.tar.gz-partab
002c9cadeecdd86351dd720bff92bfa8f485548dfea5bc5de56e82b3cad1cb21 collect-20250620-083723.tar.gz-partac
779dba2cbe9fefe5eb7590f74698c5e3f8f2860330fccf1a8149e0d2b451b71c collect-20250620-083723.tar.gz-partad
5d20c4b4cf16bd0f3d67d08fa88fda0c145a5b01be34f33b1c57aa1cef07202d collect-20250620-083723.tar.gz-partae
66dd1a9a9aa4b2732b14e2e649eacf519e8b7aeae3b70460038be2a83e187a0a collect-20250620-083723.tar.gz-partaf
a9ba56810cf462cd73f7f935e5e8ff972820908bf8751e58deb1ed9d90d5e241 collect-20250620-083723.tar.gz-partag
Step 6: Upload or transfer the files via sftp, https, or any supported method (Choose the preferred method based on the type of case handling)
Enhanced Customer Data Repository (ECuRep) - Send dataYou need all the parts of the split file to reassemble the original file by contactenating the files in the original order. Since the split command will automatically split the files using a sequential order of names, it is not mandatory to specify each and every file to concatenate in the order.
You can concatenate the files by listing each file in the order:
$ cat collect-20250620-083723.tar.gz-partaa collect-20250620-083723.tar.gz-partab collect-20250620-083723.tar.gz-partac collect-20250620-083723.tar.gz-partad collect-20250620-083723.tar.gz-partae collect-20250620-083723.tar.gz-partaf collect-20250620-083723.tar.gz-partag > reassemble-collect-20250620-083723.tar.gz
OR
Specify the file prefix followed by the wildcard character '*' and let the 'cat' command concatenate using the ordering by file name automatically.
$ cat collect-20250620-083723.tar.gz-part* > reassemble-collect-20250620-083723.tar.gz
In this command, the shell sees "collect-20250620-083723.tar.gz-part*" and expands it into a list of all files and directories in the current directory whose names start with "collect-20250620-083723.tar.gz-part".
Step 9: Validate the size and checksum of re-assembled file matches the original file
If all the split part files have matching checksum, then we expect the checksum for the re-assembled file to match too, unless some step to combine them are not done correctly. Ensure the re-assembly of the file step is done correctly.
$ du -sh reassemble-collect-20250620-083723.tar.gz
640M reassemble-collect-20250620-083723.tar.gz
$ ls -l reassemble-collect-20250620-083723.tar.gz
-rw-r--r--. 1 root root 670202803 Jun 20 09:18 reassemble-collect-20250620-083723.tar.gz
$ sha256sum reassemble-collect-20250620-083723.tar.gz
9a3a717084a83efe0d8d61fa814dd09da0c6e1ed13be9ee3bd8e1b49d701b9e0 reassemble-collect-20250620-083723.tar.gz
If all checkums of split files are correct, then the checksum of the re-assembled file must match the checksum of the original file provided. However, if the final checksum does not match the reassembled file from the customer, then it is possible that something is amiss in any of the steps. Please validate each step once again.
Step 10: The file is now available to be reviewed
Additional Information
- It is important to share the checksum of the original file
- All the split parts of the file must be available before re-assembly.
- If there is a problem with the upload or download of the split parts, then the re-assembled file might be corrupted and the corrupted parts might need to be re-collected.
Related Information
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
24 June 2025
UID
ibm17231959