IBM Support

Using SUBSTRING and LOCATE to Split Fields on Various Locations

Troubleshooting


Problem

This document provides information about how to use SUBSTRING and LOCATE to split a field on a character when the character can be in various locations.

Resolving The Problem

Sometimes more than one piece of information can be in a single field. If the data is uniform, a simple SUBSTRING can be used to break out the various parts. However, if the lengths vary, this becomes more difficult.

As an example, assume we have a file called MyLib/MyFile that has one field called FullName. This contains a person's first name, a space, and their last name. Because each name varies, there is no easy way to get just their first or last name out of this field. However, you can use a SELECT statement with SUBSTRING, LOCATE, and LENGTH to break apart each field.

To use SUBSTRING, we must pass in a starting position and a length. Hence, for this example, we must find four pieces of information: the starting position of the first name, the starting position of the last name, the length of the first name, and the length of the last name. The LOCATE function finds the position of a string in a file.

Knowing that our separator is a space, we will be searching for ' ' to break the fields apart. Knowing where the space is, we know that the first name ends one position before the space and the last name starts one position after the space. We also know that the first name starts in position one. Because the first name starts in position one and ends at the location of the space minus one, we know the length of the first name is simply the location of the space minus one.

At this point, we have three of the four pieces of information needed; all that is missing is the length of the last name. We can find the length of the last name by taking the total length of the field and subtracting the size of the first name and the space. Because we are starting at position 1, the length of the first name and space is the same as the position of the space. We know now all four pieces of information needed and simply need to plug them into our SELECT statement.

Starting position of first name: 1
Length of first name: (locate(' ', FullName) - 1)
First name: substring(FullName,1, (locate(' ',FullName) - 1))

Starting position of last name: (locate(' ',FullName) + 1)
Length of last name: (length(FullName) - length(substring(FullName,1,(locate(' ', FullName) - 1))))
Last name: substring(FullName,(locate(' ',FullName) + 1),(length(FullName) - length(substring(FullName,1,(locate(' ', FullName) - 1)))))

The SELECT statement would appear as follows:

SELECT substring(FullName,1, (locate(' ',FullName) - 1)) as FirstName,
substring(FullName,(locate(' ',FullName) + 1),(length(FullName) - length(substring(FullName,1,(locate(' ', FullName) - 1))))) as LastName
from MyLib/MyFile

Notes:
1. The above is based on the search string being a single character. Splitting on more complex strings is possible, but LOCATE only returns the starting position of the search string. This means that the formulas must be adjusted based on the length of the search string. Refer to the SQL Reference for full details on LOCATE.
2. If you have more than two parts in your field, you can still use the method in a modified manner. The LOCATE can take a starting position. Hence, to find the second space, you would use LOCATE(' ',FullName,(LOCATE(' ',FullName) + 1). This finds the position of the next space AFTER the first space. Keep in mind this locate is relative to the starting position. This means that you must add in the original position of the first space to get the true location from the start of the string.

The theory becomes: given "A B C", break off "A " and treat "B C" as a new pair like "A B" above. However, you then must compensate for taking "A " off the front. This is extremely complex, but possible. Refer to the nearest discrete mathematician for full details on discrete set theory.

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000001i4eAAA","label":"IBM i Db2-\u003ESQL Examples \/ DB Examples \/ Misc how to"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions"}]

Historical Number

402735828

Document Information

Modified date:
26 November 2024

UID

nas8N1015175