IBM Support

Fileplace and 'dd' data collection procedures for data corruption analysis

Question & Answer


Question

What information should be collected to help diagnose data corruption within a filesystem or a raw logical volume?

Cause

This document shows how to gather fileplace output and dd samples of
corruption for later analysis. It contains two procedures; one for use
when the corruption is in a "raw" LV, and the other if the corruption
is in a file.

Answer

If Corruption resides in a "raw" LV

This is the procedure to use if the corruption resides in a "raw" LV.

This example assumes that the database vendor has determined that the
corruption begins at byte offset 2MB into LV "snaplv" and spans for a
length of 32KB.


First collect a fileplace output for the entire LV. Save it in a file
for later analysis.

# fileplace -m snaplv > fileplace.out


Then calculate the starting page number of the corruption and the
length of the corruption in pages (pages are always 4K):

Starting page = byte offset / page size

# bc # Starting page: ( 2 * MB) / 4096 = 512
( 2 * 2^20) / 4096
512

Num pages = length in bytes / Frag Size # round up

# bc # Num pages: (32 * KB) / 4096 = 8
scale=3
(32 * 2^10) / 4096
8.000


With the starting page and length in pages, run fileplace to find the
physical location of the corruption. (I omitted the Logical Fragment
info from the fileplace output so it will fit better.) Save this
fileplace output for later analysis.

# fileplace -o 512 -n 8 -m snaplv > fileplace.out2

# cat fileplace.out2

Device: /dev/snaplv Partition Size: 32 MB Block Size = 4096
Number of Partitions: 160 Number of Copies: 1

Physical Addresses (mirror copy 1)
----------------------------------
2155552-2155559 hdisk0 8 blocks 32768 Bytes, 0.0%
^^^^^^^ ^^^^^^ ^



In this example, the corruption resides in a single, contiguous area on
hdisk0. We will only need a single 'dd' sample to get the entire area.
If the LV is mirrored, there will be multiple sections of fileplace
output, one for each mirror. Be sure to sample from all the locations
listed. For the 'dd' samples from disk:

if - disk name comes from second fileplace (mirror copy X)
bs - page size, 4096
count - comes from second fileplace (xx frags)
skip - comes from second fileplace (Physical Addresses)

# dd if=/dev/rhdisk0 of=hdisk.sample1 bs=4096 count=8 skip=2155552
8+0 records in.
8+0 records out.


We also want to sample the data from the LV itself. The VG must be
online at the time. For the 'dd' samples from the LV:

if - LV name provided by database folks
bs - page size, 4096
count - Starting page number (from calculations above)
skip - Num pages (from calculations above)

# dd if=/dev/snaplv of=lv.sample bs=4096 count=8 skip=512
8+0 records in.
8+0 records out.


Collect all of the fileplace output and dd samples for IBM support.
Be sure to note exactly which 'dd' commands were used to collect all
the different samples. You may want to run 'script' to capture all
the commands and output.



If Corruption resides in a file

This is the procedure to use if the corruption resides in a file.

This example assumes that the database vendor has determined that the
corruption begins at byte offset 2MB into file /var/adm/wtmp and spans
for a length of 32KB.


First run fileplace against the file to determine the filesystem
fragment size. Save the fileplace output for later analysis.

# fileplace -p /var/adm/wtmp > fileplace.out

# grep 'Frag Size' fileplace.out
Blk Size: 4096 Frag Size: 512 Nfrags: 10248 Compress: no


With the filesystem Fragment Size, calculate the starting fragment of
the corruption and the length of the corruption in fragments:

Starting frag = byte offset / Frag Size

# bc # Starting frag: ( 2 * MB) / 512 = 4096
( 2 * 2^20) / 512
4096

Num frags = length in bytes / Frag Size # round up

# bc # Num frags: (32 * KB) / 512 = 64
scale=3
(32 * 2^10) / 512
64.000


With the starting fragment and length in fragments, run fileplace again
to find the physical location of the corruption. (I omitted the
Logical Fragment info from the fileplace output so it will fit better.)
Save this fileplace output for later analysis.

# fileplace -p -o 4096 -n 64 /var/adm/wtmp > fileplace.out2

# cat fileplace.out2

File: /var/adm/wtmp Size: 5245560 bytes Vol: /dev/hd9var
Blk Size: 4096 Frag Size: 512 Nfrags: 10248 Compress: no
^^^

Physical Addresses (mirror copy 1)
----------------------------------
16499464-16499511 hdisk0 48 frags 24576 Bytes, 0.5%
16499520-16499535 hdisk0 16 frags 8192 Bytes, 0.2%
^^^^^^^^ ^^^^^^ ^^


In this example, the corruption spans across two non-contiguous areas
on hdisk0. We will need to perform two 'dd' samples to get the entire
area. If the file lives in a mirrored LV, there will be multiple
sections of fileplace output, one for each mirror. Be sure to sample
from all the locations listed. For the 'dd' samples from disk:

if - disk name comes from second fileplace (mirror copy X)
bs - Frag Size from first fileplace
count - comes from second fileplace (xx frags)
skip - comes from second fileplace (Physical Addresses)

# dd if=/dev/rhdisk0 of=hdisk.sample1 bs=512 count=48 skip=16499464
48+0 records in.
48+0 records out.

# dd if=/dev/rhdisk0 of=hdisk.sample2 bs=512 count=16 skip=16499520
16+0 records in.
16+0 records out.


We also want to sample the data from the file itself. Do this while
the filesystem is still mounted. For the 'dd' samples from the file:

if - file name provided by database folks
bs - Frag Size from first fileplace
count - Starting frag number (from calculations above)
skip - Num frags (from calculations above)

# dd if=/var/adm/wtmp of=file.sample bs=512 count=64 skip=4096
64+0 records in.
64+0 records out.


If possible, unmount and remount the filesystem and sample from the
file again:

# umount /filesystem
# mount /filesystem
# dd if=/var/adm/wtmp of=file.sample.2 bs=512 count=64 skip=4096
64+0 records in.
64+0 records out.


Collect all of the fileplace output and dd samples for IBM support.

Be sure to note exactly which 'dd' commands were used to collect all
the different samples. You may want to run 'script' to capture all
the commands and output.

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"File management","Platform":[{"code":"PF002","label":"AIX"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
17 June 2018

UID

isg3T1011158