Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Recover inaccessible instances using QEMU

Use QEMU or IBM SmartCloud Enterprise features to recover images after system disasters

Claudiu Popescu, Subject Matter Expert, IBM
Claudiu Popescu photo
Claudiu Popescu is a Subject Matter Expert specializing in L3 customer support and image development on IBM SmartCloud Enterprise and Enterprise+. His interests are cloud computing, virtualization, programming, and network architecture.
Mihai Criveti, IT Architect, IBM
Mihai Criveti photo
Mihai Criveti is an IT Architect mainly focused on cloud computing and virtualization. His interests are cloud computing, virtualization, enterprise architecture, SOA, middleware, digital forensics, and UNIX systems.

Summary:  Suppose something went wrong and you discover you have an inaccessible Linux® instance on IBM® SmartCloud Enterprise? What can you do? The authors of this article walk you through the steps to recover an inaccessible Linux instance. They show how to capture a private image, copy it to persistent storage, use QEMU to boot the captured image or mount it using kpartx, fix the problem, and then import the image back into the cloud.

Date:  18 Oct 2012
Level:  Advanced
Also available in:   Chinese  Japanese

Activity:  18996 views
Comments:  

Don't panic

It's good to have a contingency plan. This article shows you how to use the import and copy features of IBM SmartCloud Enterprise to perform the equivalent of bare metal recovery on the cloud.


When you might need bare metal recovery on the cloud

The process described is more complex than using a lights-out management tool, but the use cases are somewhat similar. Here are some possible reasons why you might need to use these recovery measures:

  • For a number of possible reasons, you can't access your instance. For example:
    • You misconfigured your firewall and are no longer permitting SSH connections.
    • You misconfigured your SSH service or deleted your private key or authorized_hosts file, and you can't log on to your instance.
    • You deleted the idcuser account and didn't configure SSH login for other accounts.
    • You misconfigured GRUB, runlevels, or startup scripts, and your system is no longer booting or starting networking or SSH services.
    • You need to reset your account passwords or access the environment using recovery media.
    • You need to recover from file system corruption in single user mode.
    • Unknown reasons — you can no longer access your image using SSH —. and you want to investigate and troubleshoot the issue.
    • You want to perform system maintenance such as full backups or repartitioning in single user mode or using recovery media.
  • You might want to upgrade your instance OS to a new major release, which requires booting from the installation media.
  • You might want to install and import your own Linux operating system and want to build the image directly on the cloud.
  • Your instance might have been compromised and you want to copy it to an attached storage so you can perform forensic activities. For this, you have the option of mounting the image file as read-only using tools such as kpartx or using QEMU to boot a forensic live CD. QEMU is a generic and open source machine emulator and virtualizer. When used as a machine emulator, QEMU can run OSes and programs made for one machine on a different machine. Attaching a host-only network device to QEMU also allows you investigate outgoing network connections.

What was it like before we had the Image Import / Copy feature in SmartCloud? In most cases, if your instance was no longer accessible, you had to re-provision your instance. Now, however, thanks to this new SmartCloud feature, you can recover inaccessible instances using a series of steps that go something like this:

  • Capture a private image of your instance.
  • Copy this image to a persistent storage device.
  • Attach the persistent storage to a Linux instance.
  • Boot the OS image directly on the cloud using QEMU.
  • Connect to your exported image using a VNC client (you can interact with your image as early as the GRUB bootloader sequence).
  • Investigate and fix any problems that prevent you from accessing the image using SSH.
  • Test login using an SSH client.
  • Import the image back into the cloud.
  • Provision a new instance based on the recovered image. You need a reserved IP if you want to preserve the system hostname and network settings.

Recovery tools

To follow the steps in this article, use these tools to set up an environment:

  • The SmartCloud Image Import/Copy API
  • Kernal-based Virtual Machine (KVM)
  • libvirt
  • QEMU
  • IBM Rational Asset Manager (REST Client/curl)

Let's look briefly at each of these.

SmartCloud Linux Import/Copy API and tools are used to import your own virtual machines to SmartCloud and copy images between datacenters. To find a detailed description of this feature, including usage instructions, sample schema files, and validation tools, look in the Creating and Customizing Images and Software Bundles guide, part of the SmartCloud documentation library. There's a link to the library in the Support tab of the SmartCloud web user interface. You can also find these assets using the asset catalog search feature. Three features described in this article are:

  • Image Import. Imports external (or internal) images placed on a persistent storage.
  • Image Copy. Copies VirtIO enabled images marked with the Copy Allowed = Y attribute in the Rational Asset Catalog to one of your ext3 formatted persistent storage devices.
  • Clone Persistent Storage. Clones a persistent storage volume to another datacenter. This can be used to move your copied image to another datacenter where you can import it.

Kernel-based Virtual Machine (KVM) in SmartCloud is a virtualization infrastructure for the Linux kernel. While KVM supports native virtualization on x86 hardware containing virtualization extensions (Intel VT or AMD-V), it does not perform any emulation by itself. One of KVM's strong points is paravirtualization for the network card, disk I/O controller, and VGA graphics interface; this offers a performance gain compared to full virtualization.

Libvirt is a set of tools used for interacting with hypervisors like KVM, Xen and others. Managing the hypervisors is done through libvirt's versatile API or command line using >virsh.

QEMU is a fast generic machine emulator supporting virtualization in combination with KVM.

IBM Rational Asset Manager is used by enterprises to catalog, govern, manage, and maintain reusable software packages as part of a software library.


Getting set up

Let's stop for a moment and talk about the notations and conventions used in this article and some considerations for setting up the environment.

  • Commands executed on the system as root are prefixed with root@host#.
  • Commands executed on the system as root using sudo are prefixed with user@host# sudo.
  • Commands executed on the system as a regular user are prefixed with user@host $.
  • Command output is delimited by a new line from the commands and indented by 1 tab to the right (as in the following code block):

Note: We purposely set up the code blocks in this article to make copy/pasting entire code sections impossible because the code is meant to serve as a guideline and you need to adapt this code to your environment, IDs, scope, etc.

root@host# 1st command - to be run as root
root@host# 2nd command - to be run as root (previous command has no output)

    output from 2nd command

user@host$ 3rd command - to be run as user

    output from 3rd command
	

Use sudo to run the commands that require escalated privileges (or sudo -s). Do not use sudo bash since this preserves the idcuser environment settings. Example: sudo yum install rpmdevtools rpm-devel rpm-build.

Requirements

You must have:

  • A SmartCloud persistent storage volume large enough to accommodate your copied image. You will attach this storage to the Linux instance running QEMU.
  • A SmartCloud 64 bit Red Hat Enterprise Linux (RHEL) or SUSE Linux Enterprise Server (SLES) instance where you will install QEMU, attach the persistent storage, and perform the recovery steps. Call this "R1."
  • The SmartCloud Linux instance or image you are trying to recover. Call this "U1."
  • These documents from the SmartCloud documentation library: Creating and Customizing Images and Software Bundles, Command Line Tool Reference, and REST API Reference. They contain the documentation and tools required to perform image import/copy.

You also need:

  • Some experience working with the SmartCloud command line tools or API.
  • Experience booting Linux instances in single user mode or performing recovery from the installation media. Experience tunneling VNC over Secure Shell (SSH) and working with Linux firewall. These steps are similar to those performed by system administrators on physical or virtual hardware.
  • Experience with QEMU or KVM and experience with building and installing software on Linux.

It's also nice if you have a steady supply of patience and determination. Booting virtual images in QEMU to run on a virtual machine is a slow and error-prone process. Depending on your instance and image size, the import/copy process can also take a while. If you get it wrong the first time, you'll have to do it all over again. You have been warned!


Summary of the recovery steps

You're ready to get started. Follow these steps:

  • Capture to image and allow export in Rational Asset Catalog.
  • Edit the captured image and change CopyAllowed = N to CopyAllowed = Y
  • Copy the image to a storage device. Attach and mount the storage to another Linux instance, R1 instance.
  • Install QEMU 1.1.1 (./configure; make; make install).
  • Tunnel port 5900 through ssh.
  • Boot the image using QEMU. For example: /usr/local/bin/qemu-system-i386 -cdrom /root/IF-YOU-NEED-A-RECOVERY-DISK.iso -boot d -m 1024 -drive file=BWVoPePtRACvZVO3Srl1lQ.img,if=virtio--daemonize
  • Connect to localhost:0 using a VNC client
  • Fix the problem.
  • Umount and detach the volume.
  • Import the image back into the cloud.
  • Provision and test the new image.

You should perform all these steps in the same datacenter.


Recovery steps

Now that you're set up, begin.

Create a private image of your inaccessible instance

You can capture a private image using the WebUI:

  1. Login to the SmartCloud Enterprise web interface.
  2. Browse to Control Panel > Instances, select the instance you need to recover and click Create private image.
  3. Click Submit and wait for the image to be created

Edit the image asset

You can use the web user interface or the Rational Asset Manager REST API. To use the web user interface:

  1. Once your image is captured, go to Control panel > View asset catalog > Home > My Assets.
  2. Search for the image you saved and select it.
  3. Click the pencil icon located on the top right of the page.

    Figure 1. Modify the image asset in Rational Asset Manager
    Modify the image asset in Rational Asset Manager

  4. Under Operating System, click More.

    Figure 2. Expand the More... Image Asset tab in RAM
    Expand the More... Image Asset tab in RAM

  5. Change Copy Allowed: from N to Y and save the changes by clicking Update at the bottom right of the page. Enter a comment describing your changes, then click Update again.

    Figure 3. Update the Image Asset tab in Rational Asset Manager
    Update the Image Asset tab in Rational Asset Manager

Export the image to a storage and mount it

  1. Export the image to a storage device using the command line tool or REST API. (The storage device needs to be formatted using ext3). Using the SmartCloud command line tools:
    user@host$ ic-copy-to.sh -u|--username username -w|--passphrase passphrase 
    -g|--file-path filepath -v|--volume-id volumeID -I|--image-id imageID

  2. Verify that your image was copied to storage. Using the SmartCloud command line tools:
    user@host$ ic-describe-volume.sh -u $OWNER -g cmihai.key -v $STORAGEID -w pass \
    | grep State
    
        State: UNMOUNTED

  3. Attach the storage to R1 instance. This example uses the Dynamic Disk Attach feature to attach a disk to an already provisioned instance. Using the SmartCloud command line tools:
    user@host$ ic-attach-volume.sh -u|--username username -w|--passphrase passphrase
    -g|--file-path filepath -l|--instance-id instanceId -v|--volume-id volumeId

    This looks like:

    user@host$ ic-attach-volume.sh -g <your.key> -u <your.email@ibm.com> -w \
    <your.passphrase> -v <your.volume.id> -l <your.instance.id>
    
        Executing action: AttachVolume  ...
        The request has been submitted successfully.
        Executing AttachVolume  finished
        
    user@host$ ic-describe-volume.sh -g <your.key> -u <your.email@ibm.com> \
    -w <your.passphrase> -v <your.volume.id> \
    | grep State
    
        State: MOUNTED

    Note: If your instances don't support dynamic disk attach, you can also provision a new Linux instance and attach the storage during instance creation, for example, using the WebUI wizard.

  4. Mount the attached storage on R1 instance. First, identify the newly attached disk on instance R1. You can do this by checking the dmesg log. A new disk should be attached using VirtIO, in this example, vde1.
    user@host# sudo dmesg | tail
    
        virtio-pci 0000:00:07.0: using default PCI settings
        virtio-pci 0000:00:08.0: no hotplug settings from platform
        virtio-pci 0000:00:08.0: using default PCI settings
        pci 0000:00:0a.0: no hotplug settings from platform
        pci 0000:00:0a.0: using default PCI settings
        virtio-pci 0000:00:0a.0: enabling device (0000 -> 0003)
        virtio-pci 0000:00:0a.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high)
        virtio-pci 0000:00:0a.0: irq 35 for MSI/MSI-X
        virtio-pci 0000:00:0a.0: irq 36 for MSI/MSI-X
        vde: vde1

    You can also list the partition table using fdisk:

    user@host$ sudo /sbin/sudo fdisk -l

    Now that you've identified it, go ahead and use the mount command:

    user@host$ sudo mkdir /mnt/storage                            
    user@host$ sudo mount /dev/vde1 /mnt/storage

    Verify the content of /mnt/storage:

    user@host$  sudo find /mnt/storage/ -type f -exec \
    sh -c 'du -h {}; printf "\t"; file -b {};' \;
    
        3.6G    /mnt/storage/image/gqlhq2tfTsmBUqFcld4r1Q.img
            x86 boot sector; GRand Unified Bootloader, stage1 version 0x3, 
            boot drive 0x80, 1st sector stage2 0xe0800, GRUB version 0.94; 
            partition 1: ID=0x83, active, starthead 32, startsector 2048, 
            2048000 sectors; partition 2: ID=0x83, starthead 155,
            startsector 2050048, 7426866 sectors, code offset 0x48
        8.0K    /mnt/storage/image/gqlhq2tfTsmBUqFcld4r1Q.ovf
            XML  document text
        28K     /mnt/storage/image/Terms.zip
            Zip archive data, at least v2.0 to extract
        4.0K    /mnt/storage/image/leZ8NeYxSti8pXxnzd3_Tg/vhost0133_DataDisk1.img
            data
        8.0K    /mnt/storage/image/BSS.zip
            Zip archive data, at least v2.0 to extract
        4.0K    /mnt/storage/image/S40E4IiKR923NAsiZTyf0A/vhost0133_DataDisk2.img
            data
        104K    /mnt/storage/image/RAM.zip
            Zip archive data, at least v2.0 to extract
    

    Your disk names and number will be different, depending on the kind of instance you've captured. Also, your disk size will differ, as the disk was shrunk to include only data on capture.

Table 1 shows the files found under storage/image.


Table 1. Files found under storage/image:
File PATHDescription
$UID.ovfOVF container
$UID.imgRAW disk image
$UID/$HOSTNAME_DataDisk$ID.imgAdditional disks.
BSS.zipZipped BSS.xml
RAM.zipZipped Rational Asset Manager artifacts.
Terms.zipZipped terms and conditions document.

Resize the root partition

You must resize the root partition only if you plan to install new packages after booting with QEMU or inside chroot. After you export your image, the free space on its root partition will be around 10MB. You have to resize the partition(s) if you need to install packages or copy files on your image.

Resizing the root partition on RHEL

Using the guestfs tools:

  1. Install libguestfs-tools.
    yum install libguestfs-tools

  2. Duplicate the old image file to a new one and increase its size.
    root@host# virt-filesystems -l -h --all -a <your.image.img>
    
        Name       Type        VFS   Label  MBR  Size  Parent
        /dev/sda1  filesystem  ext3  -      -    94M   -
        /dev/sda2  filesystem  ext3  /      -    4.2G  -
        /dev/sda1  partition   -     -      83   94M   /dev/sda
        /dev/sda2  partition   -     -      83   4.2G  /dev/sda
        /dev/sda   device      -     -      -    4.3G  -
                            
    root@host# truncate -r <your.image.img> <your.new.image.img>
    root@host# truncate -s +5G newdisk

    The new image file reflects the new size. In this example it has 9.3G.

  3. Now that the image file is 5G bigger, expand the root partition.
    root@host# virt-resize --expand /dev/sda2 olddisk newdisk
                        
        Examining <your.image.img> ...
        100% [##################################################] 00:00
        **********
                        
        Summary of changes:
                        
        /dev/sda1: This partition will be left alone.
                        
        /dev/sda2: This partition will be resized from 4.2G to 9.2G.  The 
        filesystem ext3 on /dev/sda2 will be expanded using the 'resize2fs' 
        method.
                        
        **********
        Setting up initial partition table on <your.new.image.img> ...
        Copying /dev/sda1 ...
        100% [##################################################] 00:00
        Copying /dev/sda2 ...
        100% [##################################################] 00:00
        100% [##################################################] 00:00
    
    

    You need to use the new image file from now on, booting with QEMU or use the alternative with kpartx.

Resizing the root partition on SLES

Using the standard Linux tools:

  1. Make sure everything is unmounted and the loop devices are deleted.
  2. Export RAWIMAGE for ease of use.
    root@host# export RAWIMAGE=<your.image.img>

    Note that this is a sparse image.

  3. Get the apparent image size.
    root@host # du -m --apparent-size $RAWIMAGE
                            
        4628    <your.image.img>
                        

  4. This is usually larger then the used disk space.
    root@host# du -m $RAWIMAGE
                                
        3628    <your.image.img>
                            

  5. Resize the disk. Make sure the desired size is greater than your apparent image size. Extend the image as a sparse file using dd.
    root@host# export DESIREDSIZE="9000"
    root@host# dd if=/dev/zero of=$RAWIMAGE bs=1M count=0 seek=$DESIREDSIZE

  6. Make sure your image file was expanded to the new size.
    root@host# du -m --apparent-size $RAWIMAGE
                        
        9000    <your.image.img>
    
    

  7. Now you can create the loop devices.
    root@host# losetup /dev/loop0 $RAWIMAGE
    root@host# fdisk -l /dev/loop0
    root@host# kpartx -av /dev/loop0

  8. Run a filesystem check.
    >root@host# e2fsck -fy /dev/mapper/loop0p1

  9. Delete your old partition and create a larger partition in its place to resize your partition. Mistakes here can result in data loss. Please be aware of your partitioning layout and adapt these commands to match your image.

    Note: In the code section below, user input is marked in bold.

    root@host# fdisk /dev/loop0
        Command (m for help): p
                        
        Disk /dev/loop0: 9437 MB, 9437184000 bytes
        255 heads, 63 sectors/track, 1147 cylinders
        Units = cylinders of 16065 * 512 = 8225280 bytes
        Sector size (logical/physical): 512 bytes / 512 bytes
        I/O size (minimum/optimal): 512 bytes / 512 bytes
        Disk identifier: 0x0003ee5f
                        
        Device Boot      Start         End      Blocks   Id  System
        /dev/loop0p1   *           1         128     1024000   83  Linux
        Partition 1 does not end on cylinder boundary.
        /dev/loop0p2             128         590     3713433   83  Linux
        Partition 2 does not end on cylinder boundary.
                        
        Command (m for help): d
                     
        Partition number (1-4): 2
                        
        Command (m for help): n
        Command action
        e   extended
        p   primary partition (1-4)
        p
        Partition number (1-4): 2
        First cylinder (128-1147, default 128):
        Using default value 128
        Last cylinder, +cylinders or +size{K,M,G} (128-1147, default 1147):
        Using default value 1147
                        
        Command (m for help): p
                        
        Disk /dev/loop0: 9437 MB, 9437184000 bytes
        255 heads, 63 sectors/track, 1147 cylinders
        Units = cylinders of 16065 * 512 = 8225280 bytes
        Sector size (logical/physical): 512 bytes / 512 bytes
        I/O size (minimum/optimal): 512 bytes / 512 bytes
        Disk identifier: 0x0003ee5f
                        
        Device Boot      Start         End      Blocks   Id  System
        /dev/loop0p1   *           1         128     1024000   83  Linux
        Partition 1 does not end on cylinder boundary.
        /dev/loop0p2             128        1147     8188253+  83  Linux
        
        Command (m for help): w
        The partition table has been altered!
                        
        Calling ioctl() to re-read partition table.
                        
        WARNING: Re-reading the partition table failed with error 22: Invalid argument.
        The kernel still uses the old table. The new table will be used at
        the next reboot or after you run partprobe(8) or kpartx(8)
        Syncing disks.

  10. Re-read the partition table.
    root@host# kpartx -dv /dev/loop0
    root@host# losetup -d /dev/loop0
    root@host# losetup /dev/loop0 $RAWIMAGE
    root@host# fdisk -l /dev/loop0
    root@host# kpartx -av /dev/loop0

  11. Resize your partition.
    root@host#resize2fs /dev/mapper/loop0p2
        resize2fs 1.41.12 (17-May-2010)
        Resizing the filesystem on /dev/mapper/loop0p2 to 2047063 (4k) blocks.
        The filesystem on /dev/mapper/loop0p2 is now 2047063 blocks long.

  12. Run another filesystem check.
    root@host# e2fsck -fy /dev/mapper/loop0p2

Install QEMU

Login to the instance where you have attached and mounted the storage and install QEMU.

Installing QEMU on RHEL

  1. Build QEMU RPM and SRC RPM. First, you need the sources.
    user@host$ wget http://wiki.qemu.org/download/qemu-1.1.1.tar.bz2

  2. To build the RPM, you need a spec file.
    user@host$ cat > qemu-1.1.1.spec <<'EOF'
    Name            : qemu
    Version         : 1.1.1
    Release         : 1
    License         : LGPL, GPL
    Summary         : A generic and open source processor emulator.
    Group           : Applications/Emulators
    URL             : http://fabrice.bellard.free.fr/qemu/
    Vendor          : Fabrice Bellard >fabrice@bellard.org<
    Packager        : Your Name >recipient@domain.tld<
    BuildRoot       : %{_tmppath}/%{name}-buildroot
    Source0         : %{name}-%{version}.tar.gz        
    Requires        : gcc
    BuildRequires   : gcc
    AutoReq         : no
    AutoProv        : no
            
    %description
    QEMU emulator.
            
    %prep
    rm -rf %{buildroot}
            
    %setup -q
            
    %build
    ./configure --prefix=/usr
    make
            
    %install
    mkdir -p %{buildroot}/usr/lib/debug
    make install prefix=%{buildroot}/usr bindir=%{buildroot}/usr/bin \
    datadir=%{buildroot}/usr/share/qemu docdir=%{buildroot}/usr/share/doc/qemu \ 
    mandir=%{buildroot}/usr/share/man
            
    %clean
    rm -rf %{buildroot}
            
    %files
    %defattr(-,root,root)
    %doc Changelog COPYING COPYING.LIB LICENSE README README.distrib TODO VERSION
    %{_bindir}/qemu*
    %{_docdir}/qemu
    %{_mandir}/man1/qemu*
    %{_datadir}/qemu
    * Thu Aug 02 2012 Your Name >recipient@domain.tld< 1.0-1
    - qemu 1.1.1
            
    EOF

  3. Build binary and source RPMs.

    On RHEL use yum to install the RPM build tools.

    root@host# yum install rpmdevtools rpm-devel rpm-build
    		    

    On SLES, use zypper to install the RPM build tools.

    root@host# zypper install rpmdevtools rpm-devel rpm-build

  4. Build the RPMs. You can do so as a regular user (recommended) or as root.
    root@host# cp qemu-1.1.1.spec /usr/src/packages/SPEC
    root@host# cp qemu-1.1.1.tar.bz2 /usr/src/packages/SOURCES/qemu-1.1.1.tar.bz2
    root@host# rpmbuild -ba /usr/src/packages/SPEC/qemu-1.1.1.spec

Installing QEMU on SLES

  1. Find out what version of glib2 is installed.
    user@host$ sudo zypper info glib2 | grep Version
    
        Version: 2.22.5-0.4.3
    
    

  2. Now that you know the stock glib2 version, you need to get and build the same version. Download and unpack.
    user@host$ wget http://wiki.qemu.org/download/qemu-1.1.1.tar.bz2
    user@host$ wget http://zlib.net/zlib-1.2.7.tar.gz  
    user@host$ wget http://ftp.acc.umu.se/pub/GNOME/sources/glib/2.22/glib-2.22.5.tar.bz2
    user@host$ wget ftp://sourceware.org/pub/libffi/libffi-3.0.10.tar.gz
    user@host$ tar zxf qemu-1.1.1.tar.bz2
    user@host$ tar zxf zlib-1.2.7.tar.gz
    user@host$ tar jxf glib-2.22.5.tar.bz2
    user@host$ tar zxf libffi-3.0.10.tar.gz

  3. Build libffi.
    user@host$ cd libffi-3.0.10/
    user@host$ ./configure && make && sudo make install && cd..

  4. Build zlib.
    user@host$ cd zlib-1.2.7/
    user@host$ ./configure && make && sudo make install && cd..

  5. Build glib2.
    user@host$ cd glib-2.22.5
    user@host$ ./configure && make && sudo make install && cd..

  6. Build QEMU.
    user@host$ cd qemu-1.1.1/
    user@host$ ./configure && make && sudo make install && cd..

Boot the image

Before you execute QEMU to boot the image, start vncviewer for a new session. You also need to edit GRUB configuration lines to boot the image in single user mode. You have only 5 seconds to stop the image from booting.

  1. First, set up an ssh tunnel for VNC. You can use SSH or PuTTY. PuTTY is an open source telnet and SSH client.

    To forward the VNC port using SSH:

    user@host$ ssh -i /pub_key-g -L 5910:127.0.0.1:5901 idcuser@instance-R1

    To forward the VNC port using PuTTY:

    1. Start PuTTY. For hostname, enter the public IP address of your Linux instance.

      Figure 4. New PuTTY Session
      PuTTY Session

    2. In Category box, click SSH and select the Compression check box.

      Figure 5. Enable SSH Compression in PuTTY
      Enable SSH Compression in PuTTY

    3. Expand SSH and click Auth. Load your encrypted .ppk key by clicking Browse and navigating to the location where your .ppk key is stored:

      Figure 6. PuTTY Auth
      PuTTY Auth

    4. Click Tunnel. Enter the Source port for local use, :5910 and the Destination to be forwarded, 127.0.0.1:5901.

      Figure 7. PuTTY port forwarding
      PuTTY port forwarding

    5. In the Category box, click Session and click Open.
  2. Add an iptables rule on R1 instance to allow VNC through your firewall.
    	user@host$ sudo /sbin/iptables -A INPUT -p tcp --dport 5900:5920 -j ACCEPT
    	user@host$ sudo /sbin/service iptables save

  3. Boot the image.

    For 64 bit:

    root@host$ /usr/local/bin/qemu-system-x86_64 -vnc 127.0.0.1:1 -m 1024 \
    -drive file=<your.image.img>,if=virtio -daemonize

    For 32 bit:

    root@host# /usr/local/bin/qemu-system-i386 -vnc 127.0.0.1:1 -m 1024 \
    -drive file=<your.image.img>,if=virtio -daemonize

Connect using VNC from your local workstation

Now you're ready to connect.

On UNIX, run vncviewer.

user@host$ vncviewer 127.0.0.1:5910

On Windows, you can use any of the free VNC Viewer programs, such as RealVNC.

On RHEL

You should now be able to see the image booting.

  1. Click inside the VNC window, press the space bar to stop the booting process, and then type e to edit.

    Figure 8. RHEL 5 - GRUB boot menu
    RHEL 5 grub boot menu

  2. Select the kernel line and type e.

    Figure 9. RHEL 5 - GRUB - kernel line
    RHEL 5 grub kernel select

  3. Remove the serial console lines. Add Single to the kernel line.

    Figure 10. RHEL 5 - GRUB - kernel edit
    RHEL 5 - GRUB - kernel edit

  4. Press the Enter key and then type b to boot. For example, edit the kernel line to boot image in single user mode.
    kernel /vmlinuz-2.6.32-131.0.15.el6.x86_64 ro \
    root=UUID=252e009f-8fc9-455e-9e65-400e5baa413d \
    rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM \
    LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc \
    KEYTABLE=us crashkernel=auto rhgb quiet \
    console=ttyS1,9600 console=ttyUSB0,115200n8 elevator=deadline Single

    Note: Remove the console line.

See Resources for a link to the Red Hat Enteprise documentation for details about how to recover a RHEL instance.

Using the SLES GRUB boot menu

  1. Boot the image.

    Figure 11. SLES - GRUB edit - init=/bin/bash
    SLES GRUB edit single

    For more details on booting a SLES instance in recovery mode, see the official documentation.

  2. After booting the image using init=/bin/bash, set the password for idcuser.
    user@host$ passwd idcuser

  3. Start machine in runlevel 3.
    user@host$ init 3

  4. Login with idcuser and enable sshd service.
    user@host$ sudo /sbin/service sshd start
    user@host$ sudo /sbin/chkconfig sshd on

  5. Shutdown the image booted with QEMU.

Import the image from storage/volume

You have almost completed the recovery process.

  1. After the booted image in QEMU is shutdown, copy the FileValidation.sh script (found in the Rational Asset Catalog) to the image folder.
  2. Validate all the files inside the image folder. If validation fails, create a new manifest.mf file with FileValidation.sh script.
  3. After the .mf file is created, detach the storage from R1 instance.
  4. Import U1 image from the storage:
    user@host$ ic-import-image.sh u [username] -g [auth file full path] -w [passphrase] 
        -v [volume id] -n [name of image]

  5. If the import is successful, a new private image is created in SCE > Control panel > Images tab.
  6. Create a child instance from this private image and try to access it. If the child instance is successfully provisioned, then you should be able to access it through ssh.

Mounting the image file using kpartx instead of using QEMU

In some cases, such as performing forensics or simply copying a private key to your authorized_keys file, simply mounting the exported image "loopback" is sufficient. This section describes how.

The initial copy steps

The initial image copy steps are similar to those for the QEMU scenario:

  1. Capture a private image of the instance you are trying to recover.
  2. Create an ext3 formatted persistent storage large enough to accommodate your image in the same datacenter.
  3. Set the Copy Allowed attribute to Y in the Rational Asset Catalog entry corresponding with your captured image.
  4. Obtain the Image ID of the captured image and the Storage Volume ID of the storage you've created from the WebUI, API or CMD tools.
  5. Use the ic-copy-to.sh command or API to copy your image to storage
  6. Attach this storage to an existing Linux instance, or create a new Linux instance and attach this storage to it.

Mount the exported image

  1. Access the Linux instance where your storage is mounted via SSH and mount the image.
    root@host# losetup /dev/loop0 /mnt/storage/image/<your.image.img>
    root@host# fdisk -l /dev/loop0
    
        Disk /dev/loop0: 4852 MB, 4852179968 bytes
        255 heads, 63 sectors/track, 589 cylinders
        Units = cylinders of 16065 * 512 = 8225280 bytes
        Sector size (logical/physical): 512 bytes / 512 bytes
        I/O size (minimum/optimal): 512 bytes / 512 bytes
        Disk identifier: 0x0003ee5f
        
        Device Boot      Start         End      Blocks   Id  System
        /dev/loop0p1   *           1         128     1024000   83  Linux
        Partition 1 does not end on cylinder boundary.
        /dev/loop0p2             128         590     3713433   83  Linux
        Partition 2 does not end on cylinder boundary.
    
    root@host# kpartx -av /dev/loop0
    
        add map loop0p1 (253:0): 0 2048000 linear /dev/loop0 2048
        add map loop0p2 (253:1): 0 7426866 linear /dev/loop0 2050048
    
    root@host# mkdir /mnt/myimage
    root@host# mount /dev/mapper/loop0p2 /mnt/myimage
    root@host# mount /dev/mapper/loop0p1 /mnt/myimage/boot
              

    You can list the partition table of <your.image.img> with fdisk -l /dev/loop0

  2. You have mounted your image.

    You may need to mount additional partitions (such as the ones you have on other disks). Adapt the steps above to your partitioning scheme to do so. Remember that there are extra steps for mounting partitions part of LVM (pvs, lvdisplay, vgscan, etc.). You can re-use your existing mountpoints (prefixed with the host OS mountpoint) if you already have your root partition mounted.

    You can use the chroot command to use your image's tools and binaries:

    1. Mount /proc /sys /dev and /dev/pts in your image.
      root@host# mount -o bind /proc /mnt/myimage/proc
      root@host# mount -o bind /sys /mnt/myimage/sys 
      root@host# mount -o bind /dev /mnt/myimage/dev
      root@host# mount -o bind /dev/pts /mnt/myimage/dev/pts

    2. Now you are ready to use the chroot command.
      root@host# chroot /mnt/myimage

    3. Check the OS and kernel versions.
      user@host$ cat /etc/issue
      user@host$ uname -a

    4. Fix whatever issues you may have, then exit your shell.
      exit

      You have access to all your OS tools, including yum, zypper, etc. Remember to resize your partition prior to mounting if required.

    Note that you might need to mount additional partitions as well. You can proceed to fix your image. Note that you can also use the chroot command to use your image's tools and binaries: chroot /mnt/myimage/ /bin/bash.

  3. When you are done, unmount the image. Remember to exit chroot if required. If you've mounted special devices for your chroot, unmount those first.
    root@host# umount /mnt/myimage/proc/
    root@host# umount /mnt/myimage/dev/pts
    root@host# umount /mnt/myimage/dev
    root@host# umount /mnt/myimage/sys

  4. Unmount other partitions using your mount points first. For example:
    root@host# umount /mnt/myimage/boot/

  5. Unmount your root image.
    root@host# umount /mnt/myimage/

  6. Delete partition mappings and the loop device.
    root@host# kpartx -dv /dev/loop0
        del devmap : loop0p2
        del devmap : loop0p1
    
    root@host# losetup -d /dev/loop0

  7. Re-create the .mf file containing the SHA1SUM codes, by running the FileValidationTool.sh. See the Creating and customizing images guide in the directory /mnt/myimage/image.
  8. Detach the persistent storage volume and import your fixed image back in the cloud.

In conclusion

In this article, we showed you how to recover an inaccessible Linux instance by capturing a private image, copying it to persistent storage, using QEMU to boot a Linux image attached to your persistent storage device, fixing the problem, and then importing the image back into the cloud. Plus, we showed an alternative -- mounting the image file using kpartx. These processes can help you get back on track, but they do require time, patience, and basic Linux knowledge.


Appendix A: Storage state code and meaning

This table lists storage state codes and meanings.

Storage State Code Meaning Storage State Code Meaning Storage State Code Meaning
0 New 5 Attached 10 Attaching
1 Creating 6 Failed 11 Detaching
2 Deleting 7 Deletion pending 12 Copying
3 Deleted 8 Being cloned 13 Importing
4 Detached 9 Cloning 14 Transfer Retrying


Appendix B: Rational Asset Manager REST API

Rational Asset Manager provides REST APIs that enable you to perform transactional operations on assets and artifacts.


Figure 12. Rational Asset Manager Rest Model - Search
Rational Asset Manager Search API

Appendix C: Using the Rational Asset Manager REST API

Here be dragons! Feel free to turn back and use the WebUI instead. It's easy to make mistakes, so please take heed..

  1. Set the USER, PASSWORD, OWNER, and BASE_URL for ease of use.

    Skip this step if you defined this variable described in the section, Export the image to a storage and mount it.

    user@host$ export OWNER=<your.ibm.id@domain.tld>
    user@host$ read PASSWORD
    user@host$ export LOGIN="$OWNER:$PASSWORD"
    user@host$ export BASE_URL=\
    https://www-147.ibm.com/computecloud/enterprise/api/rest/20100331/
                        

  2. You need to find the ID of the saved image. Note: We had to install a newer version of libxml2, as described above, to use the --xpath function in xmllint. As a best practice, you might want to build packages as a regular user instead.
    root@host# yum install tidy rpm-build rpmdevtools xz-devel python-devel
    root@host# rpm -ivh ftp://xmlsoft.org/libxml2/libxml2-2.8.0-1.src.rpm
    root@host# cd ~/rpmbuild/SPECS/
    root@host# rpmbuild -ba libxml2.spec
    root@host# rpm -Uvh /root/rpmbuild/RPMS/x86_64/libxml*
                        

    We used an XPath expression to print only my PRIVATE images.

    user@host$ curl -s -k -H 'Accept: application/xml' \
    -u $LOGIN $BASE_URL/offerings/image \
    |xmllint --xpath \
    "//Image[Visibility=\"PRIVATE\" and Owner=\"$OWNER\"]/ID \
    | //Image[Visibility=\"PRIVATE\" and Owner=\"$OWNER\"]/Name" - \
    | tidy -xml -q
                            
    <ID>20059014</ID>
    <Name>RHEL-58-x32-COPPER-IMG1</Name>
    <ID>20059016</ID>
    <Name>RHEL-62-BRONZE</Name>
    <ID>20059017</ID>
    <Name>SLES-11SP1-X64 7/23/12 2:38 AM</Name>
                        

  3. Set IMAGEID variable.
    user@host$ export IMAGEID=<your.image.id>

  4. Get the unique Image ID and replace { and }.
    user@host$ export GUID=$(curl -s -u $LOGIN \
    $RAM_BASE_URL'/search?q=({owner:("'$OWNER'")id\:("'$IMAGEID'")})' \
    | xml_grep -t //GUID)
                            
    user@host$ export RAMGUID=$(echo $GUID \
    | sed -e 's/{/%7B/g' | sed -e 's/}/%7D/g')

    Note:You can use standard Linux tools like grep, awk, sed, etc. if you do not have xml_grep or any XPath tools available. For example:

    RAMGUID=%7B`curl -s -u $LOGIN \
    $RAM_BASE_URL'/search?q=({owner:("'$OWNER'")id\:("'$IMAGEID'")})' \
    | grep '/GUID' | awk -F{ '{print $2}' \
    | awk -F} '{print $1}'`%7D

  5. Save the XML for editing.
    curl -s -u $LOGIN "$RAM_BASE_URL/assets/$RAMGUID/1.0" > $IMAGEID.xml

  6. Edit the XML using your favorite text editor: vim $IMAGEID.xml. Look for the //ram:attributeValue//dce:title element with the text "Copy Allowed". Change the following value element to Y.
    <ram:attributeValue>
    <attribute rel="related" href="internal/attributes/classif/assetTypesSchema.xmi
    %23copy_allowed.xml">
    <dce:title>Copy Allowed</dce:title>
    </attribute>
    <value>Y</value>
    </ram:attributeValue>

  7. Now you are ready to update the asset.
    curl --header "Content-Type:application/xml" -X PUT -u $LOGIN -d @$IMAGEID.xml \
    "$RAM_BASE_URL/assets/$RAMGUID/1.0"

    Note: If you want to update the asset again, you need to download a new version of the asset XML as described in Step 5.


Appendix D: Using the SmartCloud REST API

You can use the SmartCloud REST API with the Linux curl tool to perform the same actions as you would perform using the WebUI or command line tools.

SmartCloud REST API: Copy an image to storage (ic-copy-to)

You can perform the same commands as the ic-copy-to.sh command line tool using the REST API. On Linux, you can use the curl command. An example is detailed below. Note that you need to replace the USER, PASSWORD, IMAGE and STORAGE values with your own.

This exported variable is used multiple times throughout this document, therefore you should use only the terminal where you exported this variables for API calls.

user@host$ export LOGIN=<your.ibm.id@domain.tld:your.password>
user@host$ export BASE_URL=https://www-147.ibm.com/computecloud/enterprise/api/rest\
/20100331
user@host$ export IMAGE=<your.image.id>
user@host$ export STORAGE=<your.storage.id>
user@host$ curl -s -u $LOGIN -X PUT -d "imageId=$IMAGE" --location-trusted \
$BASE_URL/storage/$STORAGE

If you did not set the Copy Allowed flag in RAM, you receive the following error: Error 501: CopyTo feature is not allowed for this image

SmartCloud REST API: Verify that your image was copied to storage (ic-describe-volume)

You can perform the same commands as the ic-describe-volume.sh command line tool using the REST API. On Linux, you can use the curl command. An example is attached below. Note that you need to replace the USER, PASSWORD, IMAGE and STORAGE values with your own.

user@host$  curl -s -k -H 'Accept: application/xml' \
-u $LOGIN $BASE_URL/storage/$STORAGEID \
| tidy -xml -q | grep "State"

    <State>12</State>

Refer to the "Storage state code and meaning" section of the API User's Guide. 12 means: Copying. Wait for <State>4</State> which means "Detached".

SmartCloud REST API: Dynamic Disk Attach (ic-attach-volume and ic-describe-volume)

First, you need to define the following variables for ease of use:

user@host$ export INSTANCEID=<your.instance.id>
user@host$ export STORAGEID=<your.storage.id>
user@host$ export DATA="type=attach&storageID=${STORAGEID}"

Attach your disk:

user@host$ curl -s -u $LOGIN -X PUT -d "$DATA" --location-trusted \
$BASE_URL/instances/${INSTANCEID}

Check the status:

user@host$ curl -s -k -H 'Accept: application/xml' \
-u $LOGIN $BASE_URL/storage/$STORAGEID \
| tidy -xml -i -q | grep State

    <State>10</State>

In this example, you can see it's in state 10 - Attaching. If it becomes 5, then it's attached. If it becomes 4, then it's detached and attaching failed.


Resources

Learn

Get products and technologies

  • You can use QEMU, an open source processor emulator, to boot SmartCloud images directly on cloud instances.

  • Download PuTTY, an open source SSH and telnet client.

  • You can use RealVNC to connect to and control a computer remotely.

  • Libvirt is a set of tools for interacting with hypervisors like KVM, Xen, and others.

  • TightVNC - a free remote control software package can be used to connect to the QEMU VNC display.

Discuss

About the authors

Claudiu Popescu photo

Claudiu Popescu is a Subject Matter Expert specializing in L3 customer support and image development on IBM SmartCloud Enterprise and Enterprise+. His interests are cloud computing, virtualization, programming, and network architecture.

Mihai Criveti photo

Mihai Criveti is an IT Architect mainly focused on cloud computing and virtualization. His interests are cloud computing, virtualization, enterprise architecture, SOA, middleware, digital forensics, and UNIX systems.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing, Linux
ArticleID=841268
ArticleTitle=Recover inaccessible instances using QEMU
publish-date=10182012