Virtualization best practices

This page has not been liked. Updated 4/12/13, 3:19 PM by OneSkyWalkerTags: None

This Wiki page is to aid in the sharing of good practices, hints and tips, this Best Practice guide is a constant "work-in-progress" and a community based self-help document. If you need some guidance, take a look; if you have a really good idea that can help others, please add it below:

Virtualization is large subject, so this section will assume you know the basics and you have at least done your homework in reading the two Advanced POWER Virtualization Redbooks.

This guide has four sections

  1. Pre-virtualization
  2. Planning and design
  3. Implementation
  4. Management/Administration

Pre-virtualization

  • Skill-up on Virtualization
    • The AIX Wiki Virtualisation page (above this page) refers you to the best manuals, Redbooks and white papers
    • You need to invest time and practice before starting a Virtualisation implmentation because misunderstandings can cost time and effort to sort out - the old saying "do it right the first time" applies here.
    • There is no quick path
  • Certification
    • Also on the above Wiki page are details of Virtualizatin Certification - I have not gone through this, but it sounds like a good idea to me.
  • Assess the current IT environment
    • Evaluate the current hardware, applications and operating systems to understand the opportunity - this is outside the scope of this Wiki page, but an up-to-date inventory is very useful
  • Assess the virtualization skills on hand
    • It is not recommended to start virtualization with only one trained person due to the obvious risk of that person becoming "unavailable"
    • Some computer sites run with comparatively low-skilled operations staff and bring in consultants or technical specialists for implementation work - in which case, you may need to check the skill levels of those people
  • Determine appropriateness of a virtualization solution
    • If the applications are running under AIX 4.3 (or older) or AIX 5.1, will they run unchanged on AIX 5.3, which is required for true virtualisation? What are the pros and cons of upgrading the applications and the operating system? In some situations, you may feel these legacy system are not worth the effort to upgrade and they will be run to their natural end of life. The counter argument is that removing the support & maintenance, power & cooling, space and manpower costs of older system is an overriding consideration. Also note that if you're running in a virtualized environment, the CPU, memory, network and disk resources required by old applications can be harvested for other uses.
  • Practice makes perfect
    • In the pSeries, the same virtualization features are availble from top to bottom - this makes having a small machine on which to practice, learn and test a realistic proposition. The current bottom of the line p505 is available at relatively low cost, so if you are preparing to run virtualisation on larger machines, you can get experience for a tiny cost.
    • Also, in many sites machines in the production computer room have to undergo strict "lock down" and process management - a small test machine does not have to be run this way, and I have seen system administrators and operations run a "crash and burn" machine under their desk to allow more flexibility.
  • Decide which Virtualisation style you are going to use before starting
Name Example Diagram
Small box OP 710, OP720, or p5-510,520,550 = limited adapter slots

One set of disks or split SCSI disks in two 4 packs=2 LPARs

VIO Server (0.5) + 4 to 6 clients (0.1 to 1 CPU each)

Typically, clients have 4GB vSCSI disk and 1 vEthernet
Missing image
Mid-range with extras Saves buying extra box

2 to 3 larger dedicated CPU and dedicated I/O production LPARs

VIO used for bits and bobs LPAR like test, dev, training, upgrades

Typically, clients have a few 4GB vSCSI disks and 1 or 2 vEthernets
Missing image
Ranch Lots of small server consolidation work from little old machines

10 to 20 clients - small apps but not high demand (0.2, 0.5 up to 2 CPU)

VIO 1 or 2 CPUs probably RAID 5 disks maybe even SAN disks

Typically, clients have a couple of 4GB vSCSI disks and 1 vEthernet
Missing image
Serious production I/O setup only once reduce setup and management

VIO Server has SAN Disks (4+ adapters/paths) and 2 Ether-Channel

VIO has load balancing and fail-over but clients much simpler setup

VIO has 1 to 3 CPU and clients are large 1 to 8 CPUs running large applications

Typically, clients have 100s of GB vSCSI disk and many vEthernets
Missing image
Dual Server as above with 2nd VIO Server for availability/throughput/update Missing image
Key
Dedicated I/O Logical Partition with its own network and disk adapters Missing image
Virtual I/O Server with network and disk adapters it shared to clients Missing image
Virtual I/O Client no network or disk but uses VIO Servers Missing image

When to go Dual Virtual IO Server and when not too?

This is impossible to answer, but here are a few thoughts:

  • The Virtual IO Server is running its own AIX internal code like the LVM and device drivers (virtual device drivers for the clients and real device drivers for the adapters). Some may argue that there is little to go wrong here. Adding a second Virtual IO Server complicates things, so only add a second one if you really need it.
  • Only add a second Virtual IO Server for resilience if you would normally insist on setting up a high-availability environment. Typically, this would be on production machines or partitions. But if you are going to have an HACMP setup (to protect from machine outage, power supply, computer room or even site outage), then why would you need two Virtual IO Servers? If the VIO Server fails, you can do a HACMP fail over the other machine.
  • If this is a less-critical server, say one used for developers, system test and training, then you might decide the simplicity of a single VIO Server is OK, particularly if these partitions have scheduled downtime for updates to the Virtual IO Server. Plan on scheduled maintenance. Also note the VIO Server and VIO Clients start quickly so the downtime is far less then older standalone systems.

Planning and design

  • Size virtualization solutions.
    • See the VIOS Sizing web page for advice
    • With virtualization you have added degrees of freedom
      • The minimum size is smaller than before Micro-partitions:
        Resource Minimum Comment
        CPU 10% of one CPU up to the maximum in the machine
        Memory 256 MB you can go a little smaller but it is not recommended / practical
        Ethernet none shared the Ethernet in the Virtual IO Server
        Disk none part of the Virtual IO Server's disk - 4GB is a sensible minimum
      • You can change the size of an LPAR quickly
      • CPU, in particular, can be borrowed temporarily for peak processing loads which makes sizing less risky
    • In practice you rarely have the information you need
      • If this application is currently running else where evaluate the current configuration then understand the current utilisation (it might have 4 CPUs but only using 1 CPU during the peaks) and from that estimate the Micro-partition size
      • If this application is new you might be able to work from application sizing information from the vendor or compare and contrast with other implementation
      • If not you are going to have to make a guessimate - document your assumptions
      • If you do have lots of precise and useful information the the APV Redbooks include calculations on working out the sizing accurately
    • "Get out of Jail free" card
      • One of the great advantages of virtualization is that you have effective and fast means to make changes to configurations that are painful with stand alone machines.
      • If you under estimate one LPAR and over estimate another the p5 virtualisation will sort this out and rebalance the CPU use in 10 milliseconds :-)
  • Optimize LPAR configuration
    • isolation, bandwidth, availability/redundancy, cost, physical space, flexibility, DLPAR.
    • You need to build an LPAR planning spread sheet
      Resource LPAR1 LPAR2 LPAR3 LPAR4 Total VIO Server
      CPU 3 0.5 2.5 0.1 6.6 1
      Dedicated CPU yes no no no n/a no
      Memory GB 8 2 4.5 0.5 15 1
      Ethernet 2 D 0.1 0.1 0.1 2.3 1
      Disk GB 5000 D 80 125 16 5221 300
    • the "D" above means Dedicated i.e. the Virtual IO Server is not supplying this resource.
    • Don't worry about things like Entitled Capacity, Capped or UnCapped, Weight, Min and Max Virtual CPUs at tis stage
  • Re-configure existing system.
    • If you are migrating existing system you should upgrade them before the move to the Micro-partitions on POWER5 - be aware that you may also need to uprade your firmware before upgrading AIX - check with your AIX Support channel first. The same applies to Linux, of course. Once they are running software levels that can be migrated to the new machine it will be easier to transfer them to the new machine and LPAR. AIX, of course can use the excellent mksysb tool. Linux will have to move via installing a base system and then over writing with a backup.
  • Ensure application compatibility.
    • As you may be running a new or the latest AIX or Linux release a careful check of the support for every applications is vital.
    • A clear documented inventory of all software and unusual hardware (i.e. other than SCSI or SAN disks and Ethernet) - get this signed off by all parties involved.
    • Your prime applications, web servers and web applications and database are obvious bt can be easy to forget items like backup utilities, data migration and load tools and security software.
  • Plan/model for growth
    • Once you have the entire list of Logical Partitions - you can consider growth factors and buying a machine that has the Capacity Upgrade on Demand to allow painless upgrading when you need extra resources in the future.

Do not forget the below standard migration and and consolidation items

  • Develop an implementation plan.
  • Develop a solutions assurance plan.
  • Use utilization management tools (like PLM, uncapped processors, eWLM).
  • Design and document the disk sub systems and connectivity.
  • Design and document the network and connectivity.

Common mistakes

  1. Heavy Clients

    If you have logical partitions (LPAR) that will saturate a network adapter then it needs careful handling in a virtualization environment because it will effect other LPARs sharing the network adapter in the Virtual IO Server. Fortunately, we have lots of options.
    • Give the LPAR its own network adapter so so other LPARs are not effected.
    • Use a dedicated network on the Virtual IO Server just for this LPAR
    • make sure you have Ethernet Channel on the Virtual IO Server to make the bandwidth so large the it can cope with the high demand
  2. Priorities in Emergencies

    Network I/O is very high priority (dropped packets due to neglect require retranmission and are thus painfully slow) compared to disk I/O because the disk adapters will just finish and sit and wait if neglected due to high CPU loads. This means if a Virtual IO server is starved of CPU power, something that should be avoided, but if it happens then the Virtual IO Server will deal with network as a priority. For this reason some people consider splitting the Virtual IO Server into two. One for networks and one for disks, so that disks do not get neglected. This is only a worst case scenario and we should plan and guarantee this starvation does not happen.
  3. Bigger is better but that is always true

    The virtualization removes the driver drivers for real hardware adapters to the Virtual IO Server (they are replaced by drivers for virtual devices that are much simpler). This means the CPU required to run the devices is now much more visible. Many people then run tests to see how much CPU power the Virtual IO Server takes and they find large network packets and large disk blocks are much more efficient. While this is true and if at all possible use large packets and disk transfers then same is true for regular none virtual IO networks and disks. Virtualization may allow more scope for implementing these features - for example large network packets between LPARs over purely internal virtual networks.
  4. Do not Skimp on memory

    Many sizing people use a Rule of thumb for POWER5 CPU and memory for a balanced system
    • 4 GB per CPU for cacheable workloads (like RDBMS)
    • 2 GB per CPU for compute intensive

      Now for example, single image tight on money=2GB/CPU so a four CPU sizing needs 8 GB. Let us take an extreme example for Micro-partitions to make the

      point with the theoretical maximum of 40 LPARs on 4 CPUs. A sencible sizing might be a minimum 512 MB per LPAR (or 1GB). This results in a totla of 20GB on memory. Now a more realistic machines would be somewhere between 8 GB and 20 GB but it is easy with lots of LPARs to size too little memory. Particularly, with 1 or 2 CPU machines - I suggested minimum is 4 GB on any machine.
  5. Unnecessarily large maximum memory setting on LPARs

    Page Table size for managing virtual memory is one 64th of LPAR maximum memory. If you specify large maximums then you waste RAM (currently Linux does not do Dynamic RAM = total waste of RAM). An example, the LPAR settings are
    • RAM min = 0.5GB - OK
    • RAM desired = 1 GB - OK
    • RAM max = 16 GB - BAD as the Page Table = 256MB RAM (= ~16 GB / 64) - do this on 8 LPARs and 4 GB of memory appear to be missing!!
  6. Virtual IO Server below a whole CPU

    For excellent Virtual IO Server responsiveness giving the VIO Server a whole CPU is a good idea as it results in no latency waiting to get scheduled on to the CPU. But on small machines, say 4 CPU, this is a lot of computer power compared to the VIO client LPARs (i.e. 25%). I fyou decide to give the VIO Server say half a CPU (Entitle Capcity = 0.5) then be generous, never make the VIO Server Capped and give it a very large weight factor.
  7. "You don't get a quart out of a pint pot"

    If you share a single Ethernet or disk drive among lots of VIO Server client partitions and they are all busy then you risk all the partitions apparently running slowly. Booting one or two LPARs from a single disk will work fine but if you do the same with 40 partitions each will think the disk is 40 times slower than normal and the performance hit will be clearly noticeable. The same is true if LPARs are busy on the network for example more than one transferring data with FTP. If LPARs do not have enough memory they are also likely to have to page a lot and this generated disk load. You need to factor these in but what can you do. It is strongly recommended that rather than SCSI disks purchase a caching RAID5 SCSI controller to boost your disk I/O capabilities. And go for good Ethernet connections a 1Gbit adapter is far better than say two or three 100Mb ones.
  8. Bad candidates for Virtualization (i.e. give them dedicated adapters)
    1. 40 micro web servers where 1 larger web server will do nicely - i.e. don't go Virtual if there is no real need
    2. Systems with response real-time criteria or strong quality of service requirements
    3. Applications relying on polling (will not yield CPU time)
    4. Solutions with high CPU usage & relatively constant demands for example HPC, DSS - these are famous for soaking up all resources and other LPARs would suffer
    5. Application high I/O demands (use real adapters)
  9. You are allowed to mirror the rootvg disk of the Virtual IO Server but you are not allowed to mirror the logical volumes on the VIO Server that are client virtual disks. This limitation is often over looked - it will initially work but it is not supported and when there are problems it may not work as you might expect. Best to avoid this one.

Implementation

  • Plan, Plan and the Plan again - do not start creating logical partitions until you have a fully documented setup and it all carefully documented in (probably) as a spread sheet so you can check it all adds up.
  • In most large installations the configuration is an iterative process that will not quite match the initial design so some modification may have to be made.
  • Also opportunities may appear too to add flexibility of a pool of resources that can be assigned leter, once real life performance has been monitored for a few weeks.
  • This wiki page is not meant to cover the whole server consolidation or large server setup, HMC configuration, etc. but just virtualisation
  • Many people fall into three categories in the setup
    1. Those that want to name or rename every resource for clarification and explanation to make life simpler later (for example the names of Logical Partitions, prfiles, Logical Volumes etc). These then tend to get long drawn out discussions of naming conventions and processes over weeks.
    2. Those that use default names and keep careful records of which resource belongs to which - for example the virtual disks that map to each LPAR.
    3. Those that use the defaults and then examine the HMC as the definitive definition of the resources and assignments.
    • There is no right answer and as the machines I deal with are more "crash and burn" demonstration machines than production my experience is not a good start point for production best practice (I use methods 3 :-).
  • Some planning is required to ensures installation occurs in a functional order - this may be ordered due to project priority, easy wins first or when other resources become available like SAN disks or EthernetChannel connection become workable.
  • Many sites have pre-defined platform releases - these should have been tested for running as VIO Server clients before the main installation to ensure it all works OK.
  • Many peole prefer to create lots of VIO Server resources with lots of spares that can be used later on - this avoid having to dynamically create then and only dynamically set them up, this includes spare pre "plumb in" virtual SCSI and Ethernet.
  • Some low level and simple test is worth documenting to prove each VIO Server client for example
    1. Check and set if needed the time and date
    2. Clear the error logs
    3. Ping and telnet on each network to remote machines to prove the connection
    4. Create a large file with dd if=/dev/zero of=/tmp/xyz bs=128k count=1024 ; sync ; rm /tmp/xyz
    5. and then check the error log
  • I have seen production machines shutdown by simply halting the VIO Server - this is highly NOT recommended
    • The VIO Server clients just see this as a complete and utter failure of all their disks and networks.
    • AIX and Linux on POWER are robust and will have to clean up and recover from these failuse on being rebooted.
    • They will not continue operating once the VIO Server is restarted - they will need to be stopped and restarted.
    • So you might as well shut the clients down before the VIO Server.
    • Note: to get Linux on POWER to shutdown from the HMC option you need to install the RAS RPM's found at http://techsupport.services.ibm.com/server/lopdiags
  • Assuming you are using Shared Processor Partitions (what IBM marketing insisted was called Micro-partitions) then you will be create paritions from a pool. There are different ways to manage the assignment of CPU cycles. For example:
    • Assigning large CPU Entitlements based on expected use
    • Assigning low CPU Entitlements and assigning weighting factors based on expected use and priorities
    • This is covered in topics (near the end) of the IVM Advanced Topics Wiki page at this Link

Management/Administration

  • Maintain VIO server software
    • New and very useful function appear in the latest VIO Server software which makes updating it worth while
    • Take careful note the this may require firmware updates too and it is worth scheduling these and in advance of VIO Server software updates
    • There are also fixes for the VIO Server to overcome particular problems
  • It is worth making a "read only" HMC or IVM user account for people to take a look at the configuration and know they can't "mess it up".
  • I often get people claim that their Virtual I/O resource is not available when they create a new LPAR and 90% of the time it is due to mistakes on the HMC. The IVM features automation of these setup tasks and is much easier. Also recent new versions of the HMC software make the cross checking of the virtual VIO Server VIO client resources all match up.
  • It is strongly recommended the the HMC, system firmware and VIO Server software is all kept upto date to make the latest VIO features and user interface advances available and to remove known and fixed problems with early releases.
  • in particular VIO Server 1.1 should be upgraded to VIO Server 1.2 - after contacting AIX Support for advice and to check a few known problems.

Yet to be added

  • Monitor and tune the virtualized system
  • Develop and implement backup and recovery
  • Isolate problems in the virtualized environment

Creating LPAR using SSH

Get the configuration data from existing LPAR

  • If you already have LPARs created you can use this command to get their configuration which can be reused as template:
 lssyscfg -r prof -m SERVERNAME  --filter "lpar_ids=X, profile_names=normal"

Create new LPAR using command line

  • Here is an example, for more information see '''man mksyscfg'''
     mksyscfg -r lpar -m MACHINE -i name=LPARNAME, profile_name=normal, lpar_env=aixlinux, shared_proc_pool_util_auth=1, 
     min_mem=512, desired_mem=2048, max_mem=4096,   proc_mode=shared, min_proc_units=0.2, desired_proc_units=0.5, 
     max_proc_units=2.0, min_procs=1, desired_procs=2, max_procs=2, sharing_mode=uncap, uncap_weight=128,
     boot_mode=norm, conn_monitoring=1, shared_proc_pool_util_auth=1

    Create more LPARs using configuration file

  • If you want to create more LPARS at once you can use a configuration file and provide it as input for mksyscfg.

    Here is an example for 3 LPARs, each definition starting at new line:
    name=LPAR1,profile_name=normal,lpar_env=aixlinux,all_resources=0,
    min_mem=1024,desired_mem=9216,max_mem=9216,
    proc_mode=shared,min_proc_units=0.3,desired_proc_units=1.0,
    max_proc_units=3.0,min_procs=1,desired_procs=3,max_procs=3,
    sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,
    max_virtual_slots=10,
    "virtual_scsi_adapters=6/client/4/vio1a/11/1,7/client/9/vio2a/11/1",
    "virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",
    boot_mode=norm,conn_monitoring=1,auto_start=0,
    power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1
    
    name=LPAR2,profile_name=normal,lpar_env=aixlinux,all_resources=0,
    min_mem=1024,desired_mem=9216,max_mem=9216,
    proc_mode=shared,min_proc_units=0.3,desired_proc_units=1.0,
    max_proc_units=3.0,min_procs=1,desired_procs=3,max_procs=3,
    sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,
    max_virtual_slots=10,
    "virtual_scsi_adapters=6/client/4/vio1a/12/1,7/client/9/vio2a/12/1",
    "virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",
    boot_mode=norm,conn_monitoring=1,auto_start=0,
    power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1
    
    name=LPAR3,profile_name=normal,lpar_env=aixlinux,all_resources=0,
    min_mem=1024,desired_mem=15360,max_mem=15360,
    proc_mode=shared,min_proc_units=0.4,desired_proc_units=1.0,
    max_proc_units=4.0,min_procs=1,desired_procs=4,max_procs=4,
    sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,
    max_virtual_slots=10,
    "virtual_scsi_adapters=6/client/4/vio1a/13/1,7/client/9/vio2a/13/1",
    "virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",
    boot_mode=norm,conn_monitoring=1,auto_start=0,
    power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1
  • Copy this file to HMC and run:
      mksyscfg -r lpar -m SERVERNAME -f /tmp/profiles.txt

References

This site has a good overview and practice information too http://www-128.ibm.com/developerworks/eserver/library/es-aix-vioserver-v2/

Missing image The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.