Preparing Linux for the installation of Oracle RAC

Linux® preparation before the installation of Oracle RAC consists of preparing users, specifying various parameters and settings, setting up the network, setting up the file system, and performing some disk tasks.

To prepare Linux for Oracle RAC installation, complete these tasks in the order that they are listed.

Creating users and authentication parameters

To set up users and authentication for Oracle RAC, complete the following steps:
  1. Log in with root authority.
  2. Create the user named oracle on all of the nodes.
  3. Create the user group named oinstall on all of the nodes.
  4. Use an editor such as vi to add these lines to the body of the /etc/security/limits/conf file, to increase the limits for open files and processes:
    oracle          hard    nofile         65536
    oracle          soft    nproc           2047
    oracle          hard    nproc          16384
    # End of file
  5. Check these four files for the following line, and add it if the line is not already present:
    	session   required   pam_limits.so
    • /etc/pam.d sshd
    • /etc/pam.d/login
    • /etc/pam.d/su
    • /etc/pam.d/xdm

Using ulimit for shell settings

Set shell limits by editing the file with suffix .profile for the user oracle. Insert the following lines in the file:
ulimit -n 65536
ulimit -u 16384 
export OPATCH_PLATFORM_ID=211  

The environment variable OPATCH_PLATFORM_ID indicates Linux on IBM® System z®.

Setting Linux kernel parameters

Check the Linux kernel parameters in the /etc/sysctl.conf file. The values of some of the parameters must be increased if they are not equal to or greater than these values:
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.iiiip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 262144

Setting up ssh user equivalence

Oracle uses the ssh protocol for issuing remote commands. Oracle uses scp to perform the installation. Therefore, it is necessary for the users oracle and root to have user equivalence, or the ability to use ssh to move from one node to another without authenticating with passwords or passphrases. Set up ssh user equivalence between each of the interfaces on all of the nodes in the cluster with public and private authentication key pairs that are generated with the Linux command key-gen. Instructions on how to use key-gen to create public and private security keys are available at:

https://www.novell.com/documentation/sles10/sles_admin/?page=/documentation/sles10/sles_admin/data/sec_ssh_authentic.html

When the key-gen command issues a prompt to enter a passphrase, just type enter so that no passphrase will be required.

For example, an installation with two Oracle RAC servers will have a total of twenty-four key pairs (between three interfaces, public, vip, and interconnect; users root and oracle; node1 to node; node1 to node2; and the reverse).

Note: After the user equivalence has been set up, try to use each of the new authorizations one time before proceeding with the installation. This step is necessary because the first time that the new authorizations are used, they generate an interactive message that the Oracle Universal Installer cannot properly handle.

Setting up the network for Oracle RAC

Each server node in the cluster needs two physical connections and three IP interfaces, which must be set up before beginning the installation. Network device type for the Oracle interconnect explained the need to use the device that is fastest and can handle the most throughput and traffic for the private interconnect between server nodes in the cluster. For this reason, the study used HiperSockets on IBM System z.

For the public access, a 1 Gb Ethernet was used on each server, which was configured to have two interfaces, the second one being an alias.

For example, on the first server (node1) the interface for a particular OSA device port was set up as an Ethernet connection with a hardware definition in /etc/sysconfig/network, where a file named ifcfg-qeth-bus-ccw-0.0.07c0 where 07C0 is the device address of an OSA port contains the following lines:
	NAME='IBM OSA Express Network card (0.0.07c0)'
	IPADDR='10.10.10.200'
	NETMASK='255.255.255.0'
By using the network address 10.10.10.200 and a netmask of 255.255.255.0, it leaves the hardware device available to be used by any other interface using the network address of the form: 10.10.10.xxx, and the clusterware startup scripts will create an alias for the public interface when the node starts CRS.
This environment does not use a DNS server, so the alias interface name must be included in the file /etc/hosts, where the two host addresses and names looked like this:
ff02::3         ipv6-allhosts
10.10.10.200    rac-node1 rac-node1.pdl.pok.ibm.com
10.10.10.202    rac-node1vip rac-node1vip.pdl.pok.ibm.com 

In an Oracle RAC system, the string VIP in the interface name stands for virtual IP, and identifies its role. Having two interfaces for the same Ethernet connection supports immediate failover within the cluster. If a node is not responding, the vip IP address is attached to another node in the cluster, faster than the time it would take for hardware timeout to be recognized and processed.

The larger section of the file /etc/hosts shows how the aliases were used for the two nodes, because they are all present in the file and it is also the same on the other node:
10.10.10.200    db-node1 db-node1.pdl.pok.ibm.com
10.10.10.201    db-node2 db-node2.pdl.pok.ibm.com
10.10.10.202    db-node1vip db-node1vip.pdl.pok.ibm.com
10.10.10.203    db-node2vip db-node2vip.pdl.pok.ibm.com
10.10.50.200    db-node1priv db-node1priv.pdl.pok.ibm.com
10.10.50.201    db-node2priv db-node2priv.pdl.pok.ibm.com
The Linux command ifconfig displays interface net0 and its alias named net0:1
net0      Link encap:Ethernet  HWaddr 00:14:5E:78:1D:14
          inet addr:10.10.10.200  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::14:5e00:578:1d14/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1492  Metric:1
          RX packets:12723074 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13030111 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2164181117 (2063.9 Mb)  TX bytes:5136519940 (4898.5 Mb)

net0:1    Link encap:Ethernet  HWaddr 00:14:5E:78:1D:14
          inet addr:10.10.10.203  Bcast:10.10.10.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1492  Metric:1
When Oracle RAC uses the public interface, it will select the interface with the number zero after the prefix. By default this interface would be eth0. If there is a situation where the public interface must be forced to use number 0, use the udev command to name the interfaces. Naming the interfaces can be done by modifying or creating the /etc/udev/rules.d/30-net_persistent_names.rules file, to contain a line like this example, where 07C0 is the device and net is a prefix that was chosen in order to be different from eth
SUBSYSTEM=="net", ACTION=="add", ENV{PHYSDEVPATH}=="*0.0.07c0", IMPORT="/lib/udev/rename_netiface %k net0"

A separate physical connection will become the private interconnect used for Oracle cache fusion, where the nodes in the cluster exchange cache memory and messages in order to maintain a united cache for the cluster. This study used IBM System z HiperSockets in the first installation, and named the interface db-node1priv, since the Oracle convention is to call the interconnect the private connection.

The decision of what type of connectivity to use for the private interconnect is an important decision for a new installation, the objective being high speed, and the ability to handle transferring large amounts of data.

The example of the interconnect for both nodes is shown in the example above of /etc/hosts.

The first node (node1) uses IP address 10.10.50.200 and host name db-node1priv. The second node (node2) uses IP address 10.10.50.201 and host name db-node1priv.

Using a 10.x.x.x network for the external (public) interfaces

The setup for the study required circumventing an Oracle RAC requirement that the external RAC IP address and it's alias be an IP address that is from the range of public IP addresses. Th setup needed to use an IP address in the form 10.10.x.x, which is classified as an internal range.

Oracle RAC would not work until a change was made in $HOME_CRS/bin/racgvip. To do the same thing on your system, either set the variable DEFAULTGW to an IP address that can always be pinged successfully (it will never be used, Oracle only checks to see if it is available), or else search for and change FAIL_WHEN_DEFAULTGW_NOT_FOUND to a value of 0.

Using udev when preparing to install Automatic Storage Manager

To set up shared storage for Oracle RAC on IBM System z, some of the attributes in Linux of the DASD disk storage devices must be modified. In SLES, configuration files located in /etc/udev/rules.d are read by Linux shortly after the kernel is loaded, and before Linux has created the file structures for disk storage and also before Linux assigns file names and attributes to all the known disk devices and partitions.

The udev command can be used to change ownership of the block devices that are used for the OCR and Voting Disks. It is also necessary to alter the attributes of the block devices that will be given to ASM to manage as shared storage for data. Shared DASD used for data also managed by ASM must be assigned the owner oracle and the group named dba.

For this study, a new file was created and given a high number (98) in the file name, so that it would be read last and the setup changes would not be overwritten by other startup processes. There are different rules for the udev command even when comparing SLES 10 SP2 with SLES 10 SP1, so it may be necessary to check the man pages or documentation for the udev command on your system, to ensure that it works as expected.

This is a sample for the udev file 98-oracle.permissions.rules:
# for partitions import parent information
KERNEL=="*[0-9]", IMPORT{parent}=="ID_*"
# OCR disks
KERNEL=="dasdf1", OWNER="oracle", GROUP="oinstall" MODE="0660"
KERNEL=="dasdp1", OWNER="oracle", GROUP="oinstall" MODE="0660"
# VOTING DISKS
KERNEL=="dasdg1", OWNER="oracle", GROUP="oinstall" MODE="0660"
KERNEL=="dasdq1", OWNER="oracle", GROUP="oinstall" MODE="0660"
#ASM
KERNEL=="dasdh1", OWNER="oracle", GROUP="dba" MODE="0660"
KERNEL=="dasdi1", OWNER="oracle", GROUP="dba" MODE="0660"
KERNEL=="dasdj1", OWNER="oracle", GROUP="dba" MODE="0660"
KERNEL=="dasdk1", OWNER="oracle", GROUP="dba" MODE="0660"
KERNEL=="dasdm1", OWNER="oracle", GROUP="dba" MODE="0660"
KERNEL=="dasdn1", OWNER="oracle", GROUP="dba" MODE="0660"
To make the changes take effect immediately, run this command:
/etc/init.d/boot.udev restart

Setting up persistent names for disk devices

Linux assigns names to all devices that it discovers at startup, in the order in which it discovers them, assigning the device names starting with the name dasda (or sda for SCSI) and continuing using that pattern. Even with a small number of disks used from a SAN, the order can change from one Linux startup to the next. For example, if one disk in the sequence becomes unavailable, then all the disks that follow it will shift to a different name in the series. The naming order might change in a way that affects the individual nodes differently, which makes the management of the disks complicated and error-prone.

To produce device names that were the same on different Linux systems and also persistent after rebooting required the use of device names in Linux such as /dev/disk/by-path, /dev/disk/by-id, or /dev/disk/by-uuid that are unambiguous. The problem is that those types of names did not fit into spaces provided in the ASM GUI installer for that purpose. It is possible to use these names with the silent install method, which runs a scripts and uses a response file to complete the installation. The problem with the silent install approach when doing the first installation is that there is no interactive error checking, so if any of the input was unacceptable there is no way to remove a failed installation.

This study employed a workaround for this issue, which was to use the file /etc/zipl.conf to set the disks in the same order of discovery at startup using the dasd= parameter. With the order controlled this way, it is possible to use the names of the partitioned files with confidence that the naming is consistent among the nodes and will not change with a reboot.

When the installation was based on Linux as a guest on z/VM®, the order of the disks is controlled by the order of the definition to the guest, for example in the user directory.

The disks managed by ASM are not visible on /etc/fstab and are out of sight, so careful disk management must be done in order to avoid errors.