Question & Answer
Question
Configuring the RAID Hard Disk Drive (HDD) Array
Answer
Contents
This solution describes the choices and trade-offs for configuring the HDD layout for a typical single-server (All-In-One) TeaLeaf system.
During the initial phase of the installation TeaLeaf will supply a HW sizing recommendation based on information provided by the customer about the site's traffic. Among other HW recommendations the sizing recommendation will provide 3 HDD-related numbers; Canister HDD needs; Indexes HDD needs; 'Other' HDD needs.
The Indexer data (resides in the Indexer HDD space) is the largest data block. It can be regenerated from the Canister data; therefore it does not require redundancy or backup. In the event of a disk failure in the Indexer HDD area the TeaLeaf SW will be unable to search the indexes making it difficult to retrieve long-term sessions for replay. New data will still flow into the TeaLeaf canister. The corrective action will be to bring the TeaLeaf sever down while replacing the disk bring the server back up and regenerate the indexes.
TeaLeaf recommends that the Indexer HDD be assigned to a logical disk array having no redundancy or simple parity redundancy.
The Canister HDD space is smaller than the Indexer but still quite large. This data is critical and some form of planning for recovery is necessary. While this space is down no new data can go into the Canister however raw hits will still be stored in the Spool HDD area (usually part of 'other' see below). There are three typical choices for this area:
During the initial phase of the installation TeaLeaf will supply a HW sizing recommendation based on information provided by the customer about the site's traffic. Among other HW recommendations the sizing recommendation will provide 3 HDD-related numbers; Canister HDD needs; Indexes HDD needs; 'Other' HDD needs.
The Indexer data (resides in the Indexer HDD space) is the largest data block. It can be regenerated from the Canister data; therefore it does not require redundancy or backup. In the event of a disk failure in the Indexer HDD area the TeaLeaf SW will be unable to search the indexes making it difficult to retrieve long-term sessions for replay. New data will still flow into the TeaLeaf canister. The corrective action will be to bring the TeaLeaf sever down while replacing the disk bring the server back up and regenerate the indexes.
TeaLeaf recommends that the Indexer HDD be assigned to a logical disk array having no redundancy or simple parity redundancy.
The Canister HDD space is smaller than the Indexer but still quite large. This data is critical and some form of planning for recovery is necessary. While this space is down no new data can go into the Canister however raw hits will still be stored in the Spool HDD area (usually part of 'other' see below). There are three typical choices for this area:
- use no drive redundancy and depend on external backups as the source for recovering the data.
- use parity redundancy
- use mirroring redundancy.
Because of the critical nature of this data the canister HDD area should be available at all times. Choices (1) and (2) above would mean the Canister HDD area is unavailable should a drive go down. TeaLeaf recommends using mirroring to achieve redundancy of the Canister HDD area.
The 'other' HDD area includes the following : the OS and customers standard server programs; the TeaLeaf programs; the TeaLeaf log files; the TeaLeaf Report DB; the TeaLeaf spooling directory. The spooling directory has its own characteristics the rest of the 'other' HDD space can be characterized as disk space which is written to infrequently (most log files are updated daily) or non-critical (the TeaLeaf governor logs are written to every minute but are not critical data). However the OS is a very critical piece; without it the whole server is down. Likewise the Report DB is special it is written to every 5 minutes and is read 'on-demand' when users access the TeaLeaf portal.
The characteristics of the spooling directory is very load-dependent. In a properly sized RealiTea server the spooling directory is seldom written to. During seasonal peak traffic periods (for example Christmas for retails sites end of quarter or end of year for financial sites) excess traffic can be spooled to disk while the RealiTea server handles traffic as fast as it can. Also during times of maintenance or recovery of the RealiTea canister HDD area incoming traffic can be spooled to disk.
The Report DB HDD area can be occupied by either MSSQL database programs and files or CTree database programs and files. In either case this area should be backed-up every night. There is no way to recover this data from other sources in the event of a complete failure of this HDD area.
Because the spooling HDD area should be seldom used but could be considered critical data and the Report DB HDD area is critical TeaLeaf usually recommends that the spooling and Report DB HDD areas be combined with the 'other' HDD area.
Due to the critical nature of some of the data in the 'other' HDD area TeaLeaf recommends using mirroring to achieve redundancy of the Canister HDD area.
Here are two specific examples for configuring the HDD RAID arrays. The first example will be an 'ideal' example for organizations with the resources to purchase the necessary HW. All examples will use the following sizing assumptions. Remember when purchasing HW to use the actual HDD needs recommended by TeaLeaf as part of the HW sizing effort. Drive letters in the examples are just for convenience the customer's organization can use any drive letters desired.
'Other' HDD area:
10 GB for OS and customers standard server programs;1 GB for TeaLeaf programs;
2 GB for TeaLeaf log files; 2 GB for MSSQL Report DB in MSDE; 50 GB for spooling area: 65GB total
Canister HDD area: 50 GB
Indexer HDD area: 150 GB
Good HDD configuration (requires 7 drive bays):
Drive C (physical disks 1 and 2: Mirrored) If the sizing recommendation is (for example) 65GB select two drives of at least that capacity or larger for both drives 1 and 2; Configure them as a single drive C in a Mirrored Raid arrangement
Drive D (physical disks 3 and 4: Mirrored): Canister Data : If the sizing recommendation is (for example) 50GB select two drives of at least that capacity or larger for both drives 3 and 4; Configure them as a single drive D in a Mirrored Raid arrangement.
Drive E (physical disks 56 and 7: Striped with Parity): Indexer Data : Select a minimum of three drives of the same capacity; the relationship between the needed HDD and the drive sizes is: Index HDD Needed < (HDD Size)*((# of drives)/(# of drives -1)). For 3 drives Raid Striped with Parity supplies 2/3 of the HDD size as useable space. If the sizing recommendation is (for example) 150GB use 3 drives each of at least 75GB.
Also Good HDD configuration (requires 6 drive bays):
Drive C (physical disks 1 and 2: Mirrored) If the sizing recommendation is (for example) 65GB select two drives of at least that capacity or larger for both drives 1 and 2; Configure them as a single drive C in a Mirrored Raid arrangement
Drive D (physical disks 3 and 4: Mirrored): Canister Data : If the sizing recommendation is (for example) 50GB select two drives of at least that capacity or larger for both drives 3 and 4; Configure them as a single drive D in a Mirrored Raid arrangement.
Drive E (physical disks 5 and 6; Striped NOParity): Indexer Data : Select a minimum of two drives of the same capacity; If the sizing recommendation is (for example) 150GB use 2 drives each of at least 75GB.
Acceptable HDD configuration (requires 4 drive bays):
Drive C (physical disks 1 and 2: Mirrored) [Combining 'other' and Canister HDD areas] If the sizing recommendation is (for example) a combined 115 GB select two drives of at least that capacity or larger for both drives 1 and 2; Configure them as a single drive C in a Mirrored Raid arrangement.
Drive D (physical disks 3 and 4: Striped with Parity): Indexer Data : Select a minimum of two drives of the same capacity; If the sizing recommendation is (for example) 150GB use 2 drives each of at least 75GB.
Further considerations:
TeaLeaf recommends that the canister and the index data NOT be put onto the same drives. The results would be disk I/O bottlenecks.
To save some drive count Canister (D) and 'Other' (C) data can live on the same HDD although some disk contention (resulting in dropped hits) may occur during times of spooling.
A good explanation of RAID arrays can be found here:
http://www.pcguide.com/ref/hdd/perf/raid/index.htm
For a discussion specific to Raid 5 see also
http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel5-c.html
Article Reference
00000065
Version this article applies to
4.0;4.5;4.6;5.0;5.1;6.0;6.1
"
[{"Business Unit":{"code":"BU055","label":"Cognitive Applications"},"Product":{"code":"SSERNK","label":"Tealeaf Customer Experience"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}}]
Was this topic helpful?
Document Information
Modified date:
08 December 2018
UID
ibm10776583