IBM brings a solution to this space that has been finely tuned to deliver a seamless and branded shopping experience across all channels, including digital and physical touch points within each channel. WebSphere Commerce V7 drives improved customer loyalty and increased shopping cart sizes by delivering rich, personalized, and contextually relevant content at each stage of the shopping experience. The software infrastructure is based on service-oriented architecture (SOA) with WebSphere Application Server (hereafter called Application Server) and DB2® as its underlying solution components.
The architectural base of Application Server and DB2 delivers optimized performance and scalability on IBM’s POWER7 systems. The solution is based on software and hardware architectures that deliver speed of data access for near real time and interactive use of information. Customer access to the commerce platform is enhanced for users as the new hardware solution has reduced query times up to ~50% based on high and unparalleled performance of the Power servers. POWER7 brings unique virtualization, advanced memory management, breakthrough workload support, and world class availability to this solution. These characteristics are explained below.
Understanding POWER7 architecture
This section uses P750 and P780 systems as examples when discussing POWER7 chip architecture and capabilities.
POWER7 chip and cache differentiation
The POWER7 chip architecture continues IBM’s differentiation by building upon the performance leadership of POWER6®. POWER7 provides increased processor core density per chip or socket, improved multithreading support, and improved core memory bandwidth (discussed below). This chip design results in increased performance when compared to POWER6.
The availability of multicore and dynamic threading allows POWER7 to support a large number of Java Virtual Machines (JVMs) running the WebSphere Commerce Server application. The POWER7 chip features up to 8 processor cores with a 4-way SMT in each core. This is equivalent to 32 logical processors on a single chip or socket. The Power 750 Express is a one- to four-socket server, with up to 32 cores that can deliver up to 128 simultaneous compute threads.
Due to the latency difference between main memory and on-chip memory cache, POWER7 was designed with three levels of on-chip cache (see Figure 1). The chip includes 32 MB of on-chip L3 cache memory implemented in embedded Dynamic Random Access Memory (DRAM) instead of the off-chip L3 cache that was used with all the prior dual-core Power chips. The POWER7 chip has two dual-channel DDR3 memory controllers implemented on the chip that delivers 100 GB/sec of sustained bandwidth per chip. This provides significant advantage for heavy cache usage.
Figure 1. Architecture of the POWER7 chip
POWER7 virtualization and simultaneous thread support
PowerVM is built into the hardware and provides higher performance, more scalability, and higher resource utilization than the leading Intel® virtualization platforms. PowerVM offers the capability to dynamically adjust system resources based on workload demands so that each partition gets the resources it needs. Logical partitions (LPARs) allow you to run multiple operating system instances on the same system without interference. Micro-partioning allows an LPAR to share processors with other different partitions. Micro-partitioning provided by PowerVM delivers tremendous flexibility when planning web site deployments. PowerVM can adjust the use of CPU and memory among different applications without interruption, enabling the retailer flexibility in scaling to variable business requirements.
POWER7 provides a technology breakthrough and can run a larger application workload on the same sized machine (with the same number of sockets or cores). This drives better value from the retailer’s hardware investment allowing higher Central Processing Unit (CPU) utilization, based on new Simultaneous Multi-Thread (SMT) support. SMT is a processor technology to allow multiple threads to issue instructions each cycle. SMT permits all thread instances to simultaneously compete for and share processor resources. SMT4, newly supported in POWER7, provides more concurrent instances of the application with more hardware supported threads than other solutions. SMT4 provides more threads and AIX takes advantage of more threads based on its understanding of the application. Application Server V7 also has optimization to take advantage of the SMT4 capability so that applications built on Application Server V7 can receive the SMT4 advantage without modification (for example, WebSphere Commerce V7).
Memory advancements bring value to this POWER7 solution by providing more in-memory data for the software. Active Memory Expansion is a new POWER7 technology that enables the effective maximum memory capacity to be larger than the true physical memory. Innovative compression or decompression of memory content enables memory expansion up to 100 percent. This enables an application partition to do significantly more work or enable a server to run more partitions with the same physical amount of memory. Utilizing Active Memory Expansion can improve system utilization and increase a system’s throughput.
POWER7 memory architecture uses high reliability, availability, and serviceability (RAS), high performance, and low power consumption memory. The P750 and P780 use 1066 Dynamic Random Access Memory (DRAM) bus rate technology. The DRAM interface is double-ported to provide double the bandwidth of POWER6. Spare DRAM and selective mirroring provide increased memory RAS.
When a similarly powerful configuration is built on Intel platforms, the scenario creates more images for customers to manage. Power solutions have the advantage of simplified system management.
Availability and serviceability of the platform
Downtime can cause a serious impact to business continuity, and for Smarter Commerce, an unavailable commerce platform quickly impacts the retailer’s revenue. Sales will go down if the web site is unavailable. POWER7 brings a highly reliable platform to the solution. The key to the solution is the option of high availability configurations based on Power virtualization, which allows applications to automatically be moved off of a failing machine to a backup.
The retailer achieves reduced operational cost related to maintenance, space allocation, and power consumption. With midrange and high end POWER7 systems, concurrent maintenance support enables continuous application availability. Concurrent maintenance support allows fixes to be applied without taking the systems down. AIX supports hot kernel patches. This capability helps the retailer keep their systems up all the time.
It is important to know that the new capabilities of the POWER7 processor are utilized by Application Server V7.x, so moving to this application environment provides advantages over preserving binary versions of earlier existing applications. This is the preferred environment for WebSphere Commerce.
However, POWER7 does support compatibility modes, which allow applications to run on POWER5® or POWER6 processors to run unmodified on POWER7. This means that the code for these applications does not have to be changed or recompiled to run in compatibility mode. This provides a smooth transition path from older systems to the latest platform and minimizes the costs of migration to a new system. This compatibility mode is significant in that it allows legacy applications to be preserved where necessary, which might be the case where thousands of applications are supported by this deployment.
Also, although logical partitions that use the earlier processor compatibility modes can run on POWER7 servers, a POWER7 processor does not emulate all features of a POWER6 or a POWER5 processor. For example, certain types of performance monitoring might not be available for a logical partition if the current processor compatibility mode of a logical partition is set to the POWER5 mode.
Despite challenging economic times, web retail has been experiencing significant growth. Many high volume WebSphere Commerce customers are expecting significant (up to 20%) compound growth in order volume over two years (2011 and 2012). Web retail is also becoming more sophisticated and targeted, extensively leveraging promotions and marketing campaigns to entice online shoppers. To accomplish this, retailers are leveraging a rich set of features available in WebSphere Commerce. They are also integrating their WebSphere Commerce deployments with a variety of external systems, typically those providing up-to-the-minute pricing and inventory information.
The following numbers are indicative of overall direction for web retail (numbers taken from actual customer deployments):
- For a web-intensive retailer, the highest order rate achieved by a WebSphere Commerce site was about 850 orders per minute in 2009. In 2012, it is anticipated to be 1,100 orders per minute.
- In 2009, the highest order rate was achieved when about 9,000 shoppers were concurrently shopping on the site.
- In 2009, the largest WebSphere Commerce deployment involved about 100 JVMs. In 2010, this number was up to 150. In 2011, it is anticipated to involve 196 JVMs.
- In 2009, the largest WebSphere Commerce catalog in use in the field was about 7,000,000 SKUs. In 2010, this was over 10,000,000 SKUs. In 2011, this number is anticipated to grow to over 30,000,000 SKUs.
- In 2010, the largest WebSphere Commerce database observed in the field was about 1 TB.
This growth is not only driving the overall footprint of the WebSphere Commerce deployment, as expressed by the number of required cores and the number of required Websphere Commerce JVMs, to increase, but also poses new types of performance challenges:
- Increasing operational costs due to increasing numbers of JVMs.
- Spikes in workload triggered by promotions and sales events.
- Rapid changes in CPU, network, and disk I/O triggered by external systems.
Below, we will examine the typical WebSphere Commerce workload and then discuss how POWER7 features can help resolve many of these challenges.
Understanding the WebSphere Commerce workload
WebSphere Commerce is a J2EE application that is deployed on and runs in Application Server. WebSphere Commerce employs the standard three logical tier application architecture:
- The HTTP tier is typically implemented using the IBM HTTP Server (IHS). IHS hosts the Application Server HTTP plug-in. The HTTP plug-in performs the second level of load balancing for the Application Server and WebSphere Commerce server cluster. It also performs an important caching function – caching static content - such as images, closer to the edge of the network.
- The application tier is the Application Server and WebSphere Commerce server cluster. For WebSphere Commerce sites, it is here that most of the CPU-intensive computing happens.
- The database tier also plays a key role in overall WebSphere Commerce performance equation. For WebSphere Commerce sites, computation on the database tier is typically not CPU intensive, but involves high rates of disk I/O.
Another aspect to consider is that conventional view of a workload focuses on the steady-state characteristics of the system. Real world WebSphere Commerce workloads tend to be dynamic. Consider the case of a major Internet retailer preparing for a major sales event. In preparation for the sales event, the retailer clears the content of the cache, recycles the WebSphere Comerce JVMs, loads a sales catalog containing items offered only for the duration of the event, and activates promotions offering discounts and gifts.
Promotions include those commonly called “door-crasher specials”, which are promotions providing special discounts to the first online shoppers to enter the site after the start of the event. Large numbers of shoppers attempt to log on to the retailer’s site to take advantage of the door-crasher specials. As a result, at the start of the sales event, the retailers systems experience a tremendous spike in volume, placing a tremendous stress on the relatively cold system.
We will look at WebSphere Commerce workloads and the type of characteristics underlying hardware infrastructure to deliver top performance for the site. We will focus on the application and database tiers. The simpler HTTP tier is out of scope for the purposes of this article.
The application tier
Figure 2 shows a logical view of a typical WebSphere Commerce cluster. A cluster of WebSphere Commerce JVMs retrieves data from and stores data in a single database instance. Although WebSphere Commerce is an Online Transaction Processing (OLTP) application, typically over 90% of database access operations are "reads" performed as shoppers browse the catalog on the web site. Caching on the application tier plays a pivotal role in WebSphere Commerce deployments. It reduces roundtrips to the database and alleviates database input/output (I/O) bottlenecks. Extensive use of caching strongly influence demands that WebSphere Commerce makes on hardware infrastructure on the application tier.
Figure 2. Logical view of a typical WebSphere Commerce deployment
When examining the contents of each WebSphere Commerce JVM heap, you find find that 60% to 80% of the heap is occupied by long-lived (tenured) objects, while 20% to 40% of the heap is occupied by relatively short-lived (objects) typically created in the JVM nursery. The longer-lived objects are primarily cacheable objects stored in the Application Server dynacache – JSPs, JSP page fragments, commands extending the WebSphere Command Framework classes, and distributed maps.
For a typical WebSphere Commerce customer, the size of the cache is about 6 GB. For some customers, the size of the cache can reach up to 20 GB. Today, most WebSphere Commerce deployments use the 32-bit version of the JVM. This JVM is a bit more performant than the 64-bit version, but has a limitation on the maximum size of the JVM heap. This limit (the -Xmx parameter) varies a bit for each platform, but is generally about 2 GB. This means that most of the cache has to be stored outside of the JVM.
To help with this problem, dynacache provides a feature called "disk offload". When the cache gets too large to fit into the JVM heap, dynacache disk offload writes the contents of the cache out to a file on disk. In a well-tuned site, this disk offload file fits entirely into the file system cache, and is served out of RAM. In those cases where the disk offload file does not fit into RAM, WebSphere Commerce needs to perform a large number of disk I/O operations. WebSphere Commerce workloads on the application tier generally benefit from fast access to RAM and disk.
Another important aspect common to all WebSphere Commerce workloads is a high degree of parallelism. During key sales events, high-volume WebSphere Commerce sites can experience up to 10,000 concurrent shoppers browsing and placing orders. The Application Server web container uses a thread pool to service the large number of concurrent requests. Typically, under high-volume conditions, each WebSphere Commerce JVM executes between 30 and 50 web container threads simultaneously.
In addition to the web container threads, WebSphere Commerce has a scheduler feature and a number of utilities (such as dataload and stagingprop), which are executing in parallel. These features typically use up to 30 additional threads per JVM. WebSphere Commerce benefits significantly from hardware platforms that support large numbers of concurrently executing threads.
The database tier
On the database tier, the demands that WebSphere Commerce workloads place on the hardware are a bit different than on the application tier. WebSphere Commerce supports DB2 and Oracle® databases. On properly sized and tuned site, database CPU utilizations do not exceed 40%. However, the rate of I/O operations to disk is quite high.
Figure 3. NMON output showing I/O chareacteristics of a typical WebSphere Commerce site
Figure 3 shows a sample NMON output taken during a performance test. In this case, an LPAR was running an instance of DB2. In this test, 25% of the WebSphere Commerce cache (on the application tier) was invalidated every 20 minutes. This simulates a real world situation where an inventory feed from an external Enterprise Information System (EIS) updates the inventory levels in a WebSphere Commerce database.
You can see that the I/O rate (yellow line in Figure 3) remains consistently high throughout the test. Another important aspect to note is that despite the fact that WebSphere Commerce is an OLTP application, I/O activity is dominated by read operations (shown in blue in Figure 3). Both DB2 and Oracle databases provide buffers that allow table data to be cached in and served from RAM.
WebSphere Commerce performance on POWER7
This section takes a look at some of the performance results achieved on POWER7 hardware at the WebSphere Commerce performance lab. We take a look at how highly dynamic workloads perform on POWER7, and then discuss the scaling characteristics of the WebSphere Commerce JVM on this platform.
Handling highly dynamic workloads
The "Black Friday Cold Start" benchmark is modeled after the real world scenario, where a retailer needs to fully restart their site immediately prior to a major sales event to perform maintenance and load event-specific catalog and promotions. We used three p750s to host the web and application tiers, and one p780 to host the database tier. The load was ramped up from 0 to 9,000 concurrent shoppers in 1 second and was sustained without functional errors for one hour.
Results of our runs are summarized in Figure 4. Despite cold caches and extremely rapid load ramp-up, we achieved a maximum throughput of 1,026 orders/min at 9,000 concurrent shoppers. Response times were good and CPU utilizations quickly reached steady state, settling at or below 65%.
Figure 4. Executing the Black Friday Cold Start benchmark – maximum throughput vs. total number of virtual concurrent shoppers
Figure 5 shows the results of a reliability test simulating a full day Black Friday sales event. In this case, we executed the Black Friday Cold Start benchmark described above and sustained 1,000 orders/minute order rate for a full hour, simulating the Black Friday opening hour spike. Then we reduced the order rate to about 500 orders/minute, and sustained the load for another 5 hours, simulating a full day of Black Friday shopping activity.
Figure 5. NMON output showing CPU utilization for one of the p750 WebSphere Commerce application tier LPARs during full day Black Friday reliability test
Despite the extreme ramp up of workload at the start, and the high order rate sustained over 6 hours of the tests, you can see that the CPU utilization on the application tier regains steady state quickly, and remains stable at a reasonable 30-35% range for the duration of the test. This attests to the strong fit of the POWER7 platform for WebSphere Commerce workloads and strong throughput and reliability characteristics that this combination of products provides.
Figure 6 shows results of a WebSphere Commerce scalability investigation. In this case, a POWER7 LPAR was created and a single WebSphere Commerce JVM was deployed on it. We varied the number of cores available to the LPAR. For each number of available cores, a step-up test was performed to establish the maximum possible throughput. We assigned whole-core values as well as micro-partitioned half-core value to the LPAR. We plotted whole-core and micro-partitioned data points in different colors to analyze possible overhead associated with micro-partitioning.
Figure 6. Scaling characteristics of a single WebSphere Commerce JVM on a POWER7 platform
Results show near-perfect scalability characteristics up to 4.5 cores per JVM. Results also show that micro-partitioning overhead is small when 2.5 cores or more are assigned to a single JVM. Micro-partitioning provides for significant flexibility when planning WebSphere Commerce deployments.
You can consolidate all of your additional environments required for a WebSphere Commerce deployment – staging, performance, integration, and general quality assurance - as LPARs on a single POWER7 server or several servers. These additional environments do not always require entire cores to be assigned to them. Micro-partitioning is a powerful feature that allows you to leverage your processor resources with maximum efficiency.
Figure 7 demonstrates another important aspect of WebSphere Commerce performance on POWER7. The number of cores per socket (commonly used name for a chip) for POWER7 models can be 4, 6, or 8. The model of p750 that we used in our test had a 6-core socket. As you can see, up to 6 cores per JVM the maximum throughput increases at a near-liner trend. However, as the number of cores per JVM goes from 6 to 7, we observe a drop in throughput. This is because process execution now needs to be coordinated across the bus, which operates at half the clock frequency of the chip.
Figure 7. Effect of crossing the chip socket boundary
Caveats and best practices
The following caveats will help you get the best experience from your POWER7-based WebSphere Commerce deployment:
- Run the latest version of AIX. At minimum, V220.127.116.11 is required for full P7 mode. On earlier versions of AIX, you only get P6 compatibility mode performance, wich is up to 30% performance penalty. We recommend that you use the latest version of the AIX operating system (version 7.1 at the time of writing).
- Size LPARs to avoid spanning chip socket boundaries. When planning your deployments, determine your socket size first. Consult the technical documentation for your model to do so. It will be 4, 6, or 8 cores. Once you understand your socket size, plan to size your LPARs in such a way that they are not likely to span across socket boundaries. If you are using dynamic LPARs, take extra care to ensure the LPARs do not grow across socket boundaries.
- Use optimal core/JVM ratio. A good starting point is to use 4 cores/JVM. If you are micro-partitioning, it is a good idea to use at least 2.5 cores/JVM.
- Use enough web container and scheduler threads to fully leverage SMT4. If you do not use enough threads to fully load the POWER7 cores, WebSphere Commerce does not fully leverage the available computing power. The WebSphere Commerce workload benefits from parallel processing. For optimal performance, you likely need to use more threads than on an SMT2 system.
- Make sure VIOS has sufficient CPU, memory, and network bandwidth. Insufficient resources in the VIO server can bottleneck the whole system. Take care when micro-partitioning the VIOS.
POWER7 provides the best hardware platform for high-volume WebSphere Commerce deployments. Understanding these caveats will help you get the most value from POWER7.
WebSphere Commerce workloads benefit from processor architectures that provide for a high degree of concurrency and fast access to large amounts of RAM. Key POWER7 features, such as SMT4, 32 MB L3 on-chip cache, and two dual-channel DDR3 memory controllers that deliver 100 GB/sec throughput, are a perfect fit. For comparison, Intel chips based on Nehalem architecture provide hardware support for half the number of concurrently executing threads (equivalent to SMT2), generally a smaller on-chip L3 cache (depending on processor model), and significantly lower RAM throughput. In terms of raw performance, POWER7 is the best choice for high-volume WebSphere Commerce customers.
While it is possible to launch a high-performance WebSphere Commerce web site using Intel-based hardware, POWER7 features provide an important performance edge. Additionally, the ability of POWER7 PowerVM to dynamically allocate resources to LPARs allows you to co-locate your production, performance, and other QA environments on a single POWER7 server. PowerVM can rapidly shift resources from QA enviroments to production to help contain unforeseen spikes.
- IBM resources
- IBM IBV Study: Meeting the
Demands of the Smarter Consumer
Read about this study from the IBM Institue of Business Value.
- IBM Power Systems hardware
Product line overview of the latest IBM Power hardware technology, including POWER7.
- IBM Power Systems software
Product line overview of the latest IBM Power software technology.
- IBM AIX Version 7.1
Learn about AIX V7.1.
Application Server dynacache feature
Learn about the dynamic cache feature.
- Industry-standard benchmark
performance data for POWER7
This data sheet reviews the p750 models in the IBM Power 750 Express server.
- Compare UNIX
Read about performance of UNIX servers from different vendors.
- Performance Guide for HPC
Learn about detailed performance aspects of HPC type workloads on a POWER7 microprocessor.
- IBM IBV Study: Meeting the Demands of the Smarter Consumer
- WebSphere Commerce resources
- developerWorks WebSphere Commerce zone
Provides the latest technical how-to information for using WebSphere Commerce products.
- WebSphere Commerce product family
Product line overview of the three editions of WebSphere Commerce: Enterprise, Professional, and Express.
- WebSphere Commerce V7 Information Center
A single web portal to all documentation, with conceptual, task, and reference information on installing, configuring, and using WebSphere Commerce V7.
- WebSphere Commerce discussion forum
Get answers to your technical questions and share your expertise with other WebSphere Commerce users.
Redbooks on WebSphere Commerce
Lists the latest IBM Redbooks for your reference.
- WebSphere Commerce Enterprise support page
A searchable database of support problems and their solutions, plus downloads, fixes, and problem tracking.
- Featured documents for WebSphere Commerce family
Features the most requested documents in helping answer your questions related to WebSphere Commerce products.
- WebSphere Commerce on Twitter
Check out recent Twitter messages and URLs.
- developerWorks WebSphere Commerce zone
- developerWorks resources
Join a conversation with developerWorks users and authors, and IBM editors and developers.
- developerWorks Webcasts
Free technical sessions by IBM experts that can accelerate your learning curve and help you succeed in your most difficult software projects. Sessions range from one-hour Webcasts to half-day and full-day live sessions in cities worldwide.
- developerWorks podcasts
Listen to interesting and offbeat interviews and discussions with software innovators.
- developerWorks on
Check out recent Twitter messages and URLs.
- IBM Education Assistant
A collection of multimedia educational modules that will help you better understand IBM software products and use them more effectively to meet your business requirements.
- developerWorks blogs