Transaction Execution (aka Transactional Memory) and the zEC12
The zEnterprise EC12 (zEC12) was announced October last year and incorporated innovative technologies that contributed to it’s overall 50% increase in performance when compared to its predecessor z196.
The following list show the areas improved:
Derived and built on solid foundation of z196 with the same power consumption (1800 W)
- Uses IBM 32nm SOI technology with eDRAM
- More PUs in the same MCM package as the z196
- Can decode up to 3 instructions per cycle and initiate the execution of up to 7 instructions1
- Improved 3rd OoO (Out of Order) processor generation
- Instruction pipeline streamlined for smoother flow
- Faster engines for fixed-point division
- Millicode improvements
- Improved cache sizes and new 2nd level cache design
- New 2nd level branch prediction array (2nd level BTB expands branch prediction coverage)
- Crypto and Hw Compression Co-processor per core
Support for new architectural features, such as:
- Transaction Execution (TX)
- Run –Time Instrumentation (RI)
- EDAT-2 (2GB page support)
- Up to 3 instructions decoded and 5 executed per cycle on the z196
Transaction Execution (TX) has an external / academic name of Hardware Transactional Memory.
When conducting technical presentations or speeches about the zEnterprise EC12, there’s always a great interest in the Transaction Execution (TE) or Hardware Transactional Memory implementation done by IBM on the zEC12.
Up until now, most Transactional Memory research has focused on software-based implementations. In general, most processors don't actually support the Transaction Execution architecture. In 2011, IBM became the first company to ship a commercial microprocessor equipped with Transaction Execution (TX), a feature that universities and multi core chip researchers have studied for years. Transaction Execution is an approach to parallel programming that has the potential to make parallelism more efficient. The main idea behind the architecture is to avoid the use of the locking mechanisms required to maintain data integrity and coherence in parallel processing, when the subject data is shared between tasks.
In general, parallel programming can be easily done when a task can be broken up into many independent threads (subtasks) that don't share data; if that is the case, each part can run on a different processor core, and no coordination between the cores is needed.
Things get much more complicated when the different parts (threads) of the task aren't completely independent of each other. One example of this would be the case where the different threads need to update a single value (data) that they share. And this is where the value of Transaction Execution resides.
How Transaction Execution was first delivered by IBM
The first implementation of the transactional memory itself is complex and it has been introduced with the IBM BlueGene Q/Sequoia supercomputer. Ruud Haring, who presented IBM's work at Hot Chips a couple of years ago, claimed that "a lot of neat trickery" was required to make it work, and that it was a work of "sheer genius." After careful design work, the system was first built using FPGAs (chips that can be reconfigured in software) and, remarkably, it worked correctly first time. As complex as it is, at the time of its first implementation, it still had its restrictions: notably, it didn’t offer any kind of multiprocessor transactional support. This, due to the specialized Sequoia supercomputers, didn’t represent an issue, but it would become a problem for conventional multiprocessor machines: threads running on different CPUs could make concurrent modifications to shared data, and the transactional memory system wouldn’t detect that.
Other vendors such as Sun Microsystems, now part of Oracle, intended to implement Transactional Memory in their ROCK microprocessor aiming large DB computers. The ROCK microprocessor project was later canceled by Oracle.
Intel has announced that its Haswell chip architecture, due to ship some time this year, will include hardware support for Transactional Memory.
In the zEC12, Transaction Execution has been made available thru collaboration between the IBM Architecture, Software and Hardware research teams and their interlocked with POWER systems. The TX zEC12 implementation represents a long term opportunity for z/OS, DB2 etc. to exploit the architecture in the future and it can be immediately exploited by Java.
In Transaction Execution all storage accesses by the CPU will appear to be block-concurrent to other CPUs and channel subsystems
- If storage footprint used / updated remains consistent throughout its execution, the transaction is successful; otherwise, the transaction is aborted and storage (and designated GPRs) updates will be rolled-back – (see attached presentation drawings).
There are new zArchitecture instructions2 to support the Transaction Execution of software defined “lockless” sequences treated as “Atomic Transactions”.
TBEGIN – TEND: these instructions mark the beginning and the end of a transaction (see drawings)
- Transactions can be nested and the last (outermost) TEND causes transactional updates to commit
TBEGINC: Indicates the beginning of a constrained transaction
- A constrained transaction should fit into a certain set of requirements
- The hardware will use various algorithms under-the-cover to handle conflicts and to help its chance of success
- ETND: retrieves current transaction nesting depth
- TABORT: deliberately causes a transaction to abort (from the outermost transaction)
NTSTG – Performs a store that will be committed regardless of whether the transaction aborts
- mainly for SW debug
(2) Ref: SA22-7832-09 – z/Architecture Principles of Operations – (PoPs)
The slides that follow were created to facilitate the explanation of how, in theory, the Transaction Execution “works”.
Transaction Execution (TX) lecture charts (1 to 3)
5- IBM z/Architecture Principles of Operations – SA22-7832-09
(available in IBM Resource Link website under “Library”)