Be still my beating mainframe: In the day-long developerWorks Live briefing "The modern mainframe ... at the heart of your business" you'll find out about innovations that enable those with no mainframe experience to tap into applications and data already on this platform; discover why the mainframe is the best home for data of any stripe (operational, relational, XML, or warehoused); uncover today's mainframe capabilities like clustering and virtualization; and learn how mainframes can be the best tool around to help you manage your business. So if you happen to be in Moscow on October 18 or Kuwait City on October 25, don't miss this opportunity. (There will be more dates and locations added.)
Cell Broadband Engine/Power Architecture notebook
The Microtransat Challenge ...: ... is a "transatlantic race of fully autonomous sailing boats ... race aims to stimulate the development of autonomous sailing boats through friendly competition." The race is planned for late 2008. The rules are
According to officials, there will be several smaller races before the big one (like the Aberystwyth Race last month.
From September 8 to October 4, 2007: Also, learn to craft parallel scalar code in which the PPU fills in a two dimensional array. Discover whether endianness makes a difference to ALF and whether you can run two data-separate tasks simulaneously in ALF. And, learn how to cast one type of vector into another, where to find QS20 blade firmware upgrades, and how to determine on which SPE your threads are executing.
This new blog-based column looks at some of the more interesting problems and challenges posed recently in the Cell Broadband Engine Architecture forum.
Problem: Even though it is not necessary at this moment since ALF is only currently supported on the Cell/B.E. platform, does anyone have any ideas on dealing with endianness of a struct? I noticed in one of the examples that they did a
Am I correct in assuming that in the theoretical case that ALF was using a future back-end and happened to be going between a little-endian and a big-endian machine, that our
This all seems pretty theoretical to me though since I can't think of a good reason to run SPMD code across architectures requiring endian conversion if it could be avoided.
Discussion: While the API has been defined since SDK 2.0, the Endianess issue is only supported in ALF for SDK 3.0. The example you mentioned is not proper use of ALF. On Cell/B.E. platforms it can run without issue. But it is not portable to platforms where Endianess of host and accelerator are different.
Problem: I'm having trouble installing the SDK 3.0 on my PS3 with Fedora Core 7. What's a good configuration of SDK 3.0 + PS3 + FC7?
Neil Costigan says: I've just upgraded my PS3/FC7 installation to SDK 3.0 and had no problems. The new yum-based install is very streamlined. I ran a few test programs and besides some v1.1 to v3.0 path/header problems, it seems fine. My kernel is 2.6.21-1.3194.fc7. I used the lastest add-on cd (dated 2007-09-04) from http://www.kernel.org/pub/linux/kernel/people/geoff/cell/. I followed the v2.1 uninstall steps documented in the v3.0 install guide. Perhaps you could describe the problem encountered in more detail.
PaulDJ says: Hey, I was able to install SDK 3.0 on a PS3 with Fedora 7 (thanks Neil). Unfortunately, even the simplest examples and the tutorial fail. Like this:
Did I miss something during the SDK3.0 installation?
gcst says: This means spufs is not mounted. Type
PaulDJ says: Indeed, it's missing. Should I add such a line to /etc/fstab?
SDK Service Administrator says: This should have been done by the install (unless it's a known issue that this doesn't happen automatically on PS3). Run
Problem: I am trying to develop a very basic parallel scalar code in which the PPU fills in a two dimensional array (for both integer and then double) and then have to pass it directly to 1-8 SPUs for manipulating different blocks of array. To start, I created a one-dimensional array and passed it on to only one SPU (first directly and secondly through control structure). In both cases the elements which were printed before passing on the array to the SPUs were same once they received and printed by SPUs. But, when I changed the array to multi-dimensional or 2x2 dimension, the SPUs are not printing the contents; instead all the elements of the received array are printed 0 and I don't know why. I'm sure I am not making some mistake in manipulating the pointers.
I am sending the tar file of my code to pinpoint where I'm making the mistake. I would really appreciate help printing multi-dimensional arrays (max 256x256) (integer, double) through parallel scalar code.
gshi responds: You have some misunderstanding about the address. In your ppu code, "@data" is not the starting address of the actual data but a pointer to the heap memory where pointers to the real data memory are stored. The simplest way to fix your code is to declare "@data" as 2-dimensional array in both spu and ppu:
Problem: Will the following code work for casting from a vector unsigned long long to vector float and vector double?
i.e; vector float vf;vf = (vector float)spu_shuffle(vull, vull, vpat);
Kinuko says: To cast from vector unsigned long long to vector float, you can use the same method (to convert from vector unsigned long long to vector unsigned int) and then use
To convert from vector unsigned long long to vector double... in that case probably first you'll need to extract element values using
Editor says: Anyone else with an answer?
Problem: Is there any way to run 2 different tasks that depend on different data simultaneously in ALF or by attempting this, have I diverged too far from the SPMD mindset? After reading about ALF, I am under the impression that only a single task can be run at a time: "ALF uses an SPMD model, so it is possible to create multiple tasks in a batch. However, the tasks are run sequentially in the order they are created."
Now I noticed that in using
Has any work been done on something similar to ALF but in a more general sense? Keeping useful constructs such as task and workblock while hiding some of the annoying background communication details, but without the constraints of forcing us into this one-task-at-a-time parallelized serial architecture. Perhaps something that would allow us to link together a number of different tasks in more of a stream-processing sense?
donm decides: I can suggest one possible workaround to the constraint you point out, but it is dependent on how many different tasks you want to run (a few or a hundred); more specifically, it is dependent on whether or not you can pack all the code into a single executable. If you are only wanting a few, then all your code can be identical and you can pass a variable or data structure specifying an different operation for each SPU (or subset), as you desire.
ALF is fairly flexible in that sense, but it is cheating in a way because each SPU is running the same image, just not necessarily the same code within that image.
Original questionner says: I wish I could be more specific but I don't really have a concrete problem that requires it at the moment. Here are some possibly interesting examples though:
This is a little bit outside of what I think I initially asked, but handling SPE-SPE communication seems like it would be necessary (or at least extremely useful) if you were to be running multiple tasks on different SPEs. I think it's pretty clear that ALF can't handle these situations, but I guess I'm trying to figure out if there is something similar that does.
pcvideo says: Thank for the good feedback. ALF in SDK 3.0 added support for the MPMD model; however, direct SPE-to-SPE communication is not yet available. All data exchanges are still going through the global memory. For a simple pipeline based on ALF, the SDK 3.0 user guide document has some explanations of how it works.
Problem: I have ported the Scalable Pseudo-Random Number Generators library (SPRNG) onto the Cell/B.E. platform and now want to compare the performance with the Cell/B.E. SDK random number generator. I want to know the type of random number generator that is used on the SDK.
SDK Service Administrator says: Which SDK random number function are you using? The ones documented in the Libraries Overview and Users Guide (libraries_SDK.pdf) are inline functions so the code is available in the headers:
# cd /opt/cell/sysroot/opt/cell/sdk/usr/spu/include# ls rand_*rand_0_to_1.h rand_minus1_to_1.h rand_util.hrand_0_to_1_v.h rand_minus1_to_1_v.h rand_v.h
Editor says: Follow this discussion in the forum for lots more information on installing and using random number generators with the Cell/B.E. platform.
Problem: Help! I can't figure out on which hardware SPE threads did my code get executed?
SDK Service Administrator says: You can do this by reading the phys-id files; try the Which physical SPU is the SPE thread running on? thread.
Cool may just be a state of mind: Maybe modifying the software programs to avoid starting a "thermal emergency" in hardware is the key to efficient cooling. Or so thinks the VATech Parallel Emerging Architectures Research Lab. And the National Science Foundation is intrigued enough to award the researchers $350,000 to develop thermal reduction techniques based on program phase analysis. The target is large systems with lots of components close together (think data center). The process is to study how applications produce heat and to ID where heat can be reduced. Kirk Cameron, director of the VATech Scalable Performance Laboratory, has already produced Tempest (Temperature Estimator), a portable freeware tool that lets a user directly measure temperature and graphically correlate the results to source code. The end product of the research will be open source and publicly available.
Honey, get out the Qentanglement photo album: First it was a knock on neutrons' neutrality, now the poor things are responsible for producing real, live images of quantum entanglement, the state where a binary 1 and 0 can be simultaneously maintained. Led by the London Centre for Nanotechnology, researchers from 11 research centers produced images of the magnetic spins inherent in the electrons in the copper atoms of an organic antiferromagnetic material. They situated four of these atomic-level bar magnets in the corners of a square lattice Heisenberg system, then were able to use neutron magnetic scattering to image the four tiny bar magnets: The "snaps" they got were both of classical behavior and of quantum entanglement.
Pressure key to inkjetting living structures: Aerodynamically assisted biojetting -- says its discovers from the The Royal Free and University College London Medical School -- is the "first example of a non-electric field-driven jetting technique" that may someday allow users to "print" living, 3D organs. To produce 3D biological structures via jetting, so far the techniques have been to layer the jet prints, laser-directed cell writing, bio-electrospraying, and cell electrospinning. All these techniques use electric fields to control the nozzles; this can be damaging to the cells. With this new technique, pressure differentials are used to build cellular "scaffolds" from a few micrometers to a few nanometers in diameter.
Yale and NIST build first Qchips: Researchers at Yale and NIST have succeeded in linking qubits (quantum data bits that can exist in both the "0" and "1" state at the same time) on chips like the ones in conventional processors. This could lead to the ability to achieve a more "mass manufacture" of these processing oddities, something no research has led to so far.
TJ Watson strikes again!: Researchers at IBM's T.J. Watson Research Center have successfully fabricated graphene field-effect transistors (FETs) using a single layer of carbon atoms atop a silicon wafer. Graphene is a zero-bandgap semi-metal electronically because its valence and conduction bands overlap, the the researchers were able to open up a small bandgap between them by fabricating the transistor's channel from a nanoribbon of graphene just 20nm wide (having the gap allows you to turn it on and off like conventional transistors). In fact, even though the technique isn't ready for prime time, they are currently working on radio frequency (RF) applications of the technology. According to Phaedon Avouris, an IBM Fellow: "Their performance is not quite as good as carbon nanotubes, but graphene's electron mobility is at least an order of magnitude [10X] greater than silicon."
Blue Gene Safety Considerations: This IBM Redpaper contains important information about electromagnetic compatibility and safety concerning the IBM Blue Gene/L computing system; critical information for anyone working directly with the hardware. Data on electromagnetic compatibility, safety concerns, and recycling and disposal. The data is presented in English, French, Japanese, Korean, German, Portuguese, and French Canadian.
Blue Gene/P Application Development: This Redbook is one in a series for Blue Gene/P that provides such information as a hardware, software, and kernel overview; execution process modes and memory; system calls; an exhaustive section on the applications environment; and lists of header files and libraries, hardware naming conventions, architectural features, porting applications, and mapping.
Too many, too few, standards: At the Power.org conference, a panel of embedded software experts noted that developers for embedded are moving slowly to multicore, but not with the help of parallel programming languages. They mostly use C/C++; according to Green Hills CTO David Kleidermacher, "Any other language is a non-starter." According to another panel member, the biggest challenge will be in figuring out how to partition applications. Other things that may be slowing the transition -- too few (and in some cases, too many) standards.
Nanoblades slice and dice: RPI researchers have built razor-like magnesium nanoblades using oblique angle deposition, a technique formerly thought only to create cylindrical structures like nanorods or nanosprings. The materials could be useful for hydrogen atom fuel storage.
Opportunity gets another opportunity: According to scientist Jim Bell, the power reserves that kept the internals about 50C degrees had just about run out on the Mars rover Opportunity -- thankfully, it's batteries have recharged with the recent clearing of the dust storms. One of the interesting phenomena researchers have encountered is dust. At first, they projected that Mars would cover the rover surfaces with dust within about three months, but it seems the ridges and hilltops of the red planet are a bit more windy than expected.
Should be called the "Leafba": iRobot (creator of the Roomba vacuum cleaner) now has one to help you with gutter chores. The 2.25-inch-tall Looj (why didn't they call it "Leafba" or "Muckba"?) is propelled by a three-stage auger that dislodges and sweeps out dirt, leaves, and other debris; it cleans gutters by driving under gutter straps.
POWER6/System i technical presentations and hands-on labs: This technote provides a Web link to more than 40 presentations and hands-on labs related to POWER6 on System i. (And if you don't want to look at the note, just click this link to access the presentations and labs.)