 | Level: Intermediate Lewin Edwards (sysadm@zws.com), Design Engineer, Freelance
01 Aug 2006 Lewin Edwards presents five engineering tips that are crucially important to successful product engineering, but which are rarely brought up in discussions of engineering practices.
The Internet has exactly one and a half zillion articles (I
counted them) containing lists of things to do and not to do in embedded
systems design. Many of these articles focus on well-understood topics like
switch debouncing and how to estimate maximum stack depth. Dozens of these
articles quote our old favorites in terms of embedded disasters: Therac,
Ariane, and the Mars Polar Lander. This article gives you a
few choice tidbits of advice you won't find mentioned quite so frequently
in other places. It also includes some anecdotes that will show you just how easily your work life can turn into the inspiration
for a Dilbert cartoon if you're not careful. (Having just finished writing
a book on related topics, I'm brimming with anecdotes I didn't have room to
print earlier).
Tip 1: Production versus development
Engineering development procedures have different needs from production procedures; never confuse the two.
Practically nothing can be done effectively in a large company unless a defined formal process for achieving the goal is in place. You might not like this
fact, but it's true. As an engineer, you're working with a head full of
specialized knowledge while you design your widget, but at the other end of
the equation there's more than likely a factory employee or even a
subcontractor following "put flap A in slot B" instructions. The huge,
bureaucratic, and often exasperating machine that turns your schematic into
a finished product consists largely of formal processes that turn
engineering documents into flap A, slot B manufacturing and quality control
documents.
While a product is in the development stage, engineering needs
to be able to make quick and sometimes speculative changes. Accordingly,
processes that support engineering development need to be simple and
streamlined. It needs to be easy for engineering to request a build
exception -- for example, "build 50 of these boards; 25 matching the
schematic, and the other 25 with different resistor values at certain
locations."
On the other hand, when something is in actual large-scale production, you
want to be very leery indeed of making changes. Any change to the bill of
materials -- say, removing a resistor from your board -- that appears minor
from the engineering standpoint affects a huge number of production
processes. The schematic has to be changed, and the bill of materials needs
to be updated so manufacturing knows how many of that resistor it needs to
be requesting to meet each month's demand. Purchasing might need to
renegotiate contracts with the resistor vendor, or return excess
inventory. The pick-and-place programming needs to be altered so the head
won't try and place that part. The in-circuit tester needs to be told not
to check for that component. In some cases, optical inspection equipment
might need retraining to recognize good boards. Depending on where that
resistor is, and what the product is, you might also need to resubmit the
device for type approval -- this can be amazingly expensive and certain
regulatory approvals, even for simple consumer equipment, can take a
flabbergastingly long time (considerably more than a year in some cases).
Here's a (true) story of what happens when the wrong process rules
are applied to a situation. A certain product was required to carry a type
approval number on its injection-molded plastic housing.
This approval number was on the outside of the housing, visible to the
end user. On the interior (invisible) face of the part, standard procedure
was to mold the part number and revision, among other information. Quality
control would look at that internal part number to determine how to stock
these parts as they were delivered; in other words, that internal part number was the
deciding factor as to whether these parts would be received and appear in
the inventory control system, as version A or version B. For regulatory
reasons, this number had to change every time the circuit changed. The
device was in full-scale production and all was well. Then, somebody
started to develop an enhanced version of the product, with a new circuit
board and hence a new approval number. Somehow, the engineering prototype
changes were applied to the live tool for the production plastic parts,
bypassing all the normal safeguards on this sort of thing -- a classic
example of engineering exceptions being injected into the production
process by mistake.
I'll call the old plastic version "A" and the new version "B." Since the
mold was to be changed, it would no longer be possible to make version A.
Unfortunately, there was sufficient inventory of version A in the system
that by the time purchasing noticed a short supply, the mold had already
been converted to version B. Customers were screaming for product, and
marketing people were scurrying to and fro in panic mode.
The engineering lead on the product was called in to solve the problem. A
self-adhesive label with the correct information was suggested, but there
were several complications:
- The existing lettering on the product was raised, not engraved. This
made it hard to ensure that a label would stay on properly.
- There was a screw hole inside the text window on this product.
- The screw hole had to be accessible because the end user would need
to remove it in order to change the battery.
- There wasn't time to get a custom die-cut label specified and
produced -- it was necessary to find a solution that would use a stock Avery
label. This raised an intractable problem with trying to place the label
accurately enough to ensure that whatever was printed on it would not be
obliterated by the screw.
- The manufacturing facility was in another country with a different
native language, introducing delays and training problems.
- The plastics vendor was in a third country and a radically different
time zone.
Eventually -- after numerous video conferences and several full person-weeks
of work -- it was decided to stick a small label over just the approval
number. Labels were printed and sent to the factory. The factory guys ran a
test batch and responded that the process seemed to work. They then asked
respectfully why engineering required them to stick on a label when the
label said the same thing as the plastic. (Cue surprise and amazement in
engineering.)
It turned out -- after the factory sent photos of the parts they had on the
shelf -- that the vendor of the plastic part had updated the inside of the
mold to show the new "B" version, but had forgotten to update the actual
visible text on the exterior, so the parts were externally correct,
despite being labeled and stocked as the wrong thing. This was clearly a
case of two wrongs making a right, but the waste of engineering, marketing,
and manufacturing resources was horrific. Had correct procedures been
followed, this engineering development change would not have been permitted
to impact production tooling.
Tip 2: Testing and verification
Never assume your product will pass regulatory testing in an external laboratory, even if a previous version did pass and you made only minor changes.
This one is somewhat self-explanatory. It really boils down to "measure
twice, cut once." Many sorts of products -- in fact, the vast majority of
embedded systems -- need to have regulatory-type approval testing: FCC, CE,
UL, and so forth. Many very talented people have spent entire lifetimes
trying to develop standardized tests that can be described completely and
reproduced exactly at different sites. Unfortunately, nobody has unarguably
reached that goal yet for a lot of tests (particularly tests that measure
signal propagation: sound pressure, radio, magnetic field strength, and so
on). Worse, many of these tests require expensive and complicated
equipment, and they yield results that can be terribly site-specific and
highly counter-intuitive.
What this means to you is that it can be very hard to establish a high
confidence level in test results generated internally, especially for the
types of tests I highlighted above. This is an annoyance for new project
development because it means you have to wait for official test results before you
can release the product. However, it can be a very serious and costly
problem if you have to revise something that's currently in production.
Say you have a component that suddenly goes obsolete without much
warning or opportunity for an end-of-life buy. (This happens more
frequently than you might think.) For some sorts of type approval, there's
a process where you can submit revised samples and paperwork and start
shipping the modified product immediately as long as you have in-house test
data to support its continued compliance. Great, if your new design really
does pass testing after your paperwork gets through the queue -- but in the
case of some of these "fuzzy" tests, there's a significant chance that it
won't. You're then in a bad position because you've shipped some number of
units that carry the applicable approval logo, but are known not to be
compliant. This often means product recalls, enormous costs, and pain.
It's usually desirable to perform your type-approval tests in-house if
possible, because you won't be subjected to queue time at the regulatory
body (so, if your device fails the first time around, you can investigate
why and modify the design quickly). Sometimes, regulators will be happy
with this approach if you simply let them send around an invigilator to
watch the tests in progress. Usually, however, you need to undergo some
very complicated site certification before your test results can be
considered valid, in which case you need to weigh the cost and benefit of such
certification the same way you might consider the purchase of a large and
expensive piece of test equipment.
A reasonable second-best to doing the testing in-house is to contract to a
certified third-party laboratory to do the testing for you. Although this
is not typically cheap, the third-party labs will usually give you a lot of
detail on any problems they encounter during the process. In some cases,
they'll even work with you to tweak your design, providing intelligent
recommendations on how to pass. This can be more than worth the money if
you don't have a lot of domain knowledge about the particular type approval
being sought.
I don't have a particularly funny anecdote about the problems you can get
into here (at least, not one I can mention in public), but I have seen them
crop up more than once. For example, at a now-defunct company with which
I'm familiar, a product needed to have UL and CE logo testing. Units were
sent off to a third-party lab, and a separate group, not part of the normal
product development engineering crew, was working on the approval project.
After about eighteen months of futzing, the approval crew proudly delivered
an approvable prototype covered in copper tape and bristling with ferrite
beads on every wire. Unfortunately, the device in question was based around
a PC motherboard. PC peripherals have a much shorter life cycle than
eighteen months, so there was literally not a single internal component of
the approvable version that could still be purchased.
Tip 3: Engineering versus marketing
Marketing may be cute, but if you try to give them what they want, you're liable to get bitten.
Remember the movie Gremlins, featuring cute fluffy creatures that turn
into evil demonic beasts if you feed them after midnight? One of the rules
for owning a gremlin (in its fuzzy, lovable form) was "But the most
important thing, the thing you must never forget... no matter how much they
cry, no matter how much they beg, never, never feed them after midnight!"
Marketing and Sales personnel can be very similar to this. Volumes have
been written about the relationship between marketing and engineering, but
I'd like to focus on the line between keeping marketing happy and keeping
your own sanity.
Marketing -- and even more so, Sales -- has a lot of direct customer contact.
As such, they hear a lot of feedback and get a lot of requests for special
products. The problem is that the design and development process in a large
company is like one of those dinosaurs with such a slow nervous system that
it needed a brain in its tail so the head wouldn't eat the other end by
accident. The information that your marketing people are receiving from
customers is mostly only useful in a tactical context -- Fred wants a
product now that has twice as much battery life; Sue needs a version of
your product made of baby-blue plastic to score a specific contract.
Tactical information is of little or no value to engineering, because product
development is so slow (in most hi-tech corporations, typically between one
and two years) that it is inherently strategic. Fred or Sue cannot realistically get what they want in time for it to be useful to
them. Marketing and sales personnel that really know their stuff realize
this and realize that their job is to aggregate opinions like Fred's and
Sue's and provide strategic steering that says, "People need more battery
life" and "There's a better market for this product if we can customize the
colors."
The place where this can bite engineering the hardest is that critical
window just after a specification is defined, but before development has
started. (After development has started, any specification change should be
forced to go through review and approval. Just Say No to feature creep. You
have a perfectly legitimate comeback to Marketing: development has started
already!) Inside the aforementioned window, there's a constant danger that
Marketing is going to mention the product at a focus group and get
"constructive" input, which they will then attempt to back-door into the
specification without proper review. All of a sudden you'll find yourself
at the end of the project with unresolvable quality assurance (QA) problems due to conflicting
goals. Perhaps the real answer here is to start development as soon as you
can after the specification is finalized. That way, you close the window
quickly!
Tip 4: Component engineering
There is no such thing as a true jellybean component, and it is not
possible to develop a specification that will define one.
 |
What's a jellybean?
The term "jellybean" is a piece of component engineering jargon.
Specifically, it refers to a part that has such well-understood properties that
any vendor's version will be a drop-in replacement for any other vendor's
version. The implication is that the part has a few self-evident important
parameters that are so well known that anybody who is making such a part
will conform to those parameters. A second implication is that the part is
available from multiple sources and the only criteria to use in choosing a
vendor are logistic parameters like price and lead time; technical
evaluation is not necessary because the parts are inherently all the same.
For a slightly more formal definition: If you switch from vendor A's
jellybean to vendor B's version of the same thing, your process outputs
will remain within control limits.
A subsidiary meaning of "jellybean" is a part that is cheap and readily
available -- so you can use large numbers of them without worrying about the
cost or availability.
|
|
All companies that exceed a certain critical point in production volume
have an internal part numbering system. This facilitates tasks such as
approving second sources for parts. If a second source appears, the
specification and approved vendor list for the house part number are
updated; the schematics that reference the part number do not need to be
updated.
Unfortunately, there is almost no such thing as a guaranteed 100% drop-in
substitute for a given component, even in those cases where a manufacturer
explicitly claims drop-in replacement for some other company's product. You
really need to examine every application where the part is used before you
can declare that a new version is compatible. Some of this examination will
be cursory (for example, a 10K pullup resistor on a micro input line
probably isn't going to need much analysis), but you need to invest the
effort to think about every application of the part because occasionally
you will run into some very special situations.
Many years ago, I was involved in designing a very compact product that
used a slightly magical (hard-to-find) latching relay in its circuit. This
relay was the largest component in the design, and its size was the driving
factor for the spacing between two circuit boards and hence the entire size
of the product's housing. The relay was specified and purchased without
incident, and over the course of a couple of years, other products were
designed around the same relay.
One day, a rep wandered into the building and offered Component Engineering
a cheaper, pin-compatible version of the same relay. All the specified
dimensions and ratings were the same or better, and the price was much
cheaper; there was much rejoicing. This jubilation lasted right up to the
moment at which the field failure rate on my product went through the roof.
Units were coming back with cracked joints under a ball-grid array (BGA)
chip on one of the boards.
I'll skip over about four months spent looking at boards under an X-ray
machine and analyzing purported manufacturing issues and cut right to the
punch line. It transpired that there was a via (a hole in the board
connecting one layer to another) on the PCB immediately above
the relay, which caused a solder blob to sit slightly proud of the plane of
the board in that spot. On the relay we originally specified, there was a
dimple in the plastic shroud, exactly fitting the position of that via.
That dimple wasn't specified in the drawing for the relay, and it didn't
appear in our 3D model of the part -- it was just one of those serendipitous
things. Unfortunately, the replacement relay had a piece of mold sprue
instead of the dimple, and it protruded in exactly the right place to hit
the via on the board above it. When the unit was assembled, this stump of
sprue hit the via and flexed the upper board slightly. The BGA part in
question lay right on the flexure line and had mechanical stress
transmitted directly to its balls -- leading to premature failure.
 |
Tip 5: Exploding batteries
Even technical people can say, think, and do the darnedest things. Design and plan accordingly, particularly your power supplies.
I'm sure you've heard the old axiom that it is impossible to design an
idiot-proof device, because nature continues to develop better idiots.
Unfortunately, the general idea stated there is applicable even to highly
technical people. With power electronics, my two pieces of advice are:
- Design connectors and switches so that it is impossible to miswire things.
- Recognize that the impossible is inevitable and design the circuit so
that it will survive miswiring.
I have about a million stories on this topic, more's the pity, but here's
one of my particular favorites. I was working with a product (not my own)
that had a lead-acid backup battery. The board had two sets of spade lugs
on it to support two batteries in parallel if necessary. The positive lugs
were mounted vertically; the negative lugs ran horizontally. Investigating
an unrelated problem, I watched a QA technician set up the board. She
plugged the negative lead into one of the negative lugs, then brought the
positive lead from the battery up to the other negative lug on the board!
As it approached, it spat sparks and the technician jumped. Then, gathering
her resolve, she pushed the lead firmly onto its connector. The battery
cables were instantly called upon to deliver several dozen amps; they fried
off their insulation, burned a pattern on the desk, and sat there glowing
sullenly until someone managed to tug them off the battery with a pair of
pliers. The coolest part was the way a perfectly defined smoke shape went
up to the ceiling, preserving the exact shape of the cables as they were
lying on the desk (highly theatrical).
The failures here were:
- Connectors for positive and negative battery straps were not keyed.
- There was no diode on the inputs to protect against this sort of thing.
(There was a reverse battery protection diode, but it was downstream from
the input terminals; the terminals were just hardwired in parallel. Adding
a single diode and altering the wiring a little would have prevented the
problem I described).
- Almost an incidental issue, but the technician's training was obviously
somewhat lacking.
Don't let this happen to you.
In this article, I've covered the absurd, the annoying, and the physically
dangerous, and have given you a few hints for avoiding all of those
sorts of situations. Most of what I've said here is aimed at engineers
working in large corporations. In the next article in this series, I'll give
the same sort of treatment to a few issues that more directly affect
freelancers and engineers working at small corporations.
Resources
About the author  | |  |
Lewin A.R.W. Edwards works for a Fortune 50 company as a wireless security/fire safety device design engineer. Prior to that, he spent five years
developing x86, ARM and PA-RISC-based networked multimedia appliances at
Digi-Frame Inc. He has extensive experience in encryption and security
software and is the author of two books on embedded systems development.
He can be reached at sysadm@zws.com.
|
Rate this page
|  |