At last month's Zend Conference I was fortunate enough to attend Christian Wenz's tutorial on XML and Web services. This covered the use of technologies such as DOM and SimpleXML for working with XML data. As the title of this blog entry suggests, SDO provide a simple way to construct or extend XML documents.
The example below shows an XML schema used by an application which records information regarding quotations people have made (Thanks go to Christian Wenz for the scenario which is from his book entitled, "PHP Phrasebook", SAMS Publishing.). An XML document following this schema will contain a quotes
element containing a number of quote
elements, each of which consists of a phrase
and an author
element and a year
<?xml version="1.0" encoding="UTF-8"?>
<xs:element maxOccurs="unbounded" ref="quote"/>
<xs:attribute name="year" use="required" type="xs:integer"/>
<xs:element name="phrase" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
The code extract below shows how SDO can be used to load the XML schema and a quotes
document and then add a new quote
entry to that document.
1 $xmldas = SDO_DAS_XML::create('./quotes.xsd');
2 $xdoc = $xmldas->loadFromFile('./quotes.xml');
3 $quotes = $xdoc->getRootDataObject();
4 $quote = $quotes->createDataObject('quote');
5 $quote->phrase = $_POST['quote'];
6 $quote->author = $_POST['author'];
7 $quote->year = $_POST['year'];
8 $xmldas->saveDocumentToFile($xdoc, './quotes.xml');
The eight lines of code above do the following:
1. create and configure a new XML Data Access Service (DAS) with an XML schema.
2. use the XML DAS to load and validate an instance document.
3. retrieve the quotes currently contained in the document.
4-7. create a new quote and populate it with information posted from a form.
8. save the quotes (including the new quote) back to the xml file.
The same task takes 14 lines of code in DOM and in my opinion, DOM code is far less intuitive. I found it more difficult to understand the data structure from the code, because of the generic XML vocabulary of the DOM APIs. However, it is worth noting that the XML Data Access Service requires the existence of an XML schema, whereas DOM does not.
Because SDO knows about the XML schema, it lets you access the elements and attributes directly by name. The same is true of SimpleXML, however SDO can also be used to create new documents from scratch, or in this case, new fragments of XML. SimpleXML is limited in this regard because it only has knowledge of an instance document, rather than the schema. Again, it is worth noting that SimpleXML does not require the existence of an XML schema.
The SDO API does not reveal which names correspond to XML elements and which to XML attributes, but still outputs XML which follows the provided XML schema. Personally, I think this is quite neat, because often the choice to use an element or attribute in schema design is arbitrary and down to personal taste. This article
nicely sums up the feelings I had when I created my first ever XML schema.Graham Charters
I've had a few requests for the SDO
slides used at the Zend/PHP Conference
. A pdf version of the presentation can be found here
The slides give an overview of the main concepts behind SDOs, followed by a few scenarios showing how to use them to work with relational and XML data. As I mentioned in a previous blog
, the presentation could have done with a few more scenarios to address SDO's strengths and the questions which arose, but unfortunately time was short. I therefore intend to use this blog to cover these over time.
If you have any comments or questions on the SDO project
, please let us know.Graham Charters
I presented an introduction to SDO
for PHP at the Zend/PHP Conference and Expo a couple of weeks ago. The session was very well attended and there were some good questions at the end. Overall the presentation went well, but I could have done with more scenarios to address SDO's real strengths and the questions which arose. To continue the discussion and help clarify a few things, I thought I'd blog a little about SDO. So here goes...
One question which came up during the presentation was to do with the Relational Database Access Service and whether it optimizes the SQL queries it creates in order to remove redundancies. The short answer is, "no", but that's simply because it doesn't have to; SDO does the optimization for it.
The first thing to note is that SDOs keep a change summary, however, this summary only holds the information required to re-create the data object's original state, not any intermediate states. This is important when it comes to understanding how the updates are optimized.
Consider an example of a contact database (as described in this article
). The contact database table contains a "shortname" column (e.g. "Fred") and a "fullname" column (e.g. "Frederick Flintstone"). Let's assume we've retrieved a set of contact SDOs from the database into the variable
. We then perform the following four modifications:
// Create and set a new contact.
// The change summary records the fact that the new contact was created.
$new_contact = $result->createDataObject('contact');
$new_contact->shortname = 'Bertie';
// Delete the new contact.
// The change summary entry for the new contact is cleared.
// It's as if 'Bertie' never existed.
// Change the name of the first contact to "Sally Smith".
// Sally was previously called "Sally Barker", so the old
// name is stored in the change summary.
$result->contact->fullname = 'Sally Smith';
// Change the name of the first contact to "Sally Jones".
// The intermediate value of "Sally Smith" is not recorded
// in the change summary.
$result->contact->fullname = 'Sally Jones';
The create and delete cancel each other out and the intermediate value of "Sally Smith" is not recorded, so the resulting change summary would only show that "Sally Barker" had changed her name to "Sally Jones". Consequently, when the Relational Data Access Service is asked to apply the changes back to the database, the only update it would see and perform is the name change.
The resulting SQL UPDATE statement would look like this:
UPDATE contact SET fullname=? WHERE id=? AND fullname=?
with a parameter list of ("Sally Jones", primary key value
, "Sally Barker")
Useful links:SDO for PHP ProjectSDO for PHP documentationRelational Data Access Service documentationGraham Charters
Next week sees the Zend/PHP Conference and Expo
in San Francisco. It promises to be a great experience, with a goodly number of big industry keynote speakers, lots of hands-on tutorials, and three parallel session streams.
I'm presenting an introduction to Service Data Objects (SDO) on the first day which should give people the basic 101-level understanding of SDO and how to use it with XML and databases. I've included a few scenarios to hopefully keep it *real*. I'm really pleased to have been scheduled first on the parallel sessions, so I can then relax and enjoy the rest of the conference :-) .
I'm really looking forward to finally getting to meet members of the PHP community who've helped us in the creation of the SDO project
. The support and guidance we've received throughout has been fantastic.Graham Charters
Finally PHP 5.1 Beta 2 is live. I'm very excited about PHP 5.1 which is another big step for PHP.
Some of the key improvements of PHP 5.1 include:
* PDO (PHP Data Objects) - A new native database abstraction layer providing performance, ease-of-use, and flexibility.
* Significantly improved language performance mainly due to the new Zend Engine II execution architecture.
* The PCRE extension has been updated to PCRE 5.0.
* Many more improvements including lots of new functionality & many bug fixes, especially in regards to SOAP, streams and SPL.
* See the bundled NEWS file for a more complete list of changes.
Everyone is encouraged to start playing with this beta, although it is not yet recommended for mission-critical production use.[Read More
I feel like singing something Doctor Suessish:
- "Oh, the fun you will have
When you code up a style
And your friends' screens explode;
Boy, then try to smile!"
Or perhaps a limerick:
- "A Web-whacker hight Diplodonicus
Promised styles to amaze and astonish us.
But when the pages he drew
Were an unrelieved blue,
His smile became risus sardonicus!"
And that just about blew my creativity diode for the day.
What it's about is fighting with CSS to come up with arrangements and placements of large elements (images, sidebars, et cetera) that work in a) wildly different window sizes, and b) wildly different browsers. I've had to descend to making the CSS files dynamic, so they could adjust parameters according to things like the browser information. And, of course, I'm doing this by making them PHP scripts.
Perfectionist though I may be (although I doubt you'd ever be able to tell from my office at home), this is one of those all-too-numerous cases in which "right" is going to have to wait upon "good enough." I'll twiddle the styles until it looks the way I want it to do, and maybe some year I'll come back and do the styles right.
And, as usual, my perfectionism means that the scripts are too prototypical to show anybody; I'm too ashamed of their hacky kludgery. But perhaps one day..
The week before last I was in Montreal, Quebec, Canada for PHP Quebec Conference 2005. One of the presentations I gave was about using PHP scripts instead of normally static files like
robots.txt, and it gratifyingly raised a couple of eyebrows.
robots.txt a dynamic file can have a definite impact on performance, but it lets you answer queries a little more specifically. For example, three nasty types of robots are:
- Those which scan without even asking for
- Those which request
robots.txt and either ignore it or use its information to crawl through areas which it explicitly forbids; and
- Those which look at the various clients defined in
robots.txt and then come back disguised as one with more access.
I'm not sure if there actually are any Type III robots out there, but if I can think of it I'm sure some perp somewhere already has as well.
robots.txt is a dynamic file, it can handle Type III malbots by only responding with a single client's permissions those of the client to whose request it's responding. Type I malbots might be caught through the use of spider traps, and Type II can be identified by putting spider traps into a forbidden area and noting that the malbot requested
robots.txt. The fact that it asked for the file and then fell into a trap in a forbidden area is a dead giveaway. :-)
The nasty part of the whole process is reliably identifying the perp. In these days of cable and DSL providers handing out DHCP addresses, labelling a particular IPA as being a perp is only valid until the address is assigned to someone else. And identifying by client identification (the
User-Agent request header field) is unreliable because it's easily spoofed and often good software is used to do bad things.
Of course, sometimes the client ID is a dead giveaway, such as
User-Agent: EmailSiphon, or the IPA might be in a fixed range known to be assigned to people of debatable virtue, but for the most part a lot of eyeballing is going to be necessary to settle on rules. The area is fallow for enhancing the response scripts to understand client/IPA combinations, requests per time t, and other heuristics. And I'm even working on some of those. :-)
One of the concepts that are most difficult for new Web developers to fully grasp, is just how dangerous it is to trust user input. Just in the last week, there've been around a dozen or so different reports of vulnerabilities found in Web applications - mostly all of them revolve around unchecked user input. Because of PHPs dominance in the Web application development world, many of the vulnerable applications were ones written in PHP, which hurt PHPs security track record, even though its not the language which is at fault (the same applications, written in any other language would have suffered from the same vulnerabilities).
The challenge of validating user input is not a simple one. The key to meeting this challenge is attention to details combined with knowledge.
At the end of the day - nothing other than the developer herself can ensure that an unsanitized piece of data finds itself as a part of a filename, and sometimes even a database query, which is why paying close attention to what goes where is important.
But that's actually not enough. Few people fully understand just how little of the Web environment can be trusted. Nowadays, most developers know that you cannot rely on GET or POST variables
to have the values you expect (even if they're inside hidden form values) - but how many of them know that you cannot trust any $_SERVER variable that begins with HTTP (e.g., $_SERVER["HTTP_REFERER"])? These can be fully (and easily) spoofed by the remote users, and must not be trusted. Same goes with cookies - they may not be easily visible for or editable by the average user, but as they're saved on the client-side - a 'malicious hacker', or even down to earth script kiddies, can easily set them to their heart's content. And how about $SERVER_NAME ($_SERVER["SERVER_NAME"]), which actually depends on the Host: header sent by the remote user, and can therefore be spoofed under certain circumstances?
Paraphrasing agent Mulder's immortal words, 'Trust Nothing'.
Interesting links:PHP Security ConsortiumMore on trusting (or not trusting) user inputZeev Suraski
At the suggestion of a friend, I added yet another twist to my watermarking script: rotating the watermark a random amount each time.
While cool in theory, it turns out that the practice is a little more difficult. For one thing, PHP's ImageRotate() function creates a new image resource sized large enough to hold the entire rotated image. Even though I'm using a circular watermark, the image it's in is rectangular. So if you rotate a rectangle and put a bounding box around it, you get a lozenge wider and taller than the original one.
For another thing, getting ImageRotate() to do the Right Thing with the transparency is turning out to be a hassle. I'm working with one of the developers of the PHP image library on figuring that one out; it's unclear whether the issue is my boneheadedness or an actual bug in PHP.
When it's finished, or at least far enough along, I'll post more technical details
A few months ago, I talked to one of the leading CMS industry analysts and he mentioned how surprised he was to find the dominance of PHP in this market.
The available CMSs are anything from open-source freeware to proprietry to supported open-source.
Some familiar names (in no particular order) are Drupal
and of course, the popular PHP-Nuke
If I'd go through the whole list of great CMS systems I know in PHP, this post would start getting boring.
As I am asked very often what CMS I recommend, and with all the excellent packages out there it's hard to do so, I decided to post a link to The CMS Matrix
which is a nice site that provides some initial matrix comparisons between the various CMS. Of course, at the end of the day you'll have to install some of them and actually try them to see if they suit your needs.Andi Gutmans
One of the most popular articles I've ever written has been about Preventing Image 'Theft'
. I wrote it several years ago, but people are still reading it (evidently) and contacting me about it.
I've recently had call to use this sort of thing myself, and what I've got now is rather more advanced than described in the original article. For instance, now I transparently intercept images 'going offsite' and replace them with a correctly-sized blank box containing text about the copyright. And for images I want to be basically previewable but not really usable (if people want a usable version they need to contact me) I watermark 'em.
Watermarking a digital image means adding information to the existing pixels. It can be involve adding invisible information for identification purposes, such as the Digimarc
technique, or it can be to to visibly degrade the quality of the image, perhaps with a message. I've used both, but it's the latter mechanism I needed most recently.
Watermarking for degradation is an interesting challenge. There's nothing you can do short of actually destroying data to keep a really
determined perp from gatting past your defences, but you can make it pretty difficult.
For instance, the degradation watermarking I set up recently uses a watermark with built-in noise, so there isn't a single same-colour region that can be undone. A perp would have to figure out what the watermark pixels are, pixel by pixel, in order to create a mask to remove it. And since I'm using it on dense JPEG images, that's a little difficult.
In addition, the watermark is repeated across the entire image, and not
at regular intervals. Each one is jigged a bit at random, so no two previews of the same image should be identical. (Well, modulo repeats of the random sequence.) This keeps a perp from figuring that the watermark is repeated at fixed intervals, and using that to help remove it. Of course, if it accesses multiple previews of the same image, it can eventually probably
figure out the pixel settings by comparing them. But I suspect that would be a major chore, too.
Both the replacement-with-notice and the watermarking are done in realtime as part of Apache's response to a Web request. The replacement is very low impact indeed, and doesn't cause performance to deteriorate noticeably, but the watermarking involves actual image manipulation, of megapixel images, and so can
slow things down. So you can use the former on almost any server, but the latter really needs a machine with a lot of oomph to keep visitors happy.
You can, of course, get rid of the performance impact by watermarking the images ahead of time, and sending the results normally. That decreases the random factor, though,
and possibly makes the watermark more easily removed.TANSTAAFL
. Security always costs something
, even if it's only (!) convenience.Ken Coar
I use PHP extensively on almost all of my Web pages. (Just about the only ones that don't use it are on servers that don't have it installed. Mine, of course, all
have PHP installed.) In a lot of cases I store the real content in a database, and PHP pulls it out and formats it; sometimes the content is built directly from scanning files and directories.
This has advantages and disadvantages, of course. On the pro side, the content is visible immediately. On the con side, the content is visible immediately. :-) Having every page be interpreted has definite performance implications and not just on the origin server. Unless care is taken, the dynamic nature of the page can result in it rarely or never being cached, which hits you in the system and
the network. If you pay for your bandwidth by the byte, that can be a huge deal.
Like Sam, I use a mix of languages. For Web pages, I use PHP almost exclusively; for standalone apps I use C, PHP, bash
, or Perl, as seems appropriate. In general I use Perl, but if I'm frobbing a database, chances are I'll use PHP unless there's something in CPAN
that argues for Perl.
When I first got involved with blogging, in December 2002, I decided to write my own software from the ground up. In PHP. Of course, I'm a bit-twiddler, and did it that way so I'd understand what this 'blogging' thing was all about and how it worked. (Almost as soon as I brought it online, Sam challenged me
to take the next step. :-) The result can be seen at my blog
I hope to go into this subject in more detail in the future, and I definitely
intend to say some things about how I'm using PHP to automatically guard my Web servers, but this is just an introductory note after all.
Different programming languages excel at different things. I employ a number of different programming languages, and make my choice based on the task at hand. My presentations tend to be powered by Perl. My weblog
is powered by Python. But my private applications tend to be written in PHP.
The developerWorks PHP Resources site
will cover PHP from a perspective that may be new to some PHP users, covering such topics as Access an enterprise application from a PHP script
The developerWorks PHP Blog
, however, will often touch on topics that may be new to enterprise (typically Java) programmers.
My first post on this site will cover an application I wrote in about an hour to cover a specific need. It breaks a number of "rules" that guide the development of scalable enterprise applications - in particular it does not separate presentation from content. Consequently, this application does not contain any reusable components that will ultimately find their way into a Customer Relationship Management system. I'm entirely OK with that.
Still, the application is centrally managed, requires zero deployment, is accessible everywhere and portable across a wide range of client operating systems and browsers. All good traits to have.
The application is a vocabulary test. My daughter weekly gets a list of words and their definitions to study. At the end of the week, there is a test where she needs to match the words with the definitions. At first, I helped her study using flash cards, but this seemed like an obvious candidate for automation.
The UI for this application was obvious... I precisely mimicked the layout of the test
The application itself consists of a single source file
and a single data file
Nothing in this application couldn't have been done with JSP. However, to do so would have required additional effort, effort that does not result in a more functional end result.
And it wouldn't have been as much fun.
Not everybody may have the diverse set of skills required to pull together such an application in one sitting. However, a quite larger set of people can copy such an application and successfully make meaningful changes to it.
In my experience, that's how people tend to learn languages such as PHP. Some refer to this as Progressive Disclosure
Foremost, we would like to thank the developerWorks editors for giving us the opportunity to christen this blog.
Wed like to begin by saying how excited we are with IBMs announcement of embracing PHP. In a way, this brings closure to our romance with IBM which has its roots in the very birth of PHP, as we know it, about eight years ago. Those who read the fine print at the bottom of the PHP 3.0 CREDITS file
, may remember our joint thanks to Michael Rodeh, who taught our Compilation Techniques course in the Technion, Israel Institute of Technology, and also supervised our university project that was later to become PHP. During this period he played a decisive role in the way PHP history unfolded and today happens to be the Director of the IBM Research Labs in Haifa, Israel. As such, some of the first PHP brainstorming sessions happened within the corridors of IBM and therefore, this turn in events seems very natural.
Needless to say, the announcement doesnt only close a loop to a story that began eight years ago, but opens a brand new and exciting landscape. A company the magnitude of IBM putting its know-how and experience behind PHP is something PHP enthusiasts have been awaiting for a long time. We all followed what happened when IBM embraced Linux. Similar to how Linux was a few years ago, PHP today is a great technology that millions of people already use, and that is growing rapidly. However, it has been lacking the necessary endorsement from a serious industry player such as IBM in order to penetrate the mind share of many corporations and the software industry as a whole. We are confident that just like what happened with Linux, IBMs endorsement will lead to a whole new ballgame for this powerful technology.
In order to fully realize the significance of IBMs involvement in PHP, its important to note that it will not sum up in just a marketing stamp of approval. IBMs support will be backed by contributions of technology, and PHP will benefit from IBMs position in the forefront of the software industry. If previously PHP had to adapt itself to standards written with other languages in mind, the day where standards will evolve around PHP is right around the corner.
At this point you may wonder what exactly this blog will contain. Wonder no further. This shared blog will host thoughts and ideas regarding PHP and Web development at large, coming from people with first hand experience in development and deployment of open source technologies, primarily PHP and Apache. In addition to us, you can expect to read thoughts from evangelists including Sam Ruby, Ken Coar and Mark Pilgrim. In addition, this developerWorks section
will host a variety of white papers and articles dealing with a wide range of PHP-related topics, from technical documents and all the way to roadmap discussions. It should be quite interesting around here!
To conclude our maiden post, wed like to extend a warm gratitude to the PHP community, including (but not limited to) the developers, documenters, bug fixers, quality testers, and anybody else who has contributed to the PHP project throughout the years. Each and every one of you has a share in IBMs announcement. PHP would have never been what it is today without you!Andi Gutmans & Zeev Suraski