Add XML structure to the resume

Put HR-XML, stylesheets, formatting objects, and namespaces to work

In this article, explore how XML lends structure to the storage of data relating to a résumé. Using elements from the HR-XML and Open Applications Group Integration Specification (OAGIS) projects, build data and stylesheet files to generate an example résumé as a PDF file using Apache Formatting Objects Processor (FOP). Particular points of interest include handling multiple namespaces and hints about how to add decoration to the basic PDF through the stylesheet.

Share:

Colin Beckingham, Writer and Researcher, Freelance

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields including banking, horticulture, horse racing, teaching, civil service, retail, and travel and tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming, VoIP, and voice-control applications on Linux. You can reach Colin at colbec@start.ca.



01 February 2011

Also available in Chinese Russian Japanese

You can quickly and easily compose a résumé in a What You See Is What You Get (WYSIWYG) editor and with a couple of mouse clicks translate it to a PDF file for transmission to a prospective employer. So why put the extra effort into storing the data in an XML file first? Complicating the process with extra steps can introduce errors, so you need a good reason for the additional effort.

Frequently used acronyms

  • PDF: Portable Document Format
  • URL: Uniform Resource Locator
  • W3C: World Wide Web Consortium
  • XML: Extensible Markup Language

The justification lies in the separation of data from presentation and benefiting from the structure that a backend such as XML brings. When the data becomes more complex and output requirements more varied, XML offers accuracy, portability, and adaptability. Data enthusiasts try to store all data in a database of some kind. Whether a complex data structure is overkill for a plain résumé depends on your needs and how often the data changes.

Many employers react negatively to an incomplete résumé. Structure is good—elements act as reminders of what must appear in the document. You can use XML on a wide variety of platforms, and one single XML data backend can provide a résumé (short version) or curriculum vitae (long version) according to the employer's requirements simply by using a different stylesheet.

The process

The process described here uses Apache FOP (see Resources) to generate a PDF file from an XML data file using an Extensible Stylesheet Language (XSL) stylesheet. The stylesheet controls the presentation of the data and follows the standard format as described in the W3C document (see Resources).

You can store the résumé data in plain XML format using your own unique schema. But a standard format such as HR-XML has advantages. If you have special requirements not covered by the standard, it is a simple matter to take what you need from the standard and extend it by creating a personal namespace for the additional material.

HR-XML and OAGIS

HR-XML and OAGIS (see Resources) are two open source projects that combine to offer the kind of structure that many large organizations consider important in human resources and business contexts.

HR-XML is the result of much thinking by specialists in the field of human resources. These specialists view the issue from an employer's point of view, so the schema contains the scaffolding for a lot more information than is required at the interview stage. Managing people is a complex business. From determining staffing requirements through recruiting, background checks, competency assessment, and hiring to ongoing time reporting and compensation, benefits management, performance goals, and assessment, HR-XML offers schemas to cover them all.

While HR-XML is dedicated to the human resources industry, OAGIS looks at cross-industry data exchange standards. It deals with ideas and concepts common to industries in general but leaves the industry-specific elements to specialist groups from within the industry that have the expertise.

HR-XML is careful not to reinvent ideas already created by the broader OAGIS set of elements—it simply adds new material in its own namespace. The result is a schema based on what to store given the human resources context (elements) and how to store it (attributes, hierarchy), so why not benefit from their work? To get more detail about the schema that HR-XML uses, download it or view it online at the web site (registration required). In the case of the version 3.1 download, here is a path to the documentation related to the Candidate element:

.../HR-XML-3_1/org_hr-xml/3_1/Documentation/Guidelines/ch21.html#id564065

Online, a good starting point is at the following URL:

http://ns.hr-xml.org/schemas/org_hr-xml/3_1/Documentation/ComponentDoc/Candidate-noun.php

The data file

Listing 1 is an example of a basic data file—a fragment from a larger file—that employs the Candidate element and some of its children.

Listing 1. Example data file
<?xml version="1.0" encoding="UTF-8"?>
<hr:Candidate 
  xmlns:hr="http://www.hr-xml.org/3" 
  xmlns:ccts="urn:un:unece:uncefact:documentation:1.1" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:oa="http://www.openapplications.org/oagis/9">
  <hr:DocumentID>000000001</hr:DocumentID>
  <hr:CandidatePerson>
    <hr:PersonName>
      <hr:FormattedName>Blimpo Togwer</hr:FormattedName>
      <oa:GivenName>Blimpo</oa:GivenName>
      <hr:FamilyName>Togwer</hr:FamilyName>
    </hr:PersonName>
    <hr:Communication>
      <hr:ChannelCode>Mail</hr:ChannelCode>
      <hr:Address>
        <oa:AddressLine sequence="1">5555 Yellow Brick Road</oa:AddressLine>
        <oa:AddressLine sequence="2">RR #1</oa:AddressLine>
        <oa:CityName>Lesser Village</oa:CityName>
        <oa:CountrySubDivisionCode>KKK</oa:CountrySubDivisionCode>
        <hr:CountryCode>XX</hr:CountryCode>
        <oa:PostalCode>AAA BBB</oa:PostalCode>
      </hr:Address>
    </hr:Communication>
  </hr:CandidatePerson>
</hr:Candidate>

This code fragment, which stands as a complete but rather simple example, shows a number of details:

  • The XML declaration is followed by the root element Candidate.
  • Candidate here has the meaning that is defined in the hr namespace signified by that prefix.
  • The hr namespace is associated with the label http://www.hr-xml.org/3.
  • Each of the elements is preceded by a namespace label that removes all ambiguity as to what the element represents.
  • Some of the elements are defined in the hr namespace (HR-XML) and some in the oa namespace (OAGIS). They are mixed and matched as required.
  • CountryCode requires a two-character code such as US or FR.
  • CountrySubDivisionCode represents a state, province, department, or other major administrative region within a country.
  • Hierarchy is important. For example, to get to the city name, the path involves: Candidate > CandidatePerson > Communication > Address.

Use the online schema resource from HR-XML to get the names of additional elements such as CandidateProfile that allow you to add more information such as CandidateObjective, EducationHistory, PublicationHistory, Certifications, and so on.

Namespaces

Namespaces are a structure that addresses possible ambiguities when giving names to XML elements. See Resources for more information about getting started with namespaces. They impose good discipline; however, they require careful use to ensure that the correct data is retrieved, otherwise errors might occur—many of them silently. For example, if you refer to your education section and do not specify the namespace, there is a good chance that because the data cannot be found the processor prints nothing at all in that section, without warning.

Editing

To make changes to the XML files, because both the data file and the stylesheet are pure XML, use your favorite XML or text editor. For example, get Eclipse (see Resources), open a new project, copy and paste the code from Listing 1 into a new document, edit, and you are well on your way to a structured résumé data file.

The stylesheet

For a selection of tutorials about how to build and use stylesheets, see the W3C XSL web page (see Resources).

Listing 2 is an example of a basic stylesheet in the résumé context.

Listing 2. Example stylesheet
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:oa="http://www.openapplications.org/oagis/9"
  xmlns:hr="http://www.hr-xml.org/3">
<xsl:output method="xml" indent="yes"/>
  <xsl:template match="/">
    <fo:root>
      <fo:layout-master-set>
        <fo:simple-page-master master-name="page1">
          <fo:region-body margin="1in" />
        </fo:simple-page-master>
      </fo:layout-master-set>
      <fo:page-sequence master-reference="page1">
        <fo:flow flow-name="xsl-region-body">
          <fo:block text-align="right" font-size="12pt" font-family="serif">
            DocumentID: <xsl:value-of select="hr:Candidate/hr:DocumentID" />
          </fo:block>
          <fo:block>
            <fo:leader leader-pattern="dots" leader-length="100%" />
          </fo:block>
          <fo:block font-size="12pt" font-family="serif">
            Curriculum Vitae - Résumé
          </fo:block>
          <fo:block font-size="20pt" font-family="Arial" font-weight="bold">
            <xsl:value-of 
              select="hr:Candidate/hr:CandidatePerson/hr:PersonName/hr:FormattedName" />
          </fo:block>
          <fo:block font-size="12pt" font-family="serif">
            Contact
          </fo:block>
          <xsl:for-each 
        select="hr:Candidate/hr:CandidatePerson/hr:Communication[hr:ChannelCode='Mail']">
            <fo:block font-size="10pt" font-family="Arial" font-weight="normal">
              <xsl:value-of select="hr:Address//oa:AddressLine[@sequence=1]" />, 
                <xsl:value-of select="hr:Address/oa:AddressLine[@sequence=2]" />
            </fo:block>
            <fo:block font-size="10pt" font-family="Arial" font-weight="normal">
              <xsl:value-of select="hr:Address/oa:CityName" />, 
                <xsl:value-of select="hr:Address/oa:CountrySubDivisionCode" />
            </fo:block>
            <fo:block font-size="10pt" font-family="Arial" font-weight="normal">
              <xsl:value-of select="hr:Address/oa:PostalCode" />, 
                <xsl:value-of select="hr:Address/hr:CountryCode" />
            </fo:block>
          </xsl:for-each>
        </fo:flow>
      </fo:page-sequence>
    </fo:root>
  </xsl:template>
</xsl:stylesheet>

The instructions in Listing 2 control how the data from Listing 1 is displayed on the page:

  • The document needs four different namespaces. All references to data explicitly state the namespace at each node, avoiding confusion that can arise when allowing the default namespace, where no prefix is used.
  • The template match is a forward slash (/), indicating that searches start at the root element of the data document.
  • The stylesheet specifies a layout master set that defines pages in the overall document and then a page sequence element for individual pages.
  • Each page requires a series of block elements that instruct the processor where to place an item on the page and how to display it, including font and font size.
  • The stylesheet uses for-each statements to iterate over groups of elements. For example, there might be multiple communication channels: mail, email, phone, and so on. Using square bracket ([]) notation, you can specify a filter—in this case, the stylesheet filters for Mail items only.

Output using Apache FOP

Apache FOP uses the data file together with the stylesheet to produce the PDF. FOP is not limited to PDF output—you can also generate Rich Text Format (RTF), Printer Command Language (PCL), PostScript (PS), Advanced Function Presentation (AFP), Tagged Image File Format (TIFF), and Portable Network Graphics (PNG), as well as plain text files.

Getting and installing FOP is as simple as downloading and unpacking the binary version (see Resources). FOP is then ready to run from the downloaded location.

Here is an example command-line instruction to fop. In this case, the data, style, and configuration files are located in one directory. With that directory as the working directory, you call fop from its own location:

/path/to/fop/fop -c fop.xconf -xml exx.xml -xsl exx.xsl -pdf exx.pdf

This instruction tells the fop executable file to do the following:

  • Look for configuration information in the fop.xconf file
  • Look for data in the exx.xml file
  • Use the exx.xsl stylesheet to produce the exx.pdf output

The configuration file is important and appears as shown in Listing 3.

Listing 3. FOP configuration file
<?xml version="1.0"?>
<fop version="1.0">
  <base>.</base>
  <source-resolution>72</source-resolution>
  <target-resolution>72</target-resolution>
  <default-page-settings height="11in" width="8.26in"/>
  <renderers>
    <renderer mime="application/pdf">
      <filterList>
        <value>flate</value>
      </filterList>
      <fonts>
        <auto-detect />
      </fonts>
    </renderer>
  </renderers>
</fop>

In this configuration, the filterlist element controls how objects are compressed in the PDF output, and the fonts element instructs the processor to use fonts that are already known to the operating system.

Figure 1, which is a screen capture from a PDF reader of the output from the earlier listings, shows the result of running the transformation.

Figure 1. The PDF output
Screen capture of the PDF output with a document ID, name, and contact information

PDF decoration

The stylesheet can contain simple decoration items:

  • Rows of dots appear in the example, and the following code generates them:
    <fo:block>
      <fo:leader leader-pattern="dots" leader-length="100%" />
    </fo:block>
  • You can make blank lines appear using the techniques included in Nicholas Chase's developerWorks Tip (see Resources) or with the following code:
    <fo:block>&#160;</fo:block>

See the FOP documentation (see Resources) for further possibilities including borders, margins, padding, color, images, and tables.

Conclusion

Generating a résumé or curriculum vitae from an XML file involves a little more work but imposes a disciplined structure that helps ensure that the document is as complete as is necessary.

Creating documents using a text editor is still a valid possibility in the simple situation. Alternatively, using an XML file as a common source of information for different versions of a résumé suits the more intricate data source. The choice becomes one of "Is it more efficient to maintain multiple copies of a document together with markup in an editor or to maintain multiple stylesheets that operate on the same data?" Both tend to the same conclusion but use different paths.

Resources

Learn

  • Principles of XML design: Use XML namespaces with care (Uche Ogbuji, developerWorks, Apr 2004)L Read about some of the difficulties of working with namespaces and minimize problems as you incorporate namespaces into XML design.
  • Tip: Control white space in an XSLT style sheet (Nicholas Chase, developerWorks, Nov 2002): Understand whitespace and space stripping in transformation and create the document you want.
  • Improve your XSLT coding five ways (Benoît Marchal, developerWorks, Jan 2001): Add five techniques useful in transformations: using CSS with XSL stylesheets (including HTML entities), incorporating client-side JavaScript, working with multiple input documents, and using XSLT to generate stylesheets automatically.
  • The Open Applications Group Integration Specification (Michael Rowell,developerWorks, Jun 2003): Learn how OAGIS works as a standard.
  • Apache FOP: Learn more about this print formatter driven by XSL formatting objects (XSL-FO) and an output independent formatter.
  • Apache FOP Compliance Page: Visit this page to explore the formatting possibilities in a FOP document.
  • HR-XML: Check out an HR-XML implementation tool.
  • Open Applications Group: Go to the website for this standards development organization that builds process-based business standards for eCommerce, Cloud Computing, Service Oriented Architecture (SOA), Web Services, and Enterprise Integration.
  • OASIS: Learn more about the Organization for the Advancement of Structured Information Standards.
  • XSL: Delve in to this family of recommendations for defining XML document transformation and presentation.
  • More articles by this author (Colin Beckingham, developerWorks, March 2009-current): Read articles about XML, voice recognition, XHTML, PHP, SMIL, and other technologies.
  • XML area on developerWorks: Get the resources you need to advance your skills in the XML arena.
  • My developerWorks: Personalize your developerWorks experience.
  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.
  • developerWorks technical events and webcasts: Stay current with technology in these sessions.
  • developerWorks on Twitter: Join today to follow developerWorks tweets.
  • developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
  • developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=620105
ArticleTitle=Add XML structure to the resume
publish-date=02012011