Harness the power of XML to Open Financial Exchange files

Enabling XML parsing for non-XML OFX files

The ongoing task of bookkeeping is made somewhat easier when financial institutions allow customers to download files for import into their chosen accounting package. These files can pose a problem for financial programmers, however, because they are frequently only available in Open Financial Exchange (OFX) format, which is not XML compatible. Discover how to use PHP with string substitution to make OFX files XML compliant. Thus, you harness the power of XML parsing and deconstruction to OFX files and make financial programming more precise.

Share:

Colin Beckingham, Researcher, Freelance

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields, including banking, horticulture, horse racing, teaching, civil service, retail, and travel/tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming and voice-control applications on Linux. You can reach Colin at colbec@start.ca.



17 March 2009

Also available in Chinese Russian Japanese Portuguese

My bank provides me as a financial programmer and bookkeeper with a very helpful service: I can download a small file that lists the transactions in one of my accounts during a defined time period. The file contains the account name and number; whether it is a chequing, savings, or other account type; various bits of financial institution information; my balance information; the date and time of my request; and a complete listing of transactions in the account, showing whether they are a deposit or withdrawal, the amount, and the date and time of the transaction. The bank has done much of my data entry for me: All I have to do is programatically transfer it into my local record keeping, boosting the accuracy and easing the reconciliation process.

Frequently used acronyms

  • CSV - Comma-separated values
  • GNU: GNU's Not UNIX®
  • HTML: Hypertext Markup Language
  • W3C: World Wide Web Consortium
  • XML: Extensible Markup Language

When I log in to my bank's Web site and go to download this file containing the transactions relating to my account, I am offered a choice between a CSV file or a file for Quicken, Intuit QuickBooks, Microsoft® Money, or Simply Accounting. For various reasons, I have chosen not to use one of the mainstream accounting programs but my own cloud computing application, so I either have to use the plain CSV option or deconstruct one of the other downloads.

Although the CSV option is fine for a fast download into a database or a spreadsheet, deconstructing the other option has some definite advantages. All the other options are in fact the same file but with a different file name extension to suit the package. The file is a plain-text document in OFX format (see Resources for links to more information), a structure designed to be informative and accurate when dealing with banking and other financial transactions. On the whole, it is more professional to use the additional information OFX provides to ensure that the transactions are correctly interpreted—something you do not get with CSV.

The problem is that OFX version 1, although it appears at first glance to be in XML format, in fact only approximates XML. Attempting to read the file directly into an XML parser results in an error. If the file were in true XML format (as is OFX version 2), you can use the power of built-in functions in a programming language like PHP to parse the information quickly and easily. However, my bank—and probably many others, as well—only offers OFX version 1.xx files.

An example

In Listing 1 is an as-downloaded example of a version 1 file.

Listing 1. OFX file as downloaded
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:TYPE1
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE


<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
<MESSAGE>OK
</STATUS>
<DTSERVER>20090211000000[-5:EST]
<USERKEY>--NoUserKey--
<LANGUAGE>ENG
<INTU.BID>00002
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>XXXX - 20090211000000
<STATUS>
<CODE>0
<SEVERITY>INFO
<MESSAGE>OK
</STATUS>
<STMTRS>
<CURDEF>CAD
<BANKACCTFROM>
<BANKID>000000000
<ACCTID>000000
<ACCTTYPE>CHECKING
</BANKACCTFROM>
<BANKTRANLIST>
<DTSTART>20090209
<DTEND>20090209000000[-5:EST]
<STMTTRN>
<TRNTYPE>DEBIT
<DTPOSTED>20090209000000[-5:EST]
<TRNAMT>-98.91
<FITID>00000000000000000000000000
<NAME>GROCER A & Z
</STMTTRN>
<STMTTRN>
<TRNTYPE>CREDIT
<DTPOSTED>20090209000000[-5:EST]
<TRNAMT>308.86
<FITID>00000000000000000000000000
<NAME>DEPOSIT    000000
</STMTTRN>
</BANKTRANLIST>
<LEDGERBAL>
<BALAMT>256.94
<DTASOF>20090209000000[-5:EST]
</LEDGERBAL>
<AVAILBAL>
<BALAMT>256.94
<DTASOF>20090211000000[-5:EST]
</AVAILBAL>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>

Listing 2 shows an improved version with some helpful additions and changes. (I marked some additions and changes in bold highlighting along with some indentation to make it more human-readable.) This file contains only one account with two transactions: a withdrawal (debit) and a deposit (credit).

Listing 2. XML-ized OFX version

Click to see code listing

Listing 2. XML-ized OFX version

<START><OFXHEADER>100</OFXHEADER><DATA>OFXSGML</DATA><VERSION>102</VERSION><SECURITY>TYPE1</SECURITY><ENCODING>USASCII</ENCODING><CHARSET>1252</CHARSET><COMPRESSION>NONE</COMPRESSION><OLDFILEUID>NONE</OLDFILEUID><NEWFILEUID>NONE</NEWFILEUID></START>
 and remove blank line
<OFX>
 <SIGNONMSGSRSV1>
  <SONRS>
   <STATUS>
     <CODE>0</CODE>
     <SEVERITY>INFO</SEVERITY>
     <MESSAGE>OK</MESSAGE>
   </STATUS>
   <DTSERVER>2009021100000000[-5:EST]</DTSERVER>
   <USERKEY>--NoUserKey--</USERKEY>
   <LANGUAGE>ENG</LANGUAGE>
   <INTU.BID>00002</INTU.BID>
  </SONRS>
 </SIGNONMSGSRSV1>
 <BANKMSGSRSV1>
  <STMTTRNRS>
   <TRNUID>XXXX - 20090211000000000</TRNUID>
   <STATUS>
    <CODE>0</CODE>
    <SEVERITY>INFO</SEVERITY>
    <MESSAGE>OK</MESSAGE>
   </STATUS>
   <STMTRS>
    <CURDEF>CAD</CURDEF>
    <BANKACCTFROM>
      <BANKID>000000000</BANKID>
      <ACCTID>0000000</ACCTID>
      <ACCTTYPE>CHECKING</ACCTTYPE>
    </BANKACCTFROM>
    <BANKTRANLIST>
      <DTSTART>20090209</DTSTART>
      <DTEND>20090209000000[-5:EST]</DTEND>
      <STMTTRN>
        <TRNTYPE>DEBIT</TRNTYPE>
        <DTPOSTED>20090209000000[-5:EST]</DTPOSTED>
        <TRNAMT>-98.91</TRNAMT>
        <FITID>00000000000000000000000000</FITID>
        <NAME>GROCER A &amp; Z</NAME>
      </STMTTRN>
      <STMTTRN>
        <TRNTYPE>CREDIT</TRNTYPE>
        <DTPOSTED>20090209000000[-5:EST]</DTPOSTED>
        <TRNAMT>308.86</TRNAMT>
        <FITID>00000000000000000000000000</FITID>
        <NAME>DEPOSIT    00000000</NAME>
      </STMTTRN>
    </BANKTRANLIST>
    <LEDGERBAL>
      <BALAMT>256.94</BALAMT>
      <DTASOF>20090209020000[-5:EST]</DTASOF>
    </LEDGERBAL>
    <AVAILBAL>
      <BALAMT>256.94</BALAMT>
      <DTASOF>20090211000000[-5:EST]</DTASOF>
    </AVAILBAL>
   </STMTRS>
  </STMTTRNRS>
 </BANKMSGSRSV1>
</OFX>

Suppose you need the current balance. You can simply search the code in Listing 1 for the string <BALAMT> and report that associated number. But, in fact, two balances are reported, and they might well be different. The first is the ledger balance; the second is the available balance, which is the ledger balance with any holds taken into account. In practical terms, this means that you might have less money to draw on than the ledger reports. Finding the right balance amount through a simple text search process is likely to be complicated.

This is where XML has advantages, as it removes the ambiguity and simplifies the search process. Using the code in Listing 2 code, suppose you remove the <START>...</START> section and save the rest in a local file called sample.xml. This leaves the root element as <OFX>. Looking for the current ledger balance, you can rigorously specify which balance you want. Listing 3 shows a few lines of PHP code that simply ask for the balance.

Listing 3. Simple code to get the ledger balance
<?php
// test ofx
$xmlstr = file_get_contents('sample.xml');
$xml = new SimpleXMLElement($xmlstr);
echo $xml->BANKMSGSRSV1->STMTTRNRS->STMTRS->LEDGERBAL->BALAMT."\n";
?>

In this code, you get the text contents of the file and load it into an XML object with a call to the SimpleXMLElement() function. For those unfamiliar with the syntax involving the -> operator, it allows you to point to a specific branch of the whole tree. In PHP, pointing the variable $xml at the entire character string effectively points you at the root of the XML tree. Ignoring the root element (in this case, <OFX>), you then follow a path up the tree from branch to smaller branch until you can go no farther. The data you need lies at the end of the smallest twig.

In this case, because it specifically points to the element structure ...->LEDGERBAL->BALAMT, running this script produces the following answer:

256.94

The programming is concise and unambiguous. With time saved, you can move onto other, more important things like the meaning of life and the Grand Unified Theory.

Conversion criteria and goals

Using PHP

Please note that in my use of PHP in this article, I use PHP at the command line, and output goes to a terminal window. If you use your PHP in a browser context, you might prefer to use HTML breaks (<br />) where I use newlines (\n), if this is appropriate.

So, if you are to enjoy the rigor and ease of processing with XML, your raw downloaded file has to be converted. PHP (other programming languages provide equivalent processing capability) has a set of functions that can help with this. You first have to examine the file (using the code in Listing 1 and Listing 2) for patterns so that the processing is efficient and accurate.

Here are some of the adjustments required in this case:

  • Header section. The first nine lines and the following blank line, which are important in an OFX context, rarely change. It is helpful to check that you are dealing with a version 102 file, but apart from that, no useful information is provided. It is possible, as indicated, to put these items into their own element, in which case a new root element will be required to encompass both the <START> and <OFX> elements.
  • Missing terminators. The major issue is that some of the innermost elements have a start tag but no ending tag, which is required for XML.
  • Dates. As provided by OFX, the dates do not conform to the GNU guidelines for dates (see Resources for links to more information) and are therefore not immediately readable by functions such as strtotime(). In particular, the date is one continuous string with no separating space between the date and the time parts. You can deal with this during the initial processing or later, when reporting data, because the OFX date format doesn't affect the validity of the XML.
  • Special characters. The OFX output can—and in this case does—contain characters such as the ampersand (&) that cause an XML reader to report an error. Before attempting to validate your file, it is important to reduce such characters to a format compatible with XML (for example, & = &amp;).

The script

Listing 4 provides a suggested script to accomplish the transformation. It takes as its input the file sample.ofx, which will be the file exactly as downloaded from the bank.

Listing 4. Script to XML-ize the OFX file
<?php
  // 1. Read in the file
  $cont = file_get_contents('sample.ofx');
  // 2. Separate out and remove the header
  $bline = strpos($cont,"<OFX>");
  $head = substr($cont,0,$bline-2);
  $ofx = substr($cont,$bline-1);
  // 3. Examine tags that might be improperly terminated
  $ofxx = $ofx;
  $tot=0;
  while ($pos = strpos($ofxx,'<')) {
    $tot++;
    $pos2 = strpos($ofxx,'>');
    $ele = substr($ofxx,$pos+1,$pos2-$pos-1);
    if (substr($ele,0,1) =='/') $sla[] = substr($ele,1);
    else $als[] = $ele;
    $ofxx = substr($ofxx,$pos2+1);
  }
  $adif = array_diff($als,$sla);
  $adif = array_unique($adif);
  $ofxy = $ofx;
  // 4. Terminate those that need terminating
  foreach ($adif as $dif) {
    $dpos = 0;
    while ($dpos = strpos($ofxy,$dif,$dpos+1)) {
      $npos = strpos($ofxy,'<',$dpos+1);
      $ofxy = substr_replace($ofxy,"</$dif>\n<",$npos,1);
      $dpos = $npos+strlen($ele)+3;
    }
  }
  // 5. Deal with special characters
  $ofxy = str_replace('&','&amp;',$ofxy);
  // 6. write the resulting string to the screen
  echo $ofxy;
?>

This script reads the file contents in step 1, and, in step 2 scans the text for the root element and chops off the beginning so that the first thing seen is the start of the root element. In step 3, it loops through the rest of the text looking for the < and > characters that introduce or close an element. It stores opening tags in the $als array and closing tags in the $sla array. The array_diff() function compares the two arrays and notes which elements are not terminated, putting the contents in the array $adif. In step 4, it iterates through the array of problem tags, inserting the missing terminators. In step 5, the special ampersand is replaced by &amp;, if necessary, and finally, the new string is written to the screen. You can, of course, write it directly to a new file with the file_put_contents() function.

This script is only one way of accomplishing the task. Other languages and algorithms might be better, but this one has worked for me for some time. The only special character that has given me a problem is the ampersand, so I have restricted my translation to that one character rather than use a shotgun htmlentities() function approach.

The result is a file that should be a conforming XML file. If you decide to apply it to a download from your bank or credit card company, the result should be viewable in your browser. Even if the browser complains it cannot find a stylesheet, it should be able to display the tree. And it is this tree that you can now manipulate with XML functions.

Profiting from the new structure

Now, suppose you save the output from Listing 4 as a new XML file called proc.xml. Listing 5 shows how you can process this file, with an example of reviewing the transactions one at a time.

Listing 5. Sample code to extract information from the OFX file with XML
<?php
  // test ofx
  $xmlstr = file_get_contents('proc.xml');
  $xml = new SimpleXMLElement($xmlstr);
  // Let's get the balance first
  $bal = $xml->BANKMSGSRSV1->STMTTRNRS->STMTRS->LEDGERBAL->BALAMT;
  $dat = $xml->BANKMSGSRSV1->STMTTRNRS->STMTRS->LEDGERBAL->DTASOF;
  $data = strtotime(substr($dat,0,8));
  $datb = date('Y-m-d',$data);
  echo "My balance is $bal as at $datb\n";
  // Now point at the array of transactions and show the detail for each
  $trans = $xml->BANKMSGSRSV1->STMTTRNRS->STMTRS->BANKTRANLIST->STMTTRN;
  foreach ($trans as $tran) {
    $trandate = trim($tran->DTPOSTED);
    $tdate = date("Y-m-d",strtotime(substr($trandate,0,8)));
    $tranamt = $tran->TRNAMT;
    $trancrdr = $tran->TRNTYPE;
    echo "$tdate $tranamt $trancrdr\n";
  }
?>

Applied to the XML-ized OFX file, the code in Listing 5 now has no problem recognizing the text as organized in XML format, and you can apply the powerful PHP libraries of functions, including the -> operator that gets the needed data. One of the most convenient programming shortcuts is to take a deeply embedded branch such as the array of STMTTRN elements which capture the transactions, and store the reference in a variable as in $trans = $xml->BANKMSGSRSV1->... above. Further references to the array, as in the foreach() iteration, are then very straightforward and reduce coding typo errors.

This script produces output similar to that in Listing 6. You might need to remove white space from beginning and end or "trim" the variables to get them to display cleanly.

Listing 6. Output from code in Listing 5
  My balance is 256.94 as at 2009-02-09
  2009-02-09 -98.91 DEBIT
  2009-02-09 308.86 CREDIT

Here, you use only the date part of the complete date. The time is also available, if necessary, with extra coding. Although you simply report the data back to the screen in this example, you can use the information in your application to update a database, recalculate balances, and so on—all of which the script can do conditionally, depending on user feedback.

Conclusion

The same principles can be applied to any text file, which—with a little work—can be restated in XML format.

Although the emphasis in this article has been on importing data from an OFX file, remember that because the OFX standard is widely recognized, you can use it to export information from your application to others that recognize the format. The only issue is that you will have to decide to either export it directly into the standard OFX version 1 format or export it to an XML-conforming standard, then morph it into version 1 format.

No doubt in the future, financial institutions will work mostly in concert with the major software publishers and provide downloadable OFX files that are compatible with commercial mass-market products. When the move is finally made to downloadable files in OFX version 2, you will be able to immediately enjoy direct XML parsing. Until that time, financial programmers have to keep an eye open for changes in banks' download offerings and adjust their programming to suit.

Resources

Learn

Get products and technologies

  • IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=375906
ArticleTitle=Harness the power of XML to Open Financial Exchange files
publish-date=03172009