Is there a 'best practices' for XML document storage? We have a debate going on at my company regarding the best way to store XML documents so users can access them. The developers want to store in a BLOB in a SQL database, the dba's want it stored as a data structure, and another group
suggests putting in a file system and index it.
We're looking for some outside input or research sites where we can see what other companies have done.
Pinned topic XML Document Storage - Best Way?
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2003-03-07T16:05:30Z at 2003-03-07T16:05:30Z by SystemAdmin
SystemAdmin 110000D4XK76 Posts
Re: XML Document Storage - Best Way?2003-03-07T16:05:30ZThis is the accepted answer. This is the accepted answer.I touch on this a little bit in some of my earlier columns, but basically the three options you've outlined are the three possibilities. It depends on how you need to use the XML documents once they enter your system. If your documents are full of data that you want to be able to process as a secondary step (say they are invoices, and you want to return a count of how many widgets were ordered in July), then it makes sense to break that information out into a data store where it can be accessed directly by SQL queries or other data-oriented mechanisms. If all of the data needs to be accessible this way, and you have a strongly-defined structure for the document (i.e. no ad-hoc text markup blocks), then you can decompose the entire thing - you'll take a performance hit putting documents in and getting them out, but you'll be able to process all of the data using traditional RDBMS-style querying. If the amount of information coming in that needs to be processed is more limited, you can go either the BLOB route or the file system route, either of which would be augmented by additional indexing information in the database. Which is better is really a matter of preference - keeping the documents as files and pointing to them from the RDBMS reduces portability and increases the complexity of your backup and archival strategy, while storing them as BLOBs increases the burden on the database (forcing the database to retrieve more pages per query, as the BLOBs will eat up data pages with no real relational benefit).
You might want to take a look at my book Professional XML Databases (http://www.amazon.com/exec/obidos/tg/detail/-/1861003587) - I actually spend about twenty pages delving into this very topic.