I'm not a DB developer, I know basic select statements but that's about it. We have a high volume xml web service that takes roughly 200K transactions per day. Each transaction request has a corresponding response so that's about 400k transactional xmls per day to save.
Each request xml is anywhere from 20k to 80k bytes of xml and the xml responses are anywhere from 20k to 400k bytes of xml. So say an average of 100k bytes per xml packet would make it about 40 gig per day.
Our Business team wants to temporarily store all requests and responses to a DB with a purge of old records at about 1 month. They want to be able to retrieve a specific xml packet from the database based on a specific selection of data fields within the xml packet.
I don't want this saving of data to harm the performance of my web service so I think I can get around this by spawning threads to save the data to a DB.
I really don't care about the peformance of the business tool that will query the data since it won't affect any business critical applications (unless somehow it affects the saving of the data if they are querying the database while the web service is writing to it).
Questions:
1) Is the volume and size of the data too much to use a database? Should we consider just dumping the xmls to a folder on a server.
2) If a database is OK to use then: I see that SQL Server has an XML data type and some functions that can query an xml record using xpath. For querying the DB should I use the built in XML type and functions like:
Create Table
{
Clob XML
}
Then use the built in xml functions like
SELECT Clob from table where Clob.query(/root/subnode[vendorId='1234'])
(Sorry, I don't know the syntax)
or would it be better to extract the specific data items from the xml as it's being saved and save the data items to separate columns like:
Create Table
{
name varchar,
timestame datetime,
vendorID varchar,
... (more searchable data items)
Clob XML
}
Then use a query like
Select clob from table where vendorID = '1234'
As I said before, I don't know much about databases but the first option above seems like it would be extremely slow to execute an xpath against 400K x 30 days worth of xml transactions.
Thanks to anyone willing to read through my diatribe and even more thanks to anyone will to provide advice!
Software/Hardware used:
SQL Server 2008. Hardware unknown, I don't have access but I'm confident it's high performance.
Discuss This Question: 1  Reply