php - Update MySQL database from XML file

203

I got 2 tables in mySQL database:

Persons:

id    name     address .....

and Items:

id   person_id  param1  param2 ....

items are referenced by person_id to Persons. There is about 5000 persons and 40,000 items and growing. This data need to be updated every few days from big XML file what look like this:

.....
<person id='100016' lastname='....' firstname='.....' ....  >
<item param1='...' startdate='2006-01-07' enddate='2006-12-09' ... />
<item param1='...' startdate='2007-01-04' enddate='2007-12-08' ... />
<item param1='...' startdate='2008-01-04' enddate='2009-01-03' ... />
<item param1='...' startdate='2009-01-06' enddate='2009-12-31' ... />
</person>
......

Some attributes of person and item can change, new persons, and new items could be added.

What is best way to keep it easy to update ? I do not think that truncating tables and loading them again would most effective way. Should I consider some staging - size of XML file start to have meaning 60MB and growing -> it not effective to have it parsed to SimpleXML and comparing with SQL...

What would be a best practice in situation like this ?

I have mySQL and PHP with Zend Framework as my environment

220

Answer

Solution:

You could use diffxml on the old and new XML file and then parse the changes in the XML to reflect them into MySQL.

But I think truncating and loading the full file would also work in a reasonable time. And sometimes such a simple solution is the best - even if it is not super fast.

965

Answer

Solution:

I think the most effective solution would be to simply reprocess the entire file, and for each line perform an INSERT ... ON DUPLICATE KEY UPDATE. You would need to find a value that identifies the user which will never change and create a UNIQUE index (or use the primary key, which looks like it would be the most effective approach in your case as long as the id field is the one that never changes), and put any fields that may change over time into the update section.

INSERT INTO persons SET id='1234', lastname='a', firstname = 'b' ON DUPLICATE KEY UPDATE lastname='a', firstname='b';

As long as the fields are not variable in length (no varchar or test fields in the table) this should allow you to keep records updated quickly without needing to truncate the tables. Truncating would just increase the disk writes for each record.

This approach would also allow you to create statistics - for example you might add a column that records the first time you see a new person, so you know how many people got added in each update process.

159

Answer

Solution:

Question 1 - What is best way to keep it easy to update ?

Is there a unique reference you can use within the XML? If so then import this and use it to determine if the XML data has already been imported. Failing that you will have to generate the reference yourself.

Having this reference within the XML would be perfect as you can load the XML document using DOMDocument and then use XPath to easily go to the last imported reference and then import again. This would only work if you knew the data wasn't going to change on previous rows.

If it does change then you would need to iterate over it or use the diff tool suggested by Alex.

People are also looking for solutions to the problem: Repeating XML parsing with PHP

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.