Best way to update DB table

Unregistered. · 19-03-2013 3:10pm #1

I'm looking to get your opinion on this.

I have a script which runs twice a day. It gets some information and updates the database with the new information. It updates twice daily because the DB needs to be kept up-to-date with new information and because some of the information in the DB expires and no longer needs to be here.

Would you (should I) :

1) Delete the table and rebuild.

2) Delete only the expired rows (i.e. information which has not been found by the script) and insert new information only.

Assume around 200 rows of data max.

GreeBo · 19-03-2013 3:12pm

Does anything else write to the DB or just this operation?

Is it 200 rows being updated or 200 rows max in the DB ever?

CreepingDeath · 19-03-2013 3:21pm

Unregistered. wrote: »

I'm looking to get your opinion on this.

I have a script which runs twice a day. It gets some information and updates the database with the new information. It updates twice daily because the DB needs to be kept up-to-date with new information and because some of the information in the DB expires and no longer needs to be here.

Would you (should I) :

1) Delete the table and rebuild.

2) Delete only the expired rows (i.e. information which has not been found by the script) and insert new information only.

Assume around 200 rows of data max.

If only some of the data expires, that means the rest is valid.
So I don't see why you'd delete the table, that seems like overkill/bad practice.

Definitely option 2.

Unregistered. · 19-03-2013 5:48pm

CreepingDeath wrote: »

If only some of the data expires, that means the rest is valid.
So I don't see why you'd delete the table, that seems like overkill/bad practice.

Definitely option 2.

Wouldn't the DB need to do a lookup each time before it inserts a new entry - to see of it is already there.

Unregistered. · 19-03-2013 5:51pm

GreeBo wrote: »

Does anything else write to the DB or just this operation?

Is it 200 rows being updated or 200 rows max in the DB ever?

The DB is just used to store information collected by the script. Some of the information collected could already be in the database, and should not be in there twice.

GreeBo · 19-03-2013 5:58pm

Unregistered. wrote: »

The DB is just used to store information collected by the script. Some of the information collected could already be in the database, and should not be in there twice.

So typically the script will be doing *something* to ~200 rows?
Just run updates so, no need to move the entire table.

What sort of indexes are in the table and how many columns does it have?

Also , will you need to lock the table or are dirty reads ok?

CreepingDeath · 19-03-2013 6:41pm

Unregistered. wrote: »

Wouldn't the DB need to do a lookup each time before it inserts a new entry - to see of it is already there.

There's a fundamental data model design problem here.
It might be easier to have a master table and a transaction table.

The master table contains all the data.
The transaction table contains the new data from the script.
It's then a question of updating contents of the master table with the transaction table.

Then you can perform the whole operation in a transaction, and rollback if there's a problem.

Depending on your database, vague SQL might be...

BEGIN TRANSACTION

-- Delete yesterdays entries from master
DELETE FROM MASTER M WHERE M.YOURTIMESTAMP < CURRENT_DATE - 1;

-- Merge current transaction into master table
INSERT INTO MASTER SELECT * FROM TRANSACTION WHERE NOT EXISTS ( SELECT * FROM MASTER );

COMMIT;

Unregistered. · 19-03-2013 7:21pm

A little bit more background as to why I'm asking: I'm fixing a bug in an application which is currently showing data from entries which are out of date (and should not be present in the database).
The reason for this is because the developer who wrote the code used REPLACE when updating the table. As a result, if the script fround 50 entries, and there was 100 in the DB, then there would be 50 entries there which are out of date.

I *think* they assumed this would replace the *entire* contents of the table with the new values.

GreeBo wrote: »

So typically the script will be doing *something* to ~200 rows?
Just run updates so, no need to move the entire table.

What sort of indexes are in the table and how many columns does it have?

Also , will you need to lock the table or are dirty reads ok?

The table is only used by the application to read information. The only thing which alters the tables in any way is the update script. And all this does is insert/replace.

The number of rows isn't constant, I was using 200 as an example.

CreepingDeath wrote: »
There's a fundamental data model design problem here.
It might be easier to have a master table and a transaction table.

The master table contains all the data.
The transaction table contains the new data from the script.
It's then a question of updating contents of the master table with the transaction table.

Then you can perform the whole operation in a transaction, and rollback if there's a problem.

Depending on your database, vague SQL might be...
BEGIN TRANSACTION

-- Delete yesterdays entries from master
DELETE FROM MASTER M WHERE M.YOURTIMESTAMP < CURRENT_DATE - 1;

-- Merge current transaction into master table
INSERT INTO MASTER SELECT * FROM TRANSACTION WHERE NOT EXISTS ( SELECT * FROM MASTER );

COMMIT;

I think the merging of the two tables is quite sensieble. Currently, duplicates aren't checked for either - this is the only reason why I suggested deleting entire table and rebuild. I hadn't thought of using a NOT EXISTS clause.

Best way to update DB table

Comments