Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Best way to update DB table

Options
  • 19-03-2013 3:10pm
    #1
    Registered Users Posts: 3,532 ✭✭✭


    I'm looking to get your opinion on this.

    I have a script which runs twice a day. It gets some information and updates the database with the new information. It updates twice daily because the DB needs to be kept up-to-date with new information and because some of the information in the DB expires and no longer needs to be here.

    Would you (should I) :

    1) Delete the table and rebuild.

    2) Delete only the expired rows (i.e. information which has not been found by the script) and insert new information only.


    Assume around 200 rows of data max.


Comments

  • Registered Users Posts: 27,033 ✭✭✭✭GreeBo


    Does anything else write to the DB or just this operation?

    Is it 200 rows being updated or 200 rows max in the DB ever?


  • Closed Accounts Posts: 8,016 ✭✭✭CreepingDeath


    I'm looking to get your opinion on this.

    I have a script which runs twice a day. It gets some information and updates the database with the new information. It updates twice daily because the DB needs to be kept up-to-date with new information and because some of the information in the DB expires and no longer needs to be here.

    Would you (should I) :

    1) Delete the table and rebuild.

    2) Delete only the expired rows (i.e. information which has not been found by the script) and insert new information only.


    Assume around 200 rows of data max.

    If only some of the data expires, that means the rest is valid.
    So I don't see why you'd delete the table, that seems like overkill/bad practice.

    Definitely option 2.


  • Registered Users Posts: 3,532 ✭✭✭Unregistered.



    If only some of the data expires, that means the rest is valid.
    So I don't see why you'd delete the table, that seems like overkill/bad practice.

    Definitely option 2.

    Wouldn't the DB need to do a lookup each time before it inserts a new entry - to see of it is already there.


  • Registered Users Posts: 3,532 ✭✭✭Unregistered.


    GreeBo wrote: »
    Does anything else write to the DB or just this operation?

    Is it 200 rows being updated or 200 rows max in the DB ever?

    The DB is just used to store information collected by the script. Some of the information collected could already be in the database, and should not be in there twice.


  • Registered Users Posts: 27,033 ✭✭✭✭GreeBo


    The DB is just used to store information collected by the script. Some of the information collected could already be in the database, and should not be in there twice.

    So typically the script will be doing *something* to ~200 rows?
    Just run updates so, no need to move the entire table.

    What sort of indexes are in the table and how many columns does it have?

    Also , will you need to lock the table or are dirty reads ok?


  • Advertisement
  • Closed Accounts Posts: 8,016 ✭✭✭CreepingDeath


    Wouldn't the DB need to do a lookup each time before it inserts a new entry - to see of it is already there.

    There's a fundamental data model design problem here.
    It might be easier to have a master table and a transaction table.

    The master table contains all the data.
    The transaction table contains the new data from the script.
    It's then a question of updating contents of the master table with the transaction table.

    Then you can perform the whole operation in a transaction, and rollback if there's a problem.

    Depending on your database, vague SQL might be...
    BEGIN TRANSACTION
    
    -- Delete yesterdays entries from master
    DELETE FROM MASTER M WHERE M.YOURTIMESTAMP < CURRENT_DATE - 1;
    
    -- Merge current transaction into master table
    INSERT INTO MASTER SELECT * FROM TRANSACTION WHERE NOT EXISTS ( SELECT * FROM MASTER );
    
    COMMIT;
    


  • Registered Users Posts: 3,532 ✭✭✭Unregistered.


    A little bit more background as to why I'm asking: I'm fixing a bug in an application which is currently showing data from entries which are out of date (and should not be present in the database).
    The reason for this is because the developer who wrote the code used REPLACE when updating the table. As a result, if the script fround 50 entries, and there was 100 in the DB, then there would be 50 entries there which are out of date.

    I *think* they assumed this would replace the *entire* contents of the table with the new values.
    GreeBo wrote: »
    So typically the script will be doing *something* to ~200 rows?
    Just run updates so, no need to move the entire table.

    What sort of indexes are in the table and how many columns does it have?

    Also , will you need to lock the table or are dirty reads ok?
    The table is only used by the application to read information. The only thing which alters the tables in any way is the update script. And all this does is insert/replace.

    The number of rows isn't constant, I was using 200 as an example.
    There's a fundamental data model design problem here.
    It might be easier to have a master table and a transaction table.

    The master table contains all the data.
    The transaction table contains the new data from the script.
    It's then a question of updating contents of the master table with the transaction table.

    Then you can perform the whole operation in a transaction, and rollback if there's a problem.

    Depending on your database, vague SQL might be...
    BEGIN TRANSACTION
    
    -- Delete yesterdays entries from master
    DELETE FROM MASTER M WHERE M.YOURTIMESTAMP < CURRENT_DATE - 1;
    
    -- Merge current transaction into master table
    INSERT INTO MASTER SELECT * FROM TRANSACTION WHERE NOT EXISTS ( SELECT * FROM MASTER );
    
    COMMIT;
    

    I think the merging of the two tables is quite sensieble. Currently, duplicates aren't checked for either - this is the only reason why I suggested deleting entire table and rebuild. I hadn't thought of using a NOT EXISTS clause.


Advertisement