Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL Primary Key vs Index

Options
  • 21-02-2013 11:46am
    #1
    Registered Users Posts: 1,657 ✭✭✭


    It's generally held that it's more efficient use a key or index for any fields that your table is going to be selected on, wherever possible. But is it as efficient to use the first part of a primary key as an index, or should a separate index be set up?

    Let's say I have a table called stock_quantities.
    Columns:
    product_id (primary key - part)
    shop_id (primary key - part)
    stock_level

    And I want to select all records for a given product_id. should I set up a separate index just for product_id or is it as efficient because product_id is the first part of the primary key (and therefore still sorted in wherever keys / indices are stored)?


Comments

  • Registered Users Posts: 7,468 ✭✭✭Evil Phil


    I don't split primary keys across columns. For your example I would use something like:

    stock_quantity
    Columns:
    stock_quantity_id (primary key)
    product_id
    shop_id
    stock_level


  • Moderators, Technology & Internet Moderators Posts: 1,334 Mod ✭✭✭✭croo


    But you might still create an index for the product_id & store_id!

    Like most things there is a balance to be found. More indices make searching, via that index, quicker. But more indices make creating and deleting records slower as now the indices must also be changed!


  • Registered Users Posts: 249 ✭✭gargargar


    It depends on database server. I would leave it as you have it and then you can tune it later. I would not create a new spurious pk.


  • Registered Users Posts: 1,037 ✭✭✭kuro_man


    primary key <> index, though the RDBMS may create an index by default for the primary key.
    having a primary key based on 2 columns is fine - I would prefer a natural key over an unnatural one any day

    to answer your question, if you have an index based on 2 columns, and you are searching on the left-most column, then the optimiser should use the index. You should check the query plan to make sure.


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    kuro_man wrote: »
    primary key <> index, though the RDBMS may create an index by default for the primary key.
    having a primary key based on 2 columns is fine - I would prefer a natural key over an unnatural one any day

    to answer your question, if you have an index based on 2 columns, and you are searching on the left-most column, then the optimiser should use the index. You should check the query plan to make sure.

    I'm the complete opposite, every single db table I've created the first column is an auto increment int called id as the primary key. If there are natural keys thats lucky, but I think its best to play defensively in this area.


  • Advertisement
  • Registered Users Posts: 249 ✭✭gargargar


    ChRoMe wrote: »
    I'm the complete opposite, every single db table I've created the first column is an auto increment int called id as the primary key. If there are natural keys thats lucky, but I think its best to play defensively in this area.

    Why? In this case (supporting a many to many relationship) it is a pointless piece of information. You will never retrieve data based on it. It will never be an FK in another table.


  • Registered Users Posts: 1,037 ✭✭✭kuro_man


    you may not always have a natural key so you must add an _id column, but if there is one then why not use it? othwise you may inadvertantly introduce duplicates, unless you're very careful defining your non-key contraints


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    I know relatively little about databases. But why would you hold stock totals/quantities in a table and not a view? Maybe because there's massive amount of stock and the view kills performance? As for auto ID fields, I think the number of times I come across a situation where I wished I had one far out numbers the number of time I found it to be redundant. I thought the only way to know if an index was efficient was by testing it.


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    kuro_man wrote: »
    you may not always have a natural key so you must add an _id column, but if there is one then why not use it? othwise you may inadvertantly introduce duplicates, unless you're very careful defining your non-key contraints

    Disk space is cheap :)


  • Registered Users Posts: 851 ✭✭✭TonyStark


    ChRoMe wrote: »

    Disk space is cheap :)


    ....and computers are getting faster! :-D


  • Advertisement
  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    ... and people still find ways to slow them down...


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    Its a very small price to guarantee that you will always be able to identify a single row.


  • Registered Users Posts: 2,019 ✭✭✭Colonel Panic


    Being able to identify a single row is all well and good but if you use an artificial key, you can put two identical items in the same table. Identical apart from the key that is. Then what? A constraint on what would've been a natural key to enforce uniqueness?

    I guess it just depends... Is that desired or not?

    For what it's worth, in the past I've taken the Unique ID + additional constraint route. But the latter only got added after I decided to write some unit tests.


  • Registered Users Posts: 763 ✭✭✭Dar


    For composite key vs auto-generate:
    Strictly speaking you should use composite ids where appropriate instead of adding unneeded columns. However, using single auto-increment fields makes things a lot less complicated on the coding side and also lets you avoid composite foreign keys entirely. Adding an extra integer column here and there isn't going to have any real affect on performance, and as long as you apply appropriate constraints you won't have any data integrity problems.

    For the indexing:
    Order matters when defining indexes. If you already have an index for (product_id, shop_id), then you don't need another index for product_id alone. However, if you have a lot of queries selecting by shop_id, you could see a performance increase by creating an index for shop_id.

    If you're worried about query performance, do some reading on explain plans.


  • Registered Users Posts: 249 ✭✭gargargar


    BostonB wrote: »
    But why would you hold stock totals/quantities in a table and not a view? Maybe because there's massive amount of stock and the view kills performance?
    View of what?


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    stock_quantities


  • Registered Users Posts: 249 ✭✭gargargar


    Why add a layer in to view the data in the table. Why not go to the table?


  • Registered Users Posts: 2,021 ✭✭✭ChRoMe


    Identical apart from the key that is.

    So not identical at all then ;)


  • Registered Users Posts: 249 ✭✭gargargar


    ChRoMe wrote: »
    So not identical at all then ;)

    The poster was making the point that artificial key doesn't give you anything in this case. As he said you could insert dups using it.

    KEY_ID STORE_ID PRODUCT_ID STOCK
    1 | 1 | 1| 100
    2 | 1 | 1 |100

    Record one and two are the same data but appear to be unique by fake id.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    gargargar wrote: »
    Why add a layer in to view the data in the table. Why not go to the table?

    I was thinking the table with stock totals is essentially a calculated column of existing data in other tables. Why create yet another table to hold what is a copy of data in those other tables? Isn't that what views are for?

    If there was a performance issue, you have to test a number of scenarios of design to see which produced the best results.

    I was curious myself to the answer. Since tables with "totals" seem to have concurrency issues quite often when working on other peoples systems.


  • Advertisement
  • Registered Users Posts: 249 ✭✭gargargar


    BostonB wrote: »
    I was thinking the table with stock totals is essentially a calculated column of existing data in other tables. Why create yet another table to hold what is a copy of data in those other tables? Isn't that what views are for?

    If there was a performance issue, you have to test a number of scenarios of design to see which produced the best results.

    I was curious myself to the answer. Since tables with "totals" seem to have concurrency issues quite often when working on other peoples systems.
    I think I can see where you are coming from - quantity on hand = stock - sold, something like that? My feeling is that it would be kept up to date by the application. Your deliverys in would add to quantity field and sales would decrement it.


  • Registered Users Posts: 2,019 ✭✭✭Colonel Panic


    ChRoMe wrote: »
    So not identical at all then ;)

    :D Artificially identical!

    For multiple invoices, no problem, but for a stock item? Maybe...


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    gargargar wrote: »
    I think I can see where you are coming from - quantity on hand = stock - sold, something like that? My feeling is that it would be kept up to date by the application. Your deliverys in would add to quantity field and sales would decrement it.

    Why bother. Why code it when you can let the db do it.
    It can never go out of sync either.

    count of products = stock

    count of sales = sold

    The only reason not to do that would be performance. But only way to know that is to test it both ways with the expected load. (I'm guessing - maybe I'm wrong)


  • Registered Users Posts: 249 ✭✭gargargar


    BostonB wrote: »
    Why bother. Why code it when you can let the db do it.
    It can never go out of sync either.

    count of products = stock

    count of sales = sold

    The only reason not to do that would be performance. But only way to know that is to test it both ways with the expected load. (I'm guessing - maybe I'm wrong)

    No it doesn't. Stock is the number of units of a product. Products table will contain widget a, widget b... Will tell you nothing about how many of widget a and b you have in stock, at least in the case where you have multi locations.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    See I was thinking Product ID is a unique ID for a product like a phone and the barcode and/or IMEI. You're thinking its like a piece of fruit like an Orange. I wouldn't use ID like that, I'd use Product_Type or Product_Code . You're probably right. it would make more sense in the example given.


Advertisement