SQL Primary Key vs Index

komodosp · 21-02-2013 11:46AM #1

It's generally held that it's more efficient use a key or index for any fields that your table is going to be selected on, wherever possible. But is it as efficient to use the first part of a primary key as an index, or should a separate index be set up?

Let's say I have a table called stock_quantities.
Columns:
product_id (primary key - part)
shop_id (primary key - part)
stock_level

And I want to select all records for a given product_id. should I set up a separate index just for product_id or is it as efficient because product_id is the first part of the primary key (and therefore still sorted in wherever keys / indices are stored)?

Evil Phil · 21-02-2013 03:04PM

I don't split primary keys across columns. For your example I would use something like:

stock_quantity
Columns:
stock_quantity_id (primary key)
product_id
shop_id
stock_level

croo · 21-02-2013 03:13PM

But you might still create an index for the product_id & store_id!

Like most things there is a balance to be found. More indices make searching, via that index, quicker. But more indices make creating and deleting records slower as now the indices must also be changed!

gargargar · 21-02-2013 04:14PM

It depends on database server. I would leave it as you have it and then you can tune it later. I would not create a new spurious pk.

kuro_man · 22-02-2013 03:14PM

primary key <> index, though the RDBMS may create an index by default for the primary key.
having a primary key based on 2 columns is fine - I would prefer a natural key over an unnatural one any day

to answer your question, if you have an index based on 2 columns, and you are searching on the left-most column, then the optimiser should use the index. You should check the query plan to make sure.

ChRoMe · 22-02-2013 03:50PM

kuro_man wrote: »

primary key <> index, though the RDBMS may create an index by default for the primary key.
having a primary key based on 2 columns is fine - I would prefer a natural key over an unnatural one any day

to answer your question, if you have an index based on 2 columns, and you are searching on the left-most column, then the optimiser should use the index. You should check the query plan to make sure.

I'm the complete opposite, every single db table I've created the first column is an auto increment int called id as the primary key. If there are natural keys thats lucky, but I think its best to play defensively in this area.

gargargar · 22-02-2013 03:59PM

ChRoMe wrote: »

I'm the complete opposite, every single db table I've created the first column is an auto increment int called id as the primary key. If there are natural keys thats lucky, but I think its best to play defensively in this area.

Why? In this case (supporting a many to many relationship) it is a pointless piece of information. You will never retrieve data based on it. It will never be an FK in another table.

kuro_man · 23-02-2013 12:15AM

you may not always have a natural key so you must add an _id column, but if there is one then why not use it? othwise you may inadvertantly introduce duplicates, unless you're very careful defining your non-key contraints

BostonB · 23-02-2013 01:18AM

I know relatively little about databases. But why would you hold stock totals/quantities in a table and not a view? Maybe because there's massive amount of stock and the view kills performance? As for auto ID fields, I think the number of times I come across a situation where I wished I had one far out numbers the number of time I found it to be redundant. I thought the only way to know if an index was efficient was by testing it.

ChRoMe · 23-02-2013 12:17PM

kuro_man wrote: »

you may not always have a natural key so you must add an _id column, but if there is one then why not use it? othwise you may inadvertantly introduce duplicates, unless you're very careful defining your non-key contraints

Disk space is cheap

TonyStark · 23-02-2013 12:47PM

ChRoMe wrote: »

Disk space is cheap

....and computers are getting faster! :-D

BostonB · 23-02-2013 01:51PM

... and people still find ways to slow them down...

ChRoMe · 23-02-2013 03:05PM

Its a very small price to guarantee that you will always be able to identify a single row.

Colonel Panic · 23-02-2013 03:46PM

Being able to identify a single row is all well and good but if you use an artificial key, you can put two identical items in the same table. Identical apart from the key that is. Then what? A constraint on what would've been a natural key to enforce uniqueness?

I guess it just depends... Is that desired or not?

For what it's worth, in the past I've taken the Unique ID + additional constraint route. But the latter only got added after I decided to write some unit tests.

Dar · 24-02-2013 01:18AM

For composite key vs auto-generate:
Strictly speaking you should use composite ids where appropriate instead of adding unneeded columns. However, using single auto-increment fields makes things a lot less complicated on the coding side and also lets you avoid composite foreign keys entirely. Adding an extra integer column here and there isn't going to have any real affect on performance, and as long as you apply appropriate constraints you won't have any data integrity problems.

For the indexing:
Order matters when defining indexes. If you already have an index for (product_id, shop_id), then you don't need another index for product_id alone. However, if you have a lot of queries selecting by shop_id, you could see a performance increase by creating an index for shop_id.

If you're worried about query performance, do some reading on explain plans.

gargargar · 25-02-2013 11:03AM

BostonB wrote: »

But why would you hold stock totals/quantities in a table and not a view? Maybe because there's massive amount of stock and the view kills performance?

View of what?

BostonB · 25-02-2013 11:10AM

stock_quantities

gargargar · 25-02-2013 11:15AM

Why add a layer in to view the data in the table. Why not go to the table?

ChRoMe · 25-02-2013 11:18AM

Colonel Panic wrote: »

Identical apart from the key that is.

So not identical at all then

gargargar · 25-02-2013 11:38AM

ChRoMe wrote: »

So not identical at all then

The poster was making the point that artificial key doesn't give you anything in this case. As he said you could insert dups using it.

KEY_ID STORE_ID PRODUCT_ID STOCK
1 | 1 | 1| 100
2 | 1 | 1 |100

Record one and two are the same data but appear to be unique by fake id.

BostonB · 25-02-2013 11:51AM

gargargar wrote: »

Why add a layer in to view the data in the table. Why not go to the table?

I was thinking the table with stock totals is essentially a calculated column of existing data in other tables. Why create yet another table to hold what is a copy of data in those other tables? Isn't that what views are for?

If there was a performance issue, you have to test a number of scenarios of design to see which produced the best results.

I was curious myself to the answer. Since tables with "totals" seem to have concurrency issues quite often when working on other peoples systems.

gargargar · 25-02-2013 04:14PM

BostonB wrote: »

I was thinking the table with stock totals is essentially a calculated column of existing data in other tables. Why create yet another table to hold what is a copy of data in those other tables? Isn't that what views are for?

If there was a performance issue, you have to test a number of scenarios of design to see which produced the best results.

I was curious myself to the answer. Since tables with "totals" seem to have concurrency issues quite often when working on other peoples systems.

I think I can see where you are coming from - quantity on hand = stock - sold, something like that? My feeling is that it would be kept up to date by the application. Your deliverys in would add to quantity field and sales would decrement it.

Colonel Panic · 25-02-2013 04:37PM

ChRoMe wrote: »

So not identical at all then

Artificially identical!

For multiple invoices, no problem, but for a stock item? Maybe...

BostonB · 25-02-2013 05:02PM

gargargar wrote: »

I think I can see where you are coming from - quantity on hand = stock - sold, something like that? My feeling is that it would be kept up to date by the application. Your deliverys in would add to quantity field and sales would decrement it.

Why bother. Why code it when you can let the db do it.
It can never go out of sync either.

count of products = stock

count of sales = sold

The only reason not to do that would be performance. But only way to know that is to test it both ways with the expected load. (I'm guessing - maybe I'm wrong)

gargargar · 25-02-2013 06:08PM

BostonB wrote: »

Why bother. Why code it when you can let the db do it.
It can never go out of sync either.

count of products = stock

count of sales = sold

The only reason not to do that would be performance. But only way to know that is to test it both ways with the expected load. (I'm guessing - maybe I'm wrong)

No it doesn't. Stock is the number of units of a product. Products table will contain widget a, widget b... Will tell you nothing about how many of widget a and b you have in stock, at least in the case where you have multi locations.

BostonB · 25-02-2013 06:36PM

See I was thinking Product ID is a unique ID for a product like a phone and the barcode and/or IMEI. You're thinking its like a piece of fruit like an Orange. I wouldn't use ID like that, I'd use Product_Type or Product_Code . You're probably right. it would make more sense in the example given.

SQL Primary Key vs Index

Comments