If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)

Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Structuring an SQL database.

Mutant_Fruit · 2005-07-16 15:41:32

Are there any sites out there which show you how best to structure a new SQL database? I'm working on my mini-project and i've got a lot of seperate parts up and running. Now i just have to decide how best to structure my database before i get everything hooked in together. I will have to be storing information similar to…

16-07-2005 3:41pm

#1

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

Are there any sites out there which show you how best to structure a new SQL database?

I'm working on my mini-project and i've got a lot of seperate parts up and running. Now i just have to decide how best to structure my database before i get everything hooked in together.

I will have to be storing information similar to this:
team_name
team_status
team_location
salary.

Each of those 4 have different values. There are several dozen team_names, several several dozen positions, several dozen locations and whatever salaries.

Now this is going to confuse me trying to explain it... but i'm going to try something as follows.

Make a "team_name" table. In here i'll put each unique team_name, along with a unique integer called "listvalue". I'll make a "team_status" table, containing each unique team_status and give them their own unique listvalue. etc etc.

Then, i'll make a 5th table, which will contain the columns "team_name", "team_status" etc etc and the contents of these columns will be the associated integer values.

That should reduce the size of the database, as the "main" table which stores all the data will only be containing a 5-6 digit integer as opposed to a 20 character string for each of the different values.

Ok, so thats the database organised. Is this efficient? Is there a better way of doing it? Any advice appreciated, and if you can give reallife examples to help explain, that'd be great.

0

Comments

#2 16-07-2005 3:54pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

3 lookup tabkes, ie for
team_name
team_status
team_location

say

tbl_team
- team_id used autogenerated number eg identity seed field in SQL
- description

tbl_status
- status_id used autogenerated number eg identity seed field in SQL
- description

tbl_location
- location_id used autogenerated number eg identity seed field in SQL
- description

Then a 4th table called whatever you want depending on what your representing but containing say

tbl_whatever
- team_id
- status_id
- location_id
- salary

of course if you can fixed salaries you'll want another table for salary.
depends on the purpose of the database really.

Mark

0
#3 16-07-2005 6:14pm

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

Well, the database will primarily be a searchable database (if you know what i mean). There will be a lot of searches done against it, so it needs to be fast for searching, updating speed isn't as important.

I'd also like to mention that its SQLite i'm using, not full-blown SQL, so i'm not 100% sure if every feature is supported. I assume it is. It has most of the features the standard calls for,but any extras in SQL2000 probably aren't available, so bear that in mind when you recommend something.

- team_id used autogenerated number eg identity seed field in SQL

What exactly do you mean by that, in stupid people terms. Do you mean that i read in each unique team id into the "team_id" table, and assign each team_id to an integer value? Which auto-increments when i add in new "team_id's". (team id's are typically strings up to 15 chars in length).

What do you mean by lookup tables (i probably know what you mean, i just don't know the technical name for em). Are the "lookup" tables the ones that i mentioned above? The tables with the "team_id" and their related uniqe integer identifier?

indexing: What should i be indexing? In a general sense, what should be indexed? Only the most frequently searched term(s)? Or should i be indexing everything that will be searched? Would an index greatly speed up the searching of my data if i structure it the way you say?

0
#4 16-07-2005 6:56pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

If you want to be able to type in the ids then ignore the bit about the autogen ids, but yes the system will automatically create the number for you, depending on the SQL system of course.

The tables (eg tbl_location) you are creating are commonly known as lookup tables.

Each of the tables should have at least a primary/unique key (index) so that you can't input duplicates and of course for searching, so on tbl_location, put one on location_id. Plus when you're creating a join between say the tbl_whatever to tbl_location, it'll make it faster. You won't notice anything with a few records on tbl_whatever, but with alot it'll make a difference.

If there are a couple of fields which you'll always be using in search criteria on tbl_whatever then you should index them also.

Do some searches on indexing, SQLlite might have some recommendations.
Its a whole science in itself.

Feel free to ask any questions.
Mark

0
#5 16-07-2005 9:49pm

bonkey

Registered Users, Registered Users 2 Posts: 15,443 ✭✭✭✭

Join Date: April 2001

Posts: 14635

Mutant_Fruit wrote:

Well, the database will primarily be a searchable database (if you know what i mean). There will be a lot of searches done against it, so it needs to be fast for searching, updating speed isn't as important.

I'd also like to mention that its SQLite i'm using, not full-blown SQL, so i'm not 100% sure if every feature is supported. I assume it is. It has most of the features the standard calls for,but any extras in SQL2000 probably aren't available, so bear that in mind when you recommend something.

OK - you need to read up on JOIN statements. They should be supported by any SQL implementation. Read up on what they are in the help file / online and then ask any questions you need to clear up the misunderstandings.

Also, read up on Views. If you can't support JOIN satements, you should be able to create a VIEW in the database, and then query simply against that, so either way, its not a problem...again, reading up then asking for direction is your best bet here.

(I know I'm not handing you answers on a plate, but I'm assuming you want to know how to do this rather than having the answer handed to you.)

What exactly do you mean by that, in stupid people terms. Do you mean that i read in each unique team id into the "team_id" table, and assign each team_id to an integer value? Which auto-increments when i add in new "team_id's". (team id's are typically strings up to 15 chars in length).

In simple terms, IDENTITY fields are numeric (normally integer) fields which you can have values automatically generated for. But they're database-specific. I don't know if they're supported by SQLLite.

What do you mean by lookup tables (i probably know what you mean, i just don't know the technical name for em). Are the "lookup" tables the ones that i mentioned above? The tables with the "team_id" and their related uniqe integer identifier?

Yup. Its the basic concept behind relational databases. You store the lookup information once, with a Unique Identifier (again, typically just a number) which is stored with the information associated with that record.

Again, your best bet is to google / read up on 3rd Normal Form and normalisation in general. These tell you the basic concepts behind how to structure things. It sounds like you know what this is, just not what it formally is..so read up and then ask, or if you can't find a good source, ask and I'll dig you up one when I'm not so drunk if someone else hasn't.

indexing: What should i be indexing? In a general sense, what should be indexed? Only the most frequently searched term(s)? Or should i be indexing everything that will be searched? Would an index greatly speed up the searching of my data if i structure it the way you say?

Index all Primary Keys / Unique Values. Index every field which is a lookup to a Primary Key / Uniuqe Value. After that...it gets tough to give simple rules of thumb. If you search string fields (CHAR or VARCHAR, but nor TEXT), then index those, as long as the search isn't always of the form <anything>MySearchValue.... In other words, if you're not looking for a match at the start of the string, indexes won't help.

Indexing, to be honest, is almost as complex a topic as database design, if not more-so. However, your database doesn't look too complex, so the above basic rules should be enough. If not, and you have performance problems at some point (on the assumption this isn't a homework exercise)......ask.

Hope some of that gives you a pointer in the right direction.

jc

0
#6 17-07-2005 1:21am

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

homework excercises in the middle of summer?! :eek: Are you mad!

Nah, this is something i'm writing for my own benefit. Mostly to help me learn about database structuring, and also to help me get better at coding. Its a nice project, i've learned a good bit about file input-output, optimising for ram usage (i.e. NOT storing all my temp data in a dataset/datatable, as i'll end up with 150megs of ram usage). String parsing, getting substrings and dividing strings have also come into play a lot.

I still have to get 2 commandline programs hooked into my main program, so thats a good opportunity to learn about calling external programs. All in all, i'm touching on quite a few places

I'll read up on the SQL stuff. I have a fair idea how it works (from working with them) i just don't know the technical names for the different tables and methods of storing. Thanks for the help.

EDIT: I can do join statements alright, tis a basic feature of SQL MY basic idea is if my main table is called "main" and i have my other table called "team_id" i'll do something like...

select team_id.fullname from main
inner join
team_id
on
main.teamid = team_id.listvalue

Basically the "listvalue" in the team_id table is a unique int, and my main table uses the unique int in place of the full team name. So while my main table only contains the integer values, by using inner join i end up matching those integer values to their corresponding character strings... if ya get what i mean

0
Advertisement
#7 17-07-2005 12:49pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

You should do a search for SQL standards, eg naming standards, best practises etc... eg tbl_xxxx for a table, vw_xxxx for a view, usp_xxxx for a stored procedure (fyi in SQL Server, sp_ as the prefix for a stored procedure is slower than usp_ as SQL will look to the master db first for it before going to the correct db, anyhow not relevent to you).

you might as well start off using correct standards, it'll before second nature for you going forward.

0
#8 17-07-2005 12:57pm

is_that_so

Registered Users, Registered Users 2 Posts: 32,136 ✭✭✭✭

Join Date: December 2004

Posts: 31882

Did you normalise all of this before you started? This usually helps with structure and saves you time in creating tables. I'd suggesting reading up on normalising 1st form, 2nd form, 3rd form etc. Have fun.

0
#9 17-07-2005 4:21pm

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

I'm currently in the process of normalising the data. I know what it means... its just doing it is a bit tricky, considaring i've never done it before. I'll read up a little more on it and see if i get it done.

I think the basic principle behind normalising is to have as little data in your "main" table as possible, and to supply all the extra info using as little joins as possible. If you know what i mean So you end up splitting your data into sections, where all the needed info can be picked out easily (if needed).

My next major problem is actually coding a way to insert the relevant data into the right sections of the database. The way its currently coded, i can write a text file with SQL statements in it (such as: insert into TABLE... values(lah blah blah) and then SQLite reads this textfile and executes each of the SQL statements. So all i have to do is work out the right statements to code into the textfile. to make dropping old data and importing new data as fast and safe as possible.

When i get the normalising done, i'll get back to ye with more questions i'm sure

0
#10 17-07-2005 4:37pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

If you're storing in a text file, maybe consider XML. If this project is all about getting knowledge of SQL and general coding, then knowledge of XML etc is also useful

0
#11 17-07-2005 6:09pm

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

Nah, has to be a text file for importing into the SQL server. Theres no way around it. I've just normalised my data (i hope ). I ran it past a friend of mine who knows a bit about databases, and he said it was grand. Told me i had split it down more than was necessary (one of my lookup tables has only 3 rows) so he told me to not bother with that as a lookup table, just write the values straight in. So i will

0
Advertisement
#12 18-07-2005 9:02am

amen

Registered Users, Registered Users 2 Posts: 2,781 ✭✭✭

Join Date: October 1999

Posts: 2598

fyi in SQL Server, sp_ as the prefix for a stored procedure is slower than usp_ as SQL will look to the master db first for it before going to the correct db, anyhow not relevent to you

Not stricly true. this only happens if you don't specify the database so
I have SP usp_Save in BoardsDB
If access usp_boards w/o specifying the database SQL server will first search master and then the current database
if I specifiy BoardsDB.dbo.usp_boards then the current database will be searched.

0
#13 18-07-2005 10:00am

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

fair enough, however in application development you would never hardcode the database name into your query/sp call unless given zero choice.

0
#14 18-07-2005 12:07pm

mewso

Moderators, Science, Health & Environment Moderators Posts: 9,035 Mod ✭✭✭✭

Join Date: February 1998

Posts: 8421

Just one question looking at this. Naming conventions. I know people get into habits but I see the tbl_xxx used so often. Tell me when you look at your list of tables what else could they be? Unecessary if you ask me. I've given up this method. A table for me will just be the object name pure and simple e.g. Customers. No spaces and no underscores for spaces e.g. TeamLocation being alot easier to type without those irritating underscores. Just my opinion.

0
#15 18-07-2005 5:55pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

Whether you use naming conventions or not might depend on the type of operation you're running. Here its a few tables, so sure why have a naming convention ?. True it doesn't matter, but if our friend here is looking at this database as a way of learning how best to design one, bearing in mind performance and indices etc.. he or she may as well learn to do so in a standard manner.
This is industry best practice, something in my opinion should always be aimed for. A DBA at a company where data is important will insist on such standards.
It the same with regular code

At my end, part of what we do is run a large data centre with 10s of millions of transactions a day going through SQL Server 2000 Ent. As you can imagine there are a huge number of db objects being stored eg tables, views, user functions, stored procedures, dts packages etc..

Here it clearly makes sense to have objects named correctly, in fact we go one further and add solution specific information to an object. Developers don't need to have to go figuring out what db object they're using.

In the case of a stored procedure where it is just getting some simple information, it would be prefixed usp_Get_xxxx or in the case of a stored procedure serving up data to a report, sp_Rep_xxxx.

Phew.....

BTW, I'm not a DBA

Mark

0
#16 18-07-2005 7:03pm

mewso

Moderators, Science, Health & Environment Moderators Posts: 9,035 Mod ✭✭✭✭

Join Date: February 1998

Posts: 8421

I'm not talking about not using naming conventions though. I'm talking about using a different one namely the latest Microsoft conventions which consider it reduntant to use prefixes on objects that are self defining. i.e. you know they are tables so why use tbl.

0
#17 18-07-2005 7:17pm

is_that_so

Registered Users, Registered Users 2 Posts: 32,136 ✭✭✭✭

Join Date: December 2004

Posts: 31882

musician wrote:

I'm not talking about not using naming conventions though. I'm talking about using a different one namely the latest Microsoft conventions which consider it reduntant to use prefixes on objects that are self defining. i.e. you know they are tables so why use tbl.

Not everyone follows Microsoft. :rolleyes:
Naming conventions are user-defined standards taking into account "industry standards". If my current DB programming environment dictates that I use tbl then so be it. The important thing is to actually use a convention and be consistent with it.

0
#18 18-07-2005 7:20pm

bonkey

Registered Users, Registered Users 2 Posts: 15,443 ✭✭✭✭

Join Date: April 2001

Posts: 14635

As an interesting side-note on Hungarian Notation (which is the type-prefix naming strategy), I thought the following article was excellent. I've taken it from a longer article here for anyone who is interested...

So now we get back to the infamous Hungarian notation.

Hungarian notation was invented by Microsoft programmer Charles Simonyi. One of the major projects Simonyi worked on at Microsoft was Word; in fact he led the project to create the world’s first WYSIWYG word processor, something called Bravo at Xerox Parc.

In WYSIWYG word processing, you have scrollable windows, so every coordinate has to be interpreted as either relative to the window or relative to the page, and that makes a big difference, and keeping them straight is pretty important.

Which, I surmise, is one of the many good reasons Simonyi started using something that came to be called Hungarian notation. It looked like Hungarian, and Simonyi was from Hungary, thus the name. In Simonyi’s version of Hungarian notation, every variable was prefixed with a lower case tag that indicated the kind of thing that the variable contained.

For example, if the variable name is rwCol, rw is the prefix.

I’m using the word kind on purpose, there, because Simonyi mistakenly used the word type in his paper, and generations of programmers misunderstood what he meant.

If you read Simonyi’s paper closely, what he was getting at was the same kind of naming convention as I used in my example above where we decided that us meant “unsafe string” and s meant “safe string.” They’re both of type string. The compiler won’t help you if you assign one to the other and Intellisense won’t tell you bupkis. But they are semantically different; they need to be interpreted differently and treated differently and some kind of conversion function will need to be called if you assign one to the other or you will have a runtime bug. If you’re lucky.

Simonyi’s original concept for Hungarian notation was called, inside Microsoft, Apps Hungarian, because it was used in the Applications Division, to wit, Word and Excel. In Excel’s source code you see a lot of rw and col and when you see those you know that they refer to rows and columns. Yep, they’re both integers, but it never makes sense to assign between them. In Word, I'm told, you see a lot of xl and xw, where xl means “horizontal coordinates relative to the layout” and xw means “horizontal coordinates relative to the window.” Both ints. Not interchangeable. In both apps you see a lot of cb meaning “count of bytes.” Yep, it’s an int again, but you know so much more about it just by looking at the variable name. It’s a count of bytes: a buffer size. And if you see xl = cb, well, blow the Bad Code Whistle, that is obviously wrong code, because even though xl and cb are both integers, it’s completely crazy to set a horizontal offset in pixels to a count of bytes.

In Apps Hungarian prefixes are used for functions, as well as variables. So, to tell you the truth, I’ve never seen the Word source code, but I’ll bet you dollars to donuts there’s a function called YlFromYw which converts from vertical window coordinates to vertical layout coordinates. Apps Hungarian requires the notation TypeFromType instead of the more traditional TypeToType so that every function name could begin with the type of thing that it was returning, just like I did earlier in the example when I renamed Encode SFromUs. In fact in proper Apps Hungarian the Encode function would have to be named SFromUs. Apps Hungarian wouldn’t really give you a choice in how to name this function. That’s a good thing, because it’s one less thing you need to remember, and you don’t have to wonder what kind of encoding is being referred to by the word Encode: you have something much more precise.

Apps Hungarian was extremely valuable, especially in the days of C programming where the compiler didn’t provide a very useful type system.

But then something kind of wrong happened.

The dark side took over Hungarian Notation.

Nobody seems to know why or how, but it appears that the documentation writers on the Windows team inadvertently invented what came to be known as Systems Hungarian.

Somebody, somewhere, read Simonyi’s paper, where he used the word “type,” and thought he meant type, like class, like in a type system, like the type checking that the compiler does. He did not. He explained very carefully exactly what he meant by the word “type,” but it didn’t help. The damage was done.

Apps Hungarian had very useful, meaningful prefixes like “ix” to mean an index into an array, “c” to mean a count, “d” to mean the difference between two numbers (for example “dx” meant “width”), and so forth.

Systems Hungarian had far less useful prefixes like “l” for long and “ul” for “unsigned long” and “dw” for double word, which is, actually, uh, an unsigned long. In Systems Hungarian, the only thing that the prefix told you was the actual data type of the variable.

This was a subtle but complete misunderstanding of Simonyi’s intention and practice, and it just goes to show you that if you write convoluted, dense academic prose nobody will understand it and your ideas will be misinterpreted and then the misinterpreted ideas will be ridiculed even when they weren’t your ideas. So in Systems Hungarian you got a lot of dwFoo meaning “double word foo,” and doggone it, the fact that a variable is a double word tells you darn near nothing useful at all. So it’s no wonder people rebelled against Systems Hungarian.

Systems Hungarian was promulgated far and wide; it is the standard throughout the Windows programming documentation; it was spread extensively by books like Charles Petzold’s Programming Windows, the bible for learning Windows programming, and it rapidly became the dominant form of Hungarian, even inside Microsoft, where very few programmers outside the Word and Excel teams understood just what a mistake they had made.

And then came The Great Rebellion. Eventually, programmers who never understood Hungarian in the first place noticed that the misunderstood subset they were using was Pretty Dang Annoying and Well-Nigh Useless, and they revolted against it. Now, there are still some nice qualities in Systems Hungarian, which help you see bugs. At the very least, if you use Systems Hungarian, you’ll know the type of a variable at the spot where you’re using it. But it’s not nearly as valuable as Apps Hungarian.

The Great Rebellion hit its peak with the first release of .NET. Microsoft finally started telling people, “Hungarian Notation Is Not Recommended.” There was much rejoicing. I don’t even think they bothered saying why. They just went through the naming guidelines section of the document and wrote, “Do Not Use Hungarian Notation” in every entry. Hungarian Notation was so doggone unpopular by this point that nobody really complained, and everybody in the world outside of Excel and Word were relieved at no longer having to use an awkward naming convention that, they thought, was unnecessary in the days of strong type checking and Intellisense.

But there’s still a tremendous amount of value to Apps Hungarian, in that it increases collocation in code, which makes the code easier to read, write, debug, and maintain, and, most importantly, it makes wrong code look wrong.

0
#19 18-07-2005 7:21pm

is_that_so

Registered Users, Registered Users 2 Posts: 32,136 ✭✭✭✭

Join Date: December 2004

Posts: 31882

bonkey wrote:

As an interesting side-note on Hungarian Notation (which is the type-prefix naming strategy), I thought the following article was excellent. I've taken it from a longer article here for anyone who is interested...

Amen to that

0
#20 18-07-2005 8:32pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

:d

0
#21 18-07-2005 11:16pm

rsynnott

Registered Users, Registered Users 2 Posts: 4,003 ✭✭✭

Join Date: July 2003

Posts: 3930

musician wrote:

I'm not talking about not using naming conventions though. I'm talking about using a different one namely the latest Microsoft conventions which consider it reduntant to use prefixes on objects that are self defining. i.e. you know they are tables so why use tbl.

What, you mean they've abandoned Hungarian Notation? Yay!

0
Advertisement
#22 19-07-2005 7:08am

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

Slightly more back on topic (interesting read though) I have another question (which i think i know the answer too).

Supposing i've processed an XML file, and am just about to write all my insert statements to a text file, which is better:

1) Writing a statement like:
INSERT INTO tbl_main (col1, col2, col3) VALUES (tbl_team.integervalue, tbl_status.integervalue, tbl_location.integer value)

INNER JOIN
table_team
ON
table_team.name = CurrentRowImInserting.teamname

etc etc.

2)
Or would i be better off reading my lookup tables into memory, and then preprocessing my .txt file, and changing all the values there, so my text file actually ends up looking like:
INSERT INTO tbl_main(col1, col2, col3) VALUES(3, 5, 1)

I'm assuming the preprocessing method will be much much faster when importing, and is the recommended way to go. Pre processing the text file probably won't take long either (20 seconds at most). So unless someone can tell me why it might be a bad idea, i'll probably do that.

0
#23 19-07-2005 6:48pm

mewso

Moderators, Science, Health & Environment Moderators Posts: 9,035 Mod ✭✭✭✭

Join Date: February 1998

Posts: 8421

is_that_so wrote:

Not everyone follows Microsoft. :rolleyes:

Theres some irony in this statement considering bonkey's post i.e. using prefixes you may already have been following microsoft . The point being that I personally think Microsoft (who have quite a large effect on industry standards like it or not) saw the light recently and I'm glad to agree with them. Of course you are correct about using and sticking to a convention but this was not my point.

0
#24 19-07-2005 7:28pm

rsynnott

Registered Users, Registered Users 2 Posts: 4,003 ✭✭✭

Join Date: July 2003

Posts: 3930

There's little or no reason to use prefixes with modern development tools. Some of the pitfalls of doing so are here: http://mindprod.com/jgloss/unmainnaming.html (see item 28 on)

0
#25 19-07-2005 9:30pm

bonkey

Registered Users, Registered Users 2 Posts: 15,443 ✭✭✭✭

Join Date: April 2001

Posts: 14635

Mutant_Fruit wrote:

Slightly more back on topic (interesting read though) I have another question (which i think i know the answer too).

Supposing i've processed an XML file, and am just about to write all my insert statements to a text file, which is better:

1) Writing a statement like:
INSERT INTO tbl_main (col1, col2, col3) VALUES (tbl_team.integervalue, tbl_status.integervalue, tbl_location.integer value)

INNER JOIN
table_team
ON
table_team.name = CurrentRowImInserting.teamname

etc etc.

2)
Or would i be better off reading my lookup tables into memory, and then preprocessing my .txt file, and changing all the values there, so my text file actually ends up looking like:
INSERT INTO tbl_main(col1, col2, col3) VALUES(3, 5, 1)

If your SQL syntax supports option 1, I don't see much wrong with either approach.

I'm assuming the preprocessing method will be much much faster when importing,

I would have said that would be only if you're feeding the data over a network connection, which you aren't (i.e. shrink the amount of data you send == faster). A Query object will use the same compiled query multiple times, and - in effect - do the lookups for you (presumably as efficiently as you can code them yourself, if not moreso).

and is the recommended way to go.

Oh. Well go for it then

Pre processing the text file probably won't take long either (20 seconds at most).

If you read, process, insert, repeat....I don't see that it would add any significant time, but in either case, it doesn't sound like 20 seconds is a problem so I wouldn't worry about it.

So unless someone can tell me why it might be a bad idea, i'll probably do that.

Can't think of a one.

jc

0
#26 19-07-2005 10:17pm

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

God damnit! I can't get the database to do my dirty work!

I want to do something that replaces the varchar teamname with the integer value.

UPDATE tbl_main
SET team_name = team_integervalue
where tbl_main.team_name = tbl_team.team_name
INNER JOIN
tbl_team
ON tbl_team.team_name = tbl_main.team_name

Only thing is, this statement throws an error, and i cant seem to fix it. So maybe one of you SQL heads could tell me if what i'm doing is impossible (one site on the net seemed to think so) or if i've just structured my query wrong. Or even suggest a better way to do it.

0
#27 20-07-2005 6:18am

bonkey

Registered Users, Registered Users 2 Posts: 15,443 ✭✭✭✭

Join Date: April 2001

Posts: 14635

1) Does your SQL implementation support INNER JOIN statements in UPDATEs. Many implementations only support them in SELECT statements.

2) If it does, its most likely of the form

UPDATE table1
INNER JOIN table2
ON ...
SET ...

3) If you don't mind me saying so, what your'e doing seems entirely counter-sensical. Why are you updating a varchar field with an integer? More importantly - why are you storing a number in a varchar field....or - if its the number you want - why are you writing a text value initially to a field you want to store numerics in?

4) Even if its right, you should be explicitly converting your datatypes

0
#28 20-07-2005 6:53am

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

i know it sounds mad, but SQLite doesn't actually enforce the datatypes on columns, so it doesn't actually care about me doing that. In a proper database i'd have to first read the database info from the text file into a temp table, and then from the temp table i modify it as i'm trying to do and then store the resulting table into the main table.

I have to do it this way, because my program currently has no way to know which int is going to be assigned to each team. tbl_team has an auto_increment field and a column where teamname is unique (each team only exists once in a lookup table).

Ideally instead of doing this, i'd first read in the team info into the database, and then i'd read it back out along with their new integer ID's and then preprocess my text file and replace the teamname with the integer ID. But it'd be handier to do this from within the database i think.

EDIT: Looks like i'm going to have to preprocess the text file

0
#29 20-07-2005 12:39pm

amen

Registered Users, Registered Users 2 Posts: 2,781 ✭✭✭

Join Date: October 1999

Posts: 2598

why not just modify your insert state in include the join to the teamname table ?

0
#30 20-07-2005 5:01pm

mtracey

Closed Accounts Posts: 264 ✭✭

Join Date: November 2003

Posts: 250

Whats the error message ???

0
Advertisement
#31 20-07-2005 5:31pm

Mutant_Fruit

Closed Accounts Posts: 4,943 ✭✭✭

Join Date: May 2003

Posts: 4697

amen wrote:

why not just modify your insert state in include the join to the teamname table ?

This guy thinks smart! I'll give that a whirl now.

Whats the error message ???

Parse error before '.'

A very informative error message :P I'm assuming joining on an update just isn't supported, which sounds kinda stupid, but the interweb seems to think its resonable. Can't for the life of me think why they can't support it, its a damn useful feature, and can't be that much harder than supporting them on selects.

0
#32 20-07-2005 5:43pm

blorg

Registered Users, Registered Users 2 Posts: 15,995 ✭✭✭✭

Join Date: April 2003

Posts: 15731

I'd recommend getting your hands on a copy of Hernandez's Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design if at all possible, it is an extremely good book for someone starting out learning database design.

0