I recently got into a good debate with a college friend about relational database design, i said its good, he said it makes everything more confusing...what do you people think? example (relationship): -------------- post_table -------------- id title body author_id -------------- author_table -------------- id first_name last_name SELECT s.title, body, a.first_name FROM story_table as s, author_table as a WHERE a.first_name = 'john' AND s.author_id = a.id he said the above was "too confusing" his solution is below: ------------ post_table ------------ id title body first_name last_name SELECT title, body, first_name FROM post_table WHERE first_name = 'john' so, what do you guys think?
I agree with you that relational database really works. Here is my argument for it. Relational database increase the complexity at the time of retrieval query only. But one feature is there named Normalization which actually mean to normalize whole data. From complexity point of view: There won't be such anomalies like update problem in two different table as foreign key exists. OnDeleteCascade... Managing small entity has always been easier than handling whole big table. One more thing. In the example given in the debate topic, it makes no big difference whether or not to use relational database system. The benefits of RDBMS really see in large database environment. It is suggested to refer CJ Date and Korth& Sudarshan's book on DBMS for making this thing much clearer. Nice to have such question on DP. Have a nice time. Thanks
The first way is generally the best, it takes up less space in the database, if its indexed properly the query on the first one will be faster. Interesting you didn't use an inner join in the first query though as what you have suggested is a pretty old syntax and would normally be expressed as SELECT s.title, body, a.first_name FROM story_table as s inner join author_table as a on s.author_id = a.author_id WHERE a.first_name = 'john' However there are reasons for using the second form, normalisation has already been mentioned, a good quick reference for the subject is here, there are times where you just want one table to hit, however that's certainly the exception rather than the norm. While relational databases do look more confusing, as you have to read multiple tables to get the same data you could have in one table once you get your head around the benefits you will see its the way to go. I've seen databases where relational principals are not used, and trying to manage a table where it has over 200 columns isn't fun. A good argument for throwing back at your friend, is what would happen if a post had multiple authors? In a non-relational design you would tack on say a second set of first and last name fields but then that makes all your queries a lot harder as you have to search through the first set and second of name fields, and for 1 extra author that may work, but 50? And yes I've seen DB design done like that. In a relational design you would just put a link table between Posts and Authors such as remove the authorid from the posts table. PostsLinks --------- postid authorid Then you link Authors to the link table then the link table to Posts, and then you have a many to many relationship, so whether they are the 1st or 50th co-author of a post you wont have to do major changes to the database schema or your queries.
This is like saying we should all use bicycles because motorcycles are more complicated. In your example, the second method is obviously more simple. However, when you get into massive tables and situations where it makes sense to separate different elements, then a relational database is the only feasible option. Just for example, what if one of the posts in your table has 5 authors? In a relational setup, it's easy to associate a single to many relationship. In a flat table, it's impossible without either setting an upper limit by using distinct additional author columns, or using a single cell with all of the authors in it and later breaking them out. Either way is restricting and inefficient and defeats the reason for using a database.
Relational databases is a pro however you see it. It does a lot of thing you normally need to do into PHP. Like when you delete a user... without it you need to make a script that deletes all the users records. With it you don't since innodb does it for you.
great analogy! I'm not that familar with inner/outter/left/right join syntax, and thats what lead to my asking my friend for help (seeing he is getting his MS in IT) I thought for sure he would be able to explain JOINs for me, seeing as how I am mainly self taught. Thanks for the great feedback!
Well what you put and what I put are effectively the same thing, and depending on what SQL Engine you use will actually generate pretty much the same query internally. But as I said your syntax is the pre-join syntax, your more commonly likely to see joins. Jen
There is no right or wrong answer. They are both right depending on the situation. The first one is correct if there is a one to many relationionship between authors and posts. Odds are since authors are going to write multiple instances it makes sense to have to tables. If an author is going to write only one post then the second one is the correct way of doing it. Rule of thumb with relational dabases when you are going to have many to many relationship then you have to create another table.
you can always make view with the relations and tables you want, and then just do simple selects on the views ;-)
What i do, is build a star model OLAP database *aka a relational database on crack* for the long term storage of data. For any front end application, I then run aggregrates which build a new semi-flat database or OLTP database. The reason you would do this, is flat=faster.
Your design is the one you should be using 99.99% of the time in real life. But your query is a bit redundant. You shouldn't need to filter on both author ID and author's first name, when the author ID is a primary key (i.e.: unique). You'll already have identified the one exact item you want when you specify the primary key, so adding another, more general definition is not necessary. Filtering on either of them makes sense, but both together, it'd be an unnecessary redundancy.
In his example he would have to do that, AND s.author_id = a.id Isnt filtering on any Author ID, but linking the posts and authors tables because he's not using a join where you are defining tables, but a join in the WHERE clause. Its a perfectly valid syntax for joining although does often cause the confusion you had. Jen