Database design question

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#1

I have an app with the posibility for users to login( I created a table users, set the columns and it is working good). The problem is that every user can add other users to a favorite list. What would be the best way to store the favorite_users list for all users. I know that I could set a table like( id_parent_user, id_favorite_user ) but if there are 500 000 users and each has 50 there would be 25 000 000 rows. Is there a faster storage design than this ?

Solved! View solution.

Chetan Tudor, Jan 24, 2016 IP

sarahk iTamer Staff

Messages:: 29,016

Likes Received:: 4,584

Best Answers:: 124

Trophy Points:: 665

#2

It depends on how you plan to use the list but on the face of it you do need that huge table.

If you only ever needed to display their favourites you could serialize the list and store in a text field but you'd never be able to show a list of those following a user if you took that approach. I'd keep it simple and flexible.

sarahk, Jan 24, 2016 IP

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#3

Unfortunately I have to show that list...
So, I have to get used to that huge table.

Other suggestions are welcomed.

Chetan Tudor, Jan 24, 2016 IP

sarahk iTamer Staff

Messages:: 29,016

Likes Received:: 4,584

Best Answers:: 124

Trophy Points:: 665

#4

Make the table innoDb
I'd expect MySql to be able to handle that table.

sarahk, Jan 24, 2016 IP

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#5

I have read a few articles and they said that MySQL is supposed to handle even 1 billion records table pretty good. If the table is designed good enough, of course.

Chetan Tudor, Jan 24, 2016 IP

deathshadow Acclaimed Member Best Answer

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#6

As long as you keep those numbers as NUMBERS (integer), and have the "parent user" ID indexed and keyed, the overhead of the table size should NOT be that big a deal -- particularly if you keep it SEPERATE from other data for the user.

Most indexes these days are some form of binary tree or other additive/shifting sort, as such pulling a value by ID shouldn't take more than a few dozen reads to find the first one, and nothing but sequential reads to find the next ones.

Basically, so long as you created the table as something like:
CREATE TABLE favorite_users (
  id_parent_user INT NOT NULL,
  id_favorite_user INT NOT NULL,
  INDEX (id_parent_user, id_favorite_user)
)
Code (markup):
without ANY other information in it, I wouldn't worry about the table size as much... Note I would index BOTH fields as it would be likely useful to have users able to look up who has them as a favorite.

Though in the 64 bit era, I'd be tempted to make those BIGINT for good measure. Any "real" difference in disk storage would get eaten up by the cluster sizes anyways... just in case you get more than 2.1 billion users.

Even if you had over 2 billion users, if the index is a optimized adjusting btree or any other competant indexing system the most 'looks' at the table it should take to find that first match is 31. at 4 billion that only goes to 32. MIND YOU from time to time you may have to spend the time to re-optimize your indexes to make sure the "tree" depth is low, but really that's what good indexes are for. With numeric keys it can be BRUTALLY efficient -- which is why you are usually better off

Splitting data into smaller tables by when/how they are used can also alleviate overhead woes. For example having a table of just ID, username and hashed password can speed up login, you don't want to waste time reading from disk the rest of that information if you aren't using it right then and there! For login purposes using the username as a unique primary key speeds up things a lot -- concepts that laughably a LOT of existing systems flat out ignore.

A lot of things like this it just helps to understand what indexes are, and more importantly how they work. BTREE's are a very common and effective indexing method, more so if they are "self flattening" to keep the "depth" of the branches shallow.

Let's say you had 7 records numbered 1..7. A "binary tree' that's "flattened' of that data would have a list of "pointers" indicating "previous" and "next" item in the tree. A normal search would start at #1 and go through until it got to the one you're looking for -- let's say you were looking for #5... that would take five 'reads" to get to the data. On a binary tree:
               4
              / \
             /   \
            /     \
           2       6
          / \     / \
         /   |   |   \
        1    3   5    7
Code (markup):
a search for 5 would start at #4 (tree.firstNode), go "it's greater than this" and go to the "tree.firstNode.nextNode" which is 6. It sees that's less than and goes for tree.firstNode.nextNode.previousNode coming to our 5. That means it too three operations instead of five to get to it.

While this may make some searches -- like for 1 -- slower, with an optimized tree our 7 record search would never take more than three operations to get to it.

IF you do the math of a top-down search comparing the item index to the number of searches:

Item 1 = 1 Searches
Item 2 = 2 Searches
Item 3 = 3 Searches
Item 4 = 4 Searches
Item 5 = 5 Searches
Item 6 = 6 Searches
Item 7 = 7 Searches

That would add up to 28 searches looking for all of those elements by index name, for an average of 4. Compare that to our BTREE

Item 1 = 3 Searches
Item 2 = 2 Searches
Item 3 = 3 Searches
Item 4 = 1 Searches
Item 5 = 3 Searches
Item 6 = 2 Searches
Item 7 = 3 Searches

That's 17 searches, with an average of 2.42, roughly 5/8ths the time to do the same thing.

Btrees gain in efficiency the 'deeper' the data set gets in size. Every time you add 1 level of "depth" to the tree you double the number of elements it can hold.. if you had 255 elements that would have a maximum search-depth of 8... a linear top-down would take 255 reads to get to it. BIG difference. It's all related to binary since every time you add one bit of "depth", you double the possible value range; exponents of 2.

2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536, etc, etc, etc... You get to 65K records if the tree is self flattening there's no reason for a lookup of ANY value to take more than 16 reads of the index! THAT'S powerful. (and freaking AWESOME!)

There are other sorts and indexing methods, but they all work on the same principle of reducing the number of searches in the index needed to find the corresponding data -- and is why on a well optimized and built indexing system, there's no reason for a ~2.147 billion record table to have a search-depth by a indexed integer field of more than 32, with sequential reads looking for the same index just being another flat read with no extra search-depth needed; as the "nextNode" from that current index would be the next item assuming the ID matches.

deathshadow, Jan 25, 2016 IP

Nigel Lew, HCFGrizzly and sarahk like this.

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#7

Oh, some of you may notice I say "nodeFirst", "nodeNext, and "nodePrevious" -- these may sound familiar, kind of like "firstChild", "nextSibling" and "previousSibling" in JavaScript. This is because pointers to other objects -- a "pointered list" is one of the easiest ways to create a binary tree. The DOM created by browsers in relation to HTML elements is very similar in structure and methodology, just for a different purpose... THOUGH, the tree-like structure of the DOM is one of the ways in which the application of CSS selectors (of all types) can be applied as quickly as it is.

... another reason CSSLint and Google PageSpeed's claims of tag selectors and complex selectors being "slow to render" is a bunch of ignorant bullshit!

Objects -- more so object lists and trees -- often incur a good deal of memory and setup overhead, more so as you add new objects to the tree. Where they shine is when it's go time to apply values based on inheritance, or to search through the tree for specific items. When I learned OOP in Object Pascal, Modula and Smalltalk back in the '80's, binary trees were one of the first things they used as an example to teach the concepts.

That's one of the big trade-offs of indexes -- it introduces more overhead on write as the index has to be updated when there's new data -- but it speeds up reads greatly... since in the majority of cases things like databases are "write once, read mostly" on data (WORM) the extra overhead on write is well worth it.

Same thing happened with disk compression at one point; the drives (this is back in the RLL/MFM/286/IBM AT days) were so slow to write or read to (fastest of the time were write at like 48k/sec and read at 250k/sec), compressing data in realtime on the CPU meant less bus activity, resulting in faster disk access on any data that could be quickly and easily compressed. Sadly -- or thankfully depending on your point of view - disks are now fast enough we don't see that anymore.

I still remember a LOT of Clipper and Paradox databases that saw 50-70% speed improvement back then just from using SuperStor or Stacker... more so if you had the dedicated stacker hardware card. Of course that's where some of my disgust at data handling today comes from; dual processor quad core Xeons with HT and gigs of RAM being choked to death processing data sets I was handling in a fraction the time twenty years ago on a 486 with 32 megs of RAM and a 250 meg SCSI II drive, serving MORE users over Novell Netware.

People, hardware and software seem to be choking on data sizes that were considered tiny in the mid 1990's. Of course, this stems from a complete lack of programming efficiency, ignorance of good practices, ENDLESS layers of abstraction, the ignorant nonsense of "more code will fix it" and the equally flawed attitude of "we can always throw more hardware at it".

Last edited: Jan 25, 2016

deathshadow, Jan 25, 2016 IP

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#8

Thank you very much, your explanations and details helped me very much. I am lucky I have studie binary trees at school and understood your explanations.

One last quiestion left. Every user has a "profile settings" tab where it has 14 "options" about him, that e can modify( 4 of these are textareas, with a maximum of 255 characters ). What would be the best way to store all of this, as I have to retrieve all of these at a time, to be displayed on "settings" tab when the profile is loading ?

Chetan Tudor, Jan 26, 2016 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#9

Just make a table with the user_id as primary, and have the rest of the settings in columns - since a user can never have more than one ID (I presume), fetching that information based on the ID of that user should be next to instantanious.

PoPSiCLe, Jan 26, 2016 IP

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#10

Every user has only one id. I was concerned not to have too many columns. Thank you.

Chetan Tudor, Jan 26, 2016 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#11

While the amount of columns can play in, in this case, it's irrelevant, since you're gonna fetch the whole row anyway (all content from that row, all columns) - hence, no problem. Anywho, too many columns is when you end up with having to scroll sideways for a few minutes on a full-HD screen to see all of them.

PoPSiCLe, Jan 26, 2016 IP

Chetan Tudor Greenhorn

Messages:: 35

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 6

#12

I got the point. Thanks again.

Chetan Tudor, Jan 28, 2016 IP

Log in or Sign up

Database design question

Chetan Tudor Greenhorn

sarahk iTamer Staff

Chetan Tudor Greenhorn

sarahk iTamer Staff

Chetan Tudor Greenhorn

deathshadow Acclaimed Member Best Answer

deathshadow Acclaimed Member

Chetan Tudor Greenhorn

PoPSiCLe Illustrious Member

Chetan Tudor Greenhorn

PoPSiCLe Illustrious Member

Chetan Tudor Greenhorn

Useful Searches