Free : asynchronous walker for deep recursive objects, with callbacks.

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#21

seductiveapps.com said: ↑

Well, your attitude is improving. Thanks.
Click to expand...

Please remember that ALL the things I'm saying that upset you and you dislike, is still me trying to help have you make better websites. Most everything you are saying and your methodology runs 100% contrary to my 30+ years experience -- almost like you've been taking Kevin Yank's books to heart or something. You are likely finding a lot of people having the same reaction, but failing to articulate it well. See that thread which blew up into little more than personal attacks -- attack the work, not the person; that's my motto. I was almost going to contact you to say "sorry" since many of the posts in that (now locked) thread were insults WITHOUT explanation; surprisingly from people who are usually far more moderate, forgiving and apologistic than I. (and that surprised me a lot).

Remember, I might be insulting the work and questioning your reasoning, but at least I'm taking the time to try and explain WHY.

seductiveapps.com said: ↑

(1) I dont wanna have the server seek through data that the client can search through. My aim is to spend as little on server hardware as possible.
Click to expand...

Thing is if using things like indexes and words-searches means less disk access and better caching models, it would be LESS overhead than reading the entire thing; If it means transmitting less data client-side, it would mean LESS overhead than reading the whole thing. It's why client side processing of entire data sets didn't catch on when people tried it with XML ten years ago and fell flat on their face, and there's NO sign of that changing any time soon! In fact the impending bandwidth crunch shows we may be in store for the exact opposite... That's why things like certain search engines have started penalizing slow loading scripting heavy pages built from dozens if not hundreds of files.

... and really given the number of records of your example, you should be able to host dozens of such sites on as little as an Atom 330 or a two generation old generation single core P4D COMFORTABLY so long as disk throughput is up to snuff. Hell, you'd probably be able to host a dozen or two such sites off as little as a $10/mo or less VPS in this day and age.

seductiveapps.com said: ↑

(2) The photo album is not going to grow to millions of images, it's grow very slowly from about 10k images to maybe 25k images in about 5 years.
Click to expand...

Which is a very small amount of data in terms of filenames and descriptions from a database perspective. Again, indexes and proper use of them should make that a non-issue. If searching descriptions were to become unweildly, then you leverage a search-specific engine like Sphinx or build your own word search index table.

seductiveapps.com said: ↑

(3) The javascript that you claim is so utterly slow in seeking through this data performs near-instantly on my core-i5 for about 12k images. There's hardly a delay if I use my menu to request a "frog" background or a "scenic landscape" background.
Click to expand...

Given what I saw of your "menu" where the hover effects ALONE (even on v8 browsers) took ~20 seconds to even show up, and trying to select anything sent the browser off to "Not responding" land, I'm wondering how that's even possible. Particularly since I've got equal or better hardware to that.

Much less that my own attempts at trying to use a simple array of ~1000 objects for a Canvas based project did the same, tripping the 30 seconds+ it takes for the browser to bitch about the script not releasing in time. Five times that many? How would that even run?

Of course if you are calling from a menu, why would it need a "search" in the first place... Or was that one of the things that you said was in that menu that I was unable to even find? You actually said a lot of things about your "menu" in that other thread that didn't even seem to exist here...

seductiveapps.com said: ↑

(3.1) why use regexp when .indexOf (substring position search) is much more efficient?
Click to expand...

Case sensitivity? More complex searches? Whitespace neutrality? IndexOf is a cute toy when you KNOW exactly what is being searched for (like a file extension you know has is fixed case) but has some... issues when trying to process from user input. Unless you're going to either store a fixed case copy of the description and/or filename and/or force type every search, regex would actually be FASTER.

Also that pesky "has to load the regex" penalty? Can be mitigated by creating the regex OUTSIDE the loop if you're going to go through multiple results.

seductiveapps.com said: ↑

(4) I *would* (and have) use(d) SQL for other types of datasets where JSON *would* fail (for the reasons you think my photoalbum approach is lame, btw).
Click to expand...

Usually

seductiveapps.com said: ↑

(5) There's hardly *any* overhead in JSON for the filenames of a photocollection.
Click to expand...

I would think that would as I said depend on if you are using verbose vs. short and cryptic property names. again though, a few sample records would tell the real story.

seductiveapps.com said: ↑

(6) The 1.6mb is before gzipping. However, gzipping (even on a core-i5, is *slower* in delivery-time-to-the-client than not gzipping). I have yet to work out how to cache gzipped content in a file that can be PHP:readfile()ed. It's not the easiest thing to do.
Click to expand...

That seems REALLY unlikely unless you've not got proper caching models set up... and even then. Hell when it comes to bandwidth restricted throughput gzipped transmissions are ALWAYS faster since it means less IOWAIT at the bandwidth level...

Well, unless you don't have enough traffic; don't get deluded by a complete and utter lack of traffic as the numbers can be WAY different once your pipe gets saturated. One person accessing it from the LAN vs. a thousand people accessing it via a WAN are two whole different worlds -- and even 100mbit doesn't get you very far once you have multiple people accessing it. That's why I keep saying "You're going to kill whatever you are hosting it on".

seductiveapps.com said: ↑

(7) Ofcourse I'm not putting filenames in a JSON file by hand..
Click to expand...

I didn't think so, it was just one of the few things I could think of that would explain why you are making things so needlessly convoluted and/or putting so much time and effort into what should be a non-issue.

seductiveapps.com said: ↑

I have a PHP routine that gets a recursive and filtered directory listing
Click to expand...

For things like descriptions how are you handling that... As PoPSicle mentioned it also seems odd to search by filename since you'd have to be VERY strict about proper file naming conventions -- which is why typically you don't search by name you search by its description field in a database or by adding tags. Particularly since on the web spaces in filenames is bad practice that tends to bite you sooner than later. (besides all those %20 look fugly as hell)

Tags and other metadata just rock from an efficiency standpoint, even if it is a lot of work generating them -- particularly from an existing data set that may lack them; but that's why if you have actual text descriptions it often helps to build a proper word search index. Explode the description into it's words stripping punctuation, total up the instances for "weight", strip out any "block" words like conjunctions and pronouns, and store it indexed. You pull an initial result set by matches against that indexed word table, then if you want a perfect match you run that against the smaller sub-result set.

... and of course if you have result caching configured correctly, it won't even need to go to disk. That's a big thing though about hosting -- RAM is king. I'm a firm believer that in most cases if you program efficiently you don't need a multi-core hyper-ghz (faster than 2ghz) processor for a web server if you can throw RAM at the problem.

Which from my experience a million+ post forums averaging nearly a thousand new posts a day from ~800 active posting users (out of 4K+ registered) and serving out nearly 20K+ reads a day only needs a 2ghz P4D or Atom 330 if you can give it just TWO gigs of RAM and properly configure mySQL and set up a PHP bytecode cache like APC.

Which is a laugh when some people's forums jump through all sorts of goofy hoops; HTTP caching servers like Varnish, Content delivery networks when they barely have enough files to bother (usually because they just blindly believe Google Pagespeed without thinking about it), and a host of other nonsense to all basically cover up developer ineptitude -- ineptitude like using multiple megabytes in a hundred or more files to deliver 15k of plaintext and a half dozen content images that shouldn't even break 300k.

But either way, those numbers make your record counts seems like piss in a can by comparison to what a ten year old middle-of the road server could handle. A modern VPS on a (not oversold) Xeon? Overkill.

I think you are either severely underestimating the server hardware, or aren't configuring it right. Or doing something really stupid like trying to use Winblows as a server... but honestly given how knee deep in code you seem to want this to be, that seems unlikely.

Though if you are testing on home based hosting, you simply don't have enough "distance" from your hosting to have any idea what the rest of the world is seeing. You might want to try picking up a cheap VPS account ($8 to $10 for a blank one MORE than capable of handling the data sets you describe) for a month or two just so you have some idea what it's like to the rest of the world... In the thread that blew up into a bunch of "*** you pal" (which was SO helpful -- not) you seemed really shocked by responses that again, to be brutally frank are ENTIRELY expected given how you are trying to go about things; and I'd be surprised if on any other forums people responded any different -- well, excepting maybe the limp, soft, brown-nosing apologetic suckups over at another site that has the word point in it; but that's because they are fine with telling people how to fail so long as it dupes them into buying one of their books.

That it shocks you and the noodle-doodle fantasy-land numbers and performance claims you have means that there must be SOMETHING different on your end... Maybe sitting on top of the server has something to do with that?

Just trying to figure out why the things you are saying seem like fantasy-land delusions on this end. There has to be something awry on your end for that... Statements like "gzipping takes longer" supporting that conclusion. Hell, on information that compresses more than 30% running realtime disk compression used to result in FASTER disk access back in the 286/ISA days; I can't imagine it's an issue with todays multi-ghz machines and SATA unless your caching is misconfigured.

Out of curiosity, what are you running right now for a server OS and HTTP Server? I'm just trying to figure out how/why you are making claims that seem to have NOTHING to do with how any of this should/would/could work. The reason has to be there somewhere.

deathshadow, Dec 3, 2014 IP

seductiveapps.com Active Member

Messages:: 200

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 60

#22

PoPSiCLe said: ↑

1. Why? I mean, seriously, why? Servers are servers for a reason? And usually way, WAY more powerful than what most people have at home, when it comes to actually parsing data (more RAM, better CPUs).
2. 25k images (ie, 25k rows in a database) should take about 0.1 sec to search through
3. Near instantly on a localhost. No overhead, direct file-access... have you tested this on a remote server at all, via an actual web-connection?
3.1 I haven't done comparisons, but that sounds decent enough
4. SQL, or any other type of database, is exactly that - a database meant to cope with data - datasets, relations etc. Indexes is one of the benefits.
5. Why do you want to push all the filenames - I get it if the user searches for a specific filename, but wouldn't it be better to have a decent meta-tag system instead? So that the filename is irrelevant, but the tags aren't?
6. I'm not gonna go into this bit here. Not my strong point.
7. I've seen online javascript-based galleries that have visual search capabilities with thumbs etc. based on whatever input the user provides. I'm pretty sure neither of those base their approach on JSON. Could you provide a couple lines of the JSON, as @deathshadow asked for?
Click to expand...

1) If you want a website to scale to many users using it at the same time, you should decrease the workload of the server where-ever realisticly possible imo
1.1) If you want to keep your business costs down (aka not invest in fancy top hardware for the servers), you should again make sure the server doesn't do work the client can do (with ease btw, see (3) in https://forums.digitalpoint.com/thr...objects-with-callbacks.2738977/#post-19084611)
2) Takes about 0.01 seconds if you let the client do it And no trip to the server, which is always good imo
3) The only overhead in a non-LAN scenario is the download of the background image itself (until i get fiber ofcourse).
4) I'm not a fan of SQL in webbrowsers, certainly not while developing apps. The overhead of writing for, and administring, SQL servers is just too much imo.
5) meta-tags are in the folder names of the photoalbum i suppose, but you can't get as much detail into a folder/metatag structure as you can in filenames.
6) ok
7) The JSON is as follows:
backgrounds : {
  files : [
   "/path/to/image1.jpg",
   "/path/to/image2.jpg",
   etc
  ]
}
Code (markup):

seductiveapps.com, Dec 4, 2014 IP

seductiveapps.com Active Member

Messages:: 200

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 60

#23

deathshadow said: ↑

Please remember that ALL the things I'm saying that upset you and you dislike, is still me trying to help have you make better websites. Most everything you are saying and your methodology runs 100% contrary to my 30+ years experience -- almost like you've been taking Kevin Yank's books to heart or something. You are likely finding a lot of people having the same reaction, but failing to articulate it well. See that thread which blew up into little more than personal attacks -- attack the work, not the person; that's my motto. I was almost going to contact you to say "sorry" since many of the posts in that (now locked) thread were insults WITHOUT explanation; surprisingly from people who are usually far more moderate, forgiving and apologistic than I. (and that surprised me a lot).

Remember, I might be insulting the work and questioning your reasoning, but at least I'm taking the time to try and explain WHY.

Thing is if using things like indexes and words-searches means less disk access and better caching models, it would be LESS overhead than reading the entire thing; If it means transmitting less data client-side, it would mean LESS overhead than reading the whole thing. It's why client side processing of entire data sets didn't catch on when people tried it with XML ten years ago and fell flat on their face, and there's NO sign of that changing any time soon! In fact the impending bandwidth crunch shows we may be in store for the exact opposite... That's why things like certain search engines have started penalizing slow loading scripting heavy pages built from dozens if not hundreds of files.

... and really given the number of records of your example, you should be able to host dozens of such sites on as little as an Atom 330 or a two generation old generation single core P4D COMFORTABLY so long as disk throughput is up to snuff. Hell, you'd probably be able to host a dozen or two such sites off as little as a $10/mo or less VPS in this day and age.

Which is a very small amount of data in terms of filenames and descriptions from a database perspective. Again, indexes and proper use of them should make that a non-issue. If searching descriptions were to become unweildly, then you leverage a search-specific engine like Sphinx or build your own word search index table.

Given what I saw of your "menu" where the hover effects ALONE (even on v8 browsers) took ~20 seconds to even show up, and trying to select anything sent the browser off to "Not responding" land, I'm wondering how that's even possible. Particularly since I've got equal or better hardware to that.

Much less that my own attempts at trying to use a simple array of ~1000 objects for a Canvas based project did the same, tripping the 30 seconds+ it takes for the browser to bitch about the script not releasing in time. Five times that many? How would that even run?

Of course if you are calling from a menu, why would it need a "search" in the first place... Or was that one of the things that you said was in that menu that I was unable to even find? You actually said a lot of things about your "menu" in that other thread that didn't even seem to exist here...

Case sensitivity? More complex searches? Whitespace neutrality? IndexOf is a cute toy when you KNOW exactly what is being searched for (like a file extension you know has is fixed case) but has some... issues when trying to process from user input. Unless you're going to either store a fixed case copy of the description and/or filename and/or force type every search, regex would actually be FASTER.

Also that pesky "has to load the regex" penalty? Can be mitigated by creating the regex OUTSIDE the loop if you're going to go through multiple results.

Usually

I would think that would as I said depend on if you are using verbose vs. short and cryptic property names. again though, a few sample records would tell the real story.

That seems REALLY unlikely unless you've not got proper caching models set up... and even then. Hell when it comes to bandwidth restricted throughput gzipped transmissions are ALWAYS faster since it means less IOWAIT at the bandwidth level...

Well, unless you don't have enough traffic; don't get deluded by a complete and utter lack of traffic as the numbers can be WAY different once your pipe gets saturated. One person accessing it from the LAN vs. a thousand people accessing it via a WAN are two whole different worlds -- and even 100mbit doesn't get you very far once you have multiple people accessing it. That's why I keep saying "You're going to kill whatever you are hosting it on".

I didn't think so, it was just one of the few things I could think of that would explain why you are making things so needlessly convoluted and/or putting so much time and effort into what should be a non-issue.

For things like descriptions how are you handling that... As PoPSicle mentioned it also seems odd to search by filename since you'd have to be VERY strict about proper file naming conventions -- which is why typically you don't search by name you search by its description field in a database or by adding tags. Particularly since on the web spaces in filenames is bad practice that tends to bite you sooner than later. (besides all those %20 look fugly as hell)

Tags and other metadata just rock from an efficiency standpoint, even if it is a lot of work generating them -- particularly from an existing data set that may lack them; but that's why if you have actual text descriptions it often helps to build a proper word search index. Explode the description into it's words stripping punctuation, total up the instances for "weight", strip out any "block" words like conjunctions and pronouns, and store it indexed. You pull an initial result set by matches against that indexed word table, then if you want a perfect match you run that against the smaller sub-result set.

... and of course if you have result caching configured correctly, it won't even need to go to disk. That's a big thing though about hosting -- RAM is king. I'm a firm believer that in most cases if you program efficiently you don't need a multi-core hyper-ghz (faster than 2ghz) processor for a web server if you can throw RAM at the problem.

Which from my experience a million+ post forums averaging nearly a thousand new posts a day from ~800 active posting users (out of 4K+ registered) and serving out nearly 20K+ reads a day only needs a 2ghz P4D or Atom 330 if you can give it just TWO gigs of RAM and properly configure mySQL and set up a PHP bytecode cache like APC.

Which is a laugh when some people's forums jump through all sorts of goofy hoops; HTTP caching servers like Varnish, Content delivery networks when they barely have enough files to bother (usually because they just blindly believe Google Pagespeed without thinking about it), and a host of other nonsense to all basically cover up developer ineptitude -- ineptitude like using multiple megabytes in a hundred or more files to deliver 15k of plaintext and a half dozen content images that shouldn't even break 300k.

But either way, those numbers make your record counts seems like piss in a can by comparison to what a ten year old middle-of the road server could handle. A modern VPS on a (not oversold) Xeon? Overkill.

I think you are either severely underestimating the server hardware, or aren't configuring it right. Or doing something really stupid like trying to use Winblows as a server... but honestly given how knee deep in code you seem to want this to be, that seems unlikely.

Though if you are testing on home based hosting, you simply don't have enough "distance" from your hosting to have any idea what the rest of the world is seeing. You might want to try picking up a cheap VPS account ($8 to $10 for a blank one MORE than capable of handling the data sets you describe) for a month or two just so you have some idea what it's like to the rest of the world... In the thread that blew up into a bunch of "*** you pal" (which was SO helpful -- not) you seemed really shocked by responses that again, to be brutally frank are ENTIRELY expected given how you are trying to go about things; and I'd be surprised if on any other forums people responded any different -- well, excepting maybe the limp, soft, brown-nosing apologetic suckups over at another site that has the word point in it; but that's because they are fine with telling people how to fail so long as it dupes them into buying one of their books.

That it shocks you and the noodle-doodle fantasy-land numbers and performance claims you have means that there must be SOMETHING different on your end... Maybe sitting on top of the server has something to do with that?

Just trying to figure out why the things you are saying seem like fantasy-land delusions on this end. There has to be something awry on your end for that... Statements like "gzipping takes longer" supporting that conclusion. Hell, on information that compresses more than 30% running realtime disk compression used to result in FASTER disk access back in the 286/ISA days; I can't imagine it's an issue with todays multi-ghz machines and SATA unless your caching is misconfigured.

Out of curiosity, what are you running right now for a server OS and HTTP Server? I'm just trying to figure out how/why you are making claims that seem to have NOTHING to do with how any of this should/would/could work. The reason has to be there somewhere.
Click to expand...

Look, i'm not going to use any tech stacks I don't need. You say my way is 'convoluted'. I'd say it's actually the shortest amount of code to get the job done. By FAR.
And i'm not even talking about the development and administration overhead (and thus fragility) of your proposed SQL solution for my photoalbum that's also used as background picture collection for my site.

As for whitespace issues and such; indexOf is fine if you strip out the whitespace of the searchquery and treat each word in the searchquery as something seperate to .indexOf on.

seductiveapps.com, Dec 4, 2014 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#24

Hm. But... okay. I can see that the json-return of the filenames is not too bad, but if I understand you correctly, that "image1.jpg" isn't a real scenario, it's just an example - and again, if I understand you correctly, you'll have something like: "beautiful_river_with_seagulls_flying_overhead_and_people_sitting_on_the_grassy_knoll_next_to_the_ferry_landing.jpg" or something like that? Which... doesn't seem at all practical to me. Or good. Or fast.
Again, how is it any way more overhead to use a database backend? Perhaps code-wise, you'll have to code a db-manager of some sort, but for updating, adding, fetching and sorting under 10k files? It's a 30-40 line PHP-script.
But ok, you don't like SQL/databases, for some reason. Not sure I understand why, but sure. That's your prerogative - I just don't think what you're suggesting is a better way, or more practical. Not to mention more or less unscaleable - what if you do have a photo-set with a million pictures? It will never, ever be able to utilize your system. It will grind any browser or server to halt within a minute.

PoPSiCLe, Dec 4, 2014 IP

seductiveapps.com Active Member

Messages:: 200

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 60

#25

One of the previous incarnations of my website framework used SQL, and I got extremely frustrated with the development overhead of putting everything in tables, which are 2 dimensional by nature. In javascript land, it's soooo much easier and nicer to be coding in "folder datastructures" like JSON. So SQL is something I only use for specific tasks where it is actually superior to JSON.

And the proper naming of photoalbum images is definately a plus for people looking for something specific (like "river -sunset" as a searchquery) in a fairly big photo collection. I use spaces instead of underscores btw

seductiveapps.com, Dec 4, 2014 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#26

As previously stated, spaces in filenames should really be avoided.
Besides, I'm unsure as to why storing this in a database would be any different than storing it in a json-list.
True, a database-table is by definition two-dimensional, but that's why you have relations - for instance, to cater for a folder-structure, you could have a table for folders, or just a column for folder-path in whatever table you store the filename in.
I still fail to see the benefit a specific filename would have over just tagging the image with proper tags.
Say the names of the files are simple background_1 to background_200000 - each file could have an indefinite amount of tags, if you want, all of course pertaining to the actual motive. If you would use a proper word-search lookup-scheme, this could be next to instantanious, and just return whatever matches the search (or, if you want, you could match the search, and also, if there are no exact matches, do a weighting on parts or each of the search-criteria, and pull up "maybe this" or "similar, but not quite"). All doable via JSON also, probably, but I can't help but think that the filenames will become unnaturally long if you want to cater for everything - given that a filename generally also have a maximum of 255 characters, this could also become a problem, or limit the possibilities quite quickly.

PoPSiCLe, Dec 4, 2014 IP

seductiveapps.com Active Member

Messages:: 200

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 60

#27

PoPSiCLe said: ↑

As previously stated, spaces in filenames should really be avoided.
Besides, I'm unsure as to why storing this in a database would be any different than storing it in a json-list.
True, a database-table is by definition two-dimensional, but that's why you have relations - for instance, to cater for a folder-structure, you could have a table for folders, or just a column for folder-path in whatever table you store the filename in.
I still fail to see the benefit a specific filename would have over just tagging the image with proper tags.
Say the names of the files are simple background_1 to background_200000 - each file could have an indefinite amount of tags, if you want, all of course pertaining to the actual motive. If you would use a proper word-search lookup-scheme, this could be next to instantanious, and just return whatever matches the search (or, if you want, you could match the search, and also, if there are no exact matches, do a weighting on parts or each of the search-criteria, and pull up "maybe this" or "similar, but not quite"). All doable via JSON also, probably, but I can't help but think that the filenames will become unnaturally long if you want to cater for everything - given that a filename generally also have a maximum of 255 characters, this could also become a problem, or limit the possibilities quite quickly.
Click to expand...

And that SQL datamodelling development overhead is exactly what i'm so happy to have left behind me.
I also dont like data spread out between a filesystem and a SQL database, keeping it on just the filesystem is a lot better imo.

The filenames hardly get longer than 100 chars btw, and spaces in filenames these days are zero problem.

seductiveapps.com, Dec 4, 2014 IP

Log in or Sign up

Free : asynchronous walker for deep recursive objects, with callbacks.

deathshadow Acclaimed Member

seductiveapps.com Active Member

seductiveapps.com Active Member

PoPSiCLe Illustrious Member

seductiveapps.com Active Member

PoPSiCLe Illustrious Member

seductiveapps.com Active Member

Useful Searches