A faster filter method

KewL Well-Known Member

Messages:: 245

Likes Received:: 16

Best Answers:: 3

Trophy Points:: 128

#1

I have zero interest in your opinion on whether or not this is something javascript should be used for, don't waste your breath. Just looking to improve performance.

Hey guys doing an on page filter type of thing, and trying to get the fastest function possible, is anyone able to speed this up any faster?

https://jsbin.com/wotisuyalu/1/edit?html,js,console

How it should work:

Data is structured like this:
var articles = {
  1: {id: 1, title: 'poop', category: 2},
  23: {id: 23, title: 'poop 2', category: 4},
}
Code (markup):
Output should be a new object structured the same way.

More concerned with execution time then javascript page weight. Feel free to use any reasonably sized library to help. I'm currently using lodash.

Last edited: Apr 13, 2018

KewL, Apr 13, 2018 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#2

First there's nothing wrong with doing this as an ENHANCEMENT of existing server-side delivery or if this is server-side JS. OR if this is an in-house non-public facing applet scenario. It's only on websites it's a bad idea, and even then there ARE exceptions (such as scripted only games). That client side load WILL suck if put on a normal website so I wouldn't do it there, but there ARE places the approach is valid. Even I'm not that big a jackass about this.

It looks like you've got some sort of library replicating EXISTING FUNCTIONALITY... oh hey it's loadass. Excess function calls can increase the execution time and overhead so ditch those. Such as _.lowerCase which I assume is just a wrapper for String.toLowerCase? Same goes for _.filter... I'm assuming these are arrays, why not the NATIVE Array.filter? (it's not like if you NEED legacy support you cannot polyfill Array directly) -- This is part of why I dislike lodash. It replicates shit that didn't need to be replicated! You've got a perfectly good array there, use normal everyday Array.filter!!!

That they derpingly call a callback a 'predicate' is just more of that obtuse language I'm always bitching about.

Next in MOST browsers (quantum it's a wash) using anonymous functions is slower than static functions, so unless you need the scope binding you might be able to squeeze more speed out of it by not using anonymous functions as your callbacks.

A hair of speed could also be gained if you mutated parameters instead of creating new variables. Such as 'query' where you never use it after you override it's case, so why not just casetype it (toLowerCase) directly instead?

Also skipping semi-colons makes the parser have to work harder... and strange as it sounds storing the length in its own variable is NOT actually gaining you a performance boost here. (when/where that gives you a boost is often counterintuitive, lifting is weird)

Oh, and your test loop is only looping your FIRST action, you probably meant to loop ALL of them. Also the amount of time it takes for the test to run is so low, it's below the 'jitter factor' for the timer granularity of performance.now(). It's why I prefer to loop until timer rollover then loop to see how many I can run inside a specified period of time.

So something more like:
var
	start = performance.now(),
	iterations = 0;

do {
	stop = performance.now();
} while (start == stop);

stop += 1000; // time to run

do { 
  filterArticles('a', 1);
  filterArticles('b', 2);
  filterArticles('th', 2);
  filterArticles('dog', 1);
  filterArticles('a');
  filterArticles('b');
  filterArticles('th');
  filterArticles('dog');
  iterations++;
} while (performance.now() < stop);

console.log(iterations)
Code (markup):
Trying to use a fixed single run or even number of runs then taking the time elapsed can be quite flawed. You're MUCH more likely to get proper results waiting for the timer to update, then running for a fixed period of time. Also lets you set up large tests without significant worry of browser timeout. (#1 mistake I've seen in JS benchmarking tests!)

Lemme see... bah, JSBins malfing autocomplete and uselessly tiny little editor boxes are fighting me, so copy to local and adjust there... Hmm. You know if anything the mapkeys is probably what's ACTUALLY taking the longest here! Be tempted to bench that separately somehow.

Aha... ok, what's REALLY your bottleneck here is the constant swapping back and forth between array and object, array and object, using loadash's "oh we'll treat all objects as arrays in as inefficient a manner as possible".

FIRST thing I'd suggest is that if you want an object indexed by ID, STORE THEM AS ID'S!!!

But for now let's work with what you have and kick fat loadass to the curb. So first we'll make a quick little routine to convert your array to an object:
var articles = {};
for (var i = 0, e; e = articlesArray[i]; i++) articles[e.id] = e;
Code (markup):
Then let's rewrite your routine to ALWAYS work on it as an object, instead of loadass's dumbass convert to array back to object to array rubbish.
function filterArticles(search, status) {

	search = search.toLowerCase();
	
	var
		statusCallbacks = [
			function(article) {
				return search.length ? article.title.toLowerCase().indexOf(search) > -1 : true;
			},
			function(article) {
				return (
					search.length ? article.title.toLowerCase().indexOf(search) > -1 : true
				) && article.published;
			},
			function(article) {
				return (
					search.length ? article.title.toLowerCase().indexOf(search) > -1 : true
				) && !article.published;
			}
		],
		useCallback = statusCallbacks[statusCallbacks[status] ? status : 0],
		results = {};
		
	for (var i in articles)
		if (useCallback(articles[i]))
			results[i] = articles[i];
			
	return results;
}
Code (markup):
Yeah, there we go. Just work on a new object directly during creation. Your original gave me 320 passes on average, this rewrite does 2210, so basically seven times faster.

There may be other optimizations to consider, like removing the use of callbacks altogether -- Quantum's newest version of spidermonkey probably wouldn't care, but it can have a impact in Chakra and V8. F*** knows what the turds that are Trident and Nitro would do.

Some quick tests, averages over ten tests with 5 second cooldowns, higher is better.
                             Original   Rewrite  Improvement
Celeron J1900, 8 gigs RAM
FF Quantum                      61        585        9.59x
Vivaldi (chromium)             118       1047        8.87x
Edge                            46        523       11.36x
IE11                            48        476        9.91x

i7 4770k, 24 gigs RAM
FF Quantum                     148       1145        7.73x
Vivaldi (chromium)             310       2322        7.49x
Edge                           121        990        8.18x
IE11                           123       1149        9.31x
Code (markup):
The IE11 numbers surprised me there, but the JavaScript engine in IE isn't the real bottleneck -- the reason IE JavaScript always ended up so slow is how painfully inefficient the DOM, parser, and renderer (ESPECIALLY the damned parser, why innerHTML was such trash) were particularly given there was NO multithreading even of the render process. Since this doesn't even TOUCH the DOM or live document, we can see the JS performance without that interfering.

Further improvements such as removing the use of callbacks by unrolling the loop MAY be able to provide speedups in some engines... Nah, I just tried it, it provides no real measurable improvement, well within the expected 'jitter'. Hmm... object? Nope, drags performance back down to loadass' way of doing things... well, not THAT bad, still about twice as fast as your original, but why settle for 2x when you can possibly reach 11x?

THIS is why I say so many JavaScript frameworks don't make you 'think' right about solving problems. They don't even CONSIDER the performance impact of constantly derping back and forth between array and object! The 'flawed methodology' of just using lodash for this AGAIN shows something I'm always saying about these frameworks -- it's slower, more complex, and MORE code than if you did it without the framework! See how the rewritten function is only 741 bytes whilst the original was 1052? When I say "smaller, faster, simpler, and easier WITHOUT the dumbass frameworks" I usually know what I'm talking about.

I tossed live examples up here:
http://www.cutcodedown.com/for_others/KewL/filter/

Hope this helps.

Last edited: Apr 23, 2018

deathshadow, Apr 23, 2018 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#3

Oh, if you were to work off the original array instead of working off the converted object, you can speed it up even MORE!


function filterArticles(search, status) {

	search = search.toLowerCase();
	
	var
		statusCallbacks = [
			function(article) {
				return search.length ? article.title.toLowerCase().indexOf(search) > -1 : true;
			},
			function(article) {
				return (
					search.length ? article.title.toLowerCase().indexOf(search) > -1 : true
				) && article.published;
			},
			function(article) {
				return (
					search.length ? article.title.toLowerCase().indexOf(search) > -1 : true
				) && !article.published;
			}
		],
		useCallback = statusCallbacks[status || 0] || statusCallbacks[0],
		results = {};
		
	for (var i = 0, article; article = articlesArray[i]; i++)
		if (useCallback(article))
			results[article.id] = article;
			
	return results;
}

Code (markup):

I added that to the directory as "new2".

Updated bench chart:


                             Original  Rewrite  Improvement   New2  Improvement
Celeron J1900, 8 gigs RAM
FF Quantum                      61        585        9.59x     892     14.62x
Vivaldi (chromium)             118       1047        8.87x    1476     12.51x
Edge                            46        523       11.36x     683     14.85x
IE11                            48        476        9.91x     927     19.31x (WTF?!?)

i7 4770k, 24 gigs RAM
FF Quantum                     148       1145        7.73x    1963     13.26x  
Vivaldi (chromium)             310       2322        7.49x    3249     10.48x
Edge                           121        990        8.18x    1414     11.68x   
IE11                           123       1149        9.31x    1804     14.66x

Code (markup):

Array indexing is faster than objects, more so if you use the "evaluate as assignment" looping method. (best used with arrays of object or other collections where no array index is ever loose true!). The filter will still give you your key mapped object as the result even working off the original array... I'm assuming you had a reason for doing that.

So maybe having that raw array as the starting point isn't such a bad idea.

deathshadow, Apr 23, 2018 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#4

Got an even faster method that's a fraction the code, but requires a slight change in thinking. I keep playing with this on and off through my insomnia. Klytus, I'm bored...What plaything can you offer me today?
function filterArticles(search, published) {

	search = search.toLowerCase();
	if ('undefined' == typeof published) published = null;

	for (var i = 0, results = {}, article; article = articlesArray[i]; i++) if (
		(
			(search.length && (article.title.toLowerCase().indexOf(search) > -1)) ||
			!search.length
		) && (
			null == published ||
			article.published === published
		)
	) results[article.id] = article;

	return results;

}
Code (markup):
It accepts true, false, null, or undefined (aka optional) as the 'published' parameter. Renamed it to make a bit more sense in terms of what it is actually checking. 456 bytes of code, runs about two to five percent faster than the one using "callbacks". leveraging === and typeof can often make for faster and more ... consistent ways when looking for true, false, or either. Just use null/undefined as the either. Also short-circuit evaluation is a friend too, particularly since JS is very strict about how that works.

In theory the if statement should be making it slower by checking inside the loop, but in practice removing the overhead of the function call inside there gains you more than you lose. The check for literal boolean vs. null is also faster than the numeric 'status' you were using. This is 'new4' in the directory.

Naturally the test cases had to be changed thusly:
var
	start = performance.now(),
	iterations = 0;

do { stop = performance.now(); } while (start == stop);

stop += 1000; // time to run

do {
	filterArticles('a', true);
	filterArticles('b', false);
	filterArticles('th', false);
	filterArticles('dog', true);
	filterArticles('a');
	filterArticles('b');
	filterArticles('th');
	filterArticles('dog');
	iterations++;
} while (performance.now() < stop);

document.body.appendChild(document.createTextNode(iterations));
console.log(filterArticles('dog'));
console.log(filterArticles('dog', true));
Code (markup):

Last edited: Apr 23, 2018

deathshadow, Apr 23, 2018 IP

Log in or Sign up

A faster filter method

KewL Well-Known Member

deathshadow Acclaimed Member

deathshadow Acclaimed Member

deathshadow Acclaimed Member

Useful Searches