Okay, I have stock quotes of the NYSE for the last 8 years for every single stock.. so daily quotes (high, low, open, close etc).. I have a few other columns, dollar change, percentage change etc.. now what i wanna do is filter through this data. So lets say i wanna find out give me all instances where a stock went up 10% over 5 consecutive days.. Right now I'm querying the database and checking to see if the condition holds by iterating through the data, is there any better way of doing this, because i'm going through about 6 million numbers and so its taking a long long time in PHP (about 20 minutes actually, with all the additional calculation, indexing etc)
there are few ways ... 1. Normalization of database structure and optimization of queries, if there are relatively large amount of tables involved try and use specific kinda joins like left join, right join .. etc 2. If you are having data in MySQL 5 database it will be good idea to use stored procedure as there are lot of advantages of using stored procedures over inline queries due to pre-compiled statements. 3. If you know the search types which are very frequently getting used try and cache the results in diffrent tables or use XML. 4. It will be a bit difficult and time consuming in early phase for tagging the results and storing them in diffrent tables ... as in case of stockes result from march 2001 - 2005 its not gonna change ..you can add one more column to table and tag them with the month year and also can cache them in XML's You also can come up with few more ideas for optimization, but it will depend on your system requirements and fisibility.
Here is my code: I know I call the single query function a bunch of times, which adds delay.. I know I should change the order of the if statement in the compute single function too.. this script takes about 20 minutes to run.. function computeQuery($ex, $volume, $change, $period){ $query = "select DISTINCT symbol from $ex order by symbol desc"; $res = mysql_query($query); $numberOfRows = mysql_num_rows($res); for ($foo=0, $k=0; $foo<$numberOfRows; $foo++, $k++%4){ $r = mysql_fetch_row($res); $symbol = $r[0]; $criteria = computeSingle($symbol,$ex, $volume, $change, $period); if ($foo%2==0){ $return_this .="<tr><td class=seperate><a href=results.php?symbol=$symbol&exchange=$ex&change=$change&volume=$volume&period=$period>$symbol</a></td><td class=seperate>$criteria</td>"; } else{ $return_this .="<td class=seperate><a href=results.php?symbol=$symbol&exchange=$ex&change=$change&volume=$volume&period=$period>$symbol</a></td><td class=seperate>$criteria</td></tr>"; } } return $return_this; } function computeSingle($symbol, $exchange, $volume, $change, $period){ $query = "Select time,volume,percentage_change,avg_volume,williams,rsi from $exchange where symbol = '$symbol' order by time desc"; $result = mysql_query($query); $numrows = mysql_num_rows($result); $dec = 0; $criteria = 0; $calc_percentageChange = 0; $calc_volume = 0; $calc_avgVolume = 0; for ($i=0, $j = $numrows - 90, $k=0; $i <$j && $i < $period; $i++, $k = ($k+1)%$period){ $row = mysql_fetch_row($result); $res_time[$k] = $row[0]; $res_volume[$k] = $row[1]; $calc_volume += $row[1]; $res_percentageChange[$k] = $row[2]; $calc_percentageChange += $row[2]; $res_avgVolume[$k] = $row[3]; $calc_avgVolume += $row[3]; } $calc_volume; $calc_avgVolume; if ($calc_volume >= $calc_avgVolume * $volume && $calc_percentageChange <= $change){ $criteria++; } for ($i=0, $j = $numrows - 90, $k=0; $i <$j; $i++, $k = ($k+1)%$period){ $row = mysql_fetch_row($result); $calc_volume += $row[1] - $res_volume[$k]; $calc_percentageChange += $row[2] - $res_percentageChange[$k]; $calc_avgVolume += $row[3] - $res_avgVolume[$k]; $calc_volumeP = $calc_volume; $calc_avgVolumeP = $calc_avgVolume; $res_time[$k] = $row[0]; $res_volume[$k] = $row[1]; $res_percentageChange[$k] = $row[2]; $res_avgVolume[$k] = $row[3]; if ($calc_volume >= $calc_avgVolume * $volume && $calc_percentageChange <= $change){ $criteria++; } } return $criteria; } Code (markup):
1. No joins or anything like that.. so that cant be the problem 2. Not using MySQL 5, but i might be able to upgrade, would this be a huge speed advantage? 3. Caching results in XML probably isnt very advantagous.. the same query wont be ran twice, so i doubt it is needed 4. Not really sure what u mean
sadhakim can you please share mysql database structure ... so that we can understand exact relation ship within the tables and also data types you are using ...
very basic data structure... one table for each exchange, symbol varchar(4) time int unsigned open float close float high float low float dollar change float percentage change float volume int avg volume int unique(symbol, time) i'm looking at about 4 million entries into the table, still not very big, i doubt that the database structure is the problem..