1EightT, thanks for the hard work and the URL. As I suspected, the search was very slow. A search on your site for the simple keyword 'home' took over 10 minutes (I closed the browser at the 10 minute mark). It makes one really appreciate how Google performs searches so quickly. I've got three of the 10 data files in my db (~10,000,000 records), and a simple 'Select count(*)...' takes 17 seconds with just one user performing searches. I've got a clustered index set up in SQL Server 2000 on a primary key ID field with non-clustered indexes on the query and AnonID fields. I tried to build a full-text index of the table, but stopped it after 4 hours since it was still building. Anyone having any better luck? I knew this was going to be a tough nut to crack.
All that means is that you didn't know a whole lot before this, which doesn't speak highly for your marketing stills. But I'm sure now that you have this super magic list, you'll be a millionaire in no time. LMAO.
123gotoandplay Extract the txt files from all 10 zip files into a directory When doing a search query with wingrep select the *.txt option Thanks Dr N
Brother 1eightt You have done an amazing job If your site appears on digg you may have some serious problems!!! I hope that you are ready and because you had the vision to do this your income is SERIOUSLY going to increase! If you want to sell or licence the solution then PM me and we will talk!! Serious respect DR N
@natterbu, tx I already did extract one .zip and did a search with .txt But in the help I read Did you ever tried this??
I checked some of my keywords and I see that this is bad. People searching for something and click on results and really there is no information of that kind and quality like on my site.... this is frustrating very frustrating.... I was happier living before without that AOL thing
I couldn't resist... FUAOL.com There you go guys easy fast list of top 5000 keywords, enjoy, I will finish site later add some more stuff to it... Emil
I'm puzzled by this data. I searched for what I thought were popular terms (according to Overture) and hardly found any results. Then I searched on less popular ones and found more results. Either this data is weird, or it is an insignificant sample, or it has been processed in some way, or Google results are very different from Overture. What do you think? As far as people getting into trouble for the searches they did like "How to kill my wife", these really don't mean or prove anything. For a start, one AOL account can be used by many people, e.g everyone who lives in one house. Also, I commonly piggy-back on my neighbour's wireless connection (don't tell him), and so all my searches would appear on his user ID, if his search data was ever made public.
Dynamite - whatever your take is on this release of the search information ... privacy, search, marketing. Someone f***** up bigtime!
Once again download location: http://www.gregsadetsky.com/aol-data/ Tool to grep data on linux etc: http://pegasus.rutgers.edu/~elflord/unix/grep.html Tool to grep data on Windows: http://www.wingrep.com/download.htm Tool to open massive text files instantly: http://www.swiftgear.com/ltfviewer/features.html Thanks Dr N
how did you do that? I tried to create a ranked frequency table of the complete data-set with SPSS on my home pc and it got stuck after a couple of hours. Did you use a server?
I repeat my assertion that something is screwy with this list of keywords. I honestly can't believe that "pogo" is the seventh most popular search term. If it is, we should all be making pogo websites. Hands up anyone with a Pogo site?
there is nothing screwy with the data, just the way it has been interpereted. If a user searchs for "pogo" and clicks 100 of the results for that one search then that keyword is gonna be registered 100 times as seperate results which is not representative of the number of times it was searched for. A better sample would be use each unique search per unique user.
i downloaded the file, opened in WORD erm its working all great, why should i download grepper and stuff? also emil that was a bit inmature to choose that domainname .. but that just IMO