what would you guys like to see us do with the data to make it easier for you to use and gather information from?
Breaking that out into major categories would be even more beneficial. If the aggregate data from fuaol.com was broken out into ie: business, porn, gov, etc. This is at least what I would like to see.
I'm contradicting what I said earlier, but there are duplicates in the data, if we look at a section of the file like this: 2722 microsoft adaware 2006-05-24 14:23:18 2 http://reviews.cnet.com 2722 microsoft adaware 2006-05-24 14:23:18 1 http://www.microsoft.com 2722 microsoft adaware 2006-05-24 14:23:18 1 http://www.microsoft.com 2722 microsoft adaware 2006-05-24 14:23:18 1 http://www.microsoft.com We see that this wasn't three separate queries, it only one query with 4 clicks. These should be only counted once if you were doing a raw count of queries. I'm not sure if the people building the interfaces to this data are doing that or not. If you're setting this up in a database, it might be best to separate it into two table, one for queries (the first three columns) and one for click (the last 2 columns) There is a one-to-many relationship between queries and clicks.
Well its only .3% of searches or something like that. I can see the value in this information, but especially for more obscure phrases, they could be one-offs, its hard to know what phrases are worth anything. Alot of them could have been searched for once and never again.
Exactly. This is a very small sample. Its a subset (1% of AOL users) of a subset (US AOL users) of a subset (Internet users). Don't get me wrong, this list has value, but I don't think as much as people think it does. The other thing that lessens its value is that it's a snapshot of a timeframe already 3 months ago, and as time passes it's going to be less relevant. A lot can change in 3 months, at least in the niches I follow.
I hear ya.. but other people have pointed out the obvious. One.. small sample. So the top searches here are hardly relevant. It's in the micro behavior that's available here that's interesting.. two.. it's already available. Guess what the top searches are for? Top 100 site, top 100 site, top 100 site, porn, porn, porn.. every screw head under the sun is chasing those keywords. In any case.. the list contained 9.3 Million distinct searches. We'll list the top 1000 on the site tomorrow in order.. I guaratee you'll be unimpressed..
thanks for the good conversation here guys. I'm removing the direct url requests right now, and doing some counts on the data to make it more usefull.
Like has been mentioned here before, this data is useful because it's deep, not wide. I think it would be cool to have tools to dig into individual user's queries. For example, lets say I have a web site about dog-friendly hikes. I would want to do a query that showed me all the users who had "dog" or "hike" in their search history. I could then click in to see their overall search history to see what else they were searching for, what key words they were using, etc. Or it might be useful to do things like find any user who clicked onto a certain web site (like my competitors) and find out what the top keywords were for just that subset of users. In other words, what else are visitors to www.mycompetitor.com into that might entice them to visit my site?
very unique idea. I'll see what I can do to implement that. I'm doing some overall counts and such right now, but when those queries finish i'll see what I can do to allow this type of searching.
Interesting short NY Times article about this list.... a reporter tracked down a particular user. Smell the lawsuits cookin? http://www.nytimes.com/2006/08/09/t...&ex=1155700800&partner=MYWAY&pagewanted=print
nice resource, but the data beng given away dint seem to be fair, probably this must be old data, results keep changing yet gona be helpful for webmasters
that's not really the main point though, at least for me. What I like about this databse it that I can get 50.000 search strings that contain "garden" at the press of a button and rank them in order of search frequency. No other tools give so much volume
Here's another news link regarding AOL's remorse for releasing the list: http://www.wral.com/technology/9646239/detail.html
Hey all.. update www.dontdelete.com. Couple new features: 1) Search by domain - shows keywords used to get to domain, unique users and over query searches. 2) Random user search - HOURS of entertainment.. display a user's search terms.. this is just for fun but cool to see one person searching for teddy bears.. the next for escorts in Miami.. haha.. Have fun.. we'll be adding a few other features throughout the day.