20M search queries released by aol

Discussion in 'All Other Search Engines' started by LemonTree, Aug 7, 2006.

  1. RedCardinal

    RedCardinal Peon

    Messages:
    349
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #41
    Not bad - better than the first attempt I've seen. Do you have the complete dataset loaded?

    Just wondering how long it takes before the cease and desist letters start arriving to any site that is making the data public?
     
    RedCardinal, Aug 8, 2006 IP
  2. cffoodie

    cffoodie Guest

    Messages:
    27
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #42
    to rgh..

    haha.. yeah. We're waiting for the desist letters too. Although, we dont' show data according to user.. just the keywords and the site (the important part anyway). And, yes.. this is the entire data set. 36 million rows (we also fixed the one file that was corrupt). So this should be the most complete list..

    Enhancements?
     
    cffoodie, Aug 8, 2006 IP
  3. RedCardinal

    RedCardinal Peon

    Messages:
    349
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #43
    maybe allow advanced search (AND|OR) by field, exact matches?

    TBH I saw one or two other attempts at what your doing and this has been the best so far.

    For me, I'm just waiting for Mysql to index the last 2 parts locally so that I can do what I want :D

    BTW I see that some people complained of corrupt file? Funny I seem to have got a 100% working version. Did you get it on BT or direct DL?
     
    RedCardinal, Aug 8, 2006 IP
  4. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #44
    Thats great. Thanks. :D
     
    mad4, Aug 8, 2006 IP
  5. przemek

    przemek Guest

    Messages:
    49
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #45
    przemek, Aug 9, 2006 IP
  6. przemek

    przemek Guest

    Messages:
    49
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #46
    to see the most searched terms just leave the field blank and press submit
     
    przemek, Aug 9, 2006 IP
  7. cffoodie

    cffoodie Guest

    Messages:
    27
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #47
    Thanks for the feedback. Been told that we've got the best one running by a lot of people. Glad it works so well... be sure to tell other people.

    We corrected the corrupt file (it was #8 of 10 zipped text files). Basically the one file had a couple extra carriage returns where it wasn't suppose to. Cleaned those out and we were able to get it in. It was a direct DL from one of the mirrors posted here the day it was released.
     
    cffoodie, Aug 9, 2006 IP
  8. nikao

    nikao Peon

    Messages:
    216
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #48
    people already did some analyses?

    There are online interfaces popping up everywhere, but all the conclusions I found up till now are just all the voyeuristic profiling of users...

    I did some first analyses here:
    http://www.seo-portal.com/category/aol-data-analysis/
     
    nikao, Aug 10, 2006 IP
  9. brandnewx

    brandnewx Peon

    Messages:
    988
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #49
    I can sense how stupid the management at AOL is. If i were a share holder, i would immediately take back my investments. Now for sure, they do not handle their works professionally. What a mess in organization!

    Lucky me i'm not the investor and even luckier that i don't use AOL.
     
    brandnewx, Aug 10, 2006 IP
  10. infonote

    infonote Well-Known Member

    Messages:
    4,032
    Likes Received:
    68
    Best Answers:
    0
    Trophy Points:
    160
    #50
    I cannot understand the fuss. Each user is identified by a unique id, like in all websites/databases.

    Can you identify a person with a unique id number?

    No

    like you can understand what user 1345672 is
     
    infonote, Aug 10, 2006 IP
  11. Jester

    Jester Well-Known Member

    Messages:
    202
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    148
    #51
    Not unless that user searched for their own name....or SSN.... or something specific and unique to themselves. I've seen several examples online where these results have led people to the actual searcher just by what they searched for.

    Even I've been guilty of searching for my own real name..... ack.

    J
     
    Jester, Aug 10, 2006 IP
  12. brandnewx

    brandnewx Peon

    Messages:
    988
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #52
    If you know regex, you will understand how easy it is to extract all SSNs, emails, addresses, tel no, etc. Any clue you searched about yourself, you'll be spotted. If you're lucky, u only got spams. If unlucky, you'll get a few identity theft cases.
     
    brandnewx, Aug 10, 2006 IP
  13. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #53
    Ok, just threw it up there as a fast keyword suggestion tool, hotlinking the phrases to Google Trends and including estimated searches based on market shares. Let me know what you think.

    -Michael
     
    mvandemar, Aug 10, 2006 IP
  14. stojan

    stojan Guest

    Messages:
    208
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #54
    Ok , I just checked it out here is what I found, why do you think it show the same amount of searches for these terms ?


    sex and the city quotes 2710 1370 667
    bible quotes 2710 1370 667
    fire hot quotes 2710 1370 667
    best friend quotes 2710 1370 667
    life quotes 2710 1370 667


    stojan
     
    stojan, Aug 10, 2006 IP
  15. stojan

    stojan Guest

    Messages:
    208
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #55
    Forgot to mention this is for the search term " quotes "

    stojan
     
    stojan, Aug 10, 2006 IP
  16. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #56
    It was a very small sampling of data that AOL released to the public. It represents 1/3rd of 1% of the searches done in that time period by their users (0.0033). AOL themselves, based on market shares from last November, represents 6.9% of the total shares. I took the average searches over the 3 month period and divided them out for one month. The minimum I included in in my searches is it had to be searched in at least 4 times in the data sampling.

    So if you find something that was only searched on 4 times over 3 months, the number start to look like what you got. :D

    -Michael
     
    mvandemar, Aug 11, 2006 IP
  17. shaggz

    shaggz Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #57
    I made a tool and put it up here: http://www.datablunder.com

    Mine is a *little* different than the others in that it offers the chance to do some advanced AND/OR type queries in regards to the keywords. What it doesn't have now is the aggregate functionality in regards to the keyword totals. I am working on adding that. I am also working on loading the rest of the data. Right now I have 50% of it loaded.

    Keep in mind that adding functionality with this huge dataset is more than just constructing a new SQL query. You really have to be creative in order to avoid performance disasters.

    If anyone has a feature request that they want to see done, post it here. I'll give any suggestions some good thought.
     
    shaggz, Aug 11, 2006 IP
  18. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #58
    I just posted over here about how somebody has analysed the results for the number of people that click on the first result and the second etc.
     
    mad4, Aug 11, 2006 IP
  19. MilesB

    MilesB Well-Known Member

    Messages:
    1,813
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    180
    #59
    Rofl this data is gonna be funny to look at
    Aol suck
    Says me with an aol account :O

    btw is this Aol.com or .co.uk ?
     
    MilesB, Aug 11, 2006 IP
  20. 1EightT

    1EightT Guest

    Messages:
    2,646
    Likes Received:
    71
    Best Answers:
    0
    Trophy Points:
    0
    #60
    aol.com I believe
     
    1EightT, Aug 11, 2006 IP