Need help implementing Google search

Discussion in 'Programming' started by zealus, Jan 25, 2008.

  1. #1
    Hi, all!

    Here's the idea - I need to write a small desktop app to pull Google search results for a query and then analyze organic links (i.e. links that are not from advertisement). Getting PageRank would be nice as well.

    My question is - how do I do it. Now, I don't ask you to write a code for me, but a direction in which to look. I tried Google API, but it seems to be leaned towards online processing, while I am more interested in standalone app.

    Thanks in advance for input.
     
    zealus, Jan 25, 2008 IP
  2. shallowink

    shallowink Well-Known Member

    Messages:
    1,218
    Likes Received:
    64
    Best Answers:
    2
    Trophy Points:
    150
    #2
    Would help if you stated what you are planning to use. Course if you are pulling from Google, it has to have an online element to it. Caching the results maybe? Simple method would be to grab the results page and parse, pulling all links to display. Doing it with perl would be simple but some reason I don't think you want cmd line results.
     
    shallowink, Jan 25, 2008 IP
  3. zealus

    zealus Active Member

    Messages:
    70
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #3
    I am planning to use VB.NET and Visual Studio 2008.

    Was hoping that there is a way to make a call to Google API or some web service that will return the search results only, preferably in XML format. I know, I'm dreaming, but hope dies last :)
     
    zealus, Jan 25, 2008 IP
  4. shallowink

    shallowink Well-Known Member

    Messages:
    1,218
    Likes Received:
    64
    Best Answers:
    2
    Trophy Points:
    150
    #4
    Someone else can tell you better about .NET based stuff than I. But I don't remember them offering an XML format. Would have to parse the HTML on your own. First part, fetch should be easy enough. I'd search(with google) for .net html parser and see what turned up. There should be a class or library to do this, where you can return all a links or any other element from the page.
     
    shallowink, Jan 25, 2008 IP
  5. zealus

    zealus Active Member

    Messages:
    70
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #5
    I do fetch alright, the real culprit is determining where the organic search results start. In pure HTML source view there are hints, but they are missing in the downloaded text.
     
    zealus, Jan 25, 2008 IP
  6. shallowink

    shallowink Well-Known Member

    Messages:
    1,218
    Likes Received:
    64
    Best Answers:
    2
    Trophy Points:
    150
    #6
    Are you getting text with the HTML stripped? If you get the HTML, it looks like the class attributes are keyed which might help. Dunno. I'd have to see what you get from vb.NET to spot a usable pattern.
     
    shallowink, Jan 25, 2008 IP
  7. zealus

    zealus Active Member

    Messages:
    70
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #7
    Thanks for suggestion, I'll try that too.
     
    zealus, Jan 25, 2008 IP
  8. zealus

    zealus Active Member

    Messages:
    70
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #8
    Commentaries are stripped. But I have detected a pattern and actually was able to detect natural outlinks (i.e. all the resulting links that don't lead to google in any way). Now I need to see if I can pull PageRank for them.
     
    zealus, Jan 25, 2008 IP