[VB or PHP] Google scraper

Discussion in 'Programming' started by jazzcho, Jul 15, 2009.

  1. #1
    Here 's what I need.

    Program A.

    a) Search google for a given keyword, then fetch N results (user can set starting and finishing page). For example, user wants to search for keyword "test page" and get results starting from page 2 up to page 5. Only URLs are necessary, no description needed. You must let the user chose wich local google he wants (for example google.co.uk).
    b) Export these to text (.cvs) format.
    c) Option to use multiple proxies. That is fetch one page 's results from proxy A, the next page 's results from proxy B and so on. Also, it must have a setting to wait X seconds before each query. If a proxy does not function properly, remove it from the list and put it on a failed_proxies.txt file. This means you must have some check to see if you are getting the correct response for google.

    I am paying $10 for a+b and another $10 for c. This is a total of $20. I am paying through paypal.

    Program B.

    a) Load a list of urls from a text file (.cvs). Visit each page and check if it has the "nofollow" tag. (I can provide specific rues for that).
    b) Check the page rank for each page (I will provide you with a way to do it, if you do not know how).
    c) Option to use multiple proxies. That is, one query from proxy A, the next from proxy B etc. Also, it must have a setting to wait X seconds before each query. Just like in program A, it must check if the proxies are functioning properly.

    I am paying $10 for a+b and another $10 for c. This is a total of $20. I am paying through paypal.

    Do notice that if you make me both A and B I will pay you for the c module twice, since it is the same thing. ;) This is a bonus.

    Program C.

    a) User gives a URL. Your program spiders the website for all internal urls and exports them in a text (.csv) file. Internal means thet if you are spidering "www.mysite.com" and it links to facebook, that link is discarded.
    b) It must delete duplicate entries and all the trailing parts that start with a #. For example, if you find a url like "www.mysite.com/foo.html#silly" it must drop the #silly and keep "www.mysite.com/foo.html"

    I am paying $10 for this.

    If you can make all these as one program I will pay you an extra $10.

    All code must be done in either Visual Basic .NET 2008 OR PHP 5. Obviously, the price includes the source code (not obfuscated in any way, comments are NOT necessary), so I can modify it if I need to. I will prefer you to make the modifications, but in case you are not available, I must be able to make the modifications.

    I will pay as you progress. You complete a step, you get the money. Unless you want a different arrangment, we can talk about it.

    Contact me via PM and I will reply to you in maximum 12 hours.

    Thank you.
     
    jazzcho, Jul 15, 2009 IP
  2. samyak

    samyak Active Member

    Messages:
    280
    Likes Received:
    7
    Best Answers:
    4
    Trophy Points:
    90
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #2
    Hi jazzcho,

    I just sent you a PM. Hope to hear from you soon.

    Thanks
    Amit
     
    samyak, Jul 15, 2009 IP
  3. jazzcho

    jazzcho Peon

    Messages:
    326
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 1
    #3
    Thank you all for your messages.

    cosminx2003 has been assigned to the project.

    Again, many thanks for your time.
     
    jazzcho, Jul 15, 2009 IP