1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Help in Coding Robot / bot

Discussion in 'Programming' started by maverick, May 3, 2005.

  1. #1
    Hi,
    I am trying to figure out just for the sake of knowledge, I hope you all will help me out.

    If I have a website where I have publishers registered and I provide them the javascript code to keep on their site. I want to verify that the javascript is there on publishers' websites. And I want to do it regularly, once in a day. To confirm that the code is still there on the website. My robot or bot (if these are terms used), should be able to check the my javascript code existence on different websites everyday.

    What all I would have to know to do this?

    Can you come up with your suggestions or links to resources.
     
    maverick, May 3, 2005 IP
  2. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Your bot would read publishers' URL from a database; request the URL and scan the returned HTTP response for <script> tags; then it would extract the content of these tags and compare it with what you have for this publisher (you will need to normalize it first); if the database contains only website entry URLs, and you want to check the entire website, you will need to parse all links on the returned pages and follow the internal links, extracting the contents of all <script> tags you find;

    You can do this in any language that supports TCP/IP or HTTP - Java, C++, C#, VB, Perl, etc.

    J.D.
     
    J.D., May 3, 2005 IP
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Or spend $150 on something like Unit Miner which is software that does it all for you.
     
    T0PS3O, May 3, 2005 IP
  4. noppid

    noppid gunnin' for the quota

    Messages:
    4,246
    Likes Received:
    232
    Best Answers:
    0
    Trophy Points:
    135
    #4
    The thing with java script if I understand correctly, you can have it on the page and not have it execute. You may want to think about this.
     
    noppid, May 3, 2005 IP
  5. davedx

    davedx Peon

    Messages:
    429
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #5
    It won't get executed yeah, but I'd think that's what he wants... i.e. to check the source code he gave them to put on their pages is there ad verbatim.

    Btw I'd recommend PHP/CURL. It'd take you like 8 lines of code maybe.
     
    davedx, May 4, 2005 IP
  6. king_cobra

    king_cobra Peon

    Messages:
    373
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #6
    No need for all these mess maverick.

    Just include a tracker. For example create a tracker page in php or perl and put it in ur server. Then in tha javascript code u provide ur publishers, create a line to call that page. I just woke up and my brain is slow. So cant explain more. You can PM me and I will help u out.
     
    king_cobra, May 4, 2005 IP
  7. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #7
    8 lines to get info from the database, parse HTML and validate JS? I really doubt it :)

    J.D.
     
    J.D., May 4, 2005 IP
  8. noppid

    noppid gunnin' for the quota

    Messages:
    4,246
    Likes Received:
    232
    Best Answers:
    0
    Trophy Points:
    135
    #8
    I have been biting my tounge on that one. :p
     
    noppid, May 4, 2005 IP
  9. king_cobra

    king_cobra Peon

    Messages:
    373
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #9
    What is the need to parse html there?
     
    king_cobra, May 4, 2005 IP
  10. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #10
    If your gonna do this and have alot of people using your code, your gonna have to do it quickly, use some kind of data structure, maybe trees
     
    zak, May 4, 2005 IP
  11. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Is that a challange?

    
    mysql_connect(DB_HOST, DB_USER, DB_PASS);
    mysql_select_db(DB_NAME);
    $result = mysql_query('SELECT publisher_id, url FROM publishers');
    while($publisher = mysql_fetch_assoc($result))
      echo $publisher['id'] . ':  ' . (strpos(file_get_contents($publisher['url']), JAVASCRIPT) !== false ? 'Validates' : 'Failed') . '<br />'; 
    
    PHP:
    Yeah, it's messy, but still lines to spare :D
     
    nullbit, May 4, 2005 IP
  12. nevetS

    nevetS Evolving Dragon

    Messages:
    2,544
    Likes Received:
    211
    Best Answers:
    0
    Trophy Points:
    135
    #12
    I recommend Perl & LWP or Spidering Hacks from O'reilly. You can check them out by signing up for a free trial at safari.oreilly.com (it's like 15 days)

    They'll both provide you with enough sample code to get started. I have both books, but I recently subscribed to safari anyways. Makes my life easier to be able to search through the books :)
     
    nevetS, May 4, 2005 IP
  13. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #13
    This doesn't do the job, though. The idea was to validate the script, not just to make sure that there's a script on the page. That is, to make sure that publishers use the provided script and to catch those who just pretend to. You would have to scan the script, ignore differences in line endings and whitespace and compare it to the one that is correct for this publisher or a publisher group. You also will need to handle errors properly or some publishers may get away with not using the script or get penalized for no reason. You also have to implement a retry policy - in case if some publisher is temporarily offline for some technical reason. You also will have to store the results somewhere to make them usable for pinpointing those that fail.

    Can you do this with a dozen of *readable* lines?

    J.D.

    P.S. Don't waste your time - it's not a challange. It's simply not a 10-line project.
     
    J.D., May 4, 2005 IP
  14. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Heh, I was just kidding. It wasn't supposed to be usable code.
     
    nullbit, May 4, 2005 IP
  15. davedx

    davedx Peon

    Messages:
    429
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #15
    20 then maybe... :p

    I wasn't bragging, I was trying to point out PHP is ideally suited to this kind of thing. In C++ it would be significantly more bloated I imagine, unless you used .NET's regular expression functionality.

    Actually Perl would be just as good (duh).

    Hehe :)
     
    davedx, May 4, 2005 IP
  16. maverick

    maverick Peon

    Messages:
    1,191
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Thanks all for your inputs.. I would check out each post tomorrow due to some time constraint... I may come up with some doubts I think.. :)
     
    maverick, May 5, 2005 IP