<embed> video code scraper

Discussion in 'Programming' started by devgank, Aug 13, 2008.

  1. #1
    Hey guys, I have the need for some type of script(perl, php) that I can point to a website, and will spider the internal domain links, and return whatever strings I need.

    For example, there is a site that has a ton of videos that I want to embed, so instead of manually navigating to each page, I want to scrape each page for <embed src=" example"> </embed> code similar to this.

    I also want the script to use the page title or tags to identify it when it outputs the URLS to a text file.

    Can someone point me towards a script I can customize( I can code perl and PHP a tiny bit), or give me a quote on something.

    I have looked into beautifulsoup along with python, which is supposed to work well but I need something more tailored to this, as I dont want to spend a ton of time developing it.
     
    devgank, Aug 13, 2008 IP
  2. foreal

    foreal Peon

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Well, In my opinion, BeautifulSoup is the best way to do it. It really simple to get info from html source. And python itself is a very simple language, so if you know php/perl, it shouldnt be a problem for you to code with python.
    I dont think PHP is good for make such things, and perl is (imho) much more complicated then python :)
     
    foreal, Aug 14, 2008 IP
  3. plaggypig

    plaggypig Peon

    Messages:
    33
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    But isn't it the convention to use SWFObject (or other javascript libraries) to do the Flash embedding dynamically?

    You may need to mechanize a browser in order to work with the generated DOM - There's JSBridge for Python (http://code.google.com/p/jsbridge/) which sets up a communications bridge using a Mozilla plugin over TCP, or you could use bindings for gtkmozembed (available in Perl, Ruby, Python). However, it doesn't have methods for communicating with the javascript engine, but I found it easy to hack my way around this by sending the browser to URI's using the javascript: scheme and then receiving results back (in JSON format) by setting document.title (and catching them with the title signal).

    Hope this helps.

    - Andy.
     
    plaggypig, Aug 14, 2008 IP
  4. devgank

    devgank Peon

    Messages:
    59
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Now while it may be best practice to use the SWF javascript code, the sites Im looking to scrape links from are using standard <embed> tags for the media.

    Thats why I was thinking I could just do an html get and strip out those tags dynamically.
     
    devgank, Aug 14, 2008 IP
  5. ngcoders

    ngcoders Active Member

    Messages:
    206
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    55
    #5
    IF you want to scrape videos with embed codes try videoswiper.com , it does more than what u want :) .
     
    ngcoders, Aug 15, 2008 IP