I need to extract URLs from a piece of HTML code, but how?

Discussion in 'Programming' started by xthoms, Nov 21, 2010.

  1. #1
    Let's say I have some code like this

    Code:
    <a target="_blank" href="http://link.com">
    abc</a> </p>

    But I of course have a lot more. How can I automatically extract the URLs?
    I know the task is simply to return all values between href=" and "> but no idea what code I could use for it.
     
    xthoms, Nov 21, 2010 IP
  2. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You could use regular expressions. You will beyond a shadow of a doubt come across some HTML that won't fit the RegEx though :).

    For fast development (but might break easily): Regular Expressions
    For bulletproof code (but is harder to learn and longer to code): XPath or some selector library like phpQuery


    Edit: sorry, I just came out of the PHP forum and assumed you're talking about PHP.
     
    Deacalion, Nov 21, 2010 IP
  3. xthoms

    xthoms Member

    Messages:
    170
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    30
    #3
    Thanks for ur reply
    Well I just need what i wrote above cus it's just one site i need to be able to do it from.

    it's a site that checks backlinks and i need the URLs by themselves like

    www.link1.com
    www.link2.com
    www.link3.com

    I don't think it's a hard task as I basically just need somewhere where I can insert the code, and that it then finds all values that are inbetween href=" and "> and that it then echoes them. But i'm no programmor so it's not that easy
     
    xthoms, Nov 21, 2010 IP
  4. SterlingS

    SterlingS Greenhorn

    Messages:
    29
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    13
    #4
    SterlingS, Nov 21, 2010 IP
  5. xthoms

    xthoms Member

    Messages:
    170
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    30
    #5
    Your tool doesn't work :/ but some other guys made a script that works perfectly fine so i dont need it anymore :)
     
    xthoms, Nov 21, 2010 IP
  6. matessim

    matessim Active Member

    Messages:
    514
    Likes Received:
    5
    Best Answers:
    1
    Trophy Points:
    70
    #6
    You can make a scraper in a multitude of languages, you could use Java + Regex(Easiest way to search IMO) to do it easily.
     
    matessim, Nov 22, 2010 IP