Extracting Source Code From Website

Discussion in 'Programming' started by Black_Ryan, Mar 4, 2010.

  1. #1
    I am building a program that I want to extract the meta tags from and entire website, how would I be able to access the source code through programming?
     
    Black_Ryan, Mar 4, 2010 IP
  2. NeoCambell

    NeoCambell Peon

    Messages:
    456
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    NeoCambell, Mar 5, 2010 IP
  3. Garkoni

    Garkoni Active Member

    Messages:
    213
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #3
    So what is the problem? Do a remote include of a file that you need (fopen(), or whatever), and you'll get HTML source if it's an HTML file. Then you'll only need to parse the <head> section and pull out the metas. If I'm not mistaken there is a built-in function in PHP to get meta-tags (check php.net for this). If course, if your application is on PHP.
     
    Garkoni, Mar 5, 2010 IP
  4. Black_Ryan

    Black_Ryan Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thank you NeoCambell for your link

    and Garkoni thank you for your insight, I'm going to look into the built-in function.
     
    Black_Ryan, Mar 5, 2010 IP
  5. QuackWare

    QuackWare Member

    Messages:
    245
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    35
    #5
    QuackWare, Mar 5, 2010 IP
  6. nvidura

    nvidura Well-Known Member

    Messages:
    1,780
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    150
    #6
    If you are trying to write a program in php you can use CURL
    If java you can use BufferedReaders
     
    nvidura, Mar 8, 2010 IP