Ello, I need a custom script made :)

Discussion in 'Programming' started by ly2, Nov 21, 2007.

  1. #1
    It's basically a tool that strips an article from a given site. It will remove links, banners, ad's, popups, the need to go to multiple pages to read a story, etc.

    The idea is to make it so you can read an article or news story without having to deal with all the bullshit that comes with it.

    I'm sure you have all seen the URL shortener scripts. It takes a long URL and makes it small.

    I'm kind of going for the same thing here, only with news stories.

    Basically I want a user to be able to put a URL of a news story in the box and hit the submit button. After they hit the submit button, I need the script to fetch the news story / article and strip the text and related article images(if there are any) and re-post it on my domain giving it a new URL like "mydomain.com/story12345" or whatever.

    This may not make much sense, if you have any questions please ask me about it.
    Thanks a lot!
     
    ly2, Nov 21, 2007 IP
    Will.Spencer likes this.
  2. commandos

    commandos Notable Member

    Messages:
    3,648
    Likes Received:
    329
    Best Answers:
    0
    Trophy Points:
    280
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #2
    nice idea

    but there will be a lot of text that will have no-sence (like menu , footer , header , ..)

    same for images , if u get all images from a specific site and just post them ...
     
    commandos, Nov 21, 2007 IP
  3. bytestor

    bytestor Peon

    Messages:
    409
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #3
    RSS does what you say but in a bit different manner to pull the details and keeps feeding in your site. Its almost the same I believe to ask.
     
    bytestor, Nov 21, 2007 IP
  4. commandos

    commandos Notable Member

    Messages:
    3,648
    Likes Received:
    329
    Best Answers:
    0
    Trophy Points:
    280
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #4
    test this : HERE
     
    commandos, Nov 21, 2007 IP
  5. thenetninja

    thenetninja Peon

    Messages:
    314
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #5
    try this... or something along these lines...

    $f=fopen("http://www.news.com","r");
    $contents=stream_get_contents($f);
    fclose($f);
    echo strip_tags($contents);

    this will strip all the tags from teh content, however you will still get javascript. All you then need to do is run substr() on the $contents (right after downloading from the remote source) and only keep the bits between <body> and </body>
     
    thenetninja, Nov 21, 2007 IP
  6. ds316

    ds316 Peon

    Messages:
    154
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #6
    This is quite possible, although it will have to have specific processing code for each news site. Simply stripping the tags won't do a whole lot, as you are still going to be left with a bunch of links, navigational bars, etc. I've sent you a PM outlining what could be done.

    Also RSS feeds would work to some extent but they only show like the first paragraph of a story usually.
     
    ds316, Nov 21, 2007 IP
  7. ly2

    ly2 Notable Member

    Messages:
    4,093
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    205
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #7
    Thanks guys, I have found someone.
     
    ly2, Nov 22, 2007 IP
  8. ly2

    ly2 Notable Member

    Messages:
    4,093
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    205
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #8
    Not bad actually, considering you did that without even being hired. I mean, it doesn't work perfect, but it's certainly on the right track.
     
    ly2, Nov 22, 2007 IP
  9. commandos

    commandos Notable Member

    Messages:
    3,648
    Likes Received:
    329
    Best Answers:
    0
    Trophy Points:
    280
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #9
    Yeah its only like 4 lines of code :)

    there is still some stuff to remove and to fix images issues (since links in source are usualy like this " ../images/img-url" should be "domain/images/img-url")

    and last thing is easy just insert in db and make mod_rewrite to have domain/article/id
     
    commandos, Nov 22, 2007 IP
  10. chandubhai

    chandubhai Banned

    Messages:
    556
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #10
    I am interested. Please PM me
     
    chandubhai, Nov 23, 2007 IP