1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Stop harvest

Discussion in 'PHP' started by TheSyndicate, Dec 21, 2008.

  1. #1
    If i have a html pages a lot of them and i do not want people to just download them. Is there anyway i can stop it with some code in the HTML or something.

    Can i add something in the robot txt or will that stop Google as well?
     
    TheSyndicate, Dec 21, 2008 IP
  2. rene7705

    rene7705 Peon

    Messages:
    233
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    if you mean "i dont want people to spider my whole site and steal everything", that can be done, by having javascript fill in the page after it loads.. i haven't found any way yet to run javascript from the commandline / a script.

    but if you want to display HTML to people, people can rip it out of their browser page by page..
     
    rene7705, Dec 21, 2008 IP
  3. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #3
    You mean with a download program? They can do it anyway right? No one page per page i am not scared its loads of pages.
     
    TheSyndicate, Dec 21, 2008 IP
  4. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #4
    @Rene
    You can with PHP assuming you have some vital components installed.

    @Yellowberry.org - I wouldn't bother, nothing is unrippable when it comes to HTML, anything you try will inevitably be doomed to failure.
     
    Danltn, Dec 22, 2008 IP
  5. chopsticks

    chopsticks Active Member

    Messages:
    565
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    60
    #5
    Most likely, whatever you try to do to stop being from downloading would just be a pest. It would just affect the people who are less web savy, whilst the people who actually want to go in and download the site would be able to just bypass it straight away.

    As an example, think of the right click blockers. They are easy to bypass, and people wanting to save the images will still do so with ease.
     
    chopsticks, Dec 22, 2008 IP
  6. wmtips

    wmtips Well-Known Member

    Messages:
    598
    Likes Received:
    70
    Best Answers:
    1
    Trophy Points:
    150
    #6
    You can try to disallow crawling for certain downloader programs/bots who respect robots.txt. If bot ignores robots.txt, it is useless (in that case you can try to block downloading with .htaccess by user-agent).

    For example you can write this to robots.txt to prevent crawling by most popular downloaders (and to save some bandwidth):

    User-agent: DISCo Pump, Wget, WebZIP, Teleport, TeleportPro, Teleport Pro, WebSnake, Offline Explorer, Web-By-Mail
    Disallow: /
    Code (markup):
    For longer versions of robots.txt see this thread.
     
    wmtips, Dec 23, 2008 IP
  7. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #7
    ahh thats good but google will still be ok?
     
    TheSyndicate, Dec 24, 2008 IP
  8. chopsticks

    chopsticks Active Member

    Messages:
    565
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    60
    #8
    Yes.

    Whatever agents aren't in that list will be fine.
     
    chopsticks, Dec 24, 2008 IP
  9. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #9
    Oki pretty cool, what does all this things crawl for anyway?
     
    TheSyndicate, Dec 29, 2008 IP