50 GB of bandwidth used by 400 MB site!!! Please help me detecting where problem lies

Discussion in 'Traffic Analysis' started by wowla_123, Apr 11, 2008.

  1. #1
    Here are the current stats of my site right now:

    Size: 500 MB
    Bandwidth used this month: 50 GB
    Daily unique visitors: 100-200 visitors


    I have been allowed just 100 GB of bandwidth per month and since there are other sites hosted too, I'm getting worried where all this bandwidth is used. There are no flash, video, audio or software at the site. The site is in Joomla with articles, forum, picture gallery etc.

    One thing I want to mention is that the site was covered recently by media and the average visitors per day have shot from 20-30 to 100-200. I never had this problem before.

    Can anyone recommend me how I detect this issue? I ma using Google Analytics and I have access to raw Apache log too. There are also 20 email users (each has 5 MB space) but I don't see any option in Plesk to know how much bandwidth each of them is using. I have a feeling there is some problem. Either some email user is misusing or hackers are doing something to the site.
     
    wowla_123, Apr 11, 2008 IP
  2. wowla_123

    wowla_123 Peon

    Messages:
    147
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Here are ths statistics from December 2007 to April 2008 :

    Pageviews:
    [​IMG]


    Bandwidth:
    [​IMG]


    The site has not been changed since December 2007 and the site size is the same. Notice the month of April in these statistics. I'm so worried because only 1/3rd of April has passed and bandwidth consumed is over 49 GB. Please help me!!!
     
    wowla_123, Apr 11, 2008 IP
  3. godsofchaos

    godsofchaos Peon

    Messages:
    2,595
    Likes Received:
    124
    Best Answers:
    0
    Trophy Points:
    0
    #3
    My Site is 2.3 gb approx... it needs 400gb/ month... dont be surprised by the bandwidth usage of your site... its pretty normal.

    However, last month it did have a jump... maybe now you have lots of downloads/images and stuff? Or, people are using up your downloads by linking to your site without you knowing it....

    To prevent it: Simply log in Cpanel, and select the menu called Hotlink Protection and assign yourdomain.com and www.yourdomain.com to be the allowed hotlinkers... and hopefully it will jump down again.

    Cheers!
     
    godsofchaos, Apr 11, 2008 IP
  4. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #4
    You can also reduce your bandwidth usage by simply blocking some spammy bots using robots.txt
     
    manish.chauhan, Apr 14, 2008 IP
  5. wowla_123

    wowla_123 Peon

    Messages:
    147
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks a lot for pointing out. I didn't think about that before. From the traffic I can see that robots are consuming most of the bandwidth. For instance, today 5490 pageviews are by the robots and only 482 pageviews by the normal visitors. Moreover I installed a new tracking code and there is a strange thing. Apart from Google, Yahoo, MSN and Alexa robots, the top two entries are used by other robots:

    1. Twiceler (+http://www.cuill.com/twiceler/robot.html)

    2. psbot/0.1 (+http://www.picsearch.com/bot.html)

    This Twiceler has about 70% of the robots traffic. Anyone has idea about it? It seems to be an upcoming search engine but why is it so aggressively indexing my site, as compared to big players like Yahoo and Google?
     
    wowla_123, Apr 14, 2008 IP
  6. wilderness

    wilderness Member

    Messages:
    43
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #6
    This list is very outdated.

    The mjaority of the harvesters listed there are not robots compliant any way.
    These bots are best denied access via htaccess, which controls rather than requests.
     
    wilderness, Apr 14, 2008 IP
  7. wilderness

    wilderness Member

    Messages:
    43
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #7
    I've had twiceler "denied" since it first appeared.

    psbot has been denied for more than nine years.
     
    wilderness, Apr 14, 2008 IP
  8. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #8
    You can find some related info in this thread:
    http://forums.digitalpoint.com/showthread.php?t=377143
     
    manish.chauhan, Apr 14, 2008 IP
  9. chandan123

    chandan123 Prominent Member

    Messages:
    11,586
    Likes Received:
    578
    Best Answers:
    0
    Trophy Points:
    360
    #9
    i denied all the bots in robot.txt when i faced this type problem
     
    chandan123, Apr 15, 2008 IP
  10. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #10
    All bots means...do you also block google, yahoo and msn...??
     
    manish.chauhan, Apr 15, 2008 IP
  11. chandan123

    chandan123 Prominent Member

    Messages:
    11,586
    Likes Received:
    578
    Best Answers:
    0
    Trophy Points:
    360
    #11
    you given the suggestion buddy ;)

    its like

    *
    disallow /

    why anything wrong ?
     
    chandan123, Apr 15, 2008 IP
  12. wilderness

    wilderness Member

    Messages:
    43
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #12
    You've denied nothing, rather, you've suggested to compliant bots to honor your requests.
     
    wilderness, Apr 15, 2008 IP
  13. chandan123

    chandan123 Prominent Member

    Messages:
    11,586
    Likes Received:
    578
    Best Answers:
    0
    Trophy Points:
    360
    #13
    :D ok thanks

    thats what i made with robots.txt :D
     
    chandan123, Apr 15, 2008 IP
  14. Ladadadada

    Ladadadada Peon

    Messages:
    382
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Twiceler seem to have been an "upcoming" search engine for a very long time now. They also request that you tell them if their bot is getting out of control as they are still working the kinks out of it. Hopefully they will have a way of letting the bot know to slow down a bit on your site.

    On a related note, I got my first ever hit from baidu.com ! After the Baidu spider has been using twice the bandwidth of any other bot for months and months, I finally actually got someone who used their search engine and visited my site. (I'm still not sure whether the bandwidth was worth it...)
     
    Ladadadada, Apr 15, 2008 IP
  15. t2000q

    t2000q Prominent Member

    Messages:
    4,636
    Likes Received:
    192
    Best Answers:
    0
    Trophy Points:
    300
    Digital Goods:
    1
    #15
    maybe people are using your images, pages or downloads for their own sites
     
    t2000q, Apr 15, 2008 IP
  16. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #16
    You can also save your bandwidth by avoiding hot linkings of your images used by other websites. To avoid this, you can add a .htaccess file in your image folder..that will restrict others to use your images...

    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http://yourdomain.com/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com/.*$ [NC]
    RewriteRule .*\.(gif|GIF|jpg|JPG|bmp|BMP)$ - [F]
     
    manish.chauhan, Apr 15, 2008 IP
  17. sergim

    sergim Peon

    Messages:
    143
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Sorry, can issue outdated ... I encountered the same problem .. The same BOT Twiceler. You have denied it in the robots.txt? Maybe those who still confronted with this?
     
    sergim, Aug 9, 2008 IP
  18. webmasterr1

    webmasterr1 Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    It is better to do block via htaccess.
     
    webmasterr1, Aug 9, 2008 IP
  19. kewlchat

    kewlchat Well-Known Member

    Messages:
    1,779
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    110
    #19
    Thats interesting cause i still havent seen cuill bots on my server and ive submitted to them several times..

    can i ask you.. did you submit any sites manualy or did thay just come?
     
    kewlchat, Aug 10, 2008 IP
  20. sergim

    sergim Peon

    Messages:
    143
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #20
    I added a site to directories (used service). How to define it now, with a catalog apparently ..
     
    sergim, Aug 10, 2008 IP