How Much Space Will I Need?

Discussion in 'Site & Server Administration' started by AHA7, Aug 19, 2008.

  1. #1
    Hello,

    How much space will I need to make a copy of the internet's HTML documents? i.e. How much space will I need to store one copy of each HTML (including XML and XHTML) doument on the web given that I can crawl all (well, almost all) the web?

    I am talking seriously :)
     
    AHA7, Aug 19, 2008 IP
  2. ryandanielt

    ryandanielt Well-Known Member

    Messages:
    1,797
    Likes Received:
    37
    Best Answers:
    0
    Trophy Points:
    185
    #2
    Depends on the size and how many you are copying. At first it is smart to start Medium size like 10GB and then wait a bit and then you should be able to make more of a guess then if not a stable answer.
     
    ryandanielt, Aug 19, 2008 IP
  3. LH-Danny

    LH-Danny Guest

    Best Answers:
    0
    #3
    I'm not too good with math but I will get you started;
    Theres currently over 20billion pages on the web (Yahoo indexed 19billion in August 2005)... You will have to allow for more websites now.
    Apparently the average size of a html file is 25k... You will also have to allow for files bigger than this.

    Times them two numbers together and allow for excess, that should give you somewhere near. I think :p
     
    LH-Danny, Aug 19, 2008 IP
  4. AHA7

    AHA7 Peon

    Messages:
    445
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Let's talk more precisely: Does 10 petabytes of space seem like enough to store all the HTML documents on the web?
    If so, how much do you think that 10 petabyte storage hardware would cost?
     
    AHA7, Aug 19, 2008 IP
  5. Mystique

    Mystique Well-Known Member

    Messages:
    2,579
    Likes Received:
    94
    Best Answers:
    2
    Trophy Points:
    195
    #5
    I have my site hosted on Servage so I never had to worry about hosting large amount of data.

    That is why it's important keep your future requirements in mind when choosing a hosting provider :rolleyes:
     
    Mystique, Aug 19, 2008 IP