Millions of docs/excel pages, how to store/view online?

Discussion in 'PHP' started by gspabla, Apr 22, 2009.

  1. #1
    Hello all,
    I have a question/confusion,
    one of my friend having millions of documents in .doc format
    each .doc file is about 20mb and having 5 tables
    and he has above 2000000 megabytes of files (more than 2 terabyte)
    now, he wants a website to store all data. all his files mainly consist of tables

    one solution as i know is to convert all files to excel format then to html format and then to index all the files systematically.(i know this is too much time consuming)

    BUT, can u please suggest me any other easy solution doing this.
    also what kind of server do i use? and any other suggestions are most welcome
     
    gspabla, Apr 22, 2009 IP
  2. fourfingers

    fourfingers Peon

    Messages:
    37
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You've got problems

    I would probably try to install apache locally then get php to zip -> upload (curl & fsock allow ftp connections) -> delete (or move if he wants to keep them) the files individually. All this can be done programmatically so you don't have to sweat any of it.

    It's a bigger problem if he's wanting all these to be static pages that google can index.

    I've never heard of a 2TB server so he's potentially looking at having multiple dedicated ones ... I hope he's got some cash.
     
    fourfingers, Apr 22, 2009 IP
  3. gspabla

    gspabla Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    but how it is possible to view a 20mb doc file in website/online??
     
    gspabla, Apr 22, 2009 IP
  4. hasanrosidi

    hasanrosidi Peon

    Messages:
    8
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    you can see
    what the scribd.com do
    to view their doc. online
     
    hasanrosidi, Apr 22, 2009 IP
  5. fourfingers

    fourfingers Peon

    Messages:
    37
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #5
    click a link and wait to download. 20MB loads up in my browser in about 1 minute ... if it doesn't crash firefox. It's probably best to have a download link so that IE won't try to open it locally.
     
    fourfingers, Apr 22, 2009 IP
  6. SmallPotatoes

    SmallPotatoes Peon

    Messages:
    1,321
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Are the documents larger than they need to be? Can you reduce them to data and eliminate the Microsoft cruft that inflates the file size? It depends on what the content is like and what sort of presentation you require.
     
    SmallPotatoes, Apr 23, 2009 IP
  7. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #7
    More importantly, do they have to be documents? If it's possible to just store the content of the documents in (several) database, then it might be more feasible. Although, with this amount of data, dedicate, multiple servers are needed, and it won't be cheap. There is no way you can make this work running on crappy hardware. I wouldn't even try to run a project like this on a home-server, even though all files would be stored locally.
     
    PoPSiCLe, Apr 23, 2009 IP
  8. joxtechnology

    joxtechnology Peon

    Messages:
    146
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    get a web server that can handle all your file and use the scribd api to be able to view it online
     
    joxtechnology, Apr 24, 2009 IP
  9. gspabla

    gspabla Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    no they dont have to be documents
    they a just simple tables in doc format, we can convert it to xml or html,
    any other solution other than scribd?
    pages must be google search friendly

    sample page: http://ecoport.org/ep?SearchType=interactiveTableView&itableId=80010
     
    gspabla, Apr 27, 2009 IP
  10. SmallPotatoes

    SmallPotatoes Peon

    Messages:
    1,321
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    0
    #10
    scribd is indeed an awful solution, don't waste any further time even thinking about it.

    In the very least, gzip all your HTML files. You'll save an incredible amount of space, and make the web pages load about 50 times faster. And it doesn't require any advanced knowledge. The page you linked to is 420KB of raw HTML, and gzips down to 9KB. With one gzip batch job, you can compress your 2 terabytes down to (roughly extrapolated) 43 gigabytes, which is much easier to handle.
     
    SmallPotatoes, Apr 28, 2009 IP