Please help: Stripping spaces in URLs in HTML files, ready to pay

Discussion in 'Site & Server Administration' started by SumitBahl, Jun 7, 2007.

  1. #1
    Hello everyone

    I have a website. The problem is that the URLs of that website have spaces in it. For example:

    abc.com/ab cd/abc def.html
    abc.com/ab cd/abc def.jpg

    The website has a lots of wallpapers in it and a total of more than 100,000 files.

    Directories, Sub Directories, Images all have spaces in the filename and their URL path as well.

    Is there any solution to replace the spaces with either '-' or '_'?
    Can we do this in htaccess, without changing and uploading files. Website size is around 12GB, so its pretty difficult to change the filenames and the URLs in HTML files and upload it again.

    If
    you can build me a custom solution for this, I am ready to pay as well.

    Any help is appreciated.

    Thanks
    Sumit
     
    SumitBahl, Jun 7, 2007 IP
  2. deques

    deques Peon

    Messages:
    206
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    with php you can use regular expression functions
    
    <?php
    
    $string = "This has spaces";
    echo str_replace(" ", "_", $string);
    
    ?>
    
    PHP:
    You see between those quotes, the first is what you are replacing and the second is what you are replacing with. in this case a space " " being replaced by an underscore "_"
     
    deques, Jun 7, 2007 IP
  3. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #3
    Will it work in plain HTML files?
    I think i will have to change all the files extension to .php for this code to work, right?
     
    SumitBahl, Jun 7, 2007 IP
  4. deques

    deques Peon

    Messages:
    206
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    yes you need to rename the files

    but from the looks of your file structure it wont rename the files and folders to names with underscore.

    i suggest you create a new topic and ask how to rename files and folders to names with underscore. i dont know how to do that.
     
    deques, Jun 7, 2007 IP
  5. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #5
    I know how to do a mass rename of file and folders. But that not the solution I am looking for. If i rename the files and folders, then I will have to do a search and replace in almost 35,000 files and upload a total of 125,000 files accounting to 12GB.

    I am looking for a solution around that.
     
    SumitBahl, Jun 7, 2007 IP
  6. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #6
    Options +FollowSymLinks +Indexes
    RewriteEngine on
    RewriteBase /
    RewriteRule ^([^.]+)-([^.]+)/([^.]+)-([^.]+)\.html$ $1 $2 $3 $4.html [L]

    RewriteRule ^([^.]+)-([^.]+)/([^.]+)-([^.]+)\.html$ $1%20$2%20$3%20$4.html [L]

    Though I think the spaces will keep it from working.
     
    Nintendo, Jun 7, 2007 IP
  7. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #7
    Thank you for replying.
    It makes sense. Let me try it out.
     
    SumitBahl, Jun 7, 2007 IP
  8. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #8
    I copied the data to .htaccess file and uploaded.
    Gave out a Internal Server error. :(

    This is the site in question >> 76.163.104.14

    I will change the DNS when the site is fully uploaded and funtional.
     
    SumitBahl, Jun 7, 2007 IP
  9. inworx

    inworx Peon

    Messages:
    4,860
    Likes Received:
    201
    Best Answers:
    0
    Trophy Points:
    0
    #9
    May be an error by Apache of your server.

    not sure though.
     
    inworx, Jun 7, 2007 IP
  10. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #10
    I dont think so.

    Nintendo, Can you give me a solution to this problem.
    Thanks in advance.

    I have changed the name servers, they would be populated by tonite.
     
    SumitBahl, Jun 7, 2007 IP
  11. agnivo007

    agnivo007 Peon

    Messages:
    4,290
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    0
    #11
    To get a permanent solution, a perl script can be written which reads every single .htm(etc.) file in the mentioned folder and replace spaces with - if its is in a url form...a bit complex solution to program though.
     
    agnivo007, Jun 7, 2007 IP
  12. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #12
    I didn't understand this point. You mean, I will have to manually do it?
     
    SumitBahl, Jun 7, 2007 IP
  13. rodney88

    rodney88 Guest

    Messages:
    480
    Likes Received:
    37
    Best Answers:
    0
    Trophy Points:
    0
    #13
    There's two parts to changing URLs - renaming the files and updating your links with the new names.

    The mod_rewrite solution can transparently rewrite the spaces to dashes to give the effect of having renamed the files but it won't affect the output of your pages.

    If you already have everything matched up, i.e. the links pointing to the correct files but simply want to change spaces to dashes in both the filenames and the existing links, it's a bit of waste of time. The only possibility I can think of is forcing every request through a script that replaces all the dashes in the requested URI with spaces, looks up the corresponding file, parses it's contents to replace any spaces in internal links with dashes and output it.

    But that's a whole lot of unnecessary overhead - and if you were to do that, rather than processing on the fly, you'd be better off if you used a script that just systematically went through every file once, saved changes to HTML files and removed the spaces in all the filenames. It'd probably take a while to run through 12GB but at least it'd be a permanent solution as agnivo007 mentioned (although if there was a mistake in the code you could quite easily screw up your entire site).
     
    rodney88, Jun 7, 2007 IP
  14. plumsauce

    plumsauce Peon

    Messages:
    310
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #14
    A permanent solution upfront is better than mod_rewrite all the time.

    The site will be faster because it does not have to do the mod_rewrite for each request.

    The solution is to walk the directory tree of the source html files, and

    for each html file,
    for each url
    find image file
    rename image file

    then,

    for each html file,
    for each url
    rewrite url.

    The reason for doing it in two passes is that if something bombs, then
    you can do a restart without problems.

    Doing the actual operation can be in your choice of programming languages.

    It may even be possible to do this using shell scripting.

    sed, awk etc.
    .
     
    plumsauce, Jun 7, 2007 IP
  15. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #15
    Make a post at *gags* webmasterworld.com/apache/ Da REAL Apache king is there.
     
    Nintendo, Jun 7, 2007 IP
  16. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #16
    I thought you were the Apache king. :)

    Thanks for your suggestions, if there is no way around it, I will do it manually. Have no other choice. But the problem is the site would be indexed in Google by then.
    I will use a 301 for the old URLs i guess.
     
    SumitBahl, Jun 7, 2007 IP
  17. inworx

    inworx Peon

    Messages:
    4,860
    Likes Received:
    201
    Best Answers:
    0
    Trophy Points:
    0
    #17
    No, He isn the wacko king, not apache :eek:
     
    inworx, Jun 8, 2007 IP
  18. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #18
    He is pretty good with htaccess stuff.

    Nintendo, site is live here:
    http://www.fanimages.com
    Is there anything that is possible, so that i ask someone else. If its not possible I will do i manually in some days.
     
    SumitBahl, Jun 8, 2007 IP
  19. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #19
    Ugggg....Sports%20Stars/Juan%20Pablo%20Montoya

    Is that all static, or are you using a script and mod_rewrite right now?
     
    Nintendo, Jun 8, 2007 IP
  20. SumitBahl

    SumitBahl Reign of Chaos

    Messages:
    5,170
    Likes Received:
    596
    Best Answers:
    0
    Trophy Points:
    310
    #20
    Thats all static. No mod_rewrite so far.
     
    SumitBahl, Jun 8, 2007 IP