RewriteRule matching flename not url in substitutoin

Discussion in 'Apache' started by brain004, Oct 27, 2012.

  1. #1
    I wrote a CGI script (in python) that transforms XML files into XHTML for display in the client.

    It is invoked as follows: http://hostname/path/scriptname.py?p=filename.xml

    However, I much prefer a cleaner URL to be seen by the client: http://hostname/path/filename

    The solution seems to be a RewriteRule directive. Since I am using shared hosting, I do not have access to httpd.conf. However, I am able to create an .htaccess file.

    Creating such a file in the /path directory, it would seem that the correct rule is:
    RewriteRule ^(.+)$ index.py?p=$1.xml

    However, when this rule is executed for the above clean version of the path, $1 is actually set to "index.py", not "filename". This is useless, because I already know the name of the script. What I don't know is the final component of the request URL. How can I fix the rule to set $1 to "filename"?

    Thanks.
     
    brain004, Oct 27, 2012 IP
  2. pr0t0n

    pr0t0n Well-Known Member

    Messages:
    243
    Likes Received:
    10
    Best Answers:
    10
    Trophy Points:
    128
    #2
    If I understood the problem correctly then you could add RewriteCond condition one line prior to your RewriteRule, for example like this:
    
    RewriteCond %{REQUEST_URI} !scriptname.py
    
    Code (markup):
    Translated to English it tells Apache: "If not trying to open scriptname.py then satisfy the rule bellow".

    Cheers.
     
    pr0t0n, Oct 28, 2012 IP
  3. brain004

    brain004 Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #3
    Thanks for the reply, but this wouldn't solve the problem.

    Let me clarify.

    The issue is that when the browser shows this URL

    http://hostname/path/filename

    Apache rewrites it to

    http://hostname/path/scriptname.py?p=scriptname.py.xml

    but I need it to be rewritten to

    http://hostname/path/scriptname.py?p=filename.xml

    I should also note that in my original message, there are a few appearances of "index.py", which should be "scriptname.py".
     
    brain004, Oct 28, 2012 IP
  4. pr0t0n

    pr0t0n Well-Known Member

    Messages:
    243
    Likes Received:
    10
    Best Answers:
    10
    Trophy Points:
    128
    #4
    Have you tested my code?
    I believe it's because it is trying to rewrite it in two steps
    1. it rewrites /path/filename to /path/scriptname.py?p=filename.xml
    2. it tries to rewrite it again since your rule matches scriptname.py file as well, so it tries to rewrite /path/scriptname.py?p=filename.xml to /path/scriptname.py?p=scriptname.py.xml
    Theoretically it should loop the step number 2, and perhaps stop at some loop if it has a limit of rewrite loops.

    This is all asuming that your files and your scriptname.py are actually in the same folder. If not, then just disregard my suggestion.

    Cheers.
     
    pr0t0n, Oct 28, 2012 IP
  5. ultimateinfotech

    ultimateinfotech Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I am using joomla, I have updated the apache Mod_rewriting Module
    But Still it is not working on my local system, I am working on my local host
    It always shows Url not found, Please suggest what should i do , It is very urgent
     
    ultimateinfotech, Oct 29, 2012 IP
  6. brain004

    brain004 Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #6
    Great pr0t0n! To my astonishment your suggestion works. I never would have guessed that.

    Since no good deed goes unpunished, I have two more related problems:

    1) The below statement is meant to add a trailing slash, if it is absent, to a URL, without effecting the query string.

    RedirectMatch ^([^?]*[^/])((\?.*)?)$ $1/$3

    Strangely, whenever the browser requests the root of the directory, with or without a trailing slash, it receives a HTTP 302 response pointing to "index.html/", which does not exist. There are no other redirect directives in my .htaccess files at this level or higher. When I comment this line, the described behavior stops, but of course, I don't get the desired "additional/path/info" to "additional/path/info/" redirect.

    2) The below rule is intended to detect a virtual file in a virtual subdirectory called "pages". This virtual view is handled by a script called index.py, which determines the virtual filename using the PAGE environment variable. The rewrite is meant to leave the query string intact.

    RewriteRule ^pages/([^?/]*)/((\?.*)?)$ index.py$3 [E=PAGE:$1,L]

    Again, oddly, at least to me, no environment variable is set. I have tried searching for all variables with "PAGE" in their names in the target script, but none are found. It is as though no E option were ever given. I tried changing the name of the variable, and using a static value, but still no effect.
     
    brain004, Oct 29, 2012 IP
  7. brain004

    brain004 Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #7
    Ok, I came realize that Apache handles query strings separately from the main part of the URL, so I don't have to worry about them. I reduced my .htaccess to a few lines, but it does not at all do what it seems as though it should.

    RewriteRule ^pages/([^/]+)/$ index.py [E=PAGE:$1,L]
    RewriteRule ^pages/$ pages/index/
    RewriteRule ^$ pages/

    RedirectMatch ^(.*[^/])$ $1/


    Commands 2 and 3 work properly. However, command 1 still fails to set the environment variable (as confirmed by dumping a list of set variables in the script), and command 4 does not simply add a trailing slash as it should.

    I am at wit's end, so please help.
     
    brain004, Oct 30, 2012 IP
  8. pr0t0n

    pr0t0n Well-Known Member

    Messages:
    243
    Likes Received:
    10
    Best Answers:
    10
    Trophy Points:
    128
    #8
    I'm not sure about that [E=PAGE:$1] part. I never used it that way at least.. can't you just pass it as a simple query? Like:
    
    RewriteRule ^pages/([^/]+)/$ index.py?page=$1 [L]
    
    Code (markup):
    Also, your second line may conflict with a first one, so pages/ gets rewritten to /pages/index/ and then the first rule applies and rewrites /pages/index/ to index.py?page=index Perhaps you want to change something there..

    Cheers.
     
    pr0t0n, Oct 30, 2012 IP
  9. brain004

    brain004 Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #9
    With the exception of the environment variable not being set, the rewrite rules work correctly. It's a cascading scheme. Suppose the browser navigates to /. Then the third rule rewrites to the second which rewrites to the first. This is all fine.

    As far as using a query string variable, I'm not sure how well this would work because if there is already a query string I think there would be a conflict.

    But the big problem is that when I add the redirect rule, very strange begin to happen, like redirection to a nonexistent index.html.

    This is where I really need help.
     
    brain004, Oct 30, 2012 IP