1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Query on Non-Alphameric Characters in Mapped URLs

Discussion in 'Google Sitemaps' started by Owlcroft, Feb 1, 2006.

  1. #1
    I am unclear on how Google wants non-alphabetic/non-numeric characters in a URL treated in an XML sitemap. Reading what they say in their FAQs and such does not, for me, clarify anything.

    It's hard to give an example, because I don't know how any odd character I might enter in this post would appear to someone else. Well, let's try; suppose the URL is something like--

    http://mywonderfulsite.com/franCais/purE/good&bad.html

    --where the upper-case C I show is really a c-cedilla character, and the upper-case E is really an e-acute (and don't overlook the ampersand in the filename).

    Now I myself would never create such a nightmare, but if I am trying to make, say, a PHP script that handles mapping others' sites, how in heaven should it treat such stuff?

    I at least gather that the ampersand wants to become the usual & substitute; and the same for quotation marks (single or double) and angle brackets.

    Much seems to turn on their phrase "for hosting on a server that uses that encoding"--but how does one determine what encoding a server uses? The PHP environment variable "HTTP_ACCEPT_CHARSET" can--on my own server does--include both UTF-8 and ISO-8859-1 as acceptable. If that variable contains both, can one use either?

    Or does one, then, simply URL-escape the filename as one reads it off the local file system? Suppose the filename contains a character, such as an ampersand, that is supposed to be escaped? Should--

    some good & some bad.html

    --be rendered as
    SEMrush
    some%20good%20&%20some%20bad.html

    --or should the ampersand in the escaped entity also be URL-encoded?

    Yechhh....
     
    Owlcroft, Feb 1, 2006 IP
    SEMrush