I need to get Google Sitemaps working on all my sites but don't have the time or patientice to sit down and sort it all out. I also know bugger all about XML. Anyone willing to implement it for me across 4/5 websites - please pm me with details / price etc. I don't really want to do business with anyone with anyone carrying RED. Thanks.
How to Create a Dynamic Google SiteMap XML File Scheduling batch jobs to generate RSS feeds and similar stuff like the sitemap.xml file is a way to complex procedure to handle such a simple task, and this approach is fault-prone. Better implement your sitemap generator as dynamic XML file, that is a script reflecting the current state of your web site on each request1. After submitting a sitemap to Google, you don't know when Googlebot finds the time to crawl your web site. Most probably you'll release a lot of content changes between the resubmit and Googlebot's visit. Also, perhaps crawlers of other search engines may be interested in your XML sitemap in the future. There are other advantages too, so you really should ensure that your sitemap reflects the current state of your web site everytime a web robot fetches it. You can use every file name with your sitemap. Google accepts what you submit, 'sitemap.xml' is just a default. So you can go for 'sitemap.php', 'sitemap.asp', 'mysitemap.xhtml' or whatever scripting language you prefer, as long as the content is valid XML. However, there are good reasons to stick with the default 'sitemap.xml'. Here is an example for Apache/PHP: Configure your webserver to parse .xml files for PHP, e.g. by adding this statement to your root's .htaccess file: AddType application/x-httpd-php .htm .xml .rss Now you can use PHP in all .php, .htm, .xml and .rss files. http://www.yourdomain.com/sitemap.xml behaves like any other PHP script. Note: static XML files will produce a PHP error caused by the XML version header. You don't need XML software to produce the pretty simple XML of Google's sitemap protocol. The PHP example below should be easy to understand, even if you prefer another programming language. Error handling as well as elegant programming was omitted to make the hierarchical XML structure transparent and understandable. $isoLastModifiedSite = ""; $newLine = "\n"; $indent = " "; if (!$rootUrl) $rootUrl = "http://www.yourdomain.com"; $xmlHeader = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>$newLine"; $urlsetOpen = "<urlset xmlns=\"http://www.google.com/schemas/sitemap/0.84\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd\">$newLine"; $urlsetValue = ""; $urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8'); } function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, 8) ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS; } function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine"; $urlValue = ""; $urlClose = "$indent</url>$newLine"; $locOpen = "$indent$indent<loc>"; $locValue = ""; $locClose = "</loc>$newLine"; $lastmodOpen = "$indent$indent<lastmod>"; $lastmodValue = ""; $lastmodClose = "</lastmod>$newLine"; $changefreqOpen = "$indent$indent<changefreq>"; $changefreqValue = ""; $changefreqClose = "</changefreq>$newLine"; $priorityOpen = "$indent$indent<priority>"; $priorityValue = ""; $priorityClose = "</priority>$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag; } Now fetch the URLs from your database. It's a good idea to have a boolean attribute to exclude particular pages from the sitemap. Also, you should have an indexed date-time attribute storing the last modification. Your content management system should enable the attributes ChangeFrequency, Priority, PageInSitemap and perhaps even LastModified on the user interface. Example query: "SELECT pageUrl, pageLastModified, pagePriority, pageChangeFrequency from pages WHERE pages.pageSiteMap = 1 AND pages.pageActive = 1 AND pages.pageOffsite <> 1 ORDER BY pages.pageLastModified DESC". Loop: $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); After the loop you can add a few templated pages/scripts, not stored as content pages, which change on each page modification or not: if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp(date('Y-m-d H:i:s')); } $urlsetValue .= makeUrlTag ("$rootUrl/what-is-new.htm", $isoLastModifiedSite, "daily", "1.0"); Now write the complete XML. Dealing with a larger amount of pages, you should print the <url> tag on each iteration followed by a flush(). If you publish tens of thousands of pages, you should provide multiple sitemaps and a sitemap index. Each sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB. header('Content-type: application/xml; charset="utf-8"',true); print "$xmlHeader $urlsetOpen $urlsetValue $urlsetClose "; Google will process all <url> entries where the URL begins with the URL of the sitemap file. If your website is distributed over many domains, provide sitemaps per domain. Subdomains and the 'www prefix' are treated as seperate domains. URLs like 'http://www.domain.us/page' are not valid in a sitemap located on 'http://domain.us/'. The script's output should be something like <?xml version="1.0" encoding="UTF-8" ?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd"> <url> <loc>http://www.smart-it-consulting.com/</loc> <lastmod>2005-06-04T00:00:00+00:00</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> <url> <loc>http://www.smart-it-consulting.com/database/progress-database-design-guide/</loc> <lastmod>2005-06-04T00:00:00+00:00</lastmod> <changefreq>monthly</changefreq> <priority>1.0</priority> </url> <url> <loc>http://www.smart-it-consulting.com/catindex.htm?node=2</loc> <lastmod>2005-05-31T00:00:00+00:00</lastmod> <priority>0.5</priority> </url> <url> <loc>http://www.smart-it-consulting.com/what-is-new.htm</loc> <lastmod>2005-06-04T08:31:12+00:00</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> </urlset>