GoogleSitemaps-Talk

Summary: Talk page for GoogleSitemaps.
Maintainer:
Users: (View? / Edit)

This space is for User-contributed commentary and notes. Please include your name and a date along with your comment.

Comments

Here is a new version that includes changefreq, relatively good priority rater/grader based on four aspects of a page, also grades can be changed (you will have to write inside functions, be careful), there is also a priority bypass where you can set manually the priority for the pages you want.

googlesitemap-20180523.phpΔ


I am so used to think I'll have lot's of work to create something I need normally, that you start to think the same while using PmWiki, BUT it is not the truth. I did something like this for my goole sitemapINDEX and sitemap. Sitemap maps pages per Group, while sitemapindex maps groups per one page (RecentChanges). With sitemapindex you get a list of groups with an 'action=sitemap' attached to it, so you can fetch in a new feed all pages inside it.

to fetch the results for sitemapindex not using a trail, do this:

http://your-site/?group=*&name=RecentChanges&action=sitemapindex

Oh joy !

Here is the snippet:

---8x---

## Examples taken from blogger sitemap structure
# you can configure it further with pmwiki feed features
# like : group, name, list, count ...

## Sitemapindex 0.9 settings for ?action=sitemapindex
SDVA($FeedFmt['sitemapindex']['feed'], array(
  '_header' => 'Content-type: text/xml; charset="$Charset"',
  '_start' => "<?xml version='1.0' encoding='UTF-8'?>\n".
     '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n",
  '_end' => "\n</sitemapindex>\n",
));

SDVA($FeedFmt['sitemapindex']['item'], array(
  '_start' => "<sitemap>\n",
  '_end' => "</sitemap>\n",
  'loc' => ($EnablePathInfo == 1) 
         ? '{$ScriptUrl}/?group={$Group}&amp;action=sitemap'
         : '{$ScriptUrl}?group={$Group}&amp;action=sitemap'
));

## Sitemap 0.9 settings for ?action=sitemap
SDVA($FeedFmt['sitemap']['feed'], array(
  '_header' => 'Content-type: text/xml; charset="$Charset"',
  '_start' => "<?xml version='1.0' encoding='UTF-8'?>\n".
     '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n",
  '_end' => "</urlset>\n",
));

SDVA($FeedFmt['sitemap']['item'], array(
  '_start' => "<url>\n",
  '_end' => "</url>\n",
  'loc' => '{$PageUrl}',
  'lastmod' => '$ItemISOTime',
));

---8x---

Simple...

CarlosAB April 30, 2018, at 07:16 PM


  • Other Site Maps exist, like the extension module for FireFox, which offer the opportunity to have a Navigation bar… Should not your module be called GoogleSiteMap.php, with action=googlesitemap?
  • encoding is UTF-8, while pmwiki.php uses ISO-8859-1. For the "é", I have %e9 which is, I think, ISO and not UTF. Probably should you transcode the characters.
  • The time modification is not reported yet. Have you tested with the full time format (for me, with +02:00)?
  • For the priority, maybe we could increase it for the home page, and reduce it for the Recent Changes ones.
  • Note that the priority and the changefrequency are not mandatory. If the priority is always the same, I suggest not to write it in the file.
  • Could the priority be set by a page text variable? That is, pages without the variable would have a default priority, but page authors could mark pages higher or lower by setting the PTV.
    Ben Stallings December 11, 2007, at 03:02 PM
  • I'm using clean URLs recipe and have encountered a problem. ?action=sitemap works, as it is supposed to (pages are displayed as http://wiki.spounison.org/Main/Homepage); But in the sitemap.xml.gz URLs of pages are displayed as ?n=*** (e.g. http://wiki.spounison.org/pmwiki.php?n=Main.Homepage). That's the problem of mine...
    ArSoron March 05, 2008, at 03:14 AM
It might help if you set the $ScriptUrl = 'http://wiki.spounison.org/'; Umang
  • I had a problem: 403 forbidden problem due to modifications since 2.1.beta8 see ControllingWebRobots. The $RobotActions has to be completed with action=sitemap to make pmwiki.php?n=Site.AllRecentChanges&action=sitemap accepted by google. Ref. in robots.php: SDVA($RobotActions, array('browse' => 1, 'rss' => 1, 'dc' => 1));
    Damien July 08, 2008, at 05:03 AM
To prevent a 403 Error, you will have to change Line number 53 in pmwiki/scripts/robots.php from
 SDVA($RobotActions, array('browse' => 1, 'rss' => 1, 'dc' => 1)); 

to

 SDVA($RobotActions, array('browse' => 1, 'rss' => 1, 'dc' => 1, 'sitemap'=>1));

Umang

Thank you so much for this. Trying to get an accurate sitemap for the PmWiki part of my site using Google's sitemap generator has been driving me up the freaking wall. Bing doesn't have a problem correctly indexing PmWiki with clean URLs, but no matter what I do, Google refuses to recognize the last modified date and is seemingly random about indexing the PmWiki pages or the old html pages that now redirect to the wiki.

Sitemaps are not required to be in the webroot directory. Using either a sitemap index (if you have more than one sitemap), or when you submit the sitemap you can define where the sitemap is located.

Don't forget to put a sitemap entry in your robots.txt file. It should be the very first entry and look like this:

 User-agent: *
Sitemap: [=http://www.yoursite.com/sitemap.xml=] 

Google only recently explained that the higher the number the greater the priority. If you have a few key pages you'd like to prioritize, or even use Google's software to figure that out (just written to another file), then you can manually tweak a few entries. I have few enough entries to create the file in plain .xml instead of .xml.gz, so it's not that big of a deal for me.

Again - thank you so much.

Jerod Poore Crazy Meds 25 September 2011

Older Comments

  • Can the script be made to exclude password protected groups?
solved in version 1.7
  • For the frequency, I think you should write at least "hourly" for Recent Changes (Group or Main).
actually pages like recentchanges are not included in the sitemap. Since the sitemap alreade includes change-times having the recentchanges in the sitemap is not neccessary

1] It's not clear how to generate the .gz sitemap. I have set $SitemapDelay=0, made a wiki edit, and still I don't see the file. The XML is shown in browser correctly. I temporarily set the pmwiki directory to ALL write, with no sucess. (ref http://www.myurl.com/?action=sitemap). DaveG
2]same issues as DaveG here, ?action=sitemap returns a working xml, but I'm struggeling to find out how to generate the .xml.gz file.. Gilrim


Here's my hack: adding a script on a linux or OS X system as a (daily? hourly?) cronjob. Say I make a bash script called "makesitemap" for each wiki on my system and put it in the webroot for the site.

 #! /bin/bash
 curl -o sitemap.xml http://www.myurl.org/index.php?action=sitemap
 rm sitemap.xml.gz
 gzip sitemap.xml
 chmod 644 sitemap.xml.gz

I had to remove the old sitemap or the gzip command asks for overwrite verification

Now I just need a cronjob to run it. Most advanced cPanel type webhosts give you a user crontab. No, this won't work for everyone, but people worried about Google sitemaps are already getting a bit advanced :) XES


Okay ...I can't run bash on my server so I figured there had to be away of doing the same thing above with PHP ...so a gleamed the net and came-up with the following by hacking other peoples code ...cause I am not a programmer by any means... ARNOLD

 <?php\\
 $url = "[=http://http://www.myurl.org/index.php?action=sitemap=]";\\
 $file = "sitemap.xml";\\
 \\
 $ch = curl_init ($url);\\
 $fp = fopen ($file, "w") or\\
 die("Unable to open $file for writing.\n");\\
 \\
 curl_setopt ($ch, CURLOPT_FILE, $fp);\\
 curl_setopt ($ch, CURLOPT_FAILONERROR, true);\\
 curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);\\
 \\
 if (!curl_exec ($ch)) {\\
 print("Unable to fetch $url.\n");\\
 }\\
 \\
 curl_close ($ch);\\
 fclose ($fp);\\
 \\
 function compress($srcName, $dstName)\\
 {\\
   $fp = fopen($srcName, "r");\\
   $data = fread ($fp, filesize($srcName));\\
   fclose($fp);\\
 \\
   $zp = gzopen($dstName, "w9");\\
   gzwrite($zp, $data);\\
   gzclose($zp);\\
 }\\
 \\
 // Compress a file\\
 compress("sitemap.xml", "sitemap.xml.gz");\\

I simply added this to my .htaccess because I've disallowed *action in robots.txt:

RewriteRule ^sitemap\.xml$ ?action=sitemap [L]

Umang

Talk page for the GoogleSitemaps recipe (users?).