search engine beware! Was: Re: [pmwiki-users] notice of current edit
Joachim Durchholz
jo at durchholz.org
Fri Apr 15 09:03:43 CDT 2005
Radu wrote:
> Me, I find that file more of a pain than a solution. Mischievous search
> engines or slurp machines (a la webzip) totally ignore these files, and
> some may even use them to get at content that's deemed a bit 'private'
robots.txt isn't a privacy mechanism, it's a mechanism for directing web
crawlers.
Say, if you have the same page in different forms (one with all bells
and whistles, another one as text-only search-engine-friendly), then
robots.txt is helpful. If you have private content, robots.txt won't
help you (only passwords will).
> Since no sane individual can see two different pages in the same second,
Then count me among the insane. (Well, that might be accurate actually
*ggg*)
For examples, sometimes I right-click large bundles of links for "open
in new tab"; the click rate is about 2 or 3 per second.
> not to mention edit them, there is a way to differentiate between search
> engines and actual wiki authors: log the timestamp of the previous
> access from each IP. If it's smaller than a settable interval (default
> 2s), then do not honor requests for edit.
Web crawlers already have counteracted that measure. wget, for example,
has options to set arbitrary intervals when in "suck-the-site" mode.
> For an even stronger
> deterrent, to save processor time when the wiki is supposed to be
> hidden, we could also add an $Enable switch to keep from honoring ANY
> request to fast-moving IPs.
I don't think the problem is serious enough to warrant exclusion of
legitimate requests.
If you really want private areas, use passwords. If you want to exclude
robots from saving pages, password-protect the edit link.
To prevent robots from accidentally saving pages, made sure that the
edit pages all have a noindex,nofollow set on the appropriate meta tag.
That's enough to prevent the good guys from doing anything harmful, and
for the bad guys - well, that's wikispam, and can be fought using passwords.
Here's a feature request: if a user turns out to be a spammer, have a
function that undoes all his edits and elides them from the page
history, too. Also, make the history pages have noindex,nofollow
(otherwise, a wiki spammer would not mind to have his spam removed - as
long as it's available via the page history function, it's still a link
farm and useful for him).
Regards,
Jo
More information about the pmwiki-users
mailing list