Scraper

Summary: Add a markup for retrieving a portion of the content from an external webpage (screen-scraping).

Version:

Prerequisites: PmWiki 2.2beta, PHP.

Status: Project

Maintainer: CRAjr

Categories: Includes, RSS

Contents

Questions answered by this recipe

How can I have PmWiki check the content of an external web page?

Description

The goal of this recipe is to create a markup -- maybe (:Scrape :), but I haven't decided yet -- that would capture the content of an external web page. The captured result could then be examined and used in conjunction with conditional markups to determine what is displayed on a Wiki page.

As an example, the organization where I work has an automatic mechanism for blocking rogue workstations from accessing the network. If a machine is making too many network requests or other rogue activity (usually due to malware), then it is automatically blocked. Unfortunately, no automatic notification of the blocking is done. Instead, a web page is available that lists the IPs and MACs of the misbehaving systems. Using this recipe, I should be able to create a Wiki page that checks for my group's IP range within the content of the master block web page. If no machines are found, then a box is displayed with a green background and a message that all is okay. If, however, one or more of my group's machines has been blocked, then a box with a red background will be displayed that contains only the IDs and other information of my group's systems as captured from the master block web page.

Notes

Your host probably must allow outgoing http requests.

I've only just begun creating and testing this recipe. Nothing is ready for posting yet.

My primary goal is to create a true/false result if a predefined string is found in the content of the scraped web page. A secondary goal, if the result is true, is to capture all the content between a second and third predefined string. This subset of the full content would then be available for display on the Wiki page.

The functionality of this markup is provided by an excellent PHP "screen-scraping" utility written by Troy Wolf called, class_http.php.

Scraper

Questions answered by this recipe

Description

Notes

Installation

Usage

Release Notes

See Also

Contributors

Comments