WikiFilter (2.3)

Download WikiFilter 2.3 from SourceForge

Updates

2006/5/9 The new version WikiFilter and WikiIndex (2.3) is released.

Major improvement

Download Wiki database files

Currently, the English version dump files can be downloaded at the following links.

Wikipedia | Wikibooks | Wikiquote | Wikisource | WikiDictionary | Wiki commons | Meta | Species | Wikinews

Contents

  1. About WikiFilter
  2. A quick guide to browsing local wiki dump files with Apache
  3. Install WikiFilter
  4. Install database files
  5. Update to a newer data-base file
  6. Delete a data-base file or language
  7. Using WikiFilter
  8. Uninstall
  9. How-to
  10. Filter and the server

About WikiFilter

Back to top
WikiFilter is both a wiki text parser and a web filter. It converts wiki text stored in a wiki data-base (dump) file into html text, and sends the html text to the web browser. It requires
  1. a local XML data-base file (downloaded from wiki download site).
  2. a web server, either Microsoft IIS (available on WindowsXP professional among others) or the free web server Apache.
  3. a web browser like Internet Explorer or Mozilla FireFox.
  4. Microsoft Windows.
WikiFilter works remotely similar to MediaWiki.

Major differences include:

Main features include:

Screen shots of (partial) wiki pages WikiFilter delivered to FireFox

Wikipedia article Attack on pearl harbor Wikipedia article Saving private ryan
Wikipedia article Golden eagle

A page from the Wikisource
A random page from the Chinese Wiktionary

A page from the Japanese Wikiquote

A quick guide to browsing local wiki XML dump files with Apache

Back to top
The excutable for Apache is the module file WikiFilter.so.
  1. Download the XML dump files from wiki download site
  2. Run WikiIndex.exe to make index files for all of the dump files. For English, just drop the dump file into WikiIndex window, and press "Start" button, and wait one or several minutes. (Yes, even the 3GB English Wikipedia only takes about 10 minutes for a Pentium4 machine.) Repeat for each dump file.
  3. Download and install the Apache web server if you have not done so. Otherwise, stop the Apache service if it is running.
  4. Tell Apache where to find WikiFilter. For example, if you put WikiFilter.so here
    C:\Program Files\WikiFilter\WikiFilter.so
    just add the following line into Apache's main config file "httpd.conf"
    LoadModule WikiFilter_module "C:/Program Files/WikiFilter/WikiFilter.so"
    Note Apache uses web style directory delimiter "/", not "\".

    For a default install, the main config file "httpd.conf" is here

    C:\Program Files\Apache Group\Apache2\conf\httpd.conf
  5. Start Apache
  6. Click http://localhost/wiki/
  7. The Main page of the first installed Wikipedia will show up in your browser
If you will be using Apache mainly for viewing wiki pages through WikiFilter, you may comment out some of the "LoadModule" lines (by typing a "#" at the beginning of the line), because WikiFilter does not depend on any of these external modules.

Also, change "ThreadsPerChild" to a smaller number, like 10. By default, Apache runs 250 threads for serving web requests. If your machine is not a "real" server serving external requests, it's simply a waste of your system resources.

Install WikiFilter for Microsoft IIS

Back to top

If you have WindowsXP Professional, and prefer Microsoft IIS to Apache, you can set up WikiFilter as a filter on your local web. The excutable for Microsoft IIS is WikiFilter.dll,

  1. IIS is not installed when you install WindowsXP, so it has to be manually installed. If you have not done so, open Contrl panel, click "Add or Remove Programs", click the icon "Add/Remove Windows Components" to the left of the list window. Then, in the Wizard window, check "Internet Information Services (IIS)" and click "Next". The IIS will be installed to "C:\Inetpub", and a quick link "Internet Information Services" will be placed on the Desktop.

  2. Run "Internet Information Services" from the Desktop. Browse to "Default Web Site", right click it to bring up a quick menu, and click "Properties".

    Click the tab "ISAPI Filters", click "Add" to bring up a DialogBox.

  3. In the DialogBox "Filter Properties", type whatever for "Filter Name", and type the path and file name of WikiFilter.dll or "Browse" to it. For example,
    C:\Program Files\WikiFilter\WikiFilter.dll
    if WikiFilter.dll is there. Clicking "OK" will add WikiFilter.dll to be a filter on the "Default Web Site".
  4. Back to "Default Web Site Properties" and click the tab "Home Directory". Click button "Configuration". In the DialogBox "Application Configuration", make sure that the check-box "Cache ISAPI applications" is checked (it is checked by default). This is to avoid a filter being loaded and unloaded for each web request, which is extremely inefficient.

Install database files

Back to top

WikiFilter uses an index file compiled by WikiIndex to access the data-base articles. The index file will be placed and must be in the same directory as the data-base file. The two files are found by looking up the system registry.

The index file is named by appending "_index.art" to the original data-base file name. For example, a data-base file
en-20050713-pages_current.xml
will result in an index file
en-20050713-pages_current_index.art
The index file will be stored in the same directory as the orginal data-base file, and they must be in the same place for WikiFilter to find them.

After the index file is made, WikiIndex will write the original data-base file name with full path to the system registry, under the key

HKEY_LOCAL_MACHINE\SOFTWARE\Wikifilter
WikiFilter locates the data-base file and its index file by looking up that key in the registry. So do not move the data-base file to another directory after indexing is done. Otherwise, you will have to manually edit the system registry (though it's not as difficult as you might think).

How to make an index

Download the data-base file from the wiki download site. Note that the data-base file must be a file with the extension of ".xml". The old "cur_table.sql" dump files are not supported. Also note that you should download the file containing a date (the newest one is usually the same as the one that does not contain a date). If that file is not available, you should rename the downloaded file by adding a date (see Note below).

If you want to install several languages for use with WikiFilter, each having wikipedia, wikibooks, wikiquote, etc., you may find it convenient to make a root directory like
C:\LocalWiki
and further make sub-directories for each language like
C:\LocalWiki\English
C:\LocalWiki\French
Rename and put all data-base files for a lanugage under that sub-directory like
C:\LocalWiki\English\en-wikipedia_20050713.xml
C:\LocalWiki\English\en-wikibooks_20050713.xml

C:\LocalWiki\French\fr-wikipedia_20050713.xml
C:\LocalWiki\French\fr-wikibooks_20050713.xml
After all indexing is done, it will be like
C:\LocalWiki\English\en-wikipedia_20050713.xml
C:\LocalWiki\English\en-wikipedia_20050713_index.art
C:\LocalWiki\English\en-wikibooks_20050713.xml
C:\LocalWiki\English\en-wikibooks_20050713_index.art

C:\LocalWiki\French\fr-wikipedia_20050713.xml
C:\LocalWiki\French\fr-wikipedia_20050713_index.art
C:\LocalWiki\French\fr-wikibooks_20050713.xml
C:\LocalWiki\French\fr-wikibooks_20050713_index.art

Note
When renaming the data-base file, keep the date of the file as it is (4-digit year + 2-digit month + 2-digit date), or if there's none, you should add it. This is because some articles require the correct date (like "Totay's featured article" on the main page) and WikiFilter translates the date in the data-base file name into actual title name.

Run WikiIndex.

  1. Type the file name with full path of the downloaded data-base file in the text-box "Data-base file" . Or simply drag and drop it onto the main window.

    If you type the name, press "Load" to display the basic information in the text-box "Data-base file info". If you drag & drop, it will be automatically displayed.

    The info contains:

    • 2-character-long language name ("en" for English, "ja" for Japanese, etc.).
    • Site (the wiki project name for this data-base file, like "wikipedia", "wikibooks", "wikiquote", etc.).
    • Main page name (for English pedia, books, quote etc, it's normally "main_page". For other languages, it's normally in local language. The page will be displayed when you visit this site without a specific title).
    • Generator (data-base file exporter like "mediawiki 1.5beta3").

    The info is used internally by WikiFilter to identify the data-base file (this is the major reason why the old ".sql" dumps are not supported).

  2. Set "Index-word length".

    For English and other languages that use white space characters to separate words, check "Auto". For others, like most Eastern languages, you can also use "Auto", but then the index word can be as long as the whole title. It still works fine if the data-base file is not as huge as the English wikipedia.

    The better way for these languages is to specify a fixed length (in bytes) for index words. For this, you need to check out UTF-8 character encoding for your chosen language.

    For example, in Chinese and Japanese, a character is normally 3-byte long. So if you want the index word to be 1-character long, you should type 3 in the text-box, 2-character long, type 6, and so on. It's equivalent to indexing English titles with 1 or 2 starting letters. A small number results in small index and small memory use, but slightly slower speed in finding the article.

    To type a number, you must first uncheck "Auto" to enable the text-box to the left.

    Index word is simply the first word of the article title. For example, "Adam", "Adam Smith", "Adam's book ", "Adam, the author of the book" all result in the same index word "Adam". For the current English wikipedia (en-20050713-pages_current.xml), the titles of the article name space alone are about 20MB.

    The simplest and fastest way to find an article is to pre-load all sorted titles into memory and do a binary search (about 13 string comparisons for over 1 million titles including redirects). But the memory use is obviously prohibitive because along with the 20MB of titles, you need also load additional information about each title such as its position and text length in the 3GB data-base file. With this additional information, the titles of the article name space are around 35MB in WikiFilter index file.

    By using index words, however, the 35MB is reduced to less than 3MB. The cost is a little slower speed in finding the title. Now, all the titles starting with the same index word must be read from the index file (roughly 10-100 on average) before a binary search can be performed.

  3. Select "Site" . For English and others that use English site names, the site will be automatically selected for you once you drop the file in. For those that use non-English names, you will be prompted to select the correct site, and failing to do so will cause the wrong file to be used by WikiFilter.
  4. Press "Start" button, and confirm a message box.

    The speed at which WikiIndex builds an index data-base may surprise you if you have previously tried others like MediaWiki that use Mysql.

    For a huge data-base file, it mainly depends on your hard-drive's speed, because most of the time will be spent reading that file. As an example, on a 2.4GHz Pentium4 with 256MB memory, the 3GB English wikipedia takes around 10 minutes to complete, while a 500MB Japanese wikipedia takes about 1 minute.

In short, if you are doing an English data-base dump, just drop the file in, press the "Start" button, wait several minutes, and you are done.

Different languages are added in the order you install them. For example, if you first install English pedia then the French one, the first and default language version is English.

You can install up to 20 languages, and for each language, you can install all available project dump files (9 for English, 5 for others. See Wiki sites and projects)

Update to a newer data-base file

Back to top
All wiki data-base dump files are updated frequently. To update for WikiFilter, simply download the new file, and run WikiIndex to compile a new index for the new data-base file.

Note: If you have installed WikiFilter with a web server, you should unload WikiFilter by stopping the web server before compiling an index for two reasons:

  1. Save some memory for indexing.
  2. WikiFilter will reload registry information for the new index when you restart the server.
Also, to delete the old data-base file (or overwrite it if the old and the new files are of the same name) and its index file, you should stop the web server too, otherwise, you may not be able to delete or overwrite them (because WikiFilter may be using them).

Because the index file records the position of each article in the dump file, you should never use an old index file with a new dump file.

After indexing, the new data-base file along with the new index will be automatically loaded when you re-start the server. That is, nothing special needs to be done for updating an existing data-base, except that you stop the web server.

Delete a data-base file or a language

Back to top
To delete a data-base file, simply delete the file and its index file. When all data-base files of a language are gone, WikiFilter will automatically ignore it.

You can also delete the registry entry for the file or language.

Using WikiFilter

Back to top

View a wiki page

After you have installed WikiFilter with a web server, and made one or more index files, re-start your web server.

For either Apache or Microsoft IIS, the default address for the local web is http://localhost.

Clicking http://localhost/wiki or http://localhost/wiki/ will bring up the main page of the first installed wikipedia with several forms above the text.

Various input forms for Language and Site navigation, and page Search, and a link to unparsed wikitext of the current page..

If not, most probably you did not install Wikipedia for that language. Then you will need to select the "Site" to the installed project name, like "Wikibooks" (if that was what you installed for the first language) and press "Go". Then the main page of Wikibooks will show.

The first time you visit a "Language" or a "Site", its "Main page" will be displayed. Later, however, the last visited page of that "Language" or "Site" will be displayed.

If you do not see these forms on the page, then WikiFilter has not been properly installed with the web server, or the server had not started, or you have not made any index. This is because WikiFilter is coded as to respond to any request containing "/wiki", provided there is at least one indexed data-base file.

View unparsed wiki text

This section is meant for users that write Wikipedia articles.

The "Show wiki text" is a convenient link that, when clicked, will open a new browser window to show the unparsed wikitext of the current page.

A particularly helpful way is to see how templates are used in various pages. For this version (2.3), all templates, if viewed as a page, are shown verbatim (same as "Show wiki text"), so that they are the exact copy of the original text. There are tons of templates in each wiki project, many of them being tables with or without values to be replaced. You cannot see any template in the displayed html page because they have all been replaced. But through comparing the page with the original text, you can somehow figure out what a template does and how.

Search

You can search an article in any of the 16 name spaces (18 for English Wikipedia), in the current "Site" and the current "Language".

The default name space is "Articles" (the first one in the "In" drop-down list). The drop-down list "Max" specifies maximun matched titles to show as the search result. The check-box "Show match", if checked, means to show the best-matched article along with a list of matched or partly matched titles.

WikiFilter uses a bit complex search algorithm to guess what you really want. For example, the first search word, if matched, is given a higher value than the other words. Also, a title that has the same count of words as your search sentence, if matched or partly-matched, is given a higher value.

But above all, if every word is matched ( or at least one is matched while the others are partly-matched), and if the word count of the title is the same as the search sentence, the title is regarded as the best-match.

For example, to search for "Cold Mountain" (the 2003 movie), your search word sentence can be "cold mountain", "cold moun", or in the wrong order "mountain cold", "moun cold" (note search is case-insensitive). These all bring out the page "Cold Mountain". The key is at least one word is correctly matched and the count of words are the same. Another key is the first word of the title (here "cold") must be exactly matched, while others can be partly matched.

Partial match can bring up an unwanted page. For example, to find "algorithm", you must at least type "algorit". If you type "algor", you will see "Algor mortis" (check what it is later); If you type "algoris", you get "algorism".

One important point to note: if the title normally starts with "a", "of", or "the", mostly you will have to use it, or you will miss the title completely. Those after the first word are however of little importance.

Random page

Leave the "Search" text-box empty, and press "Go" brings up a random page in the current name space, current "Site" and the current "Language". (Literally, this means to search for any title). For version 2.3, the same is done by clicking the link "Random page".

Pass a command to WikiFilter

There are a pre-defined set of commands that you can pass to WikiFilter. To see a list of these commands and their functions, type

<
or
<?
in the "Search" text-box and press "Go".

Customizing output html page

WikiFilter borrows from MediaWiki package the "main.css" style file to format various elements in the output html page. The file is slightly modified and renamed as "WikiFilter_main23.css". Together with "WikiFilter_header23.txt" (mainly contains various input forms you see above the text), these two files determine the outlook of the final page in the browser.

These two files are self-contained in the program files (WikiFilter.dll and WikiFilter.so), and will be written (upon running) to the same directory as WikiFilter if they do not already exist there. There are also two image files (direct copies from MediaWiki). For all these files, change the content, but never change their names, or they will be ignored.

Because every page needs them, once loaded, they are cached.

You can edit these style files to suit your taste. For example, to change page font or its size, open the "WikiFilter_main23.css" file, and locate the following tag:

body {
	font: x-small sans-serif;
	font-size:100%;  /* added by wikifilter, or font-size too small in FireFox */
	background: #f9f9f9 url(headbg.jpg) 0 0 no-repeat;
	color: black;
	margin: 0;
	padding: 0;
}
Change the "font" or "font-size" to whatever you like. Alternatively, you can add/edit them in the <body> tag in the header file ("WikiFilter_header23.txt").

To aid you here, there is a command

<form
that you can type in the "Search" text box and pass it to WikiFilter (by pressing the "Go" button). It tells WikiFilter to reload these format files. So once you have edited the files, just pass the command to see the effect.

For "WikiFilter_header23.txt", note that you can add, change the styles of any "form" field, but you should not edit any "name" or "action" or "value" labels, otherwise the browser will pass wrong strings to WikiFilter for, say, a search request.

Using local files

WikiFilter can send a local file to the browser.

For example, you want to put a logo image "my.jpg" on every page. First, you need to specify a local directory where all local files are stored in the config file "WikiFilter23.ini". Then edit "WikiFilter_header23.txt", adding a line like this to the <body> section:

<img src="/wiki/local/my.jpg" align="right"> 
Note you must use the prefix
/wiki/local/
for any local file. Otherwise, WikiFilter will try to find it in the data-base.

Using image files

As a local dump file browser, WikiFilter is not good at handling images, though you can download wiki images, and put them into proper directories to view them. See "WikiFilter23.ini" for details.

You can show or hide images any time by passing the commands "<imageOn" or "<imageOff".

Uninstall

Back to top
Run WikiIndex, and press "Uninstall" button. This will delete the key
HKEY_LOCAL_MACHINE\SOFTWARE\Wikifilter
from the system registry. Or you can do it manually by editing the registry.

For Apache, open its main config file, and delete or comment the line that loads WikiFilter such as

LoadModule WikiFilter_module modules/WikiFilter.so
For Microsoft IIS, repeat the steps in Install for Microsoft IIS but click "Remove" button in the "ISAPI Filters" tab to remove WikiFilter from the filter list.

Delete all relevant files if you like.

How-to

Back to top

Start and stop the server

For Apache, simply use its own "Monitor" (the one that has a red clock icon).

For Microsoft IIS, run "Internet Information Services" from Desktop, browse to "Default Web Site", right-click it to bring up the quick menu, and select the corresponding menu items.

Edit system registry

Mostly, you can find keys defined by programs in
HKEY_LOCAL_MACHINE\SOFTWARE\
For WikiFilter, it's
HKEY_LOCAL_MACHINE\SOFTWARE\WikiFilter
There is 1 entry at the root directory:
LangName is a string of names of all installed languages (each 2-character long and separated by ";", for example "en;fr;ja;zh", meaning "English, French, Japanese, Chinese").

The languages are loaded in that order and filled into the "Language" drop-down list in the browser. The first page you get from WikiFilter, if you does not specify language, site, and title, is always the main page of the Wikipedia of the first language. If that does not exist, you will need to change the "Language" or "Site" in the browser window.

You can change this order to change the default "Language". For example, "en;fr" makes English the default language, while "fr;en" makes French the default.

Each sub-directory stores data-base file names for each project (site) installed.

Filter and the server

Back to top
As a Microsoft IIS web filter, WikiFilter.dll is a a somewhat standard filter application. But to keep things simple and fast, it processes only one notification from the server, gets the address of the request, and either disconnects if the address does not contain the string "/wiki", or sends the html content immediately by calling the function "WriteClient".

As an Apache module, WikiFilter.so is a bit hackish. It's neither an Apache web filter, nor a standard module.

In fact, WikiFilter.so registers itself with Apache as the "quick_handler", meaning any web request is first passed on to WikiFilter.so, and if WikiFilter.so returns OK (the content has been served), the whole request is terminated at once, with all non-kernal modules and input-output filter chains by-passed. This is what happens when a wiki link is passed on to WikiFilter.so. The result is the quickest possible processing for a wiki page, just like WikiFilter.dll does with IIS. And because WikiFilter.so does not depend on any of the modules loaded by "LoadModule" in the config file, you may keep them to a minimum if you use Apache just for viewing wiki pages.

To-do

Back to top
Math markup is not supported.