Download WikiFilter 2.3 from SourceForge
2006/5/9 The new version WikiFilter and WikiIndex (2.3) is released.
Major improvement
Wikipedia | Wikibooks | Wikiquote | Wikisource | WikiDictionary | Wiki commons | Meta | Species | Wikinews
Major differences include:
Main features include:
For example, if you have downloaded all data-base files for both English and French, then you can navigate from any of the English wiki projects (Wikipedia, Wikitionay, Wikibooks, Wikiquote, Wikinews) to any of the French ones by just one click in the browser window.
Wikipedia article Attack on pearl harbor | Wikipedia article Saving private ryan |
Wikipedia article Golden eagle
|
A page from the Wikisource |
A random page from the Chinese Wiktionary
|
A page from the Japanese Wikiquote |
The excutable for Apache is the module file WikiFilter.so.
C:\Program Files\WikiFilter\WikiFilter.sojust add the following line into Apache's main config file "httpd.conf"
LoadModule WikiFilter_module "C:/Program Files/WikiFilter/WikiFilter.so"Note Apache uses web style directory delimiter "/", not "\".
For a default install, the main config file "httpd.conf" is here
C:\Program Files\Apache Group\Apache2\conf\httpd.conf
Also, change "ThreadsPerChild" to a smaller number, like 10. By default, Apache runs 250 threads for serving web requests. If your machine is not a "real" server serving external requests, it's simply a waste of your system resources.
If you have WindowsXP Professional, and prefer Microsoft IIS to Apache, you can set up WikiFilter as a filter on your local web. The excutable for Microsoft IIS is WikiFilter.dll,
Click the tab "ISAPI Filters", click "Add" to bring up a DialogBox.
C:\Program Files\WikiFilter\WikiFilter.dllif WikiFilter.dll is there. Clicking "OK" will add WikiFilter.dll to be a filter on the "Default Web Site".
WikiFilter uses an index file compiled by WikiIndex to access the data-base articles. The index file will be placed and must be in the same directory as the data-base file. The two files are found by looking up the system registry.
en-20050713-pages_current.xmlwill result in an index file
en-20050713-pages_current_index.artThe index file will be stored in the same directory as the orginal data-base file, and they must be in the same place for WikiFilter to find them.
After the index file is made, WikiIndex will write the original data-base file name with full path to the system registry, under the key
HKEY_LOCAL_MACHINE\SOFTWARE\WikifilterWikiFilter locates the data-base file and its index file by looking up that key in the registry. So do not move the data-base file to another directory after indexing is done. Otherwise, you will have to manually edit the system registry (though it's not as difficult as you might think).
Download the data-base file from the wiki download site. Note that the data-base file must be a file with the extension of ".xml". The old "cur_table.sql" dump files are not supported. Also note that you should download the file containing a date (the newest one is usually the same as the one that does not contain a date). If that file is not available, you should rename the downloaded file by adding a date (see Note below).
C:\LocalWikiand further make sub-directories for each language like
C:\LocalWiki\English C:\LocalWiki\FrenchRename and put all data-base files for a lanugage under that sub-directory like
C:\LocalWiki\English\en-wikipedia_20050713.xml C:\LocalWiki\English\en-wikibooks_20050713.xml C:\LocalWiki\French\fr-wikipedia_20050713.xml C:\LocalWiki\French\fr-wikibooks_20050713.xmlAfter all indexing is done, it will be like
C:\LocalWiki\English\en-wikipedia_20050713.xml C:\LocalWiki\English\en-wikipedia_20050713_index.art C:\LocalWiki\English\en-wikibooks_20050713.xml C:\LocalWiki\English\en-wikibooks_20050713_index.art C:\LocalWiki\French\fr-wikipedia_20050713.xml C:\LocalWiki\French\fr-wikipedia_20050713_index.art C:\LocalWiki\French\fr-wikibooks_20050713.xml C:\LocalWiki\French\fr-wikibooks_20050713_index.art
Note
When renaming the data-base file, keep the date of the file as it is (4-digit year + 2-digit month + 2-digit date), or if there's none, you should add it. This is because some articles require the correct date (like "Totay's featured article" on the main page) and WikiFilter translates the date in the data-base file name into actual title name.
Run WikiIndex.
The info contains:
For English and other languages that use white space characters to separate words, check "Auto". For others, like most Eastern languages, you can also use "Auto", but then the index word can be as long as the whole title. It still works fine if the data-base file is not as huge as the English wikipedia.
The better way for these languages is to specify a fixed length (in bytes) for index words. For this, you need to check out UTF-8 character encoding for your chosen language.
For example, in Chinese and Japanese, a character is normally 3-byte long. So if you want the index word to be 1-character long, you should type 3 in the text-box, 2-character long, type 6, and so on. It's equivalent to indexing English titles with 1 or 2 starting letters. A small number results in small index and small memory use, but slightly slower speed in finding the article.
To type a number, you must first uncheck "Auto" to enable the text-box to the left.
The simplest and fastest way to find an article is to pre-load all sorted titles into memory and do a binary search (about 13 string comparisons for over 1 million titles including redirects). But the memory use is obviously prohibitive because along with the 20MB of titles, you need also load additional information about each title such as its position and text length in the 3GB data-base file. With this additional information, the titles of the article name space are around 35MB in WikiFilter index file.
By using index words, however, the 35MB is reduced to less than 3MB. The cost is a little slower speed in finding the title. Now, all the titles starting with the same index word must be read from the index file (roughly 10-100 on average) before a binary search can be performed.
For a huge data-base file, it mainly depends on your hard-drive's speed, because most of the time will be spent reading that file. As an example, on a 2.4GHz Pentium4 with 256MB memory, the 3GB English wikipedia takes around 10 minutes to complete, while a 500MB Japanese wikipedia takes about 1 minute.
In short, if you are doing an English data-base dump, just drop the file in, press the "Start" button, wait several minutes, and you are done.
Different languages are added in the order you install them. For example, if you first install English pedia then the French one, the first and default language version is English.
You can install up to 20 languages, and for each language, you can install all available project dump files (9 for English, 5 for others. See Wiki sites and projects)
All wiki data-base dump files are updated frequently. To update for WikiFilter, simply download the new file, and run WikiIndex to compile a new index for the new data-base file.
Note: If you have installed WikiFilter with a web server, you should unload WikiFilter by stopping the web server before compiling an index for two reasons:
Because the index file records the position of each article in the dump file, you should never use an old index file with a new dump file.
After indexing, the new data-base file along with the new index will be automatically loaded when you re-start the server. That is, nothing special needs to be done for updating an existing data-base, except that you stop the web server.
To delete a data-base file, simply delete the file and its index file. When all data-base files of a language are gone, WikiFilter will automatically ignore it.
You can also delete the registry entry for the file or language.
After you have installed WikiFilter with a web server, and made one or more index files, re-start your web server.
For either Apache or Microsoft IIS, the default address for the local web is http://localhost.
Clicking http://localhost/wiki or http://localhost/wiki/ will bring up the main page of the first installed wikipedia with several forms above the text.
Various input forms for Language and Site navigation, and page Search, and a link to unparsed wikitext of the current page..
If not, most probably you did not install Wikipedia for that language. Then you will need to select the "Site" to the installed project name, like "Wikibooks" (if that was what you installed for the first language) and press "Go". Then the main page of Wikibooks will show.
The first time you visit a "Language" or a "Site", its "Main page" will be displayed. Later, however, the last visited page of that "Language" or "Site" will be displayed.
If you do not see these forms on the page, then WikiFilter has not been properly installed with the web server, or the server had not started, or you have not made any index. This is because WikiFilter is coded as to respond to any request containing "/wiki", provided there is at least one indexed data-base file.
This section is meant for users that write Wikipedia articles.
The "Show wiki text" is a convenient link that, when clicked, will open a new browser window to show the unparsed wikitext of the current page.
A particularly helpful way is to see how templates are used in various pages. For this version (2.3), all templates, if viewed as a page, are shown verbatim (same as "Show wiki text"), so that they are the exact copy of the original text. There are tons of templates in each wiki project, many of them being tables with or without values to be replaced. You cannot see any template in the displayed html page because they have all been replaced. But through comparing the page with the original text, you can somehow figure out what a template does and how.
You can search an article in any of the 16 name spaces (18 for English Wikipedia), in the current "Site" and the current "Language".
The default name space is "Articles" (the first one in the "In" drop-down list). The drop-down list "Max" specifies maximun matched titles to show as the search result. The check-box "Show match", if checked, means to show the best-matched article along with a list of matched or partly matched titles.
WikiFilter uses a bit complex search algorithm to guess what you really want. For example, the first search word, if matched, is given a higher value than the other words. Also, a title that has the same count of words as your search sentence, if matched or partly-matched, is given a higher value.
But above all, if every word is matched ( or at least one is matched while the others are partly-matched), and if the word count of the title is the same as the search sentence, the title is regarded as the best-match.
For example, to search for "Cold Mountain" (the 2003 movie), your search word sentence can be "cold mountain", "cold moun", or in the wrong order "mountain cold", "moun cold" (note search is case-insensitive). These all bring out the page "Cold Mountain". The key is at least one word is correctly matched and the count of words are the same. Another key is the first word of the title (here "cold") must be exactly matched, while others can be partly matched.
Partial match can bring up an unwanted page. For example, to find "algorithm", you must at least type "algorit". If you type "algor", you will see "Algor mortis" (check what it is later); If you type "algoris", you get "algorism".
One important point to note: if the title normally starts with "a", "of", or "the", mostly you will have to use it, or you will miss the title completely. Those after the first word are however of little importance.
Leave the "Search" text-box empty, and press "Go" brings up a random page in the current name space, current "Site" and the current "Language". (Literally, this means to search for any title). For version 2.3, the same is done by clicking the link "Random page".
There are a pre-defined set of commands that you can pass to WikiFilter. To see a list of these commands and their functions, type
<or
<?in the "Search" text-box and press "Go".
WikiFilter borrows from MediaWiki package the "main.css" style file to format various elements in the output html page. The file is slightly modified and renamed as "WikiFilter_main23.css". Together with "WikiFilter_header23.txt" (mainly contains various input forms you see above the text), these two files determine the outlook of the final page in the browser.
These two files are self-contained in the program files (WikiFilter.dll and WikiFilter.so), and will be written (upon running) to the same directory as WikiFilter if they do not already exist there. There are also two image files (direct copies from MediaWiki). For all these files, change the content, but never change their names, or they will be ignored.
Because every page needs them, once loaded, they are cached.
You can edit these style files to suit your taste. For example, to change page font or its size, open the "WikiFilter_main23.css" file, and locate the following tag:
body { font: x-small sans-serif; font-size:100%; /* added by wikifilter, or font-size too small in FireFox */ background: #f9f9f9 url(headbg.jpg) 0 0 no-repeat; color: black; margin: 0; padding: 0; }Change the "font" or "font-size" to whatever you like. Alternatively, you can add/edit them in the <body> tag in the header file ("WikiFilter_header23.txt").
To aid you here, there is a command
<formthat you can type in the "Search" text box and pass it to WikiFilter (by pressing the "Go" button). It tells WikiFilter to reload these format files. So once you have edited the files, just pass the command to see the effect.
For "WikiFilter_header23.txt", note that you can add, change the styles of any "form" field, but you should not edit any "name" or "action" or "value" labels, otherwise the browser will pass wrong strings to WikiFilter for, say, a search request.
WikiFilter can send a local file to the browser.
For example, you want to put a logo image "my.jpg" on every page. First, you need to specify a local directory where all local files are stored in the config file "WikiFilter23.ini". Then edit "WikiFilter_header23.txt", adding a line like this to the <body> section:
<img src="/wiki/local/my.jpg" align="right">Note you must use the prefix
/wiki/local/for any local file. Otherwise, WikiFilter will try to find it in the data-base.
As a local dump file browser, WikiFilter is not good at handling images, though you can download wiki images, and put them into proper directories to view them. See "WikiFilter23.ini" for details.
You can show or hide images any time by passing the commands "<imageOn" or "<imageOff".
Run WikiIndex, and press "Uninstall" button. This will delete the key
HKEY_LOCAL_MACHINE\SOFTWARE\Wikifilterfrom the system registry. Or you can do it manually by editing the registry.
For Apache, open its main config file, and delete or comment the line that loads WikiFilter such as
LoadModule WikiFilter_module modules/WikiFilter.soFor Microsoft IIS, repeat the steps in Install for Microsoft IIS but click "Remove" button in the "ISAPI Filters" tab to remove WikiFilter from the filter list.
Delete all relevant files if you like.
For Apache, simply use its own "Monitor" (the one that has a red clock icon).
For Microsoft IIS, run "Internet Information Services" from Desktop, browse to "Default Web Site", right-click it to bring up the quick menu, and select the corresponding menu items.
HKEY_LOCAL_MACHINE\SOFTWARE\For WikiFilter, it's
HKEY_LOCAL_MACHINE\SOFTWARE\WikiFilterThere is 1 entry at the root directory:
The languages are loaded in that order and filled into the "Language" drop-down list in the browser. The first page you get from WikiFilter, if you does not specify language, site, and title, is always the main page of the Wikipedia of the first language. If that does not exist, you will need to change the "Language" or "Site" in the browser window.
You can change this order to change the default "Language". For example, "en;fr" makes English the default language, while "fr;en" makes French the default.
Each sub-directory stores data-base file names for each project (site) installed.
As a Microsoft IIS web filter, WikiFilter.dll is a a somewhat standard filter application. But to keep things simple and fast, it processes only one notification from the server, gets the address of the request, and either disconnects if the address does not contain the string "/wiki", or sends the html content immediately by calling the function "WriteClient".
As an Apache module, WikiFilter.so is a bit hackish. It's neither an Apache web filter, nor a standard module.
In fact, WikiFilter.so registers itself with Apache as the "quick_handler", meaning any web request is first passed on to WikiFilter.so, and if WikiFilter.so returns OK (the content has been served), the whole request is terminated at once, with all non-kernal modules and input-output filter chains by-passed. This is what happens when a wiki link is passed on to WikiFilter.so. The result is the quickest possible processing for a wiki page, just like WikiFilter.dll does with IIS. And because WikiFilter.so does not depend on any of the modules loaded by "LoadModule" in the config file, you may keep them to a minimum if you use Apache just for viewing wiki pages.