Semi-offline wikipedia reader

October 27, 2010

ttsiod already built an (fast) offline wikipedia reader 2~3 years
ago, and it’s still working with latest wikipedia .xml.bz2 database dump

wikipediaDumpReader also enables you to browse wikipedia offline.

I tweaked ttsiod’s software a little bit, so now you can get a better
user feeling when offline browsing: you can see images! Oh, well,
actually these images requires network connection for viewing, that’s why I call it semi-offline 🙂

You may wonder what’s my point updating this tool, it’s working, why
bother fixing it. Well, it has some issues dealing with new mediawiki
templates, and the pages sometimes are rendered less satisfactory. The
mediawiki parser ttsiod used is version 1.7.1, and the current version
is 1.16, so maybe things are breaking down.

As about the images viewing, well, who doesn’t love images, right?

You can look at the screencast at the end of this article.

To use my version of software:

  1. make sure you have the required software installed on your system:

    apt-get install python-django libxapian-dev xapian-tools

    Then check out my code:

    git clone ~/windows-config

    and put ~/windows-config/bin/linux/wiki-* (2 scripts) into your path.

  2. Get wikipedia dump file:

    mkdir ~/wikipedia
    cd ~/wikipedia

    This takes time, since the file is currently about 6.2G

  3. Build the index using wikipediaDumpReader

    cd ~/windows-config/gcode/wikipediaDumpReader-0.2.10/
    python ./ ~/wikipedia/enwiki-latest-pages-articles.xml.bz2

    This will generate a file at ~/wikipedia/enwiki-latest-pages-articles.idx

  4. Build the index database:

     cd ~/windows-config/gcode/offline.wikipedia
  5. Start the server:

     cd ~/windows-config/gcode/offline.wikipedia/mywiki
    python runserver
  6. Start browsing offline wikipedia:

     firefox http://localhost:8000/article/Mathematics/



