Semi-offline wikipedia reader


ttsiod already built an (fast) offline wikipedia reader 2~3 years
ago, and it’s still working with latest wikipedia .xml.bz2 database dump
file!

wikipediaDumpReader also enables you to browse wikipedia offline.

I tweaked ttsiod’s software a little bit, so now you can get a better
user feeling when offline browsing: you can see images! Oh, well,
actually these images requires network connection for viewing, that’s why I call it semi-offline 🙂

You may wonder what’s my point updating this tool, it’s working, why
bother fixing it. Well, it has some issues dealing with new mediawiki
templates, and the pages sometimes are rendered less satisfactory. The
mediawiki parser ttsiod used is version 1.7.1, and the current version
is 1.16, so maybe things are breaking down.

As about the images viewing, well, who doesn’t love images, right?

You can look at the screencast at the end of this article.

To use my version of software:

  1. make sure you have the required software installed on your system:

    apt-get install python-django libxapian-dev xapian-tools

    Then check out my code:

    git clone http://github.com/baohaojun/windows-config.git ~/windows-config

    and put ~/windows-config/bin/linux/wiki-* (2 scripts) into your path.

  2. Get wikipedia dump file:

    mkdir ~/wikipedia
    cd ~/wikipedia
    wget http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

    This takes time, since the file is currently about 6.2G

  3. Build the index using wikipediaDumpReader

    cd ~/windows-config/gcode/wikipediaDumpReader-0.2.10/
    python ./mparser.py ~/wikipedia/enwiki-latest-pages-articles.xml.bz2

    This will generate a file at ~/wikipedia/enwiki-latest-pages-articles.idx

  4. Build the index database:

     cd ~/windows-config/gcode/offline.wikipedia
    make
  5. Start the server:

     cd ~/windows-config/gcode/offline.wikipedia/mywiki
    python manage.py runserver
  6. Start browsing offline wikipedia:

     firefox http://localhost:8000/article/Mathematics/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: