“Brilliant people simplify things, and mediocre people complicate things.” — Unknone

Mapping Wikipedia

I was reading reddit the other day and stumbled upon the fact that wikipedia has an API. Neat I thought, and having some free time since my exams are now over I figured I would play around a bit with it. When I browse wikipedia I often click from link to link, and have come to the conclusion that all articles are 6 degrees of separation. For example I can go from Ubuntu to Tesco to Japan to G8 to United States to George W. Bush bet you never knew Ubuntu was connected to George W. Bush. I once wrote a map of the internet program that people seemed to like to download and modify so in that spirit I wrote wikimap.py to wikipedia. wikimap.py Requires pygraphviz (Install using sudo aptitude install python-pygraphviz). To use the program simply run the program and give it the wiki page name for example “ wikimap.py Norman_Graham” Note that Ubuntu is Ubuntu_(Linux_distribution) and you can find the full name in the address bar of your browser. For more complicated maps use wikimap.py –help for full usage info.

Here are a few examples that I created with wikimap.py


And a really large one being viewed in inkscape.

More examples including the dot files can be found at http://www.files.earobinson.org/wiki/. But there are a few features I would like to add:

  1. Generating the graph using python-pygraphviz currently takes a really long time and I would like to be able to do it faster
  2. Page links currently include templates, I would like to only use links in the body of the text
  3. It would be nice if the software cached previously seen connections. (But then I need to detect updates to that page)

If anyone has a nice simple soultion to any of those problems leave a comment. Also if you are able to generate any neat images I would love to see them.

Tags: , , , , , , , , , ,

11 Comments

rssComments RSS transmitTrackBack Identifier URI


There is a nice freeware apple for Mac called pathway which nicely does what you created.

Here is the site http://pathway.screenager.be/

Here is the screenshot

Comment by jonico on April 28, 2008 2:19 am


So what would it take to create a map of all of Wikipedia?

To speed up the bit were you download the wiki entries you could use threads. So, within the thread(s) you would download the html page.

The third todo can be done in two ways, were you says ignore cache (download urls that have been seen) or not to ignore it.

Comment by Bulkan Evcimen on April 28, 2008 3:19 am


Great post! it’s cool and creative.

Comment by jiu on April 28, 2008 4:17 am


http://chrisharrison.net/projects/wikiviz/index.html

is a pretty cool visualization in the same vein. Graphviz is really cool software and the results from your method are probably much more usable than Wikiviz. Good stuff.

Comment by Ben on April 28, 2008 7:12 am


Rather than use the api, you can get the download version of Wikipedia, it would be a lot faster to generate large maps from and save Wikimedia on bandwidth ☺

http://en.wikipedia.org/wiki/Wikipedia:Database_download

Comment by H3g3m0n on April 28, 2008 7:29 am


Check out Matt Biddulph’s post at HackDiary:

“Using Wikipedia and the Yahoo API to give structure to flat lists”

http://www.hackdiary.com/archives/000070.html

Comment by Pete Skomoroch on April 28, 2008 9:20 am


Wikipedia asks that you not use a webspider to access the database. You may download the entire database of any wikimedia project from the following link:

http://en.wikipedia.org/wiki/Wikipedia:Database_download

Comment by Marc on April 28, 2008 10:16 am


Thanks for all the comments. I was really just playing with it on my own not really worried or not if there was all ready a tool out there that did it because I just wanted to play with the code, and having the code is always a good thing and now you can play with the code to.

I love the work of Chris Harrison, and think he is the one that gave me the idea to map the internet in the first place.

@Bulkan Evcimen Ya thanks for the feedback, graphviz is currently the real benchmark so I need to figure out how to replace that or to make it a lot faster :)

Comment by earobinson on April 28, 2008 11:12 am


Sorry to Mark and H3g3m0n, not sure why your comments got marked as spam, I will look into doing that in the next version.

If anyone else had comments that are missing please contact me.

Comment by earobinson on April 28, 2008 4:37 pm


Mapping of Wikipedia is being undertaken by Katy Borner’s network visualization group at Indiana University. The graphics are very interesting. See

http://scimaps.org/maps/wikipedia/

Comment by joe on April 28, 2008 8:10 pm


[...] Mapping Wikipedia (tags: python wikipedia) [...]

Pingback by links for 2008-04-29 at they made me do it on April 29, 2008 12:34 am

addLeave a comment