Exploring Google's PageRank

I recently started looking at Google's PageRank, and as part of trying to understand it a little better I made a simple web app to see how PageRank depends on the damping factor, links, etc.  The actual implementation includes the breaking out of the "dangling nodes matrix" as discussed in  Dave Austin's nice article on PageRank.

The web app is on googledrive here.

With this you can

  • create custom web system systems, and see how the PageRank of pages is affected by adding/removing links or other pages
  • see exactly what the underlying matrices are used in the calculation of PageRank, and how they are related to the web system
  • step through the iterations of the power method in the estimation of PageRank

The work-in-progress app is built using:
simple tool to play with Google's PageRank - Add/Delete Pages or Links,  or Explore Convergence of the Power Method

One of the main goals of this project is to show the actual intermediate matrices used in the calculations  so as to try to maintain the connection(s) with the starting web structure as long as possible.  These matrices are updated dynamically as you modify the graph or the damping factor (in addition to the PageRank calculations themselves, of course). 

Per usual, power iteration is used to estimate the PageRank vector. For the most part, it seems to converge in a handful of iterations, and sufficiently fast enough to allow updating for any change of the slider that controls the damping factor.  I have seen a few cases where, with the damping factor set at 1.0, it did not converge within 20,000 iterations.  As pointed out in Austin's note (with some examples), the power method can fail to to converge in this case if the resulting Google matrix is not "regular" (some power of the matrix has all positive entries).

I knew things were big in real life, but one thing that caught my attention was the magnitude of the size of the actual Google matrix for the web.  For example, if its full contents were to be written on the screen, where each column gets about a quarter of an inch, then since there are about 42 billion web pages considered (assuming that the worldwidewebsize.com value for Google is approximately correct), your screen would need to be about 166,000 miles wide and tall... it would extend nearly 70% of the way to the moon.  Even if each column was reduced to the size of a pixel on an iPad retina display (264 ppi), the screen would still need to be about 2500 miles wide and tall.  Of course, the excessive sparsity of the underlying matrices is exploited when performing calculations.

Popular Posts