"When you start on your journey to Ithaca, then pray that the road is long, full of adventure, full of knowledge... Always keep Ithaca fixed in your mind. To arrive there is your ultimate goal. But do not hurry the voyage at all." (from "Ithaca", by C. P. Cavafy)

Saturday, May 18, 2013

Hitting the Powerball - An Interactive Simulation

The powerball lottery is big right now, and it got me to thinking about how unlikely it is to win (that's nothing new).  The odds of winning are about one in 175 million.  Assuming about 100 drawings a year, and if you played every time, this means that you might expect to win once about every 1.75 million years.  That's a long time between winnings.

Anyway, I put together a simple statistical simulation (still a work in progress) that lets you run experiments with how long it might take your numbers to hit the jackpot.  It's on googledrive here.

Feel Lucky?
Try it yourself on google drive here 

The simulation uses a web worker to perform the simulation - it just draws the numbers over and over until the number hits or you give up.  It uses "Robert Floyd's algorithm" to pick the 5 distinct numbers in the range 1-59 (see here for a starting place for this tiny but clever algorithm) .  One could also more quickly determine whether you win for a given drawing by simply sampling from a Bernoulli distribution independent of actually modeling each of the balls being picked, but it seemed more interesting to go ahead and implement the individual components of the actual process that occurs in reality.

While of course the javascript is exactly the same on all browsers, I am noticing differences in how often it seems to "hit" across the browsers.  Chrome seems to take longer for some reason.  This might be an artifact of something off in this (simple!) implementation, or due to low-level differences in the browsers themselves, but it seems to be a real difference I need to look into more.  Safari seems to behave more reasonably, but that is a qualitative assessment at the moment. Update: I am now using version 2.1 of the random number generator by David Bau  (seedrandom.js).

The dates get big.  This hit some weirdness in the date formatting in Chrome, which doesn't seem to like it if the year gets above about 276,000 A.D.  I guess that's a reasonable edge case for Chrome to consider low priority.  Anyway, I had to make use of the fact that the Gregorian calendar repeats every 400 years to get around this.

It would be nice to add some additional visualization components to this - a dynamic graph showing you move out in time, occasionally almost hitting it big, and all the while displaying the amount spent to date.  This would help drive the simple point home, but I think seeing those crazy looking years in this initial version does, too.

Friday, May 17, 2013

More Curiosities with Google PageRank and "Distant" Links

This is a continuation of my experiments with the PageRank algorithm for small network systems.  The figures below are from the Google PageRank explorer web app, which is here:


The figures show the surprising impact of adding a single link to the system.  Page H - and even pages C, D, and E - have their PageRank cut by about 30%, with this being taken up by pages I and J.  And the only change is that there is a link added from page I to page J.  And, even though we add a hyperlink out of page I, its PageRank goes up.
Adding Just One Hyperlink to the System - from I to J - Has an Impact on the "Distant" H, and increases the PageRank for I 
Why would the PageRank for page I go up when we add a hyperlink out of it?  This is because prior to adding the link, page I has no outlinks - it is a "dangling node" - and so it is forced to send an equal amount to every page in the system (the webapp above shows the details on all of the matrices involved).  When it has the link to page in the bottom network, it sends much less to the rest of the cells in the system, as it sends stuff mostly to page J, which in turn is passing it right back to I.  In conjunction with this is the fact that there is a only constant amount of PageRank to go around, and so if one page is getting more, then this is reducing PageRank for other pages.  Results can certainly be counterintuitive at first glance.  And, in thinking about it, the very smallness of these networks may lead to results not seen when there are billions of pages.  Only Google knows for sure.

Please feel free to let me know if you find otherwise with the examples above.

Sunday, May 12, 2013

Exploring Convergence of the PageRank Algorithm

As part of learning more about the PageRank algorithm, I created a web app that lets you add/remove nodes and links, and see how this might impact the calculated PageRank values for small web systems. I have now also added the ability to watch how the power method converges (or not) to the PageRank vector for a web system.  You can step forward or backward in the iteration process, as "stuff" moves amongst the pages each step.  This can be particularly interesting in cases where the method does not converge, as you can (usually) see the stuff cycling through the system.

You can play with it here: https://googledrive.com/host/0B2GQktu-wcTiaWw5OFVqT1k3bDA/

You click the "Explore Convergence" button to explore the convergence, and click the "Hide Convergence" button to return back to "normal mode".

Click the "Explore Convergence" button over there on the right to watch how the power method converges (or doesn't converge, if that be the case).

Once the "Explore Convergence" button is clicked, a few extra things related to the power method are shown on the screen near the top.  Clicking the "Hide Convergence" button will hide this extra stuff, and show the final result of the iteration method once again.

By using the left/right arrows anywhere on the page (or by using the slider), you can step forward or backward through the power iteration method.

One thing that might stand out to you is how stuff will get sent to apparently "disconnected" pages as the method progresses.  This is because - unless the damping factor is one - the PageRank algorithm forces every page to be connected to every other page in the web, although the "pipes" between the pages are very small.

Popular Posts

Powered by Blogger.