Nowhere Near Ithaca: Some Visualizations to Help Understand Quicksort

In the last few weeks, I've put together a few visualizations related to Quicksort, and this post collects the links into one place (mainly for me).

Visualization of Divide-and-Conquer

Link to visualization: https://googledrive.com/host/0B2GQktu-wcTidGMxWmpEV3pySEE/

A Visualization to Help Understand
the Divide-and-Conquer Approach of Quicksort
(available online at this link)

This visualization simply shows how the basic divide-and-conquer approach works with Quicksort. Some notes on its implementation are in this blog post.

The subsequent visualizations are all focused on the partitioning step of Quicksort. The partitioning step is really the core factor in the success of any divide-and-conquer technique.

Each of the visualizations allows the user to proceed through the partitioning process one step at a time, with messages along the way.

The number of values used in the partitioning visualizations is between about 5 and 50. In a production sort routine, it would have probably switched to insertion sort or merge sort for sizes about this small. As for large numbers, the Sort Benchmark competition used to include a terabyte sort, which corresponded to sorting about ten billion 100-byte records. If you were to write out ten billion values on the screen at the same size as that in the visualizations below (where each bar is about an eighth of an inch wide when there are the most values), you'd need a screen about 20,000 miles wide, which is almost completely around the earth. Note that the terabyte sort has been deprecated because it is now basically equivalent to the minute sort competition(!).

Dual Pivot Partitioning per Yaroslavskiy

Link to visualization: https://googledrive.com/host/0B2GQktu-wcTiNEtsejVjRWlmaWs/

A Visualization to Help Understand
the Dual Pivot Partitioning Approach of Yarovslavskiy
(available online at this link)

As Sebastian Wild notes in his (refreshing) masters thesis from 2013, in 2009 Vladimir Yarovslovskiy "appeared out of thin air and turned the world of Java sorting methods upside down." Yarovslavskiy's approach is based on using two different pivots, an avenue of attack that had been deemed unlikely to provide any benefit over existing single pivot methods. The impact of Yarovslavskiy's breakthrough is still being felt on Quicksort, as it has revived interest in multiple-pivot techniques in general, with the result that it is likely that additional improvements are to come.

This visualization is intended to show how Yarovslavskiy's dual pivot partitioning method works. It includes pseudo-source code, and as the partitioning occurs the corresponding line in the source code is highlighted, and the relevant pointers are updated to help convey the classic "crossing pointers" technique developed by Hoare in the original 1960 Quicksort implementation. The visualization also has an experimental option to play notes as the values are placed in their final position in the partitioning.

Some notes on its implementation are in this blog post.

3-Way Pivot Partitioning per Bentley/McIlroy

Link to visualization: https://googledrive.com/host/0B2GQktu-wcTiMkNBbDFIQnpFcTQ/

A Visualization to Help Understand
the Bentley/McIlroy 3-Way Partitioning
(available online at this link)

This visualization illustrates how the 3-way partitioning technique developed by Bentley and McIlroy in 1993 works (as implemented in this Java code by Sedgewick and Wayne). This approach was the core of the Java sort mechanism until Yarovslavskiy appeared on the scene, at which time Bentley and Joshua Bloch worked with Yarovslavskiy to incorporate the newer method into the Java 7 runtime. As for the Yarovslavskiy visualization, you can have it play sounds while in "auto" mode. There is not accompanying pseudosource code yet for this visualization.

Basic Partitioning

Link to visualization: https://googledrive.com/host/0B2GQktu-wcTicjRaRjV1NmRFN1U/

A Visualization to Help Understand
Basic Partitioning in Quicksort
(available online at this link)

This was the first visualization I did for Quicksort partitioning, and was when I first began to experiment with the use of sound to see if it could augment understanding. The partitioning method used is from p.291 of the 4th edition of Sedgewick and Wayne's Algorithms

Some notes on its implementation (and details on the partitioning code) are in this blog post.

Is the Inclusion of Sound Useful?

I was really excited when I embarked on the journey to include sound to facilitate understanding of Quicksort partitioning.

Alas.

While it is a curious and interesting feature to implement, at this point farther down the road, I am not convinced that it is particularly useful for learning in this context. As I get time, I will revisit the oldest visualization to make sure sound is optional (and by default off). Others may find a different tack that does a better job than how I included it here.

Some Things I Learned

In no particular order, some things I learned as part of these projects:

Quicksort, with its handful of lines of source code, can still be surprisingly involved. For example, it was not until I reworked Yaroslavskiy's approach to be able to step through it in javascript that I realized that there were five exit points to increment the main counter. These are hard to see by just skimming the code
The web audio api itself works pretty well. Some folks are doing fancy stuff with it, too.
There are differences in the eventing in the svg dom that require some special attention - a naive jquery approach will result in head-scratching non-events.
Descending below callback purgatory: this sequence of projects was probably the first time I began to deal with the "callback hell" of javascript. Synchronizing the various animations resulted in a gently exploding ivy of callbacks that became startlingly tangled.
I am closer to using something like CoffeeScript to reduce the amount of boilerplate I have to type.
While there are no guarantees in life, Yaroslavskiy's "out of nowhere" improvement to Quicksort in 2009 should be an inspiration to anyone toiling away in relative obscurity. Another example of this is Yitang Zhang, an unknown lecturer at the University of New Hampshire who in 2013 similarly appeared out of nowhere at the age of 57 to prove a famous problem in mathematics regarding primes. As with Yarovslavskiy, he did this with existing techniques abandoned as possible approaches to solving the problem at hand. With the current environment of open access to fantastic courses (such as the one on Algorithms by Sedgewick and Wayne at Princeton), and the web in general, this could be a more and more common event across the knowledge spectrum from those (across the age spectrum!) who take advantage of these resources.

Nowhere Near Ithaca

Some Visualizations to Help Understand Quicksort

Visualization of Divide-and-Conquer

Dual Pivot Partitioning per Yaroslavskiy

3-Way Pivot Partitioning per Bentley/McIlroy

Basic Partitioning

Is the Inclusion of Sound Useful?

Some Things I Learned

No comments:

Post a Comment

Popular Posts