Notes on a Recent D3 Visualization: Creating Tables with Text and Charts

The purpose of this note is to recollect on a few things I learned as part of putting together a visualization/data analysis of 100,000 Amazon searches; the visualization is located here.

This was the first time I had used D3.js to create a table with both text and charts created from the same dataset. The somewhat long-ish details of this are in the next-to-last section below.

The first few rows in a big table of search volumes for Amazon searches.
The full table is available here.

Basically, you can see search volumes and monthly search patterns for many terms that have the word "amazon" in them. I do not think that you can get this type of information from Amazon; however, coupled with the information from Google's Keyword Planner tool, it might serve to provide some interesting signals.

Creating the Data File

I first created a list of 100,000 terms of the form "amazon [word]", where the words were from this 1991 list from SIL. I'm sure I can find a better longer and more current word list (e.g, one from Peter Norvig from around 2008), but I haven't returned to this. Plus, if you're going to post the results somewhere, there can be a lot of distracting adult words that must be culled.

I then prepared things to run through Google's Keyword Planning tool, which can generate the monthly search volumes for the terms. While you can upload a list of words to the Keyword Planning tool, there is a limit of a total of 3000 words in the file. I could not find a way to call the tool automatically. Since I had two words per line, this means that I had to break up the original list into separate files with 1500 lines each (via the command "split -l 1500 filename words_"), and then generate the output results separately. While a bit tedious, this only took 45 minutes or so of relaxed but mindless clicking, resulting in about 80 separate csv files with the results. These were then joined together via a command like the following:


rm summary.csv
rm allData.csv
for f in d*.csv; do
  echo $f
  sed -n '6,5000p' < $f >> allData_tmp.csv
  sed -n '2,5p' < $f >> summary.csv
done
sed -n '1,1p' < $f > allData.csv
cat allData_tmp.csv >> allData.csv
rm allData_tmp.csv
Joining files downloaded from
the Keyword Planner tool into a master data file

After taking a look at this file in Excel, sorting a bit, and deleting some of the columns, I saved it back as csv for the file that is actually used for the visualization, and at that point also shortened some of the column names.

The raw data in the csv file looks like this:

words,monthly,competition,bid,Impr. Share,Apr2013,May2013,...,Mar2014,clicks,impressions,cost
amazon com," 5,000,000 ",0.13,0.05,0,4090000,4090000,...,5000000,3668.07,255015.2,15865.83
...

I ultimately pruned this file to just those terms for which there were at least 50 searches per month - this resulted in a file with about 2300 lines, and weighs in at about 220k.

The data file is loaded into an array of custom json objects when the page loads, using D3's "d3.csv(...)" function.

Fixed Table Headers

Using a fixed table header is critical for something like this. I tried a few of them - several insisted on changing the html that really borked up the table. The one I ended up using was StickyTableHeaders by Jonas Mosbech. It just worked.

Fixed table headers, via
StickyTableHeaders by Jonas Mosbech
Sorting

Currently, you can sort by the search terms or the average monthly searches. I'm using TinySort for this - straightforward, and it has some nice options for sorting columns that display numbers with commas, which leads to...

Javascript, Numbers, Commas

I always forget about this. It is surprising to me, but javascript's parseFloat acts silly when you send it a string containing commas, so you have to work around that. In this case, the actual number to be sorted is sorted as an attribute of the table cell, and TinySort uses that when sorting. ugh that this is even necessary.

Using D3 to Generate Rows with Both Text and Charts

There were two primary things of interest to me:

  • Average monthly search volume
  • Monthly pattern across the year for each term

Given the huge range of values, it didn't seem useful to graph the average monthly volume. Instead, just presenting the number (with commas) seemed sufficient.

However, some kind of graph for the monthly patterns did seem useful. While it was a little more work, I decided to use an svg line chart for these. Raw numbers were right out - your eye can't detect the patterns very well that way. Bar charts, created either with D3 or raw divs, would result in the bars themselves consuming a lot of visual space that could distract. Thin line charts, however, given the eye's evolutionary powers of edge detection, coupled with our own innate sensitivity to outlier patterns (e.g., peaks), seemed appropriate, especially as a way to allow quickly scanning through rows and rows of the table.

It was a little trickier than I expected to use D3 to make this work. Of course, I could have taken a more hands-on, D3-less approach to creating the table and the charts, but I have always found that it has been more than worthwhile to spend a little time digging into the power of D3.

The relevant D3 javascript code, with judicious comments, is included below. Note that the index.html page includes an empty table with ID="theDataTable".


   var theTable = d3.select("#theDataTable");
   var tHead = theTable.append("thead");
   var tBody = theTable.append("tbody");

   //add the header row
    var columnTitles = ["Search",
                        "Avg Monthly Searches",
                        'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept',
                        'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar'];

    //note the use of ".html(..)" rather than ".text(...")
    //This allows using html stuff; if you used ".text"(...)" then
    //any html will be escaped
    tHead.append("tr")
            .selectAll("th")
            .data(columnTitles)
            .enter()
            .append("th")
            .html(function(column) { return column + " "; })
            .attr("id",function(column,i) {return "header_" + i;})
            .attr("month",function(column, i) {return i-2;})
            .on("mouseover", function() {
                var month = d3.select(this).attr("month");
                if (month>=0) {
                    var monthLines = d3.selectAll('.month_m_' + month);
                    monthLines.classed('selected',true);
                }
            })
            .on("mouseout", function() {
                var month = d3.select(this).attr("month");
                if (month>=0) {
                    var monthLines = d3.selectAll('.month_m_' + month);
                    monthLines.classed('selected',false);
                }
            });

     //set up the rows and the different columns that will be
     //    dealt with separately
     //"amazonData" is the array of custom json objects 
     //    created from the input data file;
     // there is one such object for each search term "amazon foo"

     var rows = tBody.selectAll("tr")
                      .data(amazonData)
                      .enter()
                      .append("tr");

     //Prepare to create the different columns.
     //Note that these will have access to the data 
     // "amazonData" attached to the parent object "rows".  
     //This is yet more D3.js magic!

     var theFirstColumn = rows.append("td");
     var theSecondColumn = rows.append("td");
     var theGraphColumn = rows.append("td")
                               .attr("colspan",12)
                               .append("div")
                               .style("height","100%")
                               .style("width","100%");

     //populate the first column
       //the "d" referenced below will be a row in the 
       // "amazonData array", since we attached the array 
       //to the parent object "rows"

       //the word "amazon" in italics
            theFirstColumn
            .attr("class","searchWordColumn")
                .attr("searchTerms",function(d) {
                    return d.words;
                })
            .attr("dataSort",function(d) {
                return d.words;
            })
            .attr("abbr",1)
                .append("span")
                .attr("class","amazonWord")
                 .html(function(d) {return  d.words.split(' ')[0]});
        //the second search term
           theFirstColumn
            .append("span")
            .attr("class","amazonSearchWord")
            .text(function(d) {return ' ' + d.words.split(' ')[1]});

     //populate the second column
       //the "d" referenced below will be a row in the amazonData array
           theSecondColumn
            .attr("abbr",1)
            .attr("dataSort",function(d) {return d.monthlySearches})
            .append("span")
            .attr("class","monthlySearches")
            .text(function(d) {return ' ' + addCommas(d.monthlySearches)});

     //make the chart
        //get width and height we want to use 
        // (had to hard-code height for now)
        var theWidth = (theGraphColumn.style("width"));
        var theHeight = 40; // doesn't work:theGraphColumn.style("height");

        //add the svg element with the desired width and height
        var svgLine = theGraphColumn
                    .append("svg")
                    .attr("width",theWidth)
                    .attr("height",theHeight)
                    .attr("class","svgCell");

        //helper function - I should change this to use D3's built-in
        //   scaling functions.  Did lots of silly putzing with this.
        var scaleY = function(fraction) {
                //  0 --> height - 4
                //  1 --> 4
                return (8-theHeight)*fraction + (theHeight-4);
        };

        //add the line for the searches each month
        //try to set the x values so that they are 
        // roughly in the middle of the header cells
        svgLine.append("path")
               .attr("d", 
                 function(d) {
                  return (d3.svg.line()
                                .x(function(dValue,i) {
                                    return theWidth/24 + i*(theWidth/12); 
                                 })
                                 .y(function(dValue) {
                                     if (d.searchesRange===0) {
                                      //just return the middle
                                      return scaleY(0.5); 
                                     }
                                     else {
                                      var f = (dValue - d.minSearches) /
                                                      (d.searchesRange);
                                      return scaleY(f);
                                     }
                                 })
                                 //"d.values" is the array of 
                                 //  searches for each month
                                 .interpolate("linear"))(d.values);
                            })
                    .attr("stroke","blue")
                    .attr("fill","none")
                    .attr("stroke-width",1);
         
        //add light vertical divider lines... somewhat crudely done here;
        //  had some fighting with getting the selectors to work
          for (i=0;i<12;i++) {
            svgLine.append("line").attr("class","monthLine month_m_" + i)
                .attr("x1",theWidth/24 + i*(theWidth/12))
                .attr("x2",theWidth/24 + i*(theWidth/12))
                .attr("y1",0)
                .attr("y2",theHeight)
                .attr("monthIndex","m_" + i)
                .on("mouseover", function() {
                   var ii = d3.select(this).attr("monthIndex");
                   var monthLines = d3.selectAll('.month_' + ii);
                   monthLines.classed('selected',true);
                })
                .on("mouseout", function() {
                    var ii = d3.select(this).attr("monthIndex") + "";
                    var monthLines = d3.selectAll('.month_' + ii);
                    monthLines.classed('selected',false);
                });
           }
Still to Do
Smarter Load

I am currently rendering the entire table when the page loads. It would be smarter to just do this on demand.

Add Trend Notes

I hope to poke at adding some notes to the data regarding any trends that look interesting. For example, I imagine that "drones" peaked in December 2013 not because it was high on everyone's Christmas list, but because that was when Amazon's drone program was announced.

Improve Visibility/Clarity

I did a number of things to try to result in a table that can be effectively browsed visually - various horizontal/vertical lines (thickened on mouse hover), alternating colors, etc. However, I think there are still some challenges and improvements to be addressed with this.

No comments:

Post a Comment

Popular Posts