Saturday, June 30, 2012

Making Box and Whisker Plot in Excel

There's a simple method to construct a box and whisker diagram in Excel, by taking advantage the fact that box and whisker diagram is just a scatter plot with values fixed on one of the axis to give height to the box. Given a sample of data in column A of an Excel worksheet, here are the parameters necessary, along with the formula needed to obtain those values:
  • Minimum: =MIN(A:A)
  • 1st quartile:  =QUARTILE(A:A,1) or =PERCENTILE(A:A,0.25)
  • Median: =MEDIAN(A:A)
  • 3rd quartile:  =QUARTILE(A:A,3) or =PERCENTILE(A:A,0.75)
  • Maximum: =MAX(A:A)
Once those values are obtained, here is the most crucial component of the diagram: pasting the values into this table for a horizontal box and whisker diagram. Simply reverse the columns for vertical diagram:
Min 1
Min 3
Min 2
Q1 2
Q1 3
Q1 1
Median 1
Median 3
Q1 3
Q3 3
Q3 1
Median 1
Q3 1
Q3 2
Max 2
Max 1
Max 3
Once the numerical values are filled, this is simply a set of coordinate points. Select the table, and choose Scatter Plot > Scatter with Straight Lines and Markers. As imagined, the values 1, 2, 3 are arbitrarily chosen; since we are dealing with one-dimensional data, and just want the other axis to demonstrate some height to the boxes, the values 1, 2, 3 are the easiest to work with. In fact, as soon as the scatter plot is made, it's best to delete the values of that axis. Here's a box-whisker-plot of 50 randomly chosen numbers between 0 and 1:

To see that this is indeed the result of the table from above, trace the points on this plot, by going down the rows of the table. The labels on the plot should help to trace. Start from (min, 1) at the bottom left of the plot. The next row of the table is (min, 3) at the top left of the plot. In order to move onto the Q1 value on the plot, we must go down to (min, 2) and then to (Q1, 2) in order to preserve the straight edges of the box and whisker plot. And so on, the table lists the points that must be traced over to ultimately make the plot.