|
Conference on Quantitative
Social Science Research Using R
(click on the title to download the abstract)
|
|
Keith A. Markus (presenter) and Wen Gu
|
|
Bubble Plots as a Model-Free Graphical Tool for Three Continuous Variables
and a Flexible R Function to Plot Them
|
Researchers often wish to understand the relationship between two continuous predictors and a common continuous outcome. Existing literature and software provides a number of alternatives for plotting such relationships. Some of these, such as conditional regression lines (Aiken and West, 1991) or 3D regression surfaces, depend upon an underlying model of the data. The veridicality of the graph depends upon the veridicality of the model, and a bad model for the data could result in a misleading graph. Also, such approaches do not display the degree of variability around the conditional lines or surfaces without further modification. Non-model-based alternatives include 3D scatterplots, co-plots (Cleveland, 1993), trellis plots with continuous variables broken into categories (Cleveland, 1994; Robbins, 2005), and scatterplot matrices. Of these options, the first two are most effective because breaking continuous variables into categories results in information loss, and scatterplot matrices do not capture 3-way relationships.
An alternative is to use an enhanced 2D scatterplot. Bubble plots represent values of a variable using the size of the plotted circles. One can combine these by using bubbles to represent the third variable on a 2D scatterplot. One disadvantage is that people are generally not good at judging relative sizes for objects not aligned along a common edge (Cleveland, 1994; Robbins, 2005; Wainer, 1997). This can be addressed by including a grid (preferably light) on the scatterplot to make it easier for users to judge bubble sizes (Tufte, 2001). Another disadvantage is that people do not distinguish small differences in area well (Cleveland, 1994; Cleveland, Harris and McGill,1982; Robbins, 2005; Tufte, 2001; Wainer, 1997). This tradeoff, however, is still preferable over plots (e.g. 3D scatterplots) that rely on viewers to infer depth as an illusion to represent the third variable. The R function bp3way() implements these graphs with a variety of user specifiable parameters. In addition to specifying the data, options include selecting the proportion of cases to be plotted, how they are selected, and a range of graphical parameters. When parameters are not specified, pb3way() makes an effort to choose sensible values that are sensitive the data provided. An empirical study demonstrates the comparability of these plots to both 3D scatterplots and co-plots as a means of exploring 3-way continuous data without presupposing a model.
|
|
|
|