ggplot2 meets W. E. B. Du Bois

Visualizing 1900s Black American life in R

ggplot2 meets W. E. B. Du Bois

Visualizing 1900s Black American life in R

Data visualization: the rendering of information in a visual format to help communicate data while also generating new patterns and knowledge through the act of visualization itself.

Battle-Baptiste and Rusert (2018)

In my life as a statistician, I’ve been meaning to write something about W. E. B. Du Bois for the last few years, and I’m just now getting around to it. Well, I haven’t completely gotten around to it and will write more about this in the future. But here’s the exhaustive list of responses I’ve received after bringing this up:

Many people: “Who?”

Many statisticians: “Who?”

Many people who indeed know who W. E. B. Du Bois is: “Why?”

Quite frankly, there’s not enough time to properly address the first response, so for now, it suffices to say that W. E. B. Du Bois was a professor of sociology at Atlanta University and is the father of modern sociology (Morris 2017).

For the second response, here’s why: it’s impossible to talk about data visualization without somebody mentioning Charles Minaud’s graph of Napoleon’s invasion of Russia. You also won’t have to wait long before Florence Nightengale’s coxcombs come up, either. You’d probably get through an entire lecture, or book, or day, or course, or semester, or year, or decade before a mention of W. E. B. Du Bois, though.

And why is that a problem?

W. E. B. Du Bois contributed heavily to the Exhibit of American Negroes for the 1900 Paris Exposition. The exhibit’s purpose, in his own words, was “to give, in as systematic and compact a form as possible, the history and present condition” of Black Americans, “picturing their life and development without apology or gloss” (Du Bois 1900). To do so, Du Bois and his “best students … put series of facts into charts” (Du Bois 2007), resulting in “[32] charts, 500 photographs, and numerous maps [as] the basis of the exhibit” (Du Bois 1900). The extant materials from the exhibit comprise the collection African American Photographs Assembled for 1900 Paris Exposition at the Library of Congress, where most of these charts and graphs (but not all!) have been digitized and are available here.

So why do I think W. E. B. Du Bois deserves a mention in data visualization? Because he did a lot of data visualization, obviously. Are you even paying attention? Click that previous link and look at them.

And he wasn’t too bad at it, either; altogether, the Exhibit of American Negroes won 13 or 14 medals at the Paris Exposition (I forgot the exact number that I read—the point is that it’s substantially more than zero), but… was pretty much forgotten after that. Fortunately, Du Bois’s work has recently received more acknowledgement from both a historical perspective (Battle-Baptiste and Rusert 2018; Bridgers 2014; Lewis and Willis 2003; Provenzo Jr 2013) and a data visualization perspective (Forrest 2018a; Meeks 2017)—my fellow ASA1 members may recall the Andrews and Wainer (2017) article in Significance where the authors made a Du Bois X Minaud mash-up to visualize the Great Migration—so I’ll do my best to avoid repeating those stories here.

A few years ago, I first learned about Du Bois’s data visualizations from a CityLab article (Mock 2016), and I was reminded of it again when I saw (and immediately purchased) the aforementioned Battle-Baptiste and Rusert (2018) book. And you know what I said to myself? I said, “self, wouldn’t it be cool if W. E. B. Du Bois had ggplot?”

Then I said, “self, what would these actually look like in ggplot?”

I’ve been using ggplot2 for a while (at this point, who hasn’t?), but I’d never really needed to dig deep into all of the options for customization, so I thought that re-creating some of these plots would be fun way to do that (because I can only take but so much of the mtcars and diamonds data sets). Every plot here uses a basic Du Bois theme that I made, theme_du_bois()—which is mostly just centering the titles and setting fill = "antiquewhite2" as the default plot/panel background colors—and then I tinkered with the plots individually as necessary.

You might argue that my time would be better spent elsewhere. I might agree. And then I might just do this anyway. And that I did, so let’s get to it.

(Disclaimer: This isn’t intended to be a ggplot tutorial at all, so if that’s what you need, believe me when I say that you don’t want one from me.)

Original image at the Library of Congress

The first plot I decided to tackle is an area plot showing the percentage of free Black people by decade. geom_area() does all the heavy lifting here, and the only real tinkering I did was replacing the last label so that geom_text() aligns it with the others rather than at the bottom.

Even if you fell asleep in your United States history class like I did,2 you surely noticed the drop coinciding with the end of the American Civil War in 1865.

Original image at the Library of Congress

This second plot, in my opinion, is one of Du Bois’s most visually appealing and one of my favorites. I’d never seen anything like this before—a pie chart with the symmetry of a pyramid plot—but the information given here is immediately obvious despite it being somewhat unconventional. Overall, it’s just aesthetically satisfying to look at, I think.

It was also the most challenging to make. And it took the longest to make. And as much as I like this particular one, it only emboldened my desire to banish pie charts from existence. But it actually turned out much better than I’d anticipated.

My eyeball estimate of the actual plotting area is \([\pi/4, 3\pi/4]\) in each half, so to represent the blank spaces on the plot, I set these values as missing. (This isn’t the first time trigonometry has come through for me while plotting in R—I once created a hockey rink following the NHL’s exact specifications.) Padding the data with missing values goes a long way for spacing and such, which is the biggest piece of insight I can offer about creating this plot since I never make pie charts (and neither should you). Beyond that and knowing to use coord_polar(), I really just hacked my way through different options until I got something close to what I wanted. Truthfully, it wasn’t all that bad, and having to adjust the legend so much—moving it onto the plot, splitting it into two columns, adding spacing between the columns, etc.—will be beneficial in the future. (@hadleywickham: what’s the easiest way to change the legend key background from a square to a circle?)

Looking at the United States as a whole, Du Bois flipped the aesthetics to display the relative proportion of each race within occupation. I didn’t try to make this one because you already know I hate pie charts, but I still think it’s a cool inversion of the graph for Georgia alone. Actually, one of the more fascinating things to me about the exhibit is Du Bois’s comparisons and presentations of global, national, and local data; Forrest (2018b) argues that the whole thing was basically designed to mimic a giant interactive display with the various levels acting as the viewers’ “click to zoom in/out” options. Indeed, since Du Bois thought it a “good idea to supplement these very general figures with a minute social study in a typical Southern State. … Georgia, having the largest Negro population, is an excellent field of study” (Du Bois 1900), that was probably the intention.

Can you imagine the Shiny apps Du Bois would’ve made?!

Original image at the Library of Congress

Next up is a bar graph displaying the growth of Black city vs. rural residents in former slave states. I’m not sure why I chose to replicate this particular chart; maybe after fighting with the pie chart, I just wanted to go back to something simple.

But simple doesn’t mean ineffective, as this chart fits the exhibit’s overall theme of growth and progress. Through the presentation of Black Americans’ “history and present condition”, Du Bois was telling a story of social progress intended “to forcefully refute the widespread belief that Black Americans were innately inferior and incapable of social advancement” (Battle-Baptiste and Rusert 2018). From a sociological perspective, he treated Black Americans as a distinct nation of people, and much of his data includes comparisons to other countries, in addition to the United States overall, as context for Black American progress. The full sentence of a quote I used previously (emphasis mine): “We have thus, it may be seen, an honest, straightforward exhibit of a small nation of people, picturing their life and development without apology or gloss, and above all made by themselves.” (Du Bois 1900)

Original image at the Library of Congress

The final chart displays marital status by gender and age. As far as creating this plot, there’s nothing crazy here other than having to duplicate the axis, for which ggplot2 conveniently has a dup_axis() function. A couple authors (Forrest 2018b; Wainer 2017) note that this plot is similar to one in Henry Gannett’s Statistical atlas of the United States, based upon the results of the eleventh census (1898). Between this one and the pie chart above, Du Bois was fond of the symmetric/pyramid structure, which appears multiple times in his own 1899 The Philadephia Negro (Du Bois 1996). His familiarity with the contemporary data visualization field shows through another exhibit plot also stylized from one of Gannett’s and population dot maps reminiscent of John Snow’s3 famous cholera outbreak maps.

My R code for these four plots is available on GitLab and GitHub. It is likely (definitely?) not optimal, and I probably made things more difficult than they should’ve been, (a) because maybe that was the whole point—to force myself to play around with all the different options by making these from scratch, and (b) because maybe I don’t know any better. Maybe both. I don’t have to answer to you.

This will not be the last time I write about W. E. B. Du Bois, as I said earlier. It will be the last time I make a pie chart, though.


Andrews, R. J., and Wainer, H. (2017), “The Great Migration: A Graphics Novel,” Significance, 14, 14–19.

Battle-Baptiste, W., and Rusert, B. (eds.) (2018), W. E. B. Du Bois’s Data Portraits: Visualizing Black America, Princeton Architectural Press.

Bridgers, J. (2014), “Du Bois’s American Negro Exhibit for the 1900 Paris Exposition,” Picture This: Library of Congress Prints & Photos, Available at

Du Bois, W. E. B. (1900), “The American Negro at Paris,” American Review of Reviews, XXII, 575–577.

Du Bois, W. E. B. (1996), The Philadelphia Negro: A Social Study, Philadelphia: Univ. of Pennsylvania Press.

Du Bois, W. E. B. (2007), The Autobiography of W. E. B. Du Bois: A Soliloquy on Viewing My Life from the Last Decade of Its First Century, New York: International Publishers.

Forrest, J. (2018a), “W. E. B. Du Bois’ staggering Data Visualizations are as powerful today as they were in 1900 (Part 1),” Towards Data Science, Available at

Forrest, J. (2018b), “Style and Rich Detail: On Viewing an Original W.E.B. Du Bois Data Visualization (Part 4),” Towards Data Science, Available at

Lewis, D. L., and Willis, D. (2003), A Small Nation of People: W. E. B. Du Bois and African American Portraits of Progress, New York: Amistad.

Meeks, E. (2017), “How to Remake Historical Data Visualization and Why You Should,” Towards Data Science, Available at

Mock, B. (2016), “What Black Independence Looked Like in 1900,” CityLab, Available at

Morris, A. (2017), The Scholar Denied: W. E. B. Du Bois and the Birth of Modern Sociology, University of California Press.

Provenzo Jr, E. F. (2013), W. E. B. DuBois’s Exhibit of American Negroes: African Americans at the Beginning of the Twentieth Century, Rowman & Littlefield.

Wainer, H. (2017), “Visual Revelations: The Birth of Statistical Graphics and Their European Childhood: On the historical development of W.E.B. Du Bois’s graphical narrative of a people,” CHANCE, 30, 61–67.

  1. American Statistical Association

  2. Not a lie: this is the only class that was ever boring enough to make me fall asleep. Anyone who’s talked to me for more than 30 seconds will be shocked to hear me say that about a history class, but my U.S. history teacher in high school was truly that bad.

  3. No, not that one.

Matthew A.

Statistician. Ohio State alumnus. I like jerk chicken and Prince. I don’t like anything else. Just because I’m kidding doesn’t mean I’m not dead serious.