Context is important, this is targeted at journalists. They are usually trying to make a point to casual readers.
For readers with more interest or who are numerate in their day jobs (engineers, finance, or economists), dual axis charts can often be a great choice.
Since we are engineers or founders trying to deal with very complex systems, adding detail and clarity like the Economist or Edward Tufte does is the better way to go.
Author here. Thanks for setting the context: Datawrapper – the data vis tool I write articles like this for – is indeed for people who want to make a point with their charts and maps, often to a broad audience. I agree that people who have learned to read dual axis charts can benefit greatly from them (the same is true for rainbow color maps).
Financial Times journalist John Burn Murdoch changed my mind on dual axes charts – even for casual readers! – a bit over the last six years, too. Here's a dual axis chart he created for the FT: https://x.com/AlexSelbyB/status/1529039107732774913
The next article I write on dual axis charts will probably be a "What to consider when you do use them" one.
At first glance, sure, but without further context or supporting data I'm suspicious:
1. Why just the Daily Mail? Is that the only paper that matters in Britain, or just the one that happens to correlate?
2. I would expect public opinion to lag coverage in the paper if there were a causal relationship. This graph is over too great a period to really see that, but if the creator wants to convince me, they'd show that.
3. I might expect the lag to differ when coverage is increasing vs. decreasing. Again, if I'm to believe this graph, more context would help.
4. No consideration of other factors that might lead to changes in public concern?
5. No consideration of factors that might lead to *both* an increase in coverage *and* an increase in concern?
I'm sure I could come up with 5 more reasons to doubt this graph if I thought for another 60 seconds...
The economist is a fantastic benchmark when it comes to data visualisation. One thing to note is they publish a lot of the underlying data and models behind their visualisations on their github. If you know R it's a tremendous resource.
I generally find that a second Y axis creeping in is perhaps an indicator to stop and have a really deep think about what you are trying to achieve. You might try doing a 3D graph for example where x, y1, y2 becomes x, y, z then spin and explore. However you have to remember that y1 and y2 are both dependent on x (by definition) so when you put y2 to a separate dimension, it is not independent from y1 (or is it?)
There are no hard and fast rules when it comes to spin doctoring via graphs, and as the old adage doesn't go: There are liars, damned liars and politicians.
The only one that's improved is the one from Brazil, to be honest. The rest is taste.
Besides, it's ok if the graph takes a bit to digest, other wise you can just keep printing the same three graphs over and over merely renaming the axis.
This is a pretty good article and for the most part, should be heeded. It's quite rare for the audience of a chart to exclusively be highly-numerate people (and these people, who are often inundated with data, are not immune from being misled by poorly-conceived charts). It's kind of strange that the top-voted comment points to "better" advice while also directly contradicting the article's main point ("dual axis charts can often be a great choice").
I mean, certainly you have the right to add some color but it comes off like you are saying to ignore the article entirely in favor of your alternatives.
Found this while trying to create an observable plot with multiple scales[1][2].
I'd argue multiple scales are OK if the multiple axes have different units that can't be easily compared/confused and are used for greater information density (instead of relative comparison purposes).
For example: I'd like to plot weather stats like hourly temperature, precipitation, and AQI throughout the day, so several different days can be compared with each other. (And fit all this information on a mobile screen.)
The article only shows examples of dual axis charts where line series are used for both axes. This will clearly cause confusion (especially when tooltips are not available).
I've generally found that when displaying a percentage, it is helpful to show the individual counts for numerator/denominator. I believe that showing percentage as a line series on one axis, and raw counts, represented as a column on the other axis, can be a helpful visual.
Have you looked at how yr.no plots their forecasts on the phone. I find the graph view to be very helpful in the winter when planning ski days or days to skip. They must have updated the charts recently since they are different from what I remember this winter but it looks similar.
1. I'm not sure why having two charts side by side helps?
2. Indexed charts are also not a panacea - depending on what point on the x axis you choose as your starting point, it is easy to make it seem like one series is rapidly outpacing the other (ie choosing to start at the peak).
Ultimately I think charts are best thought of as a way to communicate a conclusion, not be the primary source for drawing a conclusion. Figure out what point you are trying to convey and choose the chart that communicates that the best.
Growth is actually one of the few things where it's permissible to remove 0 from a scale. For instance with asset prices, the dollar value doesn't matter, only the magnitude of the change.
So, it's great that they try to actually get data on what kinds of charts convey what information. However, you need to know who your audience is. I, for example, found all of their suggested alternatives to be harder to interpret than the dual axis chart. If you're trying to see whether or not the ups and downs of two different variables are similar, suggesting a connection between the two, none of the suggested alternatives do as good a job (although two charts could, if instead of having them side by side you had them one above the other, with the same x-axis scale, but that is really just a stealth dual axis chart).
Most of these "don't use this kind of chart" seems to be trying to make it impossible to confuse or mislead your audience, and that is just not plausible. You do, and probably usually should, have some point in mind when you are showing someone else a chart, and the format needs to make it easy to see that. Almost any chart, even pie charts, have some particular use case where they are the best chart for that purpose. No chart is going to always be the best way to present data. Like choosing what kind of language to use in explaining something, you need to know something about who your audience is, and what they are accustomed to.
Wasn't there an article the other day about a concept that's similar to incompleteness theorem? That any ambiguity-free language is incapable of completely describing sufficiently complex situations? Am I just imagining that? [0]
I feel like making a tool harder to use, just to prevent bad actors, only punishes good actors, while the bad actors find some other way to act badly. Like, I don't want to participate in your arms race against disinformation purveyors, i just want to illustrate that it tends to rain on days that are cloudy and have high humidity.
As an engineer with an oscilloscope, not being able to plot two probes against each other on the same chart would be severely limiting.
For instance, imagine a 10x attenuator / amplifier. Maybe the input has a DC offset. Being able to plot the two against each other to look for (e.g) distortion, is invaluable. This is committing the two cardinal sins (judging from some comments here) of not starting at zero and different scales, and yet it's not misleading at all.
I can believe dual axis charts allow misleading results, but that doesn't mean they don't have completely legitimate uses.
Agreed. dual scale plots are a great visualization to emphasize correlation between time series.
I think of it as depicting an intermediate step in computing a Pearson R when the data have been z-scored but before you’ve collapsed across data points
And the possibility of fiddling with scale to mislead still exists with side-by-side charts, their #1 alternative. In fact, they use the same misleading scale start and stop points as they criticize in the dual-axis version, so that the "one went up 80%, the other went up only 40%, but it looks like they went up equally" still applies to their replacement.
I once worked with someone who was doing performance benchmarking of two systems, and made a duel axis chart with the lines right on top of eachother when in fact one system was like 5x faster than the other. it drove me nuts because I didn't even realize the dual axis at first and thought that they literally had identical performance
Yah, the moment you see/make a dual-axis line plot, you know you're comparing relative change. The whole point is to effectively normalize for absolute value.
So yah, anyone arguing primarily on the basis of absolute value on these plots is likely pulling a fast one.
Something that occurs to me: there's [not much|nothing] special about "dual" axis charts, meaning: in general the graph matches color with the axis, but there's nothing "right" vs. "left" about the graphs. Therefore, there's nothing special about "dual" axis charts being right- and left-axis, and you could just as easily put both axes on the left (at a little loss in clarity). And finally, that means you could just as easily have three, four, or more axes.
Not really -- obviously a chart with six axes on the left and six different color-coordinated graphs would be absurd, but that's my point: there's nothing that makes a two-axis graph less absurd other than scale.
All of which to say: if there's no real correlation that you're trying to illustrate between the data series, then separate charts are the way to go. If there is a correlation, there's (probably) a better way to illustrate it.
I'd argue that the zero value should always be shown. Otherwise you get different impressions of the rate depending how you scale and subset the Y axis.
This is not a good practice at all. Do you think atmospheric CO2 charts should show 0? How about daily temperature reading for human body temperature? Should daily stock tickers all start at 0?
Why is 0 magical?
Adding 0 to the vast majority of plots shows that data at an unnatural scale that can obscure genuinely important information. Human body temperature readings on a scale from 0 to 107F would make all the important information hard to see.
A much better rule is that charts should have reasonable bounds based on knowledge of the system. For human temperature in F anything less that 95 and greater than 107 basically mean you're dead. For processes in nature good points are some delta - the lowest record to delta + highest recorded. For things like daily stock prices, a few standard deviations each way from historic volatility works.
The dogma that charts should all start at 0 is complete nonsense and tries to side step reasoning about you data. Yes scales can be used to misrepresent data, but forcing 0 to the axis does not solve this.
Yes. Charts are communication devices. Any "rules" for charts should be seen like similar "rules" for essays or emails: good advice that almost always gives a satisfactory result when followed. Reliable paths for infrequent authors.
But what matters most in charts is the same thing that matters most with writing: pick one major point and stick to it (if you're really good or can't avoid it, maybe a couple points). This also explains why a lot of dual-axis charts don't work: the author explains two sets of data that aren't even measured on the same scale and then leaves the reader to connect them and understand the meaning of that connection. You can't be sure the reader will end up at the point you wanted to make.
That's not to say a dual-axis chart is always the wrong choice. Just that, if you start making one, stop and ask if there isn't a better way to show the data. Same with pie charts.
Edward Tufte is a great source for learning which types of charts and visual techniques do their job best. I enjoyed his book "The Visual Display of Quantitative Information."
Fahrenheit is not an absolute scale, so there is nothing special about 0F, you're right about that. As for your other two examples (atmospheric CO2 and stock tickers)... Yes, the scale should start at 0. Why shouldn't they?
So if someone showed body temperature measured in Kelvin you would argue that it should start at 0? That seems even more ridiculous.
> Why shouldn't they?
Because for the vast majority of stock it would appear to be a straight line every single day? Can you find me a example of a stock trading app for a company who's price is > $100/share that shows intraday price activity on a zero scale?
Likewise most co2 charts start around 300ppm since that has been roughly where the lower bound of atmospheric co2 levels have been for all of human history.
The last time co2 was 0 on the planet earth it was just a molten rock so what's the meaning of showing this value? It's not even theoretically possible that co2 could be that low baring alien life sucking the atmosphere off the planet.
Can you clarify why the scale should start at 0 for these things? How is that anywhere close to an honest representation?
Because starting at zero can cause scaling issues that mask meaningful trends and variation. That can also be abused to mislead, but a simple rule like “always include zero” ain’t the solution to that.
All fair points about zero. Sorry, I acknowledge now I was overly influenced my metrics dashboards I use for alerting. I've seen people panic at a seeming steep rise in error rate or increase in latency because the chart was not showing the full range (0 to 1 for rates, or 0 to 2x SLA for latency). I was only thinking of operational alerting dashboards.
In that case, we should report body temperature in Kelvin. However, now the dead-alive range (95degF - 107degF) becomes 308K to 315K.
Starting at zero, that range (17K) is now only 5% of the graph, assuming we start at zero. Or in other words, if your chart is 10cm tall, the entirety of the useful range is compressed into a space that is 5mm tall.
That’s too weird for most audiences. Removing the X axis altogether seems more appropriate (while keeping the labels). Then the plot area is still “bottomless” in a sense, but the labels are where people expect to see them.
Having the axis on top implies that the values are negative. Like an ocean depth chart.
Great article. I used to be into charting a lot and ran a charting product at a famous firm. Would love to see the thoughts of the author on other charts like radar and treemap. :) Great read.
I find plotting two correlated but unrelated data series (like temperature and humidity) can be fine.
But the article chooses the worst possible dual axis charts where both data series not only measure the same thing (GDP) but share the same units (dollars). What you actually have is a multiple data series chart with actually one axis but you made two to be confusing.
I get it, and sympathize, but at many companies the decision maker is someone who wants to see dual axis charts. If Datawrapper can't do that, then that would be a point against using it widely.
It would be a point against using it in companies where the decision maker thinks they know better than the people making the charts, certainly (though if you've managed to already adopt the tech, you can often present decision makers with an alternative (indexed charts for most cases, I suspect) and they'll turn out to be more flexible than you'd originally guessed).
I think "trying not to add things they consider footguns so the people making the charts are happy and so more likely to recommend them at their next job" is a trade-off that may or may not turn out to be more profitable in the long run.
In case of German GDP vs Global GDP I'd argue the correct thing to do is to draw a graph of a new variable "German GDP as a percentage of Global GDP" and a separate graph of Global GDP.
What a patronizing company. Your customers keep asking for a feature that is widely supported and you refuse to add it because it violates your sensibilities. Instead, you write this diatribe lecturing us that the way we want to display data is wrong. Just reading the opening paragraph, whatever interest I may have had in your plotting capabilities evaporated.
I would be the one in the sample who did not find the charts confusing.
The two separate graphs are much more difficult to compare - you can't see which elements compare to the same year so lose a lot of information.
The information in the chart is if there is a change in one time series is there a change in the other. - that is probably all you can infer as without error bars you can't see if the differences are material. (ie I know they are different scales so when they cross they obviously aren't the same.) If so there might be a correlation which might be worth looking into remembering correlation does not equal causation (so the example in the link are just laughable)
The prioritisation just shows nothing.
The scatterplot shows nothing
The indexed chart does make sense and in this case I would agree would be better.
Also the scaling - in this case the original had reasonable scaling but it can be manipulated. The changes could be small enough to be random fluctuations on one series and so no real match.
However the graph does show that a slightly deeper look would be worthwhile - even if it is a very quick one to see that the data is manipulated e.g. climate deniers graphs of temperature all starting on the same year. If you change the starting year you got rather different results.
Solution 4 is so hilariously bad I am shocked it was suggested. Building a 2d landscape where the time dimension seems to take a random walk made laugh a lot. Ignoring the standard convention of "independent variable on x-axis" and instead embedding it as datapoints is a particularly clever way to obfuscate the data and confuse the reader.
I don't agree. It's a great way to visualise data when you want to focus on a trend. It makes it very obvious which "direction" is the data heading. But of course it is not very often used, is not a great fit for every use case (in particular, bad for the data in OP) and may be confusing when seen first time.
Plotting the average or top percentile latency of an API on the left axis and the number of calls to that API on the right is pretty much standard practice where I work. I would argue it makes things more clear. You get to see exactly how the latency changed as the traffic does, or where more noise is visible because the traffic was low.
Because both scales are using completely different units it's more difficult to confuse the two.
I have nothing against them. Please note, I edited my comment to change "dual axis charts" (common spreadsheet terminology) to "plots with two different scales in the same direction," which more accurately describes the plots with which the OP -- and I -- disagree.
For readers with more interest or who are numerate in their day jobs (engineers, finance, or economists), dual axis charts can often be a great choice.
This is better graph style advice from the Economist, which includes good dual axis examples and one bad one and how to correct it. https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8...
Since we are engineers or founders trying to deal with very complex systems, adding detail and clarity like the Economist or Edward Tufte does is the better way to go.