The signal

2/21/2023

When you smooth out a line chart, are you doing it because you’ve decided that the daily average is most important and that you don’t care about the distribution or seasonal variation in your peak usage hours? Or are you doing it because it’s the only way you know how to make the jagged lines in your chart go away? In other cases, we need to see more details while trying to find the key insight, but once we’ve finished the analysis and know which features of the data matter most, then aggregation can be a useful tool for focusing attention to communicate that insight.īut every time you aggregate, you make a decision about which features of your data matter and which ones you are willing to drop: which are the signal and which are the noise. There are many robust statistical approaches to effective aggregation and aggregation that can provide valuable context (comparing to median, for example). It doesn’t have to be very “big data” to have more than 1M data points, more than the number of pixels on a basic laptop screen. It can feel overwhelming to handle the quantities of data we now have at our fingertips. However, too often when we visualize data so that we as humans can make sense of it, especially time series data, we make the data smaller and smaller.Īggregation is the default for a reason. We praise the importance of large, rich datasets when we talk about algorithms and teaching machines to learn from data. Visualization by Shan Carter, Zan Armstrong, Mike Freeman, and Ian Johnson. Using the data itself as context by splitting the data into “foreground” and “background,” so the full dataset provides the context necessary to make sense of the specific subset of the data we’re interested in.“winter” or data-defined categories like “high” or “normal” energy usage. Augmenting the data with concepts that matter, like “summer” vs.Rearranging the data to compare “like to like.”.In this article, I’ll start by discussing how aggregation can be problematic, before walking through three specific alternatives to aggregation with before/after examples that illustrate: Behind many of the best practices we recommended for time series analysis was a deeper theme: actually embracing the complexity of the data.Īggregation is the standard best practice for analyzing time series data, but it can create problems by stripping away crucial context so that you’re not even aware of how much potential insight you’ve lost. We found ourselves repeatedly changing how we visualized the data to reveal the underlying signals, rather than treating those signals as noise by following the standard practice of aggregating the hourly data to days, weeks, or months. My colleagues Ian Johnson, Mike Freeman, and I recently collaborated on a series of data-driven stories about electricity usage in Texas and California to illustrate best practices of Analyzing Time Series Data.

These features can lead us to ask better data-driven questions that change how we analyze our data, the parameters we choose for our models, our scientific processes, or our business strategies. Time and time again, I’ve found that by being more specific about what’s important to us and embracing the complexity in our data, we can discover new features in that data. For six years as a data visualization specialist, I’ve helped clients and colleagues discover new features of the data they know best.

For five years as a data analyst, I forecasted and analyzed Google’s revenue.

0 Comments

The signal

Leave a Reply.

Author

Archives

Categories