class: center, middle, inverse, title-slide .title[ # 36-613: Data Visualization ] .subtitle[ ## Trends and Animations ] .author[ ### Professor Ron Yurko ] .date[ ### 10/10/2022 ] --- # Please fill out the FCE! #### Faculty course evaluations (FCEs) are out now #### I take feedback very seriously and I want the course to be useful! #### If you enjoyed this class, please fill out the FCE. #### If you didn’t enjoy this class, please fill out the FCE. --- ## Longitudinal data and time series structure - For now, consider a _single observation_ measured across time - Time series data usually has the following structure: | Variable | `\(T_1\)` | `\(T_2\)` | `\(\dots\)` | `\(T_J\)` | | ---------- | -------- | -------- | -------- | -------- | | `\(X_1\)` | `\(x_{11}\)` | `\(x_{12}\)` | `\(\dots\)` | `\(x_{1J}\)` | | `\(X_2\)` | `\(x_{21}\)` | `\(x_{22}\)` | `\(\dots\)` | `\(x_{2J}\)` | | `\(\vdots\)` | `\(\vdots\)` | `\(\vdots\)` | `\(\dots\)` | `\(\vdots\)` | | `\(X_P\)` | `\(x_{P1}\)` | `\(x_{P2}\)` | `\(\dots\)` | `\(x_{PJ}\)` | -- - With `\(N\)` observations we have `\(N\)` of these matrices -- - Time may consist of regularly spaced intervals - For example, `\(T_1 = t\)`, `\(T_2 = t + h\)`, `\(T_3 = t + 2h\)`, etc. - It could also consist of irregularly spaced intervals. Then have to work with the raw `\(T_1,T_2,...\)` --- ## Demo example: Statistics PhDs by year .pull-left[ ```r stat_phd_year_summary %>% ggplot(aes(x = year, y = n_phds)) + geom_point() + scale_x_continuous( * breaks = unique(stat_phd_year_summary$year), * labels = unique(stat_phd_year_summary$year)) + theme_bw() + labs(x = "Year", y = "Number of PhDs", title = "Number of Statistics-related PhDs awarded over time") ``` - Typical scatterplot display with `n_phds` on the y-axis and `year` on the x-axis: - Manually set x-axis breaks to show every year ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Add lines to emphasize order .pull-left[ ```r stat_phd_year_summary %>% ggplot(aes(x = year, y = n_phds)) + geom_point() + * geom_line() + scale_x_continuous( breaks = unique(stat_phd_year_summary$year), labels = unique(stat_phd_year_summary$year)) + theme_bw() + labs(x = "Year", y = "Number of PhDs", title = "Number of Statistics-related PhDs awarded over time") ``` ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Drop points to emphasize trends <img src="figs/Lec12/only-line-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Can fill the area under the line .pull-left[ ```r stat_phd_year_summary %>% ggplot(aes(x = year, y = n_phds)) + * geom_area(fill = "darkblue", alpha = 0.5) + geom_line() + scale_x_continuous( breaks = unique(stat_phd_year_summary$year), labels = unique(stat_phd_year_summary$year)) + theme_bw() + labs(x = "Year", y = "Number of PhDs", title = "Number of Statistics-related PhDs awarded over time") ``` - __Only appropriate when the y-axis starts at 0!__ - It changes the y-axis by default to start at 0 - Also a redundant use of ink... ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Plotting several time series - do NOT only use points .pull-left[ ```r stats_phds %>% ggplot(aes(x = year, y = n_phds, * color = field)) + geom_point() + scale_x_continuous(breaks = unique(stat_phd_year_summary$year), labels = unique(stat_phd_year_summary$year)) + theme_bw() + theme(legend.position = "bottom", legend.text = element_text(size = 7)) + labs(x = "Year", y = "Number of PhDs", title = "Number of Statistics-related PhDs awarded over time", color = "Field") ``` - __We should NOT display multiple time series with just points!__ ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Plotting several time series - use lines! .pull-left[ ```r stats_phds %>% ggplot(aes(x = year, y = n_phds, * color = field)) + * geom_line() + scale_x_continuous(breaks = unique(stat_phd_year_summary$year), labels = unique(stat_phd_year_summary$year)) + theme_bw() + theme(legend.position = "bottom", legend.text = element_text(size = 7)) + labs(x = "Year", y = "Number of PhDs", title = "Number of Statistics-related PhDs awarded over time", color = "Field") ``` - Lines alleviate issue of times series running into each other which is difficult to read with points ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Use [`ggrepel`](https://ggrepel.slowkow.com/articles/examples.html) to directly label lines <img src="figs/Lec12/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> --- ## How do we plot many lines? NOT LIKE THIS! <img src="figs/Lec12/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Instead we highlight specific lines <img src="figs/Lec12/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> --- ## What about Nightingale's rose diagram? <img src="https://daily.jstor.org/wp-content/uploads/2020/08/florence_nightingagle_data_visualization_visionary_1050x700.jpg" width="90%" style="display: block; margin: auto;" /> --- ## What about Nightingale's rose diagram? <img src="figs/Lec12/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> --- ## What about displaying lines instead? <img src="figs/Lec12/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Storytelling with animation... .pull-left[ ```r f1_data_ex %>% ggplot(aes(x = round, y = points, group = name, color = name)) + geom_line(size = 2) + scale_x_continuous(breaks = seq(1, 17, 1)) + labs(title = "The race for third place in the 2020 F1 season", y = "Accumulated points", x = NULL) + theme_bw() ``` - Can see the accumulated points increasing over time for each team - But we could _incrementally_ reveal the results at each stage __to emphasize the story of progression__ ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Use [`gganimate`](https://gganimate.com/) to add animations .pull-left[ ```r library(gganimate) f1_data_ex %>% ggplot(aes(x = round, y = points, group = name, color = name)) + geom_line(size = 2) + scale_x_continuous(breaks = seq(1, 17, 1)) + labs(title = "The race for third place in the 2020 F1 season", y = "Accumulated points", x = NULL) + theme_bw() + # Reveal the results by round * transition_reveal(round) ``` - Emphasize the intermediate results through animation with the `transition_reveal()` function ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-15-1.gif" width="100%" style="display: block; margin: auto;" /> ] --- ## Using animation to add a dimension .pull-left[ ```r txhousing %>% group_by(city, year) %>% summarize(median = mean(median, na.rm = TRUE), listings = mean(listings, na.rm = TRUE)) %>% ggplot(aes(x = median, y = listings, color = (city == "Houston"), size = (city == "Houston"))) + geom_point(alpha = 0.5, show.legend = FALSE) + scale_color_manual(values = c("black", "darkred")) + scale_size_manual(values = c(2, 4)) + scale_x_continuous(labels = scales::dollar, name = "Median Price") + scale_y_continuous(labels = scales::label_number_si()) + theme_bw() + labs(x = "Median Price", y = "Avg. of Monthly Listings", subtitle = "Houston in red") ``` ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Using animation to add a dimension .pull-left[ ```r txhousing %>% group_by(city, year) %>% summarize(median = mean(median, na.rm = TRUE), listings = mean(listings, na.rm = TRUE)) %>% ggplot(aes(x = median, y = listings, color = (city == "Houston"), size = (city == "Houston"))) + geom_point(alpha = 0.5, show.legend = FALSE) + scale_color_manual(values = c("black", "darkred")) + scale_size_manual(values = c(2, 4)) + scale_x_continuous(labels = scales::dollar, name = "Median Price") + scale_y_continuous(labels = scales::label_number_si()) + theme_bw() + labs(x = "Median Price", y = "Avg. of Monthly Listings", subtitle = "Houston in red", * title = "Year: {frame_time}") + * transition_time(year) ``` ] .pull-right[ <img src="figs/Lec12/unnamed-chunk-17-1.gif" width="100%" style="display: block; margin: auto;" /> ] --- # Reminders about animation Some key points to think about before adding animation to a visualization: 1. Always make and describe the original / base graphic first that does NOT include animation. 2. Before adding animation to the graph, ask yourself: How would animation give you additional insights about the data **that you would otherwise not be able to**? 3. Never add animation just because it's cool! 4. When presenting, make sure you explain exactly what is being displayed with animation and what within the animation you want to emphasize. This will help you determine if animation is actually worth including. --- class: center, middle # Next time: [`htmlwidgets`](http://www.htmlwidgets.org/index.html) and dashboards __Report due Friday Oct 14th by 5 PM EDT via email!__ Recommended reading: [CW CH 13 Visualizing time series and other functions of an independent variable](https://clauswilke.com/dataviz/time-series.html) [CW CH 14 Visualizing trends](https://clauswilke.com/dataviz/visualizing-trends.html) [`gganimate` package](https://gganimate.com/)