Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
+ - 0:00:00
Notes for current slide
Notes for next slide

36-613: Data Visualization

More High Dimensional Data and Shiny

Professor Ron Yurko

9/28/2022

1

Consider the following spiral structure...

2

PCA simply rotates the data...

3

Nonlinear dimension reduction with t-SNE and UMAP

Both t-SNE and UMAP look at the local distances between points in the original p-dimensional space and try to reproduce them in a lower k-dimensional space

4

t-SNE: t-distributed stochastic neighbor embedding

  • Construct conditional probability for similarity between observations in original space

    • i.e., probability xi will pick xj as its neighbor

pji=exp(xixj2/2σ2i)kiexp(xixk2/2σ2i),pij=(pji+pij)2n

  • σi is the variance of Gaussian centered at xi controlled by perplexity: log( perplexity )=jpjilog2pji

    • loosely interpreted as the number of close neighbors to consider for each point
5

t-SNE: t-distributed stochastic neighbor embedding

  • Construct conditional probability for similarity between observations in original space

    • i.e., probability xi will pick xj as its neighbor

pji=exp(xixj2/2σ2i)kiexp(xixk2/2σ2i),pij=(pji+pij)2n

  • σi is the variance of Gaussian centered at xi controlled by perplexity: log( perplexity )=jpjilog2pji

    • loosely interpreted as the number of close neighbors to consider for each point
  • Find points yi in lower dimensional space with symmetrized student t-distribution

qji=(1+yiyj2)1ki(1+yiyk2)1,qij=qij+qji2n

  • Match conditional probabilities by minimize sum of KL divergences C=ijpijlog(pijqij)
5

Starbucks t-SNE plot

Use Rtsne package

set.seed(2013)
tsne_fit <- starbucks %>%
dplyr::select(serv_size_m_l:caffeine_mg) %>%
scale() %>%
Rtsne(check_duplicates = FALSE)
starbucks %>%
mutate(tsne1 = tsne_fit$Y[,1],
tsne2 = tsne_fit$Y[,2]) %>%
ggplot(aes(x = tsne1, y = tsne2,
color = size)) +
geom_point(alpha = 0.5) +
labs(x = "t-SNE 1", y = "t-SNE 2")

6

Starbucks t-SNE plot - involves randomness!

Depends on the random starting point!

set.seed(2014)
tsne_fit <- starbucks %>%
dplyr::select(serv_size_m_l:caffeine_mg) %>%
scale() %>%
Rtsne(check_duplicates = FALSE)
starbucks %>%
mutate(tsne1 = tsne_fit$Y[,1],
tsne2 = tsne_fit$Y[,2]) %>%
ggplot(aes(x = tsne1, y = tsne2,
color = size)) +
geom_point(alpha = 0.5) +
labs(x = "t-SNE 1", y = "t-SNE 2")

7

Starbucks t-SNE plot - watch the perplexity!

set.seed(2013)
tsne_fit <- starbucks %>%
dplyr::select(serv_size_m_l:caffeine_mg) %>%
scale() %>%
Rtsne(perplexity = 100,
check_duplicates = FALSE)
starbucks %>%
mutate(tsne1 = tsne_fit$Y[,1],
tsne2 = tsne_fit$Y[,2]) %>%
ggplot(aes(x = tsne1, y = tsne2,
color = size)) +
geom_point(alpha = 0.5) +
labs(x = "t-SNE 1", y = "t-SNE 2")
  • Increases with more data

  • Should not be bigger than n13

8

Back to the spirals: results depend on perplexity!

9

Criticisms of t-SNE plots

  • Poor scalability: does not scale well for large data, can practically only embed into 2 or 3 dimensions

  • Meaningless global structure: distance between clusters might not have clear interpretation and cluster size doesn’t have any meaning to it

  • Poor performance with very high dimensional data: need PCA as pre-dimension reduction step

  • Sometime random noise can lead to false positive structure in the t-SNE projection

  • Can NOT interpret like PCA!

10

Interactive web apps with Shiny

Shiny is a framework to interactive web applications and dynamic dashboards in R

You do NOT need to be a web developer to create Shiny apps, you just need to learn some additional syntax to augment your R code

11

Interactive web apps with Shiny

Shiny is a framework to interactive web applications and dynamic dashboards in R

You do NOT need to be a web developer to create Shiny apps, you just need to learn some additional syntax to augment your R code

Every Shiny app consists of two scripts (could also be saved into one file app.R but that's annoying)

  1. ui.R: controls user interface, sets up the display, widgets for user input

    • contains more code specific to Shiny
  2. server.R: code to generate / display the results! Communicates with ui.R with reactive objects: processes user input to return output

    • will contain more traditional R code: load packages, data wrangling, create plots
11

Interactive web apps with Shiny

Shiny is a framework to interactive web applications and dynamic dashboards in R

You do NOT need to be a web developer to create Shiny apps, you just need to learn some additional syntax to augment your R code

Every Shiny app consists of two scripts (could also be saved into one file app.R but that's annoying)

  1. ui.R: controls user interface, sets up the display, widgets for user input

    • contains more code specific to Shiny
  2. server.R: code to generate / display the results! Communicates with ui.R with reactive objects: processes user input to return output

    • will contain more traditional R code: load packages, data wrangling, create plots

Can be run locally or deployed on a Shiny app server for public viewing

11

DO IT LIVE

12

Next time: Maps

HW4 due today! HW5 due next Wednesday and Graphics Critique / Replication #2 due Friday Oct 7th!

Recommended reading:

How to Use t-SNE Effectively

Understanding UMAP

Shiny tutorials

Shiny Gallery

13

Consider the following spiral structure...

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
Esc Back to slideshow