Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: "Zoom in" on height axis for distribution tail visualizations #60

Open
hwaight opened this issue Aug 1, 2020 · 0 comments

Comments

@hwaight
Copy link

hwaight commented Aug 1, 2020

I'm trying to create a ridgeline plot where I display the full distribution and the tail of the distribution side-by-side. I am requesting a feature where you can "zoom in" on the tail of distributions without altering the estimated density function. You can currently do this for the x-axis using coord_cartesian, but it is not possible at the moment to do that on the height of each distribution. This feature would be really helpful for people who are working with large datasets which follow power law-like distributions in their tails and who want to visualize the extremes of their distributions. This is a common situation in working with text data, for example. I've created two examples below:

  • An example of the feature I would like ggridges to have, but with a single density plot
  • An example of what ggridges can currently do (as far as I've been able to tell)

Single Density Example

library("gridExtra")

## iris data set, full distribution of viriginica Sepal.Width
plot1 <- ggplot(iris %>% filter(Species == "virginica"), 
                aes(x = Sepal.Width)) +
                 geom_density()  

plot2 <-  ggplot(iris %>% filter(Species == "virginica"), 
                 aes(x = Sepal.Width)) +
                   geom_density()  +
                  coord_cartesian(xlim = c(3.5, 3.8),
                                   ylim = c(0, .5))

grid.arrange(plot1, plot2, ncol = 2)

image

The figure shows that the tail of the density has not been re-normalized. It maintains the shape and area from the original figure, we've just zoomed in on both the x and y axis.

GG Ridges Examples

In this example I've added an additional aesthetic mapping, as that helps underscore why this would be helpful. I've created a binary variable to map to the fill aesthetic.

## binary variable 
iris$norm <- rnorm(150)
iris$norm_bin <- ifelse(iris$norm < 0,
                        "Less than 0",
                        "Greater than 0")


plot1 <- ggplot(iris,  aes(y = Species)) +
  geom_density_ridges(aes(x = Sepal.Width,
                          fill = norm_bin),
                      alpha = .5) +
  theme_ridges(grid = FALSE,
               center_axis_labels = TRUE) +
  theme(legend.position = "left",
        axis.title.y = element_blank()) + 
  ggtitle("Center of Distribution")


plot2 <- ggplot(iris,  aes(y = Species)) +
  geom_density_ridges(aes(x = Sepal.Width,
                          fill = norm_bin),
                      alpha = .5) +
  theme_ridges(grid = FALSE,
               center_axis_labels = TRUE) +
  theme(legend.position = "none",
        axis.title.y = element_blank()) + 
  coord_cartesian(xlim = c(4, 5)) +
  ggtitle("Tail of Distribution")

grid.arrange(plot1, plot2, ncol = 2)

image

With ggridges you can't zoom in on the height of each ridgeline, as coord_cartesian() only accepts x and y limits. Here it doesn't really matter because there are so few observations (so you can still see and make sense of the tails), but once we increase the observations as well as the number of ridgelines it becomes difficult.

If there could be a feature built that would allow you to zoom in on the "height" of each ridgeline it would be really helpful.

Thanks! And thanks for building such a fantastic package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant