Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSSION: .progress argument for map-reduce wrappers #85

Open
DavisVaughan opened this issue Aug 11, 2020 · 1 comment
Open

DISCUSSION: .progress argument for map-reduce wrappers #85

DavisVaughan opened this issue Aug 11, 2020 · 1 comment

Comments

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Aug 11, 2020

Related discussions in #83, #3, #78

After discussing #83 and seeing that it might be a little more work than expected for users to update to progressr from my hacky .progress = TRUE argument in furrr, I am reconsidering keeping the .progress argument, but having it use progressr internally. It would be focused on signaling the simplest progress update possible, and would not wrap with_progress() as #78 does. That would still be left to the user.

I am curious about your thoughts on this.

It would work like this:

library(dplyr)
library(tidyr)
library(furrr)
library(progressr)

df <- nest(mtcars, data = -gear)

# Without dplyr
with_progress({
  future_map(df$data, nrow, .progress = TRUE)
})

# In a pipeline
df %>%
  mutate(x = with_progress(future_map(data, nrow, .progress = TRUE)))

# The old way, for comparison
df %>%
  mutate(x = future_map(data, nrow, .progress = TRUE))

I am also considering letting .progress take FALSE/TRUE and some other type of argument that would control how the internal progressor is created, just to give it a little more flexibility, but I'm not quite sure yet. I might start simpler and do that if it is requested.

Pros of this:

  • I think it is a little bit easier to use vs creating the p <- progressor() object and ticking it manually with p() inside the .f function. As we've seen in Adding a progress bar by default in a package #78, it seems to be a little tricky to get it exactly right, especially with dplyr.

  • The default of .progress = FALSE will have zero overhead

  • This approach still allows users to create a progressor object themselves and tick it manually, as displayed in Adding a progress bar by default in a package #78.

  • Perhaps this is a longer horizon "pro", but with a global calling handler we could completely remove the need for with_progress() and it would look exactly like the original furrr implementation.

Cons of this:

  • It requires managing a .progress option in furrr, but I think that is okay.

  • It is a little strange to have .progress = TRUE which signals progress updates, but then you also have to wrap it in with_progress() to see the updates. It also requires a little documentation to explain this, but it could go right in the docs for the .progress argument.

  • It would call p() on every element of .x, which might be overkill and result in performance issues. This is related to Limit the number of progression conditions signaled #81, but if it was a real issue then users could use a custom progressor object and control the progress updates themselves.

  • Nested progress updates are still an issue, but I think that is a separate issue from this one, and whatever improvements come there would flow naturally into furrr.

  • Technically furrr's .progress = TRUE argument "worked" on multicore, and this currently won't, but I can live with that.

@HenrikBengtsson
Copy link
Owner

I and @DavisVaughan talked offline, but for those who might follow/see this, lots of these comments/ideas bring up the issues that relate to trying to identify exactly what progressr should and should not do and how it should be done. These are design problems that are not straightforward to decide so it'll take some time.

Regarding:

  • Technically furrr's .progress = TRUE argument "worked" on multicore, and this currently won't, but I can live with that.

This one is "simple" - support near-live progress updates using the multicore future backend can and will be implemented in the future package. The first iteration will probably relay such progress information via the local file system shared by the workers and the main R session. Either way, it's independent of the progressr package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants