Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Spawned processes and memory usage #25

Closed
freitasskeeled opened this issue Jan 11, 2019 · 15 comments
Closed

[QUESTION] Spawned processes and memory usage #25

freitasskeeled opened this issue Jan 11, 2019 · 15 comments

Comments

@freitasskeeled
Copy link

Hi there,

I'm using your library to get predictions based on a trained model.

Here is the code sequence when I start the program:

  1. Load all the necessary libraries, .RData files and other binary files
  2. Declare my endpoint where the handler function sources a file where the appropriate code is

After this, I have one process using about 350MB of memory (which is normal for this application).

Then, I do a request and a second process spawns using the same amount of memory (350MB).

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

Sorry for the long text and probably some of the questions are basic ones but my knowledge in R is not really extensive.

Thank you!

@dselivanov
Copy link
Collaborator

Hi!

  1. Load all the necessary libraries, .RData files and other binary files
  2. Declare my endpoint where the handler function sources a file where the appropriate code is

I would suggest not to 'source; anything on every request, but source all the code during application start.

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

How do you measure memory usage? It's true that each process is handled in a separate fork. All processes share memory and follow copy-on-write semantics. But there will be some small amount of memory which is not shared (because when you handle requests each process likely will allocate memory during calculation).

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

This is related to s-u/Rserve#111. I would suggest to put proxy behind RestRserve/Rserve and configure it to close connection after each request. I use HAproxy for that (see option http-server-close in HAproxy configuration)

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

I usually deploy it with Rscript api.R inside docker.

@freitasskeeled
Copy link
Author

freitasskeeled commented Jan 12, 2019

Hi,

Thanks for you quick reply and help provided!

So, just a quick feedback:

  1. You are right, not sure why I was doing this way.

  2. I'm using tools like ps/htop. Obviously, I'm expecting each process to have a small amount of memory allocated for its own processing but as far I can tell, each process is having the initial state copied. If you know a different tool that allows me to check the shared memory usage, let me know please.

  3. I see. But the process that was created to handle a request and stays there after the response is sent will be able to handle a future request? Or will the process remain there found nothing?

  4. Sounds good. Without any argument?

Thank you again.

Cheers

@freitasskeeled
Copy link
Author

(sorry, I closed the issue by mistake)

@dselivanov
Copy link
Collaborator

dselivanov commented Jan 14, 2019

The problem with memory is likely related to your code. Consider following example where on each request we make a dot product between 800mb matrix and a vector.

library(RestRserve)

n = 1e5
m = 1e3
mat = matrix(runif(m * n), nrow = n)
# object.size(mat) / 1e6
# around 800mb

app = RestRserveApplication$new()
app$add_get(
  "/tst",
  function(req, res) {
    v = runif(m)
    dummy = m %*% v
    res$body = as.character(Sys.getpid())
    forward()
  }
)
app$run(8080)

Now create a container:

FROM dselivanov/restrserve:0.1.5
COPY app.R /
CMD ["Rscript", "/app.R"]
docker build -t tst .

Run it with memory limited to 1g:

docker run -p 8080:8080 -m='1g' -it tst

And stress test now with 16 thread using apib tool:

apib -c 16 -d 3 http://127.0.0.1:8080/tst

You will see that it successfully run several concurrent requests using 1g container memory constraint. If processes would not share memory it won't be possible.

@dselivanov
Copy link
Collaborator

I see. But the process that was created to handle a request and stays there after the response is sent will be able to handle a future request? Or will the process remain there found nothing?

From what I've seen - yes, but only from the same client.

@freitasskeeled
Copy link
Author

Thank you for explanation and example!

Can you clarify what you mean by "same client"?

@dselivanov
Copy link
Collaborator

dselivanov commented Jan 14, 2019 via email

@freitasskeeled
Copy link
Author

Thank you for the support!

I'll have a look into what you have sent.

Just a quick non related question: do you know what may be causing this error?
Error in (function (..., config.file = "/etc/Rserve.conf") : ignoring SIGPIPE signal Calls: <Anonymous> -> do.call -> <Anonymous> Execution halted

It seems something related with Rserve and not by my code.

Thanks.

@dselivanov
Copy link
Collaborator

yes, this is Rserve related issue - see here s-u/Rserve#121
From my experience it can be ignored - happen after request served.

@freitasskeeled
Copy link
Author

Thanks again for the help and support!

Keep up the good work.

@long-do
Copy link

long-do commented Aug 20, 2020

Hello @freitasskeeled,
I have the same error message but without solution found on Google for RestRserve.

@long-do
Copy link

long-do commented Aug 20, 2020

Hello @dselivanov,
I could not find an appropriate solution even after read the issue invoked by @freitasskeeled .

Here is my R script put in systemd as micro-service:

#!/usr/bin/env Rscript

# define external arguments when calling this R script by bash commands
args <- commandArgs(trailingOnly = TRUE)


## ---- load packages ----
tryCatch({
  library(RestRserve)
  library(pool)
},
error = function(error_detail) {
  install.packages(c("RestRserve", "pool"))
})

## ---- create pool connection to database ----
create_pool_conn <- function() {
  ...
  ...
}
pool_conn <- create_pool_conn()
pool::dbListTables(pool_conn)  # it is important to run in the pool for the first time


## ---- create application -----
app <- RestRserve::Application$new()


## ---- create handler for the HTTP requests ----
my-function <- function(request, response) {
  # foo is an R package of my algorithm
  response$body <- foo::bar(input = request$body,
                                              pool = pool_conn)
  response$content_type <- "application/json"
}


## ---- register endpoints and corresponding R handlers ----
app$add_post(path = "my-api-endpoint", 
                        FUN = my-function)

app$add_openapi(
  path = "/openapi.yaml",
  file_path = "openapi.yaml"
)

# see details on https://swagger.io/tools/swagger-ui/
app$add_swagger_ui(
  path = "/swagger",
  path_openapi = "/openapi.yaml",
  path_swagger_assets = "/swagger/assets/",
  file_path = tempfile(fileext = ".html"),
  use_cdn = FALSE
)


## ---- start application ----
backend <- RestRserve::BackendRserve$new()
backend$start(app, http_port = 8060)

What should I improve, please tell me? Thank you very much.

@long-do
Copy link

long-do commented Aug 26, 2020

Hi @dselivanov ,
Thank you for your advice mentioned in #23 .
I think I did not well explain my issue, so I try to be clearer.
The connection pool I used was declared in the parent process & is working very well in forked/children processes.
The problem is encountered when I benchmarked my API using Jmeter simulating 100 (and more) concurrent queries.
The API replies indeed in a multi-threading mode (from 12 to 15 threads //, depending on server power).
My real issue is the sub-processes were not killed, so all the server memory is quickly occupied and hang out.
I must in this case manually killed all these forked processes, which is not a solution when deploying on pre-production or production.
Would you have a solution for this issue, please?
Thank you very much.
Best regards,

@dselivanov
Copy link
Collaborator

dselivanov commented Aug 27, 2020

I honestly don't see how pool can work (may be by a chance) and suggest you to not rely on it. The answer to the issue when connections are not closed is following.

  • either explicitly close TCP connection on client side
  • or use proxy behind RestRserve (such as HAproxy or nginx) and let proxy
    • either forcefully close connections after each request
    • or keep a persistent pool of connections from proxy to RestRserve

@mrchypark
Copy link

mrchypark commented Aug 25, 2021

How turn off Error in (function (..., config.file = "/etc/Rserve.conf") : ignoring SIGPIPE signal Calls: <Anonymous> -> do.call -> <Anonymous> Execution halted message if it's not problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants