[QUESTION] Spawned processes and memory usage #25

freitasskeeled · 2019-01-11T16:07:37Z

Hi there,

I'm using your library to get predictions based on a trained model.

Here is the code sequence when I start the program:

Load all the necessary libraries, .RData files and other binary files
Declare my endpoint where the handler function sources a file where the appropriate code is

After this, I have one process using about 350MB of memory (which is normal for this application).

Then, I do a request and a second process spawns using the same amount of memory (350MB).

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

Sorry for the long text and probably some of the questions are basic ones but my knowledge in R is not really extensive.

Thank you!

The text was updated successfully, but these errors were encountered:

dselivanov · 2019-01-11T18:00:21Z

Hi!

Load all the necessary libraries, .RData files and other binary files

Declare my endpoint where the handler function sources a file where the appropriate code is

I would suggest not to 'source; anything on every request, but source all the code during application start.

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

How do you measure memory usage? It's true that each process is handled in a separate fork. All processes share memory and follow copy-on-write semantics. But there will be some small amount of memory which is not shared (because when you handle requests each process likely will allocate memory during calculation).

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

This is related to s-u/Rserve#111. I would suggest to put proxy behind RestRserve/Rserve and configure it to close connection after each request. I use HAproxy for that (see option http-server-close in HAproxy configuration)

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

I usually deploy it with Rscript api.R inside docker.

freitasskeeled · 2019-01-12T12:09:29Z

Hi,

Thanks for you quick reply and help provided!

So, just a quick feedback:

You are right, not sure why I was doing this way.
I'm using tools like ps/htop. Obviously, I'm expecting each process to have a small amount of memory allocated for its own processing but as far I can tell, each process is having the initial state copied. If you know a different tool that allows me to check the shared memory usage, let me know please.
I see. But the process that was created to handle a request and stays there after the response is sent will be able to handle a future request? Or will the process remain there found nothing?
Sounds good. Without any argument?

Thank you again.

Cheers

freitasskeeled · 2019-01-12T12:12:23Z

(sorry, I closed the issue by mistake)

dselivanov · 2019-01-14T08:56:58Z

The problem with memory is likely related to your code. Consider following example where on each request we make a dot product between 800mb matrix and a vector.

library(RestRserve)

n = 1e5
m = 1e3
mat = matrix(runif(m * n), nrow = n)
# object.size(mat) / 1e6
# around 800mb

app = RestRserveApplication$new()
app$add_get(
  "/tst",
  function(req, res) {
    v = runif(m)
    dummy = m %*% v
    res$body = as.character(Sys.getpid())
    forward()
  }
)
app$run(8080)

Now create a container:

FROM dselivanov/restrserve:0.1.5
COPY app.R /
CMD ["Rscript", "/app.R"]

docker build -t tst .

Run it with memory limited to 1g:

docker run -p 8080:8080 -m='1g' -it tst

And stress test now with 16 thread using apib tool:

apib -c 16 -d 3 http://127.0.0.1:8080/tst

You will see that it successfully run several concurrent requests using 1g container memory constraint. If processes would not share memory it won't be possible.

dselivanov · 2019-01-14T09:03:36Z

I see. But the process that was created to handle a request and stays there after the response is sent will be able to handle a future request? Or will the process remain there found nothing?

From what I've seen - yes, but only from the same client.

freitasskeeled · 2019-01-14T09:06:58Z

Thank you for explanation and example!

Can you clarify what you mean by "same client"?

dselivanov · 2019-01-14T09:14:56Z

See here https://en.m.wikipedia.org/wiki/HTTP_persistent_connection пн, 14 янв. 2019 г., 13:06 Rui Freitas [email protected]:

…

Thank you for explanation and example! Can you clarify what you mean by "same client"? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3dF6PDFU570Dn7sM7BRZbiQNWTKXks5vDEiygaJpZM4Z7vRv> .

freitasskeeled · 2019-01-15T09:45:19Z

Thank you for the support!

I'll have a look into what you have sent.

Just a quick non related question: do you know what may be causing this error?
Error in (function (..., config.file = "/etc/Rserve.conf") : ignoring SIGPIPE signal Calls: <Anonymous> -> do.call -> <Anonymous> Execution halted

It seems something related with Rserve and not by my code.

Thanks.

dselivanov · 2019-01-15T11:10:20Z

yes, this is Rserve related issue - see here s-u/Rserve#121
From my experience it can be ignored - happen after request served.

freitasskeeled · 2019-01-15T11:13:33Z

Thanks again for the help and support!

Keep up the good work.

long-do · 2020-08-20T11:39:32Z

Hello @freitasskeeled,
I have the same error message but without solution found on Google for RestRserve.

long-do · 2020-08-20T12:03:16Z

Hello @dselivanov,
I could not find an appropriate solution even after read the issue invoked by @freitasskeeled .

Here is my R script put in systemd as micro-service:

#!/usr/bin/env Rscript

# define external arguments when calling this R script by bash commands
args <- commandArgs(trailingOnly = TRUE)


## ---- load packages ----
tryCatch({
  library(RestRserve)
  library(pool)
},
error = function(error_detail) {
  install.packages(c("RestRserve", "pool"))
})

## ---- create pool connection to database ----
create_pool_conn <- function() {
  ...
  ...
}
pool_conn <- create_pool_conn()
pool::dbListTables(pool_conn)  # it is important to run in the pool for the first time


## ---- create application -----
app <- RestRserve::Application$new()


## ---- create handler for the HTTP requests ----
my-function <- function(request, response) {
  # foo is an R package of my algorithm
  response$body <- foo::bar(input = request$body,
                                              pool = pool_conn)
  response$content_type <- "application/json"
}


## ---- register endpoints and corresponding R handlers ----
app$add_post(path = "my-api-endpoint", 
                        FUN = my-function)

app$add_openapi(
  path = "/openapi.yaml",
  file_path = "openapi.yaml"
)

# see details on https://swagger.io/tools/swagger-ui/
app$add_swagger_ui(
  path = "/swagger",
  path_openapi = "/openapi.yaml",
  path_swagger_assets = "/swagger/assets/",
  file_path = tempfile(fileext = ".html"),
  use_cdn = FALSE
)


## ---- start application ----
backend <- RestRserve::BackendRserve$new()
backend$start(app, http_port = 8060)

What should I improve, please tell me? Thank you very much.

long-do · 2020-08-26T09:31:13Z

Hi @dselivanov ,
Thank you for your advice mentioned in #23 .
I think I did not well explain my issue, so I try to be clearer.
The connection pool I used was declared in the parent process & is working very well in forked/children processes.
The problem is encountered when I benchmarked my API using Jmeter simulating 100 (and more) concurrent queries.
The API replies indeed in a multi-threading mode (from 12 to 15 threads //, depending on server power).
My real issue is the sub-processes were not killed, so all the server memory is quickly occupied and hang out.
I must in this case manually killed all these forked processes, which is not a solution when deploying on pre-production or production.
Would you have a solution for this issue, please?
Thank you very much.
Best regards,

dselivanov · 2020-08-27T11:02:09Z

I honestly don't see how pool can work (may be by a chance) and suggest you to not rely on it. The answer to the issue when connections are not closed is following.

either explicitly close TCP connection on client side
or use proxy behind RestRserve (such as HAproxy or nginx) and let proxy
- either forcefully close connections after each request
- or keep a persistent pool of connections from proxy to RestRserve

mrchypark · 2021-08-25T08:39:49Z

How turn off Error in (function (..., config.file = "/etc/Rserve.conf") : ignoring SIGPIPE signal Calls: <Anonymous> -> do.call -> <Anonymous> Execution halted message if it's not problem?

freitasskeeled closed this as completed Jan 12, 2019

freitasskeeled reopened this Jan 12, 2019

freitasskeeled closed this as completed Jan 15, 2019

dselivanov added the FAQ label Jan 19, 2019

dselivanov mentioned this issue Aug 6, 2020

continuously post eat all memory and server crash #159

Closed

dselivanov added the Rserve label Aug 6, 2020

long-do mentioned this issue Aug 20, 2020

Add DB connection pool helper #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Spawned processes and memory usage #25

[QUESTION] Spawned processes and memory usage #25

freitasskeeled commented Jan 11, 2019

dselivanov commented Jan 11, 2019

freitasskeeled commented Jan 12, 2019 •

edited

Loading

freitasskeeled commented Jan 12, 2019

dselivanov commented Jan 14, 2019 •

edited

Loading

dselivanov commented Jan 14, 2019

freitasskeeled commented Jan 14, 2019

dselivanov commented Jan 14, 2019 via email

freitasskeeled commented Jan 15, 2019

dselivanov commented Jan 15, 2019

freitasskeeled commented Jan 15, 2019

long-do commented Aug 20, 2020

long-do commented Aug 20, 2020 •

edited by dselivanov

Loading

long-do commented Aug 26, 2020

dselivanov commented Aug 27, 2020 •

edited

Loading

mrchypark commented Aug 25, 2021 •

edited

Loading

[QUESTION] Spawned processes and memory usage #25

[QUESTION] Spawned processes and memory usage #25

Comments

freitasskeeled commented Jan 11, 2019

dselivanov commented Jan 11, 2019

freitasskeeled commented Jan 12, 2019 • edited Loading

freitasskeeled commented Jan 12, 2019

dselivanov commented Jan 14, 2019 • edited Loading

dselivanov commented Jan 14, 2019

freitasskeeled commented Jan 14, 2019

dselivanov commented Jan 14, 2019 via email

freitasskeeled commented Jan 15, 2019

dselivanov commented Jan 15, 2019

freitasskeeled commented Jan 15, 2019

long-do commented Aug 20, 2020

long-do commented Aug 20, 2020 • edited by dselivanov Loading

long-do commented Aug 26, 2020

dselivanov commented Aug 27, 2020 • edited Loading

mrchypark commented Aug 25, 2021 • edited Loading

freitasskeeled commented Jan 12, 2019 •

edited

Loading

dselivanov commented Jan 14, 2019 •

edited

Loading

long-do commented Aug 20, 2020 •

edited by dselivanov

Loading

dselivanov commented Aug 27, 2020 •

edited

Loading

mrchypark commented Aug 25, 2021 •

edited

Loading