Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hook or callback for cleaning up on disk file uploads #264

Open
LeSpocky opened this issue Mar 23, 2022 · 10 comments
Open

Add a hook or callback for cleaning up on disk file uploads #264

LeSpocky opened this issue Mar 23, 2022 · 10 comments
Assignees
Labels
feature-request Feature requests or enhancements

Comments

@LeSpocky
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Since 3fd99c8 it's possible to let libhttpserver store files uploaded with forms (mime-type multipart/form-data) to disk, see the pull request #257 for reference. This whole new feature request is about cases where file upload target is set to FILE_UPLOAD_DISK_ONLY or FILE_UPLOAD_MEMORY_AND_DISK.

In certain circumstances it is necessary for an application to not store those files permanently, but remove them from disk again, after processing. Doing that in a render method associated with a registered resource is possible. However the file is always stored to disk, regardless if that render method is called at all. Multiple code paths in webserver::finalize_answer(), mostly error cases, are possible leading to mr->dhrs assigned by something else than the render callback supplied by the application. Simplest case is sending a (possibly forged) request to a not registered resource (HTTP status 404), but others are possible, too. That would mean the cleanup in your render method is not executed.

The actual target we use libhttpserver on is an embedded device with usable RAM smaller than files uploaded, so we have to store those files on disk. Those files can and should be removed again on every request, otherwise the filesystem would fill up quickly. This is especially true for manipulated requests with bad intent. A full filesystem can be considered as a denial of service situation.

Describe why the feature or enhancement you are proposing fits the library.

I propose some extension to libhttpserver for cleaning up all uploaded files of a request at the end of request processing. Thinking of the possibility of optionally hooking up a callback function provided by the application, where user can decide how to cleanup, and call that in an appropriate place.

Describe the solution you'd like

Extend the create_webserver options by a function to set a custom cleanup function. That function would not be called if not set by the application. However if set it could be executed next to MHD_destroy_response(raw_response); at the end of webserver::finalize_answer()?

Describe alternatives you've considered

With the current API of libhttpserver always cleaning up would only be possible by providing custom methods for all of these:

  • internal_error_page
  • method_not_allowed_page
  • not_found_page

And you had to put that same cleanup code into every render method in your application, no matter if it is supposed to handle file uploads at all.

This would lead to lots of duplicated code with high potential to miss some path.

Additional context

Same problem applies to the draft #246 not followed (for good reasons) in favor of #257.

@LeSpocky LeSpocky added the feature-request Feature requests or enhancements label Mar 23, 2022
@ctomahogh
Copy link
Contributor

I totally agree and have to admit, that I completely forgot about this behavior in my merge request.
This could be indeed be a problem if there are lots of files stored onto a disk.

But why should a custom cleanup function from the calling application take care about the cleanup.
As the class file_info and everything else resides inside the library (and is created inside), I would see the cleanup task as task of the library.

In my opinion, the library should always cleanup all uploaded files. Why not put it into the destructor of the file_info class?

Of course there is a use case (i.e. the use case, I implemented this feature for), where we want to have these uploaded files be preserved longer than the actual request.
For this case you would simply need a possibility for the application, to mark some files as to not be deleted. Of course, than the application would be responsible for deleting all the marked files eventually.

@LeSpocky
Copy link
Contributor Author

Of course there is a use case (i.e. the use case, I implemented this feature for), where we want to have these uploaded files be preserved longer than the actual request. For this case you would simply need a possibility for the application, to mark some files as to not be deleted. Of course, than the application would be responsible for deleting all the marked files eventually.

This is what I suspected. Some users want to keep those files, others don't, others maybe only want to keep some files.

What about a default cleanup method built into libhttpserver, which would remove all files, and a possibility to override that cleanup method by the application? Could look somewhat like the already present custom defaulted error messages or custom logging callbacks?

@ctomahogh
Copy link
Contributor

What would a custom cleanup mean?
Would this have to be registered on webserver creation and then take effect on every request? Or could this be set on each request individually?

Problem I see is, that one would have to keep track of the files, which should not be deleted in the application. And the handler can be called completely out of the render functions.

Personally, I would rather prefer a solution, where you could command the library to delete them or not.
But this should be possible on every request.

So, if the library sends an error message (i.e. 404) or if the render function (i.e. for render put) does not take care about the files, the library should automatically delete the files in the end.
But if the application does something with the files, then it should tell the library to not delete the files at the end of the request.
Ideally down to single files.

@LeSpocky
Copy link
Contributor Author

What would a custom cleanup mean? Would this have to be registered on webserver creation and then take effect on every request? Or could this be set on each request individually?

What I had in mind was a rather global function registered on webserver creation.

Problem I see is, that one would have to keep track of the files, which should not be deleted in the application. And the handler can be called completely out of the render functions.

Yes. But I see this as a feature. The render functions are bound to a registered resource. File uploads (think of malicious requests from client with bad intent) can happen to any resource, especially resources where the webserver would answer with 404.

Personally, I would rather prefer a solution, where you could command the library to delete them or not. But this should be possible on every request.

So, if the library sends an error message (i.e. 404) or if the render function (i.e. for render put) does not take care about the files, the library should automatically delete the files in the end. But if the application does something with the files, then it should tell the library to not delete the files at the end of the request. Ideally down to single files.

Mkay. Do you have an idea how this could be realized from API point of view?

@ctomahogh
Copy link
Contributor

Actually it might really get difficult to find a good solution for setting the cleanup on every request individually (at least I have no quick idea on that).

So yes, a callback or something like that is probably a better solution.
How would a callback in your opinion look like? Or what will be handed over to this callback?
Would this callback be called on every uploaded file individually and only provide the path to a single file?
cleanup_callback(std::string &filepath);
Or would this callback get the whole map of files?
cleanup_callback(std::map<std::string, std::map<std::string, http::file_info>> &files)

Maybe a little bit more radical approach:
Don't offer a cleanup callback at all and always delete all uploaded files (regardless whether a 404 is returned or whether a render function was called or not).
If an application needs one or more files to be stored for longer than the request, it can simply copy the file.
Or just rename the file in the application. Then the unlink-call in the libhttpserver would fail but this would be considered as acceptable and not trigger an error.

@LeSpocky
Copy link
Contributor Author

Actually it might really get difficult to find a good solution for setting the cleanup on every request individually (at least I have no quick idea on that).

So yes, a callback or something like that is probably a better solution. How would a callback in your opinion look like? Or what will be handed over to this callback? Would this callback be called on every uploaded file individually and only provide the path to a single file? cleanup_callback(std::string &filepath); Or would this callback get the whole map of files? cleanup_callback(std::map<std::string, std::map<std::string, http::file_info>> &files)

What I had in mind was the whole map.

Maybe a little bit more radical approach: Don't offer a cleanup callback at all and always delete all uploaded files (regardless whether a 404 is returned or whether a render function was called or not). If an application needs one or more files to be stored for longer than the request, it can simply copy the file. Or just rename the file in the application. Then the unlink-call in the libhttpserver would fail but this would be considered as acceptable and not trigger an error.

Yes, I mean I could live with that, because it would not interfere at all with my usecase. For the case where those files are needed later, that copy might be expensive, depending on how it is done, but you could argue that's business of the user.

Just moving (renaming) a file would make sense for the case where the lib generates the filenames anyways, but I did not thought of this in the first place, because I considered the lib responsible for those files.

Either way, it would need an update of documentation, to clearly state what the library does and what not. It's a little hard to guess what user expectations are in this case. 😕

@ctomahogh
Copy link
Contributor

I am fine if the cleanup function would get the whole map. This was merely a question.

Actually the "radical" approach would just be the same if the user of the lib would not provide a custom cleanup function. Then the lib would again always delete all uploaded files and a user could simply preserve files by copying or renaming.

As a default cleanup is needed anyway I offering the possibility for a custom cleanup function is a good approach.

Anyway I think, this functionality (a default cleanup function at least) is really necessary for a released version. As using the library as it is at the moment could be used for some kind of DoS attack if a malicious client could simply fill up the whole file system of a device.

@etr
Copy link
Owner

etr commented Mar 29, 2022

Love the thread.

The general approach of the library with managing uploaded content is that the content is "owned" by the library until it leaves and it is accessed by the structures of the library itself. Along those lines, if you use an "in memory" approach, the content uploaded goes away as soon as the http_request object is deleted.

The fact that we store the file on disk is just an extension of memory to me; the library still "owns" that content until the user doesn't take explicit ownership of it.

As such, having a deletion by default is a good strategy (akin to freeing a dangling pointer and preventing a leakage).

To keep the library usable this is what I'd do:

  1. Have the library delete by default the file(s) on disk contextually with the deletion of the http_request they are related to.
  2. Provide a server-level override for the delete function - the users can basically void this function if they want to keep the files in the same location (had they provided a custom one) or move them if they need to manage them elsewhere.

On 2. I am working on a more general system of overrides at resource level; so keeping it at server level for now is fine - it will all go down one layer when I am finally close on this.

@LeSpocky
Copy link
Contributor Author

Okay, what I take from this for now: we can start with the cleanup (1.) already, because it's more or less independent from the actual solution we come up with for hook or override (2.) or whatever it will be called.

LeSpocky added a commit to LeSpocky/libhttpserver that referenced this issue Apr 7, 2022
If the user needs those files, she should copy/move theme in the request
handler.

References: etr#264
Signed-off-by: Alexander Dahl <[email protected]>
@LeSpocky LeSpocky mentioned this issue Apr 7, 2022
4 tasks
@LeSpocky
Copy link
Contributor Author

Okay, what I take from this for now: we can start with the cleanup (1.) already, because it's more or less independent from the actual solution we come up with for hook or override (2.) or whatever it will be called.

For that minimal "radical" approach, see #266.

etr pushed a commit that referenced this issue Apr 12, 2022
* http_request: Remove uploaded files in destructor

If the user needs those files, she should copy/move theme in the request
handler.

References: #264
Signed-off-by: Alexander Dahl <[email protected]>

* test: integ: file_upload: Remove redundant file delete

Not needed anymore. http_request destructor does cleanup now.

* test: integ: file_upload: Check uploaded files are deleted

* test: integ: file_upload: Stop webserver earlier

The webserver runs in a different thread than the test, therefor
deleting the uploaded files and checking if they are deleted might lead
into a race condition, and thus into tests failing sometimes, but not
always.

Moving the webserver stop and destructor behind the curl action should
be safe, every information needed for the test is copied to or in the
resource handler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Feature requests or enhancements
Projects
None yet
Development

No branches or pull requests

3 participants