Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for downloading input images from S3 bucket. #181

Open
theoway opened this issue May 8, 2022 · 11 comments
Open

Add support for downloading input images from S3 bucket. #181

theoway opened this issue May 8, 2022 · 11 comments

Comments

@theoway
Copy link
Contributor

theoway commented May 8, 2022

What is the problem?

There should be a way for NodeODM to directly get input images from a S3 bucket/folder and use it for processing. Currently, one has to upload it from their machine to a remote instance.

What should be the expected behavior?

There should be a field where one can simply enter the name of the S3 path to images folder, and be able to process it, saving the results in the folder specified by the user.

How can we reproduce this? (What steps did you do to trigger the problem? Be detailed)

It's a feature request.

@theoway
Copy link
Contributor Author

theoway commented May 8, 2022

How I'm going to implement this:

Here, I'll add a similar function to download_images() from the S3 path

uploadPaths: function(srcFolder, bucket, dstFolder, paths, cb, onOutput){

Modifications will have to be made in places like these:

NodeODM/index.js

Lines 201 to 284 in 6acd372

/** @swagger
* /task/new:
* post:
* description: Creates a new task and places it at the end of the processing queue. For uploading really large tasks, see /task/new/init instead.
* tags: [task]
* consumes:
* - multipart/form-data
* parameters:
* -
* name: images
* in: formData
* description: Images to process, plus optional files such as a GEO file (geo.txt), image groups file (image_groups.txt), GCP file (*.txt) or seed file (seed.zip). If included, the GCP file should have .txt extension. If included, the seed archive pre-polulates the task directory with its contents.
* required: false
* type: file
* -
* name: zipurl
* in: formData
* description: URL of the zip file containing the images to process, plus an optional GEO file and/or an optional GCP file. If included, the GCP file should have .txt extension
* required: false
* type: string
* -
* name: name
* in: formData
* description: An optional name to be associated with the task
* required: false
* type: string
* -
* name: options
* in: formData
* description: 'Serialized JSON string of the options to use for processing, as an array of the format: [{name: option1, value: value1}, {name: option2, value: value2}, ...]. For example, [{"name":"cmvs-maxImages","value":"500"},{"name":"time","value":true}]. For a list of all options, call /options'
* required: false
* type: string
* -
* name: skipPostProcessing
* in: formData
* description: 'When set, skips generation of point cloud tiles.'
* required: false
* type: boolean
* -
* name: webhook
* in: formData
* description: Optional URL to call when processing has ended (either successfully or unsuccessfully).
* required: false
* type: string
* -
* name: outputs
* in: formData
* description: 'An optional serialized JSON string of paths relative to the project directory that should be included in the all.zip result file, overriding the default behavior.'
* required: false
* type: string
* -
* name: dateCreated
* in: formData
* description: 'An optional timestamp overriding the default creation date of the task.'
* required: false
* type: integer
* -
* name: token
* in: query
* description: 'Token required for authentication (when authentication is required).'
* required: false
* type: string
* -
* name: set-uuid
* in: header
* description: 'An optional UUID string that will be used as UUID for this task instead of generating a random one.'
* required: false
* type: string
* responses:
* 200:
* description: Success
* schema:
* type: object
* required: [uuid]
* properties:
* uuid:
* type: string
* description: UUID of the newly created task
* default:
* description: Error
* schema:
* $ref: '#/definitions/Error'
*/
app.post('/task/new', authCheck, taskNew.assignUUID, taskNew.uploadImages, (req, res, next) => {

I'd like to know your thoughts on this, and any directions on implementing this are much appreciated :)

@fosteman
Copy link

Also interested in doing something similar. My thought was to mount GCP bucket onto the local filesystem with gcloud api.

@FrankvVeelen
Copy link

Did you make any progress on this? I'm trying to do the same thing and would love to know if you got it working.

@theoway
Copy link
Contributor Author

theoway commented Sep 9, 2022

Did you make any progress on this? I'm trying to do the same thing and would love to know if you got it working.

No, I'm working on a different feature in the ODM engine. You're welcome to add this enhancement :)

@utya
Copy link

utya commented Nov 17, 2023

Hello everyone. I would like to update this issue. Currently, I have make some research and realized that there are two ways to transfer an image from S3 to ODM. The first method is to add an S3 photo upload feature in the ODM python Docker itself. And the second method is to add an S3 photo upload feature to the nodeODM and then pass these images into python. I have some draft code. Can I pull request it?

@pierotofy
Copy link
Member

Always feel free to open pull requests.

@Saijin-Naib
Copy link

This sounds fascinating! I look forward to seeing what you have, but Piero and Stephen likely need to weigh in first.

@utya
Copy link

utya commented Dec 11, 2023

Hi. My ODM container is configured to download images from an S3 bucket upon each startup. I'm planning to modify NodeODM to pass S3 bucket parameters for each task. However, I need to confirm how NodeODM handles ODM Docker containers for each task. Does NodeODM initiate a new ODM Docker container for each task, or does it reuse an existing container? Understanding this will help me ensure that my modifications work as intended.

Thanks

@kauly
Copy link
Contributor

kauly commented May 1, 2024

Hello, I think that similar results can be achieved using the zipUrl param of this route:

https://github.com/OpenDroneMap/NodeODM/blob/master/docs/index.adoc#post-tasknew

Just zip the images before upload them to S3

@spwoodcock
Copy link

spwoodcock commented Oct 14, 2024

This issue can be bypassed with Minio by downloading a zipped directory directly from the API (see below).

I'm not sure if AWS S3 supports this, or may require a Lambda function to achieve the same.

Then, as mentioned above, the zipurl param on the /task/new endpoint can be used.

Further Context

Hopefully the info below is useful to someone.

Our Existing Workflow

  1. Image files are uploaded to S3 as they are flown / taken.
  2. Once all imagery is collected, we continue to the next step.
  3. In a Python script we first download all the file from the S3.
  4. We create a new project via NodeODM endpoint /task/new/init (via PyODM).
  5. Then we re-upload all the files to NodeODM via endpoint /task/new/upload (PyODM).
  6. Processing can be started by NodeODM endpoint /task/new/commit (PyODM).
  7. Polling for completion is done by PyODM wait_for_completion function, which calls the /info endpoint for the Node at an interval until completion.

Obviously this is extremely inefficient. It uses unnecessary resources on the Python server, including possibly locking a thread for hours while NodeODM is polled for completion.

It would be much nicer to have NodeODM download the files it needs for processing instead.

Proposed Solution

  1. Use the NodeODM endpoint /task/new?zipurl=xxx as mentioned above.
    a. In our case we are using Minio.
    b. Minio (and AWS S3 I believe) provides the capability of zipping data on-the-fly.
    c. For example Minio has an endpoint: https://YOUR_SERVER/api/v1/buckets/BUCKET_NAME/objects/download?prefix=BASE64ENCODEDPATH
    d. In the url shown above BASE64ENCODEDPATH is a base64 encoded path, for example: /project1/flight_images/.
    e. This URL provides a .zip file, so can be used as the zipurl param to NodeODM.
    f. A task UUID is returned from the API call for future endpoint calls.
  2. Get status of task /task/{uuid}/info via scheduled polling
    a. We are looking for a better solution here. Perhaps NodeODM can trigger a webhook on job completion, or something similar. Need to dig into the docs / code further to see if this is already possible somehow! Otherwise we could contribute a solution / PR.
  3. Download the final orthomosaic via /task/{uuid}/download/{asset}.

Efficiency gain 1: downloading the files from S3 once, on the NodeODM service, instead of download/re-upload combo on an intermediary service (e.g. Python PyODM).

Efficiency gain 2: TBC, a way to avoid polling NodeODM for job completion? I feel this problem is probably already solved by ClusterODM, so will look into how it's solved there.

@pierotofy
Copy link
Member

Check the --webhook option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants