Add support for downloading input images from S3 bucket. #181

theoway · 2022-05-08T18:13:13Z

What is the problem?

There should be a way for NodeODM to directly get input images from a S3 bucket/folder and use it for processing. Currently, one has to upload it from their machine to a remote instance.

What should be the expected behavior?

There should be a field where one can simply enter the name of the S3 path to images folder, and be able to process it, saving the results in the folder specified by the user.

How can we reproduce this? (What steps did you do to trigger the problem? Be detailed)

It's a feature request.

theoway · 2022-05-08T18:18:08Z

How I'm going to implement this:

Here, I'll add a similar function to download_images() from the S3 path

NodeODM/libs/S3.js

Line 77 in 6acd372

uploadPaths: function(srcFolder, bucket, dstFolder, paths, cb, onOutput){

Modifications will have to be made in places like these:

NodeODM/index.js

Lines 201 to 284 in 6acd372

    
           /** @swagger 
        
            *  /task/new: 
        
            *    post: 
        
            *      description: Creates a new task and places it at the end of the processing queue. For uploading really large tasks, see /task/new/init instead. 
        
            *      tags: [task] 
        
            *      consumes: 
        
            *        - multipart/form-data 
        
            *      parameters: 
        
            *        - 
        
            *          name: images 
        
            *          in: formData 
        
            *          description: Images to process, plus optional files such as a GEO file (geo.txt), image groups file (image_groups.txt), GCP file (*.txt) or seed file (seed.zip). If included, the GCP file should have .txt extension. If included, the seed archive pre-polulates the task directory with its contents. 
        
            *          required: false 
        
            *          type: file 
        
            *        - 
        
            *          name: zipurl 
        
            *          in: formData 
        
            *          description: URL of the zip file containing the images to process, plus an optional GEO file and/or an optional GCP file. If included, the GCP file should have .txt extension 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: name 
        
            *          in: formData 
        
            *          description: An optional name to be associated with the task 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: options 
        
            *          in: formData 
        
            *          description: 'Serialized JSON string of the options to use for processing, as an array of the format: [{name: option1, value: value1}, {name: option2, value: value2}, ...]. For example, [{"name":"cmvs-maxImages","value":"500"},{"name":"time","value":true}]. For a list of all options, call /options' 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: skipPostProcessing 
        
            *          in: formData 
        
            *          description: 'When set, skips generation of point cloud tiles.' 
        
            *          required: false 
        
            *          type: boolean 
        
            *        - 
        
            *          name: webhook 
        
            *          in: formData 
        
            *          description: Optional URL to call when processing has ended (either successfully or unsuccessfully). 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: outputs 
        
            *          in: formData 
        
            *          description: 'An optional serialized JSON string of paths relative to the project directory that should be included in the all.zip result file, overriding the default behavior.' 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: dateCreated 
        
            *          in: formData 
        
            *          description: 'An optional timestamp overriding the default creation date of the task.' 
        
            *          required: false 
        
            *          type: integer 
        
            *        - 
        
            *          name: token 
        
            *          in: query 
        
            *          description: 'Token required for authentication (when authentication is required).' 
        
            *          required: false 
        
            *          type: string 
        
            *        - 
        
            *          name: set-uuid 
        
            *          in: header 
        
            *          description: 'An optional UUID string that will be used as UUID for this task instead of generating a random one.' 
        
            *          required: false 
        
            *          type: string 
        
            *      responses: 
        
            *        200: 
        
            *          description: Success 
        
            *          schema: 
        
            *            type: object 
        
            *            required: [uuid] 
        
            *            properties: 
        
            *              uuid: 
        
            *                type: string 
        
            *                description: UUID of the newly created task 
        
            *        default: 
        
            *          description: Error 
        
            *          schema: 
        
            *            $ref: '#/definitions/Error' 
        
            */ 
        
           app.post('/task/new', authCheck, taskNew.assignUUID, taskNew.uploadImages, (req, res, next) => {

I'd like to know your thoughts on this, and any directions on implementing this are much appreciated :)

fosteman · 2022-06-25T19:42:49Z

Also interested in doing something similar. My thought was to mount GCP bucket onto the local filesystem with gcloud api.

FrankvVeelen · 2022-09-07T14:49:47Z

Did you make any progress on this? I'm trying to do the same thing and would love to know if you got it working.

theoway · 2022-09-09T07:50:54Z

Did you make any progress on this? I'm trying to do the same thing and would love to know if you got it working.

No, I'm working on a different feature in the ODM engine. You're welcome to add this enhancement :)

utya · 2023-11-17T19:15:21Z

Hello everyone. I would like to update this issue. Currently, I have make some research and realized that there are two ways to transfer an image from S3 to ODM. The first method is to add an S3 photo upload feature in the ODM python Docker itself. And the second method is to add an S3 photo upload feature to the nodeODM and then pass these images into python. I have some draft code. Can I pull request it?

pierotofy · 2023-11-17T20:55:15Z

Always feel free to open pull requests.

Saijin-Naib · 2023-11-17T20:55:45Z

This sounds fascinating! I look forward to seeing what you have, but Piero and Stephen likely need to weigh in first.

utya · 2023-12-11T09:41:33Z

Hi. My ODM container is configured to download images from an S3 bucket upon each startup. I'm planning to modify NodeODM to pass S3 bucket parameters for each task. However, I need to confirm how NodeODM handles ODM Docker containers for each task. Does NodeODM initiate a new ODM Docker container for each task, or does it reuse an existing container? Understanding this will help me ensure that my modifications work as intended.

Thanks

kauly · 2024-05-01T23:03:48Z

Hello, I think that similar results can be achieved using the zipUrl param of this route:

https://github.com/OpenDroneMap/NodeODM/blob/master/docs/index.adoc#post-tasknew

Just zip the images before upload them to S3

spwoodcock · 2024-10-14T13:28:44Z

This issue can be bypassed with Minio by downloading a zipped directory directly from the API (see below).

I'm not sure if AWS S3 supports this, or may require a Lambda function to achieve the same.

Then, as mentioned above, the zipurl param on the /task/new endpoint can be used.

Further Context

Hopefully the info below is useful to someone.

Our Existing Workflow

Image files are uploaded to S3 as they are flown / taken.
Once all imagery is collected, we continue to the next step.
In a Python script we first download all the file from the S3.
We create a new project via NodeODM endpoint /task/new/init (via PyODM).
Then we re-upload all the files to NodeODM via endpoint /task/new/upload (PyODM).
Processing can be started by NodeODM endpoint /task/new/commit (PyODM).
Polling for completion is done by PyODM wait_for_completion function, which calls the /info endpoint for the Node at an interval until completion.

Obviously this is extremely inefficient. It uses unnecessary resources on the Python server, including possibly locking a thread for hours while NodeODM is polled for completion.

It would be much nicer to have NodeODM download the files it needs for processing instead.

Proposed Solution

Use the NodeODM endpoint /task/new?zipurl=xxx as mentioned above.
a. In our case we are using Minio.
b. Minio (and AWS S3 I believe) provides the capability of zipping data on-the-fly.
c. For example Minio has an endpoint: https://YOUR_SERVER/api/v1/buckets/BUCKET_NAME/objects/download?prefix=BASE64ENCODEDPATH
d. In the url shown above BASE64ENCODEDPATH is a base64 encoded path, for example: /project1/flight_images/.
e. This URL provides a .zip file, so can be used as the zipurl param to NodeODM.
f. A task UUID is returned from the API call for future endpoint calls.
Get status of task /task/{uuid}/info via scheduled polling
a. We are looking for a better solution here. Perhaps NodeODM can trigger a webhook on job completion, or something similar. Need to dig into the docs / code further to see if this is already possible somehow! Otherwise we could contribute a solution / PR.
Download the final orthomosaic via /task/{uuid}/download/{asset}.

Efficiency gain 1: downloading the files from S3 once, on the NodeODM service, instead of download/re-upload combo on an intermediary service (e.g. Python PyODM).

Efficiency gain 2: TBC, a way to avoid polling NodeODM for job completion? I feel this problem is probably already solved by ClusterODM, so will look into how it's solved there.

pierotofy · 2024-10-14T14:03:19Z

Check the --webhook option.

pierotofy added help us improvement labels Jan 12, 2023

pierotofy removed the help us label Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for downloading input images from S3 bucket. #181

Add support for downloading input images from S3 bucket. #181

theoway commented May 8, 2022

theoway commented May 8, 2022 •

edited

Loading

fosteman commented Jun 25, 2022

FrankvVeelen commented Sep 7, 2022

theoway commented Sep 9, 2022 •

edited

Loading

utya commented Nov 17, 2023

pierotofy commented Nov 17, 2023

Saijin-Naib commented Nov 17, 2023

utya commented Dec 11, 2023

kauly commented May 1, 2024

spwoodcock commented Oct 14, 2024 •

edited

Loading

pierotofy commented Oct 14, 2024

Add support for downloading input images from S3 bucket. #181

Add support for downloading input images from S3 bucket. #181

Comments

theoway commented May 8, 2022

What is the problem?

What should be the expected behavior?

How can we reproduce this? (What steps did you do to trigger the problem? Be detailed)

theoway commented May 8, 2022 • edited Loading

How I'm going to implement this:

fosteman commented Jun 25, 2022

FrankvVeelen commented Sep 7, 2022

theoway commented Sep 9, 2022 • edited Loading

utya commented Nov 17, 2023

pierotofy commented Nov 17, 2023

Saijin-Naib commented Nov 17, 2023

utya commented Dec 11, 2023

kauly commented May 1, 2024

spwoodcock commented Oct 14, 2024 • edited Loading

Further Context

Our Existing Workflow

Proposed Solution

pierotofy commented Oct 14, 2024

theoway commented May 8, 2022 •

edited

Loading

theoway commented Sep 9, 2022 •

edited

Loading

spwoodcock commented Oct 14, 2024 •

edited

Loading