Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Georeferenced datasets processed on Metashape are loaded incorrectly/can't be loaded using splatfacto. #3255

Open
gaigc opened this issue Jun 25, 2024 · 2 comments

Comments

@gaigc
Copy link

gaigc commented Jun 25, 2024

Describe the bug
I've noticed that, when trying to use a dataset that I've aligned using GPS reference in Metashape, it will not load, or it will load but produce no results. This has been an issue since nerfstudio implemented loading in point clouds (.ply) for splat seeding.

I thought that @simonbethke might have reported this problem when it first was being tested in pull #3122. When @jb-ye asked for sample data, I assumed that they had shared that somewhere, but I couldn't find any issue related to this, so I'm making one here.

On older (month ago) nerfstudio and gsplat versions, I was getting error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

On the ODM_mygla dataset on current versions, I'm getting this error after ~1000its:
CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}") ZeroDivisionError: division by zero

If I do process the data without the gps reference, it does load correctly.

I'm able to use this data when exporting using the gaussian splatting metashape script on inria's and postshot implementation of gaussian splatting. The data is exported using colmap format.

Why use GPS reference

I've found that using GPS reference in metashape helps with alignment (speed and accuracy), and usually results in a consistent scale, orientation, and ground plane.

I'm not asking to somehow implement GPS data in splats/nerfstudio-data, but to simply accept data created with it.

To Reproduce
My workflow usually consists of:

  1. importing dataset(s) into metashape in different chunks. GPS data is auto selected in the reference panel from exif.
  2. Align images, usually with high/highest details, and high number of keypoints and unlimited tiepoints.
  3. Export camera positions and sparse/tie pointcloud using a batch job (Also tried manually)
  4. Process data into nerfstudio using the following command: ns-process-data metashape --data "e:/3D Datasets/[Dataset Name]" --xml "e:/3D Datasets/[Dataset Name]/db.xml" --ply "e:/3D Datasets/[Dataset Name]/PointCloud.ply" --output-dir e:/NerfStudio/data/[Dataset Name]/out
  5. Run Splatfacto with ns-train splatfacto --output-dir ./outputs/ODMlogs nerfstudio-data --data ./data/ODMlogs/out
  6. After undistorting it either loads incorrectly or give an error

I've tried exporting camera positions/point cloud with local coordinates and wgs84 as the coordinate system, both give the same error.

Expected behavior
To be able to load the data that has been georeferenced.

Alternative solution
Using the colmap export script data to create the nerfstudio data, but this will only be an option with people who have Metashape pro, and in my opinion a workaround (maybe still useful for using data used in other programs).

I tried to simply copy and paste this data into the colmap structure in nerfstudio, but I failed. I'm sure that there is a command/script to convert this into something that nerfstudio can run, but I wasn't able to get something that works.

Here is how the folder is formatted in case it helps:

[Dataset]/
├─ node_modules/
├─ images/
│ ├─ Image1.jpg
│ ├─ Image2.jpg
│ ├─ Image3.jpg
├─ sparse/
│ ├─ 0/
│ │ ├─ cameras.bin
│ │ ├─ images.bin
│ │ ├─ points3D.bin

Screenshots
ODM_mygla processed using splatfacto georeferenced before it crashes:
image
The only notable detail is a small white dot at the bottom of the scene at possibly infinite distance. ODM_helenenschacht also displays similar results, but doesn't crash. Camera sometimes are obscured by the scene, so I need to disable composite depth to use the camera positions as reference.

ODM_mygla processed using nerfacto on same dataset:
image

Additional context
Machines specs and info:
Pc 1- R9 5900X, 128gb ram, 3060 12gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.2, Gsplat 1.0.0
Pc 2- i7 7700HQ, 16gb ram, 1060 6gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.0, Gsplat 0.1.12
All latest nvidia drivers, also tested with drivers from 4 months ago.
(I'm aware that Pc 2 won't be able to run any future gsplats, still included this info since the old version was giving different errors that might help narrowing down the problem)

Data processed using Metashape 2.x

I capture my own data using a Mavic Air 2s, it often aligns a few meters below ground, but even when adjusting for that, there are errors. I'm not sure how to share my own dataset, so here are some datasets that I've tested and display the same errors:

Datasets for reference:
ODM_mygla 41 images ~5mb each. Captured on DJI phantom 3

Here are the export files as .txt, you'll need to change the extension:
db.xml.txt
PointCloud.ply.txt

ODM_helenenschacht 176 images ~12mb each. Captured on Autel Evo II Pro RTK
I've also tested, but pointcloud is too big. Here is the camera positions:
db.xml.txt

This is my first issue submitted, so apologies for any missing info, or bad etiquette.

Dump of Logs

Console error from trying to run splatfacto with ODM_mygla dataset on nerfstudio 1.1.2 with PC 1

890 (2.97%)         8.359 ms             4 m, 3 s             78.09 M
----------------------------------------------------------------------------------------------------   splatfacto.py:
Viewer running locally at: http://localhost:7007 (listening on 0.0.0.0)
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0155
VanillaPipeline.get_train_loss_dict: 0.0118
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 265, in train
    callback.run_callback_at_location(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 115, in run_callback_at_location
    self.run_callback(step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 100, in run_callback
    self.func(*self.args, **self.kwargs, step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 456, in refinement_after
    split_params = self.split_gaussians(splits, nsamps)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 543, in split_gaussians
    CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}")
ZeroDivisionError: division by zero 

Console error from trying to run splatfacto with nerfstudio 1.1.0 with PC 2

 [17:24:14] Caching / undistorting train images                                            full_images_datamanager.py:183
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 8.2725
VanillaPipeline.get_train_loss_dict: 8.2715
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 261, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 498, in train_iteration
    self.grad_scaler.scale(loss).backward()  # type: ignore
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\autograd\__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
@jb-ye
Copy link
Collaborator

jb-ye commented Jul 23, 2024

Just want to confirm a few @gaigc :

Q1: if you unselect the GPS data in metashape alignment, would you experience the same issue by converting xml/ply to nerfstudio format?

Q2: Since you have colmap project exported, you can directly train splatfacto using the colmap parser without converting them to nerfstudio format after all. ns-train splatfacto colmap --data <path_to_colmap_data>

@gaigc
Copy link
Author

gaigc commented Jul 23, 2024

Just want to confirm a few @gaigc :

Q1: if you unselect the GPS data in metashape alignment, would you experience the same issue by converting xml/ply to nerfstudio format?

Q2: Since you have colmap project exported, you can directly train splatfacto using the colmap parser without converting them to nerfstudio format after all. ns-train splatfacto colmap --data <path_to_colmap_data>

R1:
If I align it with gps, and disable the reference, it stays in place in the global map in Metashape and still gives the same error, I assume that the ply uses the real world coordinates, and it might be too large a value to be used.

R2:
Thanks for the info! I've found that out recently while trying to figure out how to use masks with splatfacto. I've been playing with this since, and it seems to be a good way to work around this problem, It even creates the downscale images for the images and masks.

For anyone who faces a similar issue, you just have to use the script to create the colmap files, and simply move the "sparse" folder it creates into a new folder called "colmap".

So it goes like this:

[Dataset]/
├─ images/
│ ├─ Image1.jpg
│ ├─ Image2.jpg
│ ├─ Image3.jpg
├─ masks/ (optional)
│ ├─ Image1.jpg
│ ├─ Image2.jpg
│ ├─ Image3.jpg
├─ colmap/
│ ├─ sparse/
│ ├─ 0/
│ ├─ cameras.bin
│ ├─ images.bin
│ ├─ points3D.bin

And then run it like this:

ns-train splatfacto --output-dir ./outputs/[Dataset Name] --data ./data/[Dataset Name] colmap

and for masks simply add --masks-path:
ns-train splatfacto --output-dir ./outputs/[Dataset Name] --data ./data/[Dataset Name] colmap --masks-path masks

Speculating on the possible problem

I've realized that the Metashape script has a default option called "Use localframe" and I believe that this is why colmap works, but nerfstudio doesn't with the ply.

Description of the option and screenshot of the window:

Shifts coordinates origin to the center of the bounding box
Uses localframe rotation at this point
This is useful to fix large coordinates

image

Since nerfstudio worked with the camera coordinates, I assumed that this was a problem with the way splatfacto worked, but I believe it might be something related to having the 3d points so far away from origin.

Here is a table I made to test this out:

Name Geo Loc Reference LocalFrame Works Comment
Nerfstudio Convert Default Y Y NA N Usual way I use for nerfs
Nerfstudio no reference Y N NA N Is geo located, but not fixed in place
Nerfstudio no reference before align N N NA Y Aligned W/O gps info, wrong orientation
Colmap Normal Y Y Y Y
Colmap No Local frame Y Y N N It almost works, but give a lot of artifacts.
Colmap No Reference Y N Y Y
Colmap no Local frame No referece Y N N N It almost works, but give a lot of artifacts.

As you can see, localframe seems to be the main solution to the problem. While it did go further than nerfstudio, colmap with no local frame gave off artifacts and the results seems to be pretty much useless.

Here are some screenshots:

No localframe
image

No reference
image

No reference and no localframe
image

Conclusion

There might be a way to average out the coordinates of the camera and pointclouds to transform the coordinates with something that nerfstudio plays well with, but I can only guess that it would be quite complicated to deal with.

For those who have Metashape Pro, colmap export seems to be the best solution that only requires very little finking, but for those using standard, I believe your best option is to not use gps in the reference tab when aligning and see if rotating by hand works.

If you have any questions, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants