Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunpowder errors #18

Open
Secondus2 opened this issue Dec 13, 2023 · 1 comment
Open

Gunpowder errors #18

Secondus2 opened this issue Dec 13, 2023 · 1 comment

Comments

@Secondus2
Copy link

Secondus2 commented Dec 13, 2023

I have been trying to train a model from scratch on some data produced in one of our research groups at the University of Warwick. The data has voxels which are 70 x 9 x 9 nm, and is 48 x 4096 x 4096 voxels.

mnd_train_mito.json
mnd_validation_mito.json
config_training_test_post_meeting_patrick .yaml.txt

I have tried to run train.py using the attached files, and the following command:

python3 train.py --name example_training with config_training_test_post_meeting_patrick.yaml training.data=data_configs/mnd_train_mito.json validation.data=data_configs/mnd_validation_mito.json torch.device=0

I get the following output:

INFO:__main__:Attach Mongo observer
INFO:example_training:Running command 'train'
INFO:example_training:Started run with ID "11"
Added application/json as content-type of artifact /mnt/e/Camdu/incasem-main/scripts/02_train/data_configs/mnd_train_mito.json.
Added application/json as content-type of artifact /mnt/e/Camdu/incasem-main/scripts/02_train/data_configs/mnd_validation_mito.json.
INFO:__main__:Starting new training run 11
INFO:__main__:total_params=5837730
INFO:__main__:trainable_params=5837730
INFO:incasem.pipeline.sources.data_sources_base:Setting up my_new_data_train_mito
INFO:incasem.pipeline.sources.data_sources_semantic:No mask given, add dummy mask of all 1s.
INFO:incasem.pipeline.training_baseline_with_context:Sampling probabilities for the provided datasets:
{'my_new_data_train_mito': 1.0}
/home/camdu/.local/lib/python3.8/site-packages/gunpowder/batch_request.py:118: UserWarning: merge is deprecated! please use update_with as it accounts for spec metadata
  warn(
DEBUG:incasem.pipeline.training_baseline_with_context:ZarrSource[/mnt/e/Camdu/incasem-main/data/my_new_data.zarr] -> Crop -> Crop -> Crop -> BinarizeLabels -> MergeLabels -> AddMask -> BinarizeLabels -> MergeMasks -> Normalize -> DeepCopyArrays -> BinarizeLabels -> SaveBlockPosition -> RandomLocationBounded -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> CentralizeRequests -> RandomProvider -> Reject -> Downsample -> SimpleAugment -> ElasticAugment -> SimpleAugment -> IntensityAugment -> ToDtype -> BalanceLabels -> IntensityScaleShift -> DeepCopy -> Unsqueeze -> Unsqueeze -> PreCache -> Train -> Squeeze -> Squeeze -> IntensityScaleShift -> FloatToUint8 -> ToDtype -> Softmax -> FloatToUint8 -> DeepCopyArrays -> Snapshot -> Uint8ToFloat -> PrintProfilingStats
INFO:incasem.pipeline.sources.data_sources_base:Setting up my_new_data_validation
INFO:incasem.pipeline.sources.data_sources_semantic:No mask given, add dummy mask of all 1s.
WARNING:incasem.gunpowder.torch.predict:Model is in training mode during prediction. Consider using model.eval()
INFO:__main__:debug_logdir='/mnt/e/Camdu/incasem-main/training_runs/tensorboard/0011/debug'
INFO:incasem.gunpowder.random_location_bounded:requesting complete mask...
INFO:incasem.gunpowder.random_location_bounded:allocating mask integral array...
INFO:incasem.gunpowder.torch.train:Training on gpu 0.
INFO:incasem.gunpowder.torch.train:Starting training from scratch
INFO:incasem.gunpowder.torch.train:Using device cuda:0
INFO:__main__:Training iteration is 0, copying into validation pipeline
INFO:gunpowder.nodes.precache:starting new set of workers (8, cache size 20)...
ERROR:gunpowder.producer_pool:Exception in Unsqueeze while processing request
        RAW: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LABELS: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        BACKGROUND_MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        METRIC_MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LOSS_SCALINGS: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        RAW_POS: ROI: None, voxel size: None, interpolatable: None, non-spatial: True, dtype: None, placeholder: False

Batch returned so far:
None
Traceback (most recent call last):
  File "/home/camdu/.local/lib/python3.8/site-packages/gunpowder/nodes/batch_provider.py", line 182, in request_batch
    self.check_request_consistency(request)
  File "/home/camdu/.local/lib/python3.8/site-packages/gunpowder/nodes/batch_provider.py", line 244, in check_request_consistency
    assert request_roi.get_shape()[d]%provided_spec.voxel_size[d] == 0, \
AssertionError: in request
        RAW: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LABELS: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        BACKGROUND_MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        METRIC_MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        RAW_POS: ROI: None, voxel size: None, interpolatable: None, non-spatial: True, dtype: None, placeholder: False
, dimension 0 of request RAW is not a multiple of voxel_size 70

Followed by a lot of subsequent errors.

Does anyone have any idea what might be going wrong here?

Thanks a lot,
Tim

@patrickstock
Copy link
Collaborator

Hi Tim, sorry to see you are still having difficulty. Two additional pieces of info would be helpful to sort this out:

  1. Can you provide the .zarray and .zattrs from my_new_data.zarr
  2. This one I suppose is unlikely given the timing of your request but I should ask anyway - when did you clone the repository? We fixed an issue in August (PR 14) to enable regions smaller than 204,204,204.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants