Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod5 merge hangs indefinitely at 99-100%(the last 20 pod5 have not been merged) #131

Open
kir1to455 opened this issue Jun 9, 2024 · 5 comments

Comments

@kir1to455
Copy link

Issue Description

I use pod5 merge to merge my pod5 file, I have 3320 pod5 files. It seemed to have stopped processing the last 20 pods. However, nohup told me it was done and there were no errors.

Logs

This is input group.
image
This is ip group.
image
image
Here is my pod5 merge code:
image
Here is the size of merge_pod5 and multi_pod5:
image
image
It seems that the last 20 pod5 have not been merged.

Specifications

  • Pod5 Version: 0.3.10
  • Python Version: Python 3.8.17
  • Platform: Centos7
@HalfPhoton
Copy link
Collaborator

Interesting.
Is this running in a conda environment or python environment? We occasionally see issues when running in conda.

Are you able to merge the remaining 20 files into the ip_merge.pod5 file?

@kir1to455
Copy link
Author

Hi, @HalfPhoton

We occasionally see issues when running in conda.

I run this code in conda environment.
image

Are you able to merge the remaining 20 files into the ip_merge.pod5 file?

I don't know how pod5 merge handles the order of files.
Like test_0.pod5...test_1.pod5... test_20.pod5?
If so, I will try to merge it.

Best wishes,
Kirito

@HalfPhoton
Copy link
Collaborator

ah - I see.

In this case please create a list of missing read ids from the first merged output and all inputs using pod5 view.

# get read ids
pod5 view -IH input_data/ -o input.ids
pod5 view -IH merged.pod5 -o merged.ids

# Sort the files (comm requires sorted files)
sort input.ids > input.ids.sorted
sort merged.ids > merged.ids.sorted

# Find ids in input that are not in merged file
comm -23 input.ids.sorted merged.ids.sorted > missing.ids

# Get a pod5 file of only missing ids
pod5 filter input_data/ --ids missing.ids -o missing.pod5

# Merge in missing ids
pod5 merge merged.pod5 missing.pod5 -o merged.final.pod5

@HalfPhoton
Copy link
Collaborator

I recommend using a python virtual environment instead of a conda environment:

python3.10 -m venv venv --prompt=pod5
source venv/bin/activate
pip install -U pip pod5
pod5 --version

@arturotorreso
Copy link

Just for the record, the same thing happens to me, but all the files are actually processed and there's no missing reads. So it's probably something with the progress bar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants