Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: object extended beyond tape #16

Open
dvillaveces-tq opened this issue Feb 7, 2023 · 4 comments
Open

Error: object extended beyond tape #16

dvillaveces-tq opened this issue Feb 7, 2023 · 4 comments

Comments

@dvillaveces-tq
Copy link

docker run -it -v "$(pwd)/tmp/mrf-parse:/tmp/mrf-parse:rw" dancarbone/danielchalef-mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz -o /tmp/mrf-parse/outputs -p -1 -s /tmp/mrf-parse/data/filters/tic_500_shoppable_svcs.csv

INFO[2023-02-04T18:31:29Z] Running step: Download
INFO[2023-02-04T18:31:45Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz
INFO[2023-02-04T18:31:45Z] Step Download completed in 16 seconds
INFO[2023-02-04T18:31:45Z] Running step: Split
Reading /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz
Closing /tmp/mrfparse3638373359/split/provider_references_00.jsonl after 4.898750 seconds
Closing /tmp/mrfparse3638373359/split/in_network_00.jsonl after 0.561741 seconds
/tmp/mrfparse3638373359/split/root.json written successfully
Completed in 5.465689 secondsINFO[2023-02-04T18:31:51Z] Step Split completed in 6 seconds
INFO[2023-02-04T18:31:51Z] Running step: Parse
INFO[2023-02-04T18:31:51Z] Loaded 493 services.
INFO[2023-02-04T18:31:51Z] Found 3 files.
INFO[2023-02-04T18:31:51Z] MrfRoot file parsed: /tmp/mrfparse3638373359/split/root.json
INFO[2023-02-04T18:31:51Z] Found in_network_rate file/tmp/mrfparse3638373359/split/root.json
INFO[2023-02-04T18:31:51Z] Parsing in_network_rates: /tmp/mrfparse3638373359/split/in_network_00.jsonl
ERRO[2023-02-04T18:31:51Z] Fatal error in /app/pkg/mrfparse/mrf/in_network_rates.go#390: corrupt input: object extended beyond tape

@danielchalef
Copy link
Owner

I'm unable to reproduce this with the same input. See below. Can you confirm you're using the latest pull from main and Go 1.19.x?

daniel@server1 ➜  mrfparse git:(main) ✗ ./out/bin/mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.
amazonaws.com/anthem/CO_CBPLMED0000.json.gz  -o /tmp/out -s data/tic_500_shoppable_services.csv -p 0
INFO[2023-02-07T17:50:57Z] Running step: Download
INFO[2023-02-07T17:51:01Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3704144958/src/CO_CBPLMED0000.json.gz
INFO[2023-02-07T17:51:01Z] Step Download completed in 4 seconds
INFO[2023-02-07T17:51:01Z] Running step: Split
Reading /tmp/mrfparse3704144958/src/CO_CBPLMED0000.json.gz
Closing /tmp/mrfparse3704144958/split/provider_references_00.jsonl after 3.308092 seconds
Closing /tmp/mrfparse3704144958/split/in_network_00.jsonl after 0.515044 seconds
/tmp/mrfparse3704144958/split/root.json written successfully
Completed in 3.824363 secondsINFO[2023-02-07T17:51:04Z] Step Split completed in 3 seconds
INFO[2023-02-07T17:51:04Z] Running step: Parse
INFO[2023-02-07T17:51:04Z] Loaded 493 services.
INFO[2023-02-07T17:51:04Z] Found 3 files.
INFO[2023-02-07T17:51:04Z] MrfRoot file parsed: /tmp/mrfparse3704144958/split/root.json
INFO[2023-02-07T17:51:04Z] Found in_network_rate file/tmp/mrfparse3704144958/split/root.json
INFO[2023-02-07T17:51:04Z] Parsing in_network_rates: /tmp/mrfparse3704144958/split/in_network_00.jsonl
INFO[2023-02-07T17:51:04Z] Completed reading negotiated_rates: /tmp/mrfparse3704144958/split/in_network_00.jsonl
INFO[2023-02-07T17:51:05Z] Found 4980 providers in in_network_rates.
INFO[2023-02-07T17:51:05Z] Found provider_references fileprovider_references_00.jsonl
INFO[2023-02-07T17:51:05Z] Parsing provider references: /tmp/mrfparse3704144958/split/provider_references_00.jsonl
INFO[2023-02-07T17:51:06Z] Completed reading provider references: /tmp/mrfparse3704144958/split/provider_references_00.jsonl
INFO[2023-02-07T17:51:06Z] Found 275675 providers. Matched on 4980 providers.
INFO[2023-02-07T17:51:06Z] Step Parse completed in 2 seconds
INFO[2023-02-07T17:51:06Z] Running step: Clean
INFO[2023-02-07T17:51:06Z] Step Clean completed in 0 seconds

@frishrash
Copy link

frishrash commented Feb 20, 2023

docker run -it -v "$(pwd)/tmp/mrf-parse:/tmp/mrf-parse:rw" dancarbone/danielchalef-mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz -o /tmp/mrf-parse/outputs -p -1 -s /tmp/mrf-parse/data/filters/tic_500_shoppable_svcs.csv

INFO[2023-02-04T18:31:29Z] Running step: Download INFO[2023-02-04T18:31:45Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz INFO[2023-02-04T18:31:45Z] Step Download completed in 16 seconds INFO[2023-02-04T18:31:45Z] Running step: Split Reading /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz Closing /tmp/mrfparse3638373359/split/provider_references_00.jsonl after 4.898750 seconds Closing /tmp/mrfparse3638373359/split/in_network_00.jsonl after 0.561741 seconds /tmp/mrfparse3638373359/split/root.json written successfully Completed in 5.465689 secondsINFO[2023-02-04T18:31:51Z] Step Split completed in 6 seconds INFO[2023-02-04T18:31:51Z] Running step: Parse INFO[2023-02-04T18:31:51Z] Loaded 493 services. INFO[2023-02-04T18:31:51Z] Found 3 files. INFO[2023-02-04T18:31:51Z] MrfRoot file parsed: /tmp/mrfparse3638373359/split/root.json INFO[2023-02-04T18:31:51Z] Found in_network_rate file/tmp/mrfparse3638373359/split/root.json INFO[2023-02-04T18:31:51Z] Parsing in_network_rates: /tmp/mrfparse3638373359/split/in_network_00.jsonl ERRO[2023-02-04T18:31:51Z] Fatal error in /app/pkg/mrfparse/mrf/in_network_rates.go#390: corrupt input: object extended beyond tape

I had a similar issue when I built on Windows. I tried to trace it back and it seems like a bug in fakesimdjson. When I forced it to work with simdjson directly this error has gone.

@danielchalef
Copy link
Owner

Thanks. That's helpful context. @dcarbone may be interested in taking a look ^

@dcarbone
Copy link
Contributor

Yeah, I saw this when dealing with particularly large files. I probably won't have much time to look into it for a little while, unfortunately :\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants