-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage with too many invalid fields. #50
Comments
Hi @syxolk! Thanks for the report, and sorry that vladiate was eating up your memory.
This is currently not possible, but it could be a valuable optional feature. I'm a little more interested in figuring out why so much memory is getting consumed and if the footprint can be reduced (I'm guessing it can, as I haven't really tested the upper bounds of this tool). |
Hey, I found two issues that lead to high memory usage in my particular case: First, Second, the If we can fix at least one of the two issues the memory problem should be gone:
PS: I'm happy to contribute! |
I think there's two things we could do here:
I'm leaning towards the first one, since it seems like less work, and still preserves the entire exception for debugging, but I could be convinced otherwise.
I think this is a great idea.
PRs are welcome! |
A new function stringify_set only stringifies n elements of a given set.
A new function stringify_set only stringifies n elements of a given set.
* Fix issue with SetValidator and large valid_set (#50) A new function stringify_set only stringifies n elements of a given set. * Use itertools.islice for better python2 performance * Add SetValidator test for coverage * Fix stringify_set: Return {...} instead of [...] * Add test cases for stringify_set Fix stringify_set: Sort the elements before displaying for small sets * Remove previously introduced test case for set validator * Add parameter checks for stringify_set * Rename stringify_set to _stringify_set * Revert d355def
I have a CSV with roughly 140k lines that should be validated with vladiate. The validation code looks like this:
The number of items in
lots_of_ids
is also around 140k.For some reasons I had a case where the ids in
lots_of_ids
had no intersection with the values from theparent_id
column inlargefile.csv
. Vladiate will in this case (correctly) collect all invalid rows. However, this resulted in my PC going to a RAM usage of at least 8GB.Is there any way I can stop the validation early? Can Vladiate detect if there were too many wrong fields and stop validating?
The text was updated successfully, but these errors were encountered: