-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify batch processing docs and behavior in context of multiple files, cross message rules (E047) #100
Comments
Comment by barbeau @evansiroky Thanks for pointing this out. I agree that the time dependency of some rules could be called out more explicitly. IIRC the basic assumption we made was that all files in a directory being validated would come from the same feed stream. So if you're mixing feeds (e.g., different .pb file sources) in the same directory it could cause issues. Also, note that there is a -sort parameter that controls if the file name or date is used as the "current" time for these rules: Also FYI, the validator should support mixed feeds, where you have multiple entity types in the same PB file (e.g., VehiclePosition and TripUpdates). |
Looking at this further, I am wondering how this plays into the calculation of E047 errors. I'm confused about how this is all supposed to work in a batch context now. |
@evansiroky I agree that the logic of of the cross-message validation needs to be reviewed in context of the batch processor. From what I recall, when you're using the webapp and you enter multiple URLs, all the entity objects (TUs, VPs) from both URLs get aggregated into a single list that's validated for all the rules, including the cross-message validation like E047. In batch processing, only one RT file is read at a time into that same entity list, so unless you have mixed entity types in the same file (i.e., VPs and TUs at the same endpoint URL) you'll never see E047. We should verify what the current behavior is and document it appropriately, and possibly allow dynamic merging of multiple files (e.g., with same timestamp in file name? That probably isn't realistic...) so that files from multiple feed endpoints (URL for TUs, URL for VPs) could be verified together for rules like E047 in batch mode. I updated the issue title to reflect this. @evansiroky @e-lo It would be useful to better understand what your file system looks like for archiving files from multiple feed endpoints (TUs, VPs) for a single provider and how you think the batch processor command-line options or other configuration could be set up in your use case to run E047 and similar cross-entity rules. |
Issue by evansiroky
Feb 10, 2022
Originally opened as CUTR-at-USF#411
Summary:
I looked in the code and realized that some validation rules look back at previous messages, and so it occurred to me that this implies validating from the same stream of RT file types.
Steps to reproduce:
If multiple file types are validated at the same time, they might all have certain header timestamps that could result in certain timestamp validation rules being triggered or not being triggered.
Expected behavior:
The batch file documentation should recommend validating only one RT file type at a time.
Observed behavior:
The batch file documentation does not recommend validating only one RT file type at a time.
Platform:
https://github.com/cal-itp/gtfs-rt-validator-api/
The text was updated successfully, but these errors were encountered: