Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common string matching patterns to avoid. #14

Open
dblodgett-usgs opened this issue Aug 24, 2023 · 3 comments
Open

Common string matching patterns to avoid. #14

dblodgett-usgs opened this issue Aug 24, 2023 · 3 comments

Comments

@dblodgett-usgs
Copy link
Collaborator

@lekoenig has a nice list of string matching omissions that could get added to the disambiguate names function.

@lkoenig-usgs
Copy link

I made then refined that list based on inspecting a random sample of ~300 site names from our "high-priority" WQP sites: 250 random site names + ~50 site names that contained the string "trib".

I think this helps to disambiguate some flowlines, especially when the site names contain extraneous info...but I wonder about hard-coding these decisions in disambiguate_indexes (and then needing to maintain a list for other users! 🙈). I'm game, just adding some thoughts.

@dblodgett-usgs
Copy link
Collaborator Author

Maybe we could just put the example that's a good starting point in an example or vignette?

@lkoenig-usgs
Copy link

I like that idea to show some proof-of-concept for cleaning ascii strings as a precursor step to disambiguate_indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants