Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escape regex characters that may appear in identifiers, fixes #680 #683

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hellkite500
Copy link
Member

Changes

  • When searching for forcing files by pattern, escape regex characters that may be in the ID string before making the pattern

TODO

  • Add test case for scientific notation ID

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • MacOS

@PhilMiller
Copy link
Contributor

Please limit IDs to a set of characters that exclude anything that regex would need escaped, rather than going along this path.

If that means other scripts or data sets need to be fixed up to not emit/contain those characters, we should push that through.

@hellkite500
Copy link
Member Author

Hydrofabric identifiers are strings and something like cat-7e+05 is a valid divide identifier (it uniquely identifies the divide in a given hydrofabric). While it may not be consistent, it is valid (NOAA-OWP/hydrofabric#28). ngen could enforce more stringent requirements, but it isn't clear why that should be required. We can definitely expand the list of explicitly escaped characters to the full set of regex special characters:
., +, *, ?, ^, $, (, ), [, ], {, }, |, \.
which would ensure that any string could be pattern matched by the internal regex built in ngen.

Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike agree to limit the catchment IDs to a safer set of characters. Once we have the updated hydrofabric, we can validate against the narrowed spec, and this escaping will be moot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants