Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cucumber Expressions with optionals interpreted as Regular Expressions #191

Open
kieran-ryan opened this issue Mar 18, 2024 · 3 comments
Open
Labels
🐛 bug Defect / Bug python

Comments

@kieran-ryan
Copy link
Member

kieran-ryan commented Mar 18, 2024

👓 What did you see?

  • Cucumber Expressions written in Python containing optionals are considered 'undefined' by the Language Service
  • However, the step successfully matches and executes when using the Python implementation of Cucumber Expressions directly
No matches with optionals

✅ What did you expect to see?

  • The step text is considered 'defined'

📦 Which tool/library version are you using?

  • Node - v18.16.0
  • Cucumber Language Service - v1.4.1

🔬 How could we reproduce it?

Steps to reproduce the behavior:

  1. Open Visual Studio Code

  2. Install the official Cucumber extension

  3. Create a feature file inside the features directory` containing

    Feature: Colour selection
    
      Scenario:
        Given I select the theme colour "red"
  4. Create a step definition inside the features/steps directory containing

    from behave import given
    
    @given('I select the theme colo(u)r "{color}"')
    def step_when(context):
        ...
  5. Observe the step in the feature file is highlighted as 'undefined'

📚 Any additional context?

The Language Service Python implementation prioritises Regular Expressions (checks first) over Cucumber Expressions.

A criteria for determining whether a pattern is a Regular Expression is whether it contains brackets () through specialCharsMatch.

export function isRegExp(cleanWord: string): boolean {
const startsWithSlash = cleanWord.startsWith('/')
const namedGroupMatch = /\?P/
const specialCharsMatch = /\(|\)|\.|\*|\\|\|/
const containsNamedGroups = namedGroupMatch.test(cleanWord)
const containsSpecialChars = specialCharsMatch.test(cleanWord)
return startsWithSlash || containsNamedGroups || containsSpecialChars
}

As a result, any Cucumber Expression containing an optional will be treated as a Regular Expression and the optional will instead be considered a capture group.

0: r {expression: 'I am on the profile customisation/settings page', parameterTypeRegistry: Aa, parameterTypes: Array(0), ast: ti, treeRegexp: r}
1: Us {regexp: /I select the theme colo(u)r "{color}"/, parameterTypeRegistry: Aa, treeRegexp: r}

A challenge is that in some languages a Regular Expression can be denoted by special prefix and suffix characters, whereas in Python, strings are similar in either case. See Java implementation:

toStepDefinitionExpression(node) {
const text = stringLiteral(node)
const hasRegExpAnchors = text[0] == '^' || text[text.length - 1] == '$'
return hasRegExpAnchors ? new RegExp(text) : text
},

Brackets usage in Regular Expressions with Python

Official Python documentation on regular expressions outline the use of brackets as follows:

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(], [)].

(?...)

This is an extension notation (a '?' following a '(' is not meaningful otherwise). The first character after the '?' determines what the meaning and further syntax of the construct is. Extensions usually do not create a new group; (?P...) is the only exception to this rule. Following are the currently supported extensions.

Further References

@kieran-ryan kieran-ryan added the 🐛 bug Defect / Bug label Mar 18, 2024
@kieran-ryan kieran-ryan self-assigned this Mar 18, 2024
@kieran-ryan kieran-ryan changed the title Cucumber Expressions using optionals interpreted as Regular Expressions Cucumber Expressions with optionals interpreted as Regular Expressions Mar 18, 2024
@kieran-ryan
Copy link
Member Author

@mpkorstanje, wondering you by any chance have any guidance on this one - either intermediate or as a long term solution?

In essence - at least in the Python implementation - due to an invalid regular expression check, Cucumber Expressions containing optionals are being incorrectly treated as Regular Expressions. Thus, they are being considered 'undefined'.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Mar 18, 2024

With cucumber/vscode#125 in mind, this looks like a pretty tricky problem. Cucumber and regular expressions have considerable overlap. Consider:

Hello(.+)\?
Hello( world)?
Hello world?

For Java, cucumber-expressions use a simple heuristic to determine what we're dealing with. This is implemented in the ExpressionFactory. (Note: The characters aren't anything special, they're the regex start of input and end of input markers).

I do not see a Python equivalent of the ExpressionFactory so unfortunately users of the cucumber-expressions library have to decide what is a regex and what is a cucumber expression. And the language service would have to duplicate that logic. So I do think it would be a good idea to implement a ExpressionFactory for Python to get rid of at-least some ambiguity by providing a canonical solution.

But that won't solve the problem. Where Cucumber JVM only supports regular and cucumber expressions, Behave currently only supports regular and parse expressions, while PyTestBDD-NG supports regular-, parse-, and cucumber-expressions, in addition to a heuristic.

So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.

@kieran-ryan
Copy link
Member Author

This is great!

So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.

We could extract this information, though how it's configured varies quite a bit based on the framework and may be a challenge to maintain. The configuration option sounds like a great shout: minimal implementation and easily replicable across languages and frameworks.

Will look into the ExpressionFactory to gain an understanding and think about this further. Thanks a million!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Defect / Bug python
Projects
None yet
Development

No branches or pull requests

2 participants