Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow identifiers that start with a number #154

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

biochimia
Copy link
Contributor

First of all, this is a breaking change. JSLT code that uses identifiers
starting with a number will fail to parse. The only redeeming feature of
this change is that this is a common restriction in many programming
languages, which should somewhat limit the risk of mayhem.

The motivation for doing this change, and one which is reflected in the
tests, is how array slicing is picky about whitespace: .data[1:2]
would result in a cryptic error message, as the subscript was being
interpreted as an imported module's identifier:

com.schibsted.spt.data.jslt.JsltException: Parse error: Encountered " "]" "] "" at line 1, column 10.
Was expecting:
"(" ...
at :1:7
(...stacktrace follows...)

With this change, all identifiers (and references to them) are required
to start with a letter or underscore. Dashes are similarly disallowed,
to avoid conflicts with the use of a minus operator now or in the
future.

First of all, this is a breaking change. JSLT code that uses identifiers
starting with a number will fail to parse. The only redeeming feature of
this change is that this is a common restriction in many programming
languages, which should somewhat limit the risk of mayhem.

The motivation for doing this change, and one which is reflected in the
tests, is how array slicing is picky about whitespace: `.data[1:2]`
would result in a cryptic error message, as the subscript was being
interpreted as an imported module's identifier:

  com.schibsted.spt.data.jslt.JsltException: Parse error: Encountered " "]" "] "" at line 1, column 10.
  Was expecting:
    "(" ...
     at <inline>:1:7
       (...stacktrace follows...)

With this change, all identifiers (and references to them) are required
to start with a letter or underscore. Dashes are similarly disallowed,
to avoid conflicts with the use of a minus operator now or in the
future.
@larsga larsga added the bug Something isn't working label Nov 4, 2020
@larsga
Copy link
Collaborator

larsga commented Nov 4, 2020

That's well spotted! Thank you. And thank you for doing the implementation as well.

Your proposed fix may be the right one, but a possible alternative that's less invasive is to change the syntax of prefixes, so that they cannot start with digits. We'd need to change the definition of import and PIDENT, but otherwise everything could stay the same. Thoughts?

Also, we need tests showing that the now-illegal identifers really are flagged as errors in the parser.

@larsga
Copy link
Collaborator

larsga commented Nov 4, 2020

A third possibility is to make two breaking changes to the syntax at the same time. The one proposed here, and the rather annoying syntactic ambiguity where the end of a function or let can be misinterpreted (issue #133).

As an example, this is OK:

def remove_html(text)
  (replace($text, "</?h3\\s?[^>]*>", ""))

[for (.) {
  "title" : remove_html(.html)
}]

But this is not:

def remove_html(text)
  replace($text, "</?h3\\s?[^>]*>", "")

[for (.) {
  "title" : remove_html(.html)
}]

@larsga larsga added this to the JSLT 0.2.0 milestone May 7, 2021
@biochimia
Copy link
Contributor Author

Picking up the conversation here again, I happened upon a couple more interesting inputs where allowing identifiers to start with numbers leads to unexpected results, namely due to the way JavaCC disambiguates tokens first by length, and then by order/priority.

The following will currently all be interpreted as identifiers in JSLT:

  • 1-1, without whitespace this parses as an identifier because it's longer than 1 for the number
  • 1e, could be a number missing the exponent, it parses as an identifier, it would be preferable to raise an error as it's inconspicuous
  • 0123, is an identifier, as the grammar precludes numbers starting with a leading zero.

I can look at adding more tests covering cases that should fail.

Earlier I was trying to get out a YAML-based JSLT test library we have built internally but it won't necessarily help here, as it is scala-based and might not be appropriate for testing the core language because of the added dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants