Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextual string tags to prevent SQL injection #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mikesamuel
Copy link

https://nodesecroadmap.fyi/chapter-7/query-langs.html describes
this approach as part of a larger discussion about library support
for safe coding practices.

This is one step in a larger effort to enable

connection.querySELECT * FROM T WHERE x = ${x}, y = ${y}, z = ${z}(callback)

and similar idioms.

This was broken out of mysqljs/mysql#1926

@dougwilson
Copy link
Member

Thanks so much! I would live to play around this to review, but not 100% sure about all the different cases around in here. When you get a chance, can you write up documentation for the README so myself (and everyone else) knows this exists and how to make use of it? I'm really confused why there is a SQL parser in here and concerned that there are going to be lots of edge cases with issues and I'm not super interested in maintaining an actual SQL parser as part of this. Why is the parser necessary? I would almost expect this to act just like ? does today, or what is happening here?

@dougwilson dougwilson self-assigned this Jan 26, 2018
lib/Template.js Outdated
return false;
};

function stringWrapper() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do, exactly? How is it different from the toSqlString already implemented here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained when discussing Fragment and Identifier below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference is that you want to use instanceof I see. Well, that won't work, so there doesn't sound like a difference then?

Copy link
Author

@mikesamuel mikesamuel Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in favor of SqlString.raw and SqlString.identifier. The latter is newly added and round passes through escapeId.

I apologize for not looking carefully enough for existing APIs. I read the mysql module APIs but did not read the sqlstring APIs. I now see that toSqlString is mentioned both places but it didn't register.

I reached for TypedString without looking too carefully because I'm arguing in "A common style guide for tag implementors" for using TypedString subclasses to encourage interoperability among tag implementations, but I should put in more effort to see that my PRs fit into the existing code well.

lib/Template.js Outdated
* @param {string} content
* @constructor
*/
module.exports.Fragment = stringWrapper();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identifier and Fragment look like they are identical. Was this an accident or can they just be combined into one thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important fact is that instanceof Fragment is separate from instanceof Identifier.

A fragment is semantically a series of SQL tokens.
An identifier is a run of characters that can appear between `...`.

When the runtime support ES6, these will be represented as subclasses of require('template-tag-common').TypedString which allows associating a runtime type with a sub-language so that RTTI can be used to avoid over-escaping.

Without ES6, these are meant to just be valid right-hand-sides to the instanceof operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but instanceof is super broken when using modules in Node.js as O explained. We don't accept using instanceof any longer because of these issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Fragment sounds like it should be using our existing Fragment API: the toSqlString contract. If that is not possible, I need to understand why and then you should remove the toSqlString API completely in favor of the API because there is no reason to have two different Fragment APIs in the same module, that is silly like the module cannot coordinate with itself :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use SqlString.raw and SqlString.identifier.

if (require('process').env.npm_lifecycle_event === 'test') {
// Expose for testing.
// Harmless if this leaks
sql.makeLexer = makeLexer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put this into a different file and require it directly in the tests. And here, rather than a condition export. When users run their tests, this will be exported in everyone's suite, and if they use it accidentally, their own test suite won't fail, instead they'll fail when it gets into production, probably the worst place to fail :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(msamuel): separate out into TemplateLexer.js and require it directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Actually in lib/es6/Lexer.js

* Template tag function that contextually autoescapes values
* producing a SqlFragment.
*/
const sql = memoizedTagFunction(computeStatic, interpolateSqlIntoFragment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an in memory cache, let's make sure to note in the README the maximum amount of RAM this is set to so folks will understand how much memory this will consume over the lifetime of their application.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying the cache is a WeakMap, so it should adapt to memory pressure.

The maximum memory usage should scale with the number of uses of the tag that appear in the source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, sorry. Doesn't memoizing something cache some aspect in memory for later use? What is the maximum number of objects that will end up in that cache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum number of objects is 3 per occurrence of a relevant tag usage:

  • Two created arrays
  • One record that bundles the two together.

They're not pinned in memory since they key of the template object which is itself weakly referenced by the containing realm.

If that answers your question, I can put that in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm still confused on how this is working and what it's doing under the hood. Is there another way you can explain what this is doing? For example, if you remove the memoization what does that do? What does this memoization provide? Is it calculating some things and storing in memory for later use? If so, for how long is that thing stored in memory? I'm trying to figure this out, because Node.js apps will typically stay running for months at a time if possible, and I'm concerned if this is going to introduce some kind of small long-term memory growth from memoization of many queries over the lifetime of the app. So maybe that would help provide context for the information I'm trying to understand is :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sql`foo${bar}` is parsed by the EcmaScript engine, it creates a frozen array ['foo', ''].

When control reaches that expression, the memoizer uses that array as a key in a WeakMap lookup to decide whether it has to run the lexer or whether it can reuse.

The frozen array is itself weakly held by the module scope. This happens in "12.2.9.3Runtime Semantics: GetTemplateObject ( templateLiteral )". Although the spec doesn't make this explicit, it has to be weakly held because otherwise

function foo() {}
while (1) {
  eval('  foo`...`  ')
}

would leak memory.

So after a full garbage collection, any memoized state for scopes that are not reachable by any re-enterable scope will be flushed.

So the max size of the memo-table post full garbage collection should be the count of sql`...` calls in loaded modules.

By "re-enterable scope" above, I mean that those used only in (function () { ... }()) module initializers and in eval-ed code should not contribute in the steady state.

The min size of the memo-table will be the count of those uses that have been evaluated.

The memory cost per entry in the memo table will be the 3 objects detailed earlier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of digging around and found tc39/ecma262#840 so some of my assumptions were misfounded. I'm going to replace the cache with an LRU cache which should put a constant limit on the memory overhead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the WeakMap with an LRU cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the LRU cache size in bytes set to / is it adjustable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache caps at 100 entries.

I'm going to assume that in the 3σ case, all uses fits on a 120 column line and have 12 ${...}s.
In this 3σ case, a full cache consumes 11.7kB.

I'm going to assume that the average case has 3 or fewer ${...}.
In the typical case, a full cache consumes 4.7kB.

If storing a single ASCII character in an array costs less than an 8B pointer, then it will be correspondingly cheaper.

I could probably save some space by joining the contexts into a string instead of storing characters in an array, but I don't have the tooling to do memory micro-benchmarks so can't say for sure.

The LRU cache size is not adjustable. I could make that adjustable if you have an idea for an API.

@dougwilson
Copy link
Member

Running npm test seems to fail on my machine for some reason:

$ npm test

> [email protected] test C:\Users\dougw\GitHub\nodejs-sqlstring
> node test/run.js

1..2
ok 1 test\unit\test-SqlString.js
  0 fail | 64 pass | 16 ms

not ok 2 test\unit\test-Template.js
  Failed: template tag date

    AssertionError: 'SELECT \'1999-12-31 19:00:00.000\'' == 'SELECT \'2000-01-01 00:00:00.000\''
        at runTagTest (C:\Users\dougw\GitHub\nodejs-sqlstring\test\unit\es6-Template.js:72:12)
        at Object.date (C:\Users\dougw\GitHub\nodejs-sqlstring\test\unit\es6-Template.js:83:5)
        at TestCase.run (C:\Users\dougw\GitHub\nodejs-sqlstring\node_modules\utest\lib\TestCase.js:30:10)
        at Collection._runTestCase (C:\Users\dougw\GitHub\nodejs-sqlstring\node_modules\utest\lib\Collection.js:44:6)
        at Collection.run (C:\Users\dougw\GitHub\nodejs-sqlstring\node_modules\utest\lib\Collection.js:23:10)
        at _combinedTickCallback (internal/process/next_tick.js:73:7)
        at process._tickCallback (internal/process/next_tick.js:104:9)
        at Module.runMain (module.js:606:11)
        at run (bootstrap_node.js:389:7)
        at startup (bootstrap_node.js:149:9)

  1 fail | 24 pass | 46 ms

npm ERR! Test failed.  See above for more details.

@dougwilson
Copy link
Member

Ok, so I poked around a bit there. Let me know when you have some docs, because I can't (at least in the small time I have right now) figure out how to specify the timezone setting when using the template strings, for example (or stringifyObject).

@dougwilson
Copy link
Member

In the end, would be great to see template tags land because they are very convenient.


const one = valueArray[i];
let valueStr = null;
if (one instanceof SqlFragment) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users can construct the instance of these variables themselves, you need to be really careful with instanceof. For example, if a user uses an ORM that has mysql under the hood, which depends on sqlstring and then they are importing sqlstring in order to construct these instances, instanceof will end up being false unless the user is sure they depend on the exact version of sqlstring as the underlying mysql and npm was able to dedup the two installs into a single install. In practice, this has been a long term issue users encounter all the time, so if they can construct the instance, don't use instanceof check at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(mikesamuel): I will rework TypedString to carry a language identifier and a static method TypedString.same which can be used in lieu of instanceof.

Out of curiosity, does this occur when two dependencies loading two different versions of sqlstring, or is this a result of { "bundleDependencies": true" } in package.json?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both. For example, if a user uses an ORM that has mysql under the hood, which depends on sqlstring and then they are importing sqlstring in order to construct these instances, instanceof will end up being false unless the user is sure they depend on the exact version of sqlstring as the underlying mysql and npm was able to dedup the two installs into a single install.

When there are two copies in play, all reference checks will fail between them because they are completely separate module instances. That is why instanceof checks fail. You can see all this discussion from when the "toSqlString" was implemented as that was oroginally going to use instanceof.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, removed SqlFragment and Identifier in favor of SqlString.raw and SqlString.identifier.

}
valueStr = one.toString();
needsSpace = i + 1 === nValues;
} else if (one instanceof Identifier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same instanceof comment here as above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant to this PR, but I rewrote TypedString in npmjs.com/package/template-tag-common to make it easy to avoid instanceof.

@dougwilson
Copy link
Member

Is the following the expected output?

> SqlString.sql`SELECT * FROM foo WHERE bin = "${Buffer.from("1f3870be274f6c49b3e31a0c6728957f","hex")}"`.toString()
SELECT * FROM foo WHERE bin = "8p�\'OlI��\Z
                                           g(�"

@dougwilson
Copy link
Member

dougwilson commented Jan 26, 2018

Also not sure if I'm using this right: was just trying to put in a NULL:

> SqlString.sql`SELECT * FROM foo WHERE bin = ${null}`.toString()
SELECT * FROM foo WHERE bin = ?

@dougwilson
Copy link
Member

The following seems to throw an error, even though it's valid SQL syntax:

> SqlString.sql`UPDATE scores SET score=score--1`.toString()
Error: Expected delimiter at "UPDATE scores SET score=score--1"

Copy link
Author

@mikesamuel mikesamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving it a look.
I responded to comments with TODOs and will respond again as they're done.

Thanks so much! I would live to play around this to
review, but not 100% sure about all the different
cases around in here.

When you get a chance, can
you write up documentation for the README so
myself (and everyone else) knows this exists and
how to make use of it?

TODO(mikesamuel): Document API changes

I'm really confused why there
is a SQL parser in here and concerned that there are
going to be lots of edge cases with issues and I'm
not super interested in maintaining an actual SQL
parser as part of this. Why is the parser necessary?
I would almost expect this to act just like ? does
today, or what is happening here?

There seem to be two separable questions:

  1. Why is a sql parser needed?
  2. Why is SqlString.sql`FOO ${bar} BAZ` not just
    syntactic sugar for SqlString.format('FOO ? BAZ', [bar])?

I expect (1) is more controversial so I'll deal with that first.


.format might benefit from taking into account context.

.../sqlstring > node
> console.log(sqlstring.format(` SELECT '?' `, ['; DROP TABLE T -- ']))
 SELECT ''; DROP TABLE T -- ''
undefined

This is a footgun. The problem occurs because the author might have been thinking that the '?' specifies a string, and adds quotes. Those quotes interact badly with the quotes inserted by sqlstring.format.

In most cases, a developer will catch this in testing when ? gets filled with something that causes a parse error. It may not for holes that are often filled with empty strings or null values.

Lexing to find delimited string boundaries allows us to identify when a substitution point ( ? in prepared statement syntax ) appears inside quotes which also allows us to switch automagically between escape and escapeId semantics.

I've had success preventing injection attacks with similar systems. SQL is a simpler language, so the approach may be overkill, but the case above worries me and the parser is correspondingly simpler.

and concerned that there are going to be lots of edge cases

Language corner cases are a valid concern, but there is a separation of concerns here that we can take advantage of.

An attacker attempts to use obscure language corner cases to escape containment, but the template author has no such incentives.

This lexer only applies to the portion of the SQL string written by the trusted author though, and the escapers that contain untrusted strings are unchanged by this PR.


(2) is an excellent question and the answer is because I did not thoroughly investigate extending the existing API before implementing. I think I can rework this CL to adapt and extend format in a way that should be backwards compatible except w.r.t. odd cases like the SELECT '?' case above.

The goal could be to provide two idioms:

SqlString.sql({ stringifyObjects, zone}) returns SqlString.sql curried with options.

Used as a tag, SqlString.sql`...`callsSqlString.format` under the hood.

Combined, those would enable SqlString.sql({ stringifyObjects, zone })`...`.

lib/Template.js Outdated
* @param {string} content
* @constructor
*/
module.exports.Fragment = stringWrapper();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important fact is that instanceof Fragment is separate from instanceof Identifier.

A fragment is semantically a series of SQL tokens.
An identifier is a run of characters that can appear between `...`.

When the runtime support ES6, these will be represented as subclasses of require('template-tag-common').TypedString which allows associating a runtime type with a sub-language so that RTTI can be used to avoid over-escaping.

Without ES6, these are meant to just be valid right-hand-sides to the instanceof operator.

lib/Template.js Outdated
return false;
};

function stringWrapper() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained when discussing Fragment and Identifier below.


const one = valueArray[i];
let valueStr = null;
if (one instanceof SqlFragment) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(mikesamuel): I will rework TypedString to carry a language identifier and a static method TypedString.same which can be used in lieu of instanceof.

Out of curiosity, does this occur when two dependencies loading two different versions of sqlstring, or is this a result of { "bundleDependencies": true" } in package.json?

}
valueStr = one.toString();
needsSpace = i + 1 === nValues;
} else if (one instanceof Identifier) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

* Template tag function that contextually autoescapes values
* producing a SqlFragment.
*/
const sql = memoizedTagFunction(computeStatic, interpolateSqlIntoFragment);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying the cache is a WeakMap, so it should adapt to memory pressure.

The maximum memory usage should scale with the number of uses of the tag that appear in the source code.

if (require('process').env.npm_lifecycle_event === 'test') {
// Expose for testing.
// Harmless if this leaks
sql.makeLexer = makeLexer;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(msamuel): separate out into TemplateLexer.js and require it directly.

@mikesamuel
Copy link
Author

Ok, so I poked around a bit there. Let me know when you have some docs, because I can't (at least in the small time I have right now) figure out how to specify the timezone setting when using the template strings, for example (or stringifyObject).

I neglected to provide a way to thread those through.

TODO(mikesamuel): allow sql(optionsObject) to specify a tag handler that closes over options including timezone and stringifyObject.
Fix the test that fails in non GMT contexts.
Run tests in two or more TZ=... contexts before submitting PRs.

@dougwilson
Copy link
Member

Thanks for your comments! I don't see any replies for the bugs(?) I think I found and would love to hear back on those. Also, not sure how you made that second set of line comments but there is no reply button for them so I cannot reply to them.

@mikesamuel
Copy link
Author

I don't see any replies for the bugs(?) I think I found and would love to hear back on those.

Looking at this again. Will address those and others shortly.

Also, not sure how you made that second set of line comments but there is no reply button for them so I cannot reply to them.

Hmm. That's odd. I'll try to leave things as file comments.

'|' +
(
// Run of non-comment non-string starts
'(?:[^\'"`\\-/#]|-(?!-' + WS + ')|/(?![*]))'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougwilson, The addition of WS fixes the failure to lex X--1 example you found.

@@ -0,0 +1,107 @@
// A simple lexer for SQL.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougwilson This separate file is now imported by the unittest instead getting rid of the conditional export.

if (delimiter) {
result += escapeDelimitedValue(value, delimiter, timeZone);
} else {
result += SqlString.escape(value, stringifyObjects, timeZone);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The responsibility for escaping is almost entirely delegated to ../SqlString.js now.

return SqlString.escapeId(String(value)).replace(/^`|`$/g, '');
}
if (Buffer.isBuffer(value)) {
value = value.toString('binary');
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found no good way to use an X"..." style syntax the way format/escape do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this insert the data into the database without silent data loss for all possible values? If it's not possible to do the right escaping, we probably don't want to end up formatting silently for data loss, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. https://dev.mysql.com/doc/refman/5.7/en/string-literals.html seems relevant

[_charset_name]'string' [COLLATE collation_name]

so it seems to depend on whether collations can be lossy even if charset decoding doesn't substitute U+FFFD or the like.

https://dev.mysql.com/doc/refman/5.7/en/adding-collation-simple-8bit.html seems to suggest they can be. The custom <collation name="latin1_test_ci"> at that link seems to downgrade to LATIN. Default collations can be associated with columns, so there may be no cues in the text.

package.json Outdated
@@ -18,6 +18,9 @@
"sql escape"
],
"repository": "mysqljs/sqlstring",
"dependencies": {
"template-tag-common": "2.0.1"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped the major version because I reworked it to allow efficiently threading options objects through to address my failure to thread timeZone and stringifyObjects properly.

assert.equal(tokens('SELECT ```', '`'), '`,_');
assert.equal(tokens('SELECT `\\`', '`'), '`,_');
},
'comment': function () {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cruft. Will remove.

'date': function () {
runTagTest(
`SELECT '2000-01-01 00:00:00.000'`,
() => sql({ timeZone: 'GMT' })`SELECT ${new Date(Date.UTC(2000, 0, 1, 0, 0, 0))}`);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how timezones can thread through.

runTagTest(
'SELECT "\x1f8p\xbe\\\'OlI\xb3\xe3\\Z\x0cg(\x95\x7f"',
() =>
sql`SELECT "${Buffer.from("1f3870be274f6c49b3e31a0c6728957f","hex")}"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case incremental diffs aren't working, I added this and the following two testcases to address other bugs that you found.


// If we're on a Node runtime that should support ES6, run the ES6 tests.
var nodeVersion = process.env.npm_config_node_version;
if (/^[0-5][.]/.test(nodeVersion || '')) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(mikesamuel): Is there a common place I can put this old-ES-engine-test-skipping machinery?

'double quest marks passes pre-escaped id': function () {
var sql = SqlString.format(
'SELECT * FROM ?? WHERE id = ?',
[SqlString.identifier('table'), 42]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite put my finger on why, but it seems like if SqlString.identifier is an API, this is important for some kind of symmetry.

@mikesamuel
Copy link
Author

The test failures seem to be related to npm run-script of eslint and test-ci.

I'll tackle that in the next commit but probably won't push anything until tomorrow.

@mikesamuel
Copy link
Author

The remaining Travis CI failures seem to be in test-{Lexer,Template} because the following test is not working on Node runtimes with version 0.x.x:

var nodeVersion = process.env.npm_config_node_version;
if (/^[0-5][.]/.test(nodeVersion || '')) {

I don't know enough about historical oddities of Node runtimes, so I'll have to figure out how to do a robust Node version test.

I'll see if https://www.npmjs.com/package/check-node-version works on older node tomorrow.

@mikesamuel
Copy link
Author

I discovered npx which lets me test with various versions.

$ npm install --no-save npx
$ ./node_modules/.bin/npx [email protected] test/run.js
1..3
ok 1 test/unit/test-Lexer.js
  Skipping ES6 tests for node_version v0.12.18

ok 2 test/unit/test-SqlString.js
  0 fail | 72 pass | 11 ms

ok 3 test/unit/test-Template.js
  Skipping ES6 tests for node_version v0.12.18
  0 fail | 3 pass | 1 ms

@mikesamuel
Copy link
Author

Tests run green now.

I've looked over the coverage report. The main sticky point there is

// lib/Template.js
try {
  module.exports = require('./es6/Template');
} catch (ignored) {
  // ES6 code failed to load.
  ...
}

I added tests for the missing branch but those won't be reflected in the coverage report since it does not, IIUC, union coverage from runs on multiple node versions.

@dougwilson
Copy link
Member

Sorry I've been busy these past 2 days. The coverage is definitely a union as reported to the PR status -- I maintain many modules that have separate code paths based on versions. Looking at the missing 2.2% it's lines not covered in any Node.js version run.

@mikesamuel
Copy link
Author

The coverage is definitely a union as reported to the PR status

Ah. I see https://coveralls.io/builds/15290075/source?filename=lib%2FTemplate.js
Nice!

@mikesamuel
Copy link
Author

it's lines not covered in any Node.js version run.

It was unreachable since the lexer patterns will all fall back to matching the empty string. Fixed.

@dougwilson
Copy link
Member

It was unreachable since the lexer patterns will all fall back to matching the empty string. Fixed.

No, it was as I said at the time I made comments. I'm not sure why you are assuming I'm making comments on changes you made after I made comments. Or am I missing something here? You're saying all 2.2% of those uncovered lines where from the lexer pattern fallback stuff?

lib/SqlString.js Outdated
if (QUALIFIED_ID_REGEXP.test(sqlString)) {
return sqlString;
} else {
throw new TypeError();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this provide some kind of message / description that will assist the dev who encounters this thrown error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      throw new TypeError(
        'raw sql reached ?? or escapeId but is not an identifier: ' +
        sqlString);

lib/Template.js Outdated
@@ -0,0 +1,32 @@
/* eslint no-unused-vars: "off" */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 100% necessary? If it's possible to not override eslint, please try not to. Otherwise, add comments somewhere in the code explaining why it's necessary so we know for the future :) And ideally if it's possible to scope the override to only the specific line, that is better for the future because line scope probably doesn't mean comment (since it is likely to be more obvious) and it will continue to protect this mistake in the rest of the file 👍

Copy link
Author

@mikesamuel mikesamuel Jan 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely do line overrides, or just get rid of the formal parameters if you're ok with calledAsTemplateTagQuick.length differing depending on whether the fallback is what's used.

What do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. I feel like they aren't necessary, but maybe I'm not seeing something here. What would the users encounter if the property differed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I just bias towards minimizing programmatically visible differences.

Some code might treat zero-arity functions as a thunk, but I don't know why these functions would reach code that does.

@mikesamuel
Copy link
Author

No, it was as I said at the time I made comments. I'm not sure why you are assuming I'm making comments on changes you made after I made comments. Or am I missing something here? You're saying all 2.2% of those uncovered lines where from the lexer pattern fallback stuff?

You're not missing anything, and I'm not saying that.

@@ -0,0 +1,7 @@
// If we're on a Node runtime that should support ES6, run the ES6 tests.
if (/^v?[0-5][.]/.test(process.version)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than version detect, can the tests simply skip if loading the Lexer fails? Or at least use some kind of feature detection instead of version sniffing? I remember in the io.js days and post io.js Node.js core constantly said never try to version sniff in the code, only feature detect. There are also things like Chakra Node there days too. Version skipping in the main mysql package is all done using feature detection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one of the source files fails to load because of an error in the source when read by an ES>=6 interpreter, then npm test would not alert us. npm run-script lint does give us some confidence here, but would not alert us if there was a failure like

const A_REGEXP = new RegExp('[invalid input to RegExp constructor');

Putting a stake in the ground seemed the safest strategy for test code, and degrading gracefully seemed the best for production code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense not to just abort if the require fails. You can just add es6 feature detect here instead, then 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Good point. Were you thinking of something like

var es6Compatible = true;
try {
  eval('const [ x ] = [ 1234 ];');
} catch (ignored) {
  es6Compatible = false;
}

But I I suppose that would interact badly if tests were run under FLAG_disallow_code_generation_from_strings.

TODO(mikesamuel): Add a simple es6 canary file that can be required inside a try block.

@dougwilson
Copy link
Member

You're not missing anything, and I'm not saying that.

Very sorry, I guess I just misunderstood what you're trying to say 😂 So what are you trying to say?

@mikesamuel
Copy link
Author

Very sorry, I guess I just misunderstood what you're trying to say 😂 So what are you trying to say?

I saw your comment just after I'd convinced myself that the last missed branch was unreachable, and assumed I hadn't seen it earlier since it hadn't been there.

I was focused on particular lines in files, and did not intend to make a claim about percentages though that is a reasonable interpretation of what I wrote.

@dougwilson
Copy link
Member

P.S. thanks so much for all this work ! I think I'm finally understanding why you're adding the lexer and it seems like the old .format would also benefit from the added context information, but the lexer is written in es6. There doesn't seem to be anything in the lexer itself that requires es6 to function, though. One day it would be sweet to post that back to es5 and add to .format so the es6 template can be sugar and everyone can benefit from the additional context / protection that the lexer is enabling instead of letting those users continue to footgun 😂

I need to get going and I'll be back in a few days to continue to review and learn from the code. There is a lot of code, especially in the lexer that is new to me, at least :) Also something to think about would be if you'd be open to committing to helping maintain this stuff for the next 1 year or not. No big deal if not, but just asking because I don't really need to fully understand it prior to merging, for example, if I know you'll be around to help if issues come up 👍

@mikesamuel
Copy link
Author

Also something to think about would be if you'd be open to committing to helping maintain this stuff for the next 1 year or not.

I'm planning on doing a variety of open-source hardening things, so I can be available to walk early adopters through and fix bugs that shake out. I'm unlikely to have reliable availability outside of GMT+5 working hours though.

One day it would be sweet to post that back to es5

It should be fairly straightforward.

Trying to recast format in terms of machinery currently provided by Template though should happen after we have early adopters' experiences to inform us.

I need to get going and ...

Ttyt.

@dougwilson
Copy link
Member

Can you make changes to the new dependency template-tag-common so it does not have a fragile installation? It has dependencies with ">=" which means when new major versions of those are published they'll get installed, likely breaking installs of this module that used to work.

@mikesamuel
Copy link
Author

* Template tag function that contextually autoescapes values
* producing a SqlFragment.
*/
const sql = memoizedTagFunction(computeStatic, interpolateSqlIntoFragment);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache caps at 100 entries.

I'm going to assume that in the 3σ case, all uses fits on a 120 column line and have 12 ${...}s.
In this 3σ case, a full cache consumes 11.7kB.

I'm going to assume that the average case has 3 or fewer ${...}.
In the typical case, a full cache consumes 4.7kB.

If storing a single ASCII character in an array costs less than an 8B pointer, then it will be correspondingly cheaper.

I could probably save some space by joining the contexts into a string instead of storing characters in an array, but I don't have the tooling to do memory micro-benchmarks so can't say for sure.

The LRU cache size is not adjustable. I could make that adjustable if you have an idea for an API.

* adding xmlns="http://www.w3.org/2000/svg" to the <svg> tag
* adding <defs><style>...</style></defs> as the first child
of the <svg> element with the contents of
https://raw.githubusercontent.com/tabatkins/railroad-diagrams/gh-pages/railroad-diagrams.css
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code review tools doesn't show image sources by default so see this if you want the diagram's derivation.

@mikesamuel
Copy link
Author

I answered your questions on the cache, and did all the remaining TODOs.

If the cache is a concern, I could try benchmarking with and without. If that sounds useful, do you have any preferred way of doing micro-benchmarks in JS?

https://nodesecroadmap.fyi/chapter-7/query-langs.html describes
this approach as part of a larger discussion about library support
for safe coding practices.

This is one step in a larger effort to enable

connection.query`SELECT * FROM T WHERE x = ${x}, y = ${y}, z = ${z}`(callback)

and similar idioms.

This was broken out of mysqljs/mysql#1926
@mikesamuel
Copy link
Author

I rebased around releases 2.3.{0,1} which required a git push -f. That required one manual merge around the new 'triple question marks are ignored' and a test I added in test-SqlString.js

@mikesamuel
Copy link
Author

Ping?

1 similar comment
@mikesamuel
Copy link
Author

Ping?

@mikesamuel
Copy link
Author

@dougwilson If this is effectively dead, please let me know and I'll close out the PR.

@dougwilson
Copy link
Member

I am sorry I have been bad about providing updates. I have been working on rewriting the syntax to be es5 instead of es6 so it can be exposed to everyone and not restricted to only use in template strings (though that would still be supported). There are also some edge cases I fixed as well. I'm going to push them up (as separate commits -- not altering your existing commits) to the branch in this PR so we can review them to make sure they are correct. I know you didn't want to translate it to es5 from our earlier conversations, so I figured I'd help with that effort since I'm the one of us two who is interested in that support 👍

@mikesamuel
Copy link
Author

Thanks for explaining. I wasn't averse to translating to es5. I was just worried that doing that in a single PR might be too much, but it sounds like you've got it sorted.

@mikesamuel
Copy link
Author

Fyi, I needed something like this so I wrote safesql to provide template tags for MySQL and Postgres based in part on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants