feat: contract log query #1601

zone117x · 2023-03-31T18:30:26Z

This PR adds two new query params to the /extended/v1/contract/<principal>/events endpoint.

?filter_path=<jsonpath expression>:
Optional jsonpath expression to select only results that contain items matching the expression.
?contains=<json object>:
Optional stringified JSON to select only results that contain the given JSON.

Rationale

Clarity smart contracts are able to generate structured log outputs during the execution of a contract-call or contract-deploy transaction. For more context see the Clarity print docs. These contract logs can then be ingested by an event-observer (such as the API) to store and serve contract-specific data. Today we have several examples of this:

The bns contract emits logs used by the API to process, track, and serve clients BNS information like ownership, zonefiles, etc.
The pox contract emit logs used by the API to store and track stacking related data, which is used in account balance endpoints (i.e. locked balance), Rosetta queries, endpoints for Stacking pool operators, etc.
The send-many-stx contract, used by essentially all exchanges, performs batch stx-transfers and uses contract logs to emit a corresponding memo, which the API stores and returns in an endpoint also used by exchanges to track when their wallets have received a batch stx-transfer and the associated memo.
SIP-019 defines contract logs that FT/NFT tokens can implement in order to issue metadata update notifications.

These are some of the contracts and logs that this API and others currently have special handling for. Typically, for each of these contracts, the log events have unique code implemented to denormalize the log payload, store in a new table, and new endpoints + sql queries.

There are many more deployed contracts which have their own application-specific usage of contract logs, with their own unique payloads. The API doesn't and cannot reasonably implement custom handling for all of the current and future contracts in the same way it has for the contracts listed above. Currently, clients must use the /extended/v1/contract/{contract_id}/events endpoint. This simply returns the latest N logs for a given contract. Clients must paginate through these requests, and manually parse out and filter the logs they need. Even for the contracts that already receive special treatment (e.g. pox, bns), there are requests for new endpoints with different query criteria which would require us to implement new endpoints and sql queries.

In the Ethereum world, the ability to emit and query structured logs is a critical element of their app ecosystem. Token standards that you may be familiar with (e.g. ERC20) specify the logs that contracts must emit when token operations happen. The eth_getLogs RPC endpoint provides a mechanism for efficiently querying those logs. For example "give me all transfers for a given FT associated with a given recipient address".

In Stacks, we have have the building blocks for offering similar capabilities.

Contact log query implementation

Prior to this PR, contract logs (i.e. structured Clarity print outputs) were stored in a table in the contract_logs in their original consensus-serialized binary encoding. This encoding made it essentially impossible to perform queries against the contents of the log object. Postgres supports a storing arbitrary structured documents in a jsonb column type. This column type also supports a few types of advanced indexes (e.g. GIN(jsonb_ops) and GIN(jsonb_path_ops)) that make it possible perform efficient querys against the arbitrary JSON.

So what we can do is 1) decode the binary Clarity buffers into an object, 2) encode that into a JSON object, 3) store that in a new jsonb column in the contracts_log table, then 4) implement API endpoint(s) that allow more precise queries against these logs.

This PR implements the above. One of the design decisions was to implement yet-another way to encode Clarity values into JSON. The existing schemas we have are lossless (you can reverse the JSON back into the original Clarity value), however, that comes with overhead from having to store Clarity type information (because there is not a 1-1 mapping between JSON primitives and Clarity primitives), so that JSON is less readability and takes up more space. In our use-case, we aren't necessarily concerned with preserving exact Clarity typing, rather we want the JSON to be easy-to-query, readable, space-efficient, index-efficient, and have good interoperability with JSON-based workflows (e.g. jsonpath expressions). Here is the spec for how the Clarity values are encoded into a JSON object in this PR:

OptionalSome and ResponseOk are unwrapped (i.e. the value is encoded directly without any nesting).
OptionalNone is encoded as json null.
ResponseError is encoded as an object with a single key _error which is set to the unwrapped error value.
Buffers are encoded as an object containing the key hex with the hex-encoded string as the value, and the key utf8 with the utf8-encoded string as the value. When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors. Both of these are included because Clarity Buffers are used for storing both binary data and string data (e.g. BNS names).
Ints and UInts that are in the range of a safe js integers are encoded as numbers, otherwise they are encoded as string-quoted base-10 integers.
Booleans are encoded as booleans.
Principals are encoded as strings, e.g. <address> or <address>.<contract_name>.
StringAscii and StringUtf8 are both encoded as regular json strings.
Lists are encoded as json arrays.
Tuples are encoded as json objects.

Usage

Here's an example of how the subnets-nft-demo app can be refactored. The app fetches a list of all events, then on the client decodes of the Clarity values and filters for specific events. This can now be done in a single query leveraging the new filter_path query param. This accepts a jsonpath expression.

Previous code:
https://github.com/hirosystems/subnets-nft-demo/blob/34e433a2d2893f36e1767ce635ee280baf1acbf6/src/stacks/apiCalls.ts#L96-L111

Example of doing the same thing with a single fetch using a jsonpath expression filter:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('filter_path', `&?(
  @.event == "withdraw" && 
  @.type == "stx" && 
  @.sender == "${address}"
)`);
const req = await fetch(url);

Example of using the new contains param to perform a similar query:

const url = new URL(`${L2_URL}/extended/v1/contract/${L2_SUBNET_CONTRACT_ID}/events`);
url.searchParams.set('contains', JSON.stringify({
  event: 'withdraw',
  type: 'stx',
  sender: address,
}));
const req = await fetch(url);

Here's another example of using jsonpath expression filter to fetch recent BNS name-revoke and name-renewal events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
url.searchParams.set('filter_path', `$.attachment.metadata?(
  @.op == "name-revoke" ||
  @.op == "name-transfer" ||
  @.op == "name-renewal"
)`);
const req = await fetch(url);

Here's an example of using the new contains param. This is also a filter, but it takes a JSON object and returns only the contract log events that contain the given object. Here's an example of fetching recent BNS name-revoke events:

const bnsContract = 'SP000000000000000000002Q6VF78.bns';
let url = new URL(`${API_HOST}/extended/v1/contract/${bnsContract}/events`);
let contains = { attachment: { metadata: { op: 'name-renewal' } } };
url.searchParams.set('contains', JSON.stringify(contains));
const req = await fetch(url);

Considerations

The jsonpath expressions supported in Postgres can include relatively complex operations. For example, recursive key-value lookups, regex, math operations, type coercions, and more. Many of these cannot take advantage of the postgres JSON-specific indexes, which can result in unreasonably expensive queries. The vast majority of expected use-cases only need simple jsonpath expressions which my testing so far have shown to be reasonably efficient. So we need to determine when a given jsonpath expression is "too complex" and reject the request. This PR has a very simple regex-based version of that implemented, however, these regexes are easy to trick and bypass. We need to validate against the actual AST of a jsonpath expression. Existing jsonpath parsing libraries in js do not support the postgres flavor of expressions, so I'm in the process of switching the regex-based validation to jsonpath-pg. This is a library I've wrote which compiles the grammar and lexer C code from the postgres codebase into WASM and exposes the AST object. This can be used to reliable determine the complexity of an expression.

Alternatives

The capabilities described above could potentially be implemented with other tools. For example, no-sql solutions like GraphQL or MongoDB. However, I'd like to explore the possibility of getting this approach to work before considering adding new systems/layers into the project's tech stack.

Upgrading + database migration

This PR changes the sql schema and adds a new non-nullable column to an existing table. Typically, we'd modify the existing migration tables (which breaks existing dbs), and release this as a major version that requires an event-replay. This is easy for API contributors, however, these kinds of breaking changes are difficult for deployment, both internal and external, because event-replays increasingly take more time and resources.

So this PR is experimenting with an more advanced sql schema migration which is compatible with existing deployments and does not require a major version bump or event-replay. Migration libraries (including the one we use) tend not to support advanced table manipulation like what this PR requires, so a new function is called directly after the regular migrations in order to perform the "advanced" migrations. In this case, it takes around half an hour to run this migration because it involves querying and updating every row in the contract_logs table (which internally postgres treats as a DELETE and INSERT, without the ability to batch).

In a future release when a breaking change is strictly required, the "advanced migration" function for this case can be dropped.

TODO

Endpoint integration tests
Document the migration approach that avoids needing an event-replay
Document the new Clarity value JSON-compact encoding format implemented that was implemented power these filters

…ct/<principal>/events` endpoint

github-actions · 2023-03-31T18:34:19Z

🚀 Deployed on https://deploy-preview-1601--stacks-blockchain-api.netlify.app

codecov · 2023-03-31T18:35:38Z

Codecov Report

Merging #1601 (874ad8b) into develop (5d15bc8) will decrease coverage by 55.96%.
The diff coverage is 46.25%.

❗ Current head 874ad8b differs from pull request most recent head 6d94c0e. Consider uploading reports for the commit 6d94c0e to get more accurate results

@@             Coverage Diff              @@
##           develop    #1601       +/-   ##
============================================
- Coverage    77.41%   21.45%   -55.96%     
============================================
  Files           78       78               
  Lines        11161    11302      +141     
  Branches      2487     2521       +34     
============================================
- Hits          8640     2425     -6215     
- Misses        2403     8198     +5795     
- Partials       118      679      +561

Impacted Files	Coverage Δ
src/datastore/common.ts	`100.00% <ø> (ø)`
src/datastore/pg-store.ts	`0.90% <0.00%> (-91.70%)`	⬇️
src/api/routes/contract.ts	`20.75% <6.25%> (-68.99%)`	⬇️
src/datastore/migrations.ts	`40.40% <26.00%> (-47.84%)`	⬇️
src/api/query-helpers.ts	`20.74% <60.00%> (-58.31%)`	⬇️
src/helpers.ts	`37.30% <80.55%> (-33.35%)`	⬇️
src/datastore/pg-write-store.ts	`1.26% <100.00%> (-85.98%)`	⬇️

... and 52 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

rafaelcr

This looks awesome. Just left a couple comments and some discussion items.

rafaelcr · 2023-04-10T16:31:19Z

migrations/1680181889941_contract_log_json.js

+
+/** @param { import("node-pg-migrate").MigrationBuilder } pgm */
+exports.down = pgm => {
+  pgm.dropIndex('contract_logs', 'value_json_path_ops_idx');


This index is never created, is this an old line?

rafaelcr · 2023-04-10T19:37:13Z

src/datastore/common.ts

@@ -1379,6 +1379,7 @@ export interface SmartContractEventInsertValues {
  contract_identifier: string;
  topic: string;
  value: PgBytea;
+  value_json: string;


It would be best if we set this as string | null IMO to avoid storing empty strings

rafaelcr · 2023-04-10T19:40:01Z

src/tests/cv-compact-json-encoding-tests.ts

+  /*
+  test.skip('generate', () => {
+    const result = clarityCompactJsonVectors.map(([, hexEncodedCV, compactJson]) => {
+      const encodedJson = clarityValueToCompactJson(hexEncodedCV);
+      const reprResult = decodeClarityValueToRepr(hexEncodedCV);
+      return [reprResult, hexEncodedCV, JSON.stringify(encodedJson)];
+    });
+    console.log(result);
+  });
+  */


rafaelcr · 2023-04-10T19:40:08Z

src/tests/jsonpath-tests.ts

+  /*
+  test('generate vector data', () => {
+    const result = postgresExampleVectors.map((input) => {
+      const complexity = containsDisallowedJsonPathOperation(input);
+      return [input, complexity ? complexity.operation : null];
+    });
+    console.log(result);
+  });
+  */


rafaelcr · 2023-04-10T19:45:04Z

src/datastore/migrations.ts

+  }
+}
+
+async function complete_1680181889941_contract_log_json(


Is there a pressing issue that would require us to ship this feature with this kind of migration? This code looks good to me (and I like this experimentation as you specified in the PR desc), but I worry that it could create some problems if we want to add newer migrations that rely on the new columns or something. IMO it would be easier and more maintainable to just ship this as a regular new migration and rely on replay, or to explicitly mark this code as a "patch" somehow which we could drop once we switch to a new v8 release.

What do you think?

…ty and ban some operations

zone117x · 2024-01-17T12:07:09Z

I'd like to get more feedback on this feature to prioritize work on it. My sense is that if this were available, then A) contract authors would take more advantage of structured logs (i.e. print statements) and B) apps would leverage this API feature to make more straightforward and efficient queries.

zone117x added 6 commits March 30, 2023 15:10

feat: initial pass at jsonpath expression constraints

0330a51

feat: add clarity value jsonb column to contract_logs table

81e8490

chore: simplify pg cursor

838fa4e

feat: encode contract_log clarity values to json and insert in pg

ab31da6

feat: support filter_path and contains json operators in `/contra…

565a26f

…ct/<principal>/events` endpoint

docs: update contract events endpoint docs

65cba05

github-actions bot temporarily deployed to pull request March 31, 2023 18:34 Inactive

chore: cleanup

3829a1e

github-actions bot temporarily deployed to pull request March 31, 2023 18:38 Inactive

github-actions bot temporarily deployed to commit March 31, 2023 18:38 Inactive

chore: cleanup 2

1b6ae00

github-actions bot temporarily deployed to commit March 31, 2023 18:45 Inactive

github-actions bot temporarily deployed to pull request March 31, 2023 18:45 Inactive

feat: encode clarity ints as json numbers where possible

eba8472

github-actions bot temporarily deployed to pull request April 3, 2023 13:07 Inactive

github-actions bot temporarily deployed to commit April 3, 2023 13:08 Inactive

fix: jsonpath disallowed operations

874ad8b

github-actions bot temporarily deployed to pull request April 3, 2023 14:23 Inactive

github-actions bot temporarily deployed to commit April 3, 2023 14:23 Inactive

zone117x requested a review from rafaelcr April 7, 2023 17:49

rafaelcr reviewed Apr 10, 2023

View reviewed changes

zone117x self-assigned this Apr 13, 2023

feat: use jsonpath-pg lib to accurately determine expression complexi…

6d94c0e

…ty and ban some operations

github-actions bot temporarily deployed to commit April 14, 2023 13:03 Inactive

zone117x added the P3 Priority 3 There are workarounds, or the functionality is secondary label Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: contract log query #1601

feat: contract log query #1601

Uh oh!

zone117x commented Mar 31, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2023 •

edited

Loading

Uh oh!

codecov bot commented Mar 31, 2023 •

edited

Loading

Uh oh!

rafaelcr left a comment

Uh oh!

rafaelcr Apr 10, 2023

Uh oh!

rafaelcr Apr 10, 2023

Uh oh!

rafaelcr Apr 10, 2023

Uh oh!

rafaelcr Apr 10, 2023

Uh oh!

rafaelcr Apr 10, 2023

Uh oh!

zone117x commented Jan 17, 2024

Uh oh!

Uh oh!

feat: contract log query #1601

Are you sure you want to change the base?

feat: contract log query #1601

Uh oh!

Conversation

zone117x commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Contact log query implementation

Usage

Considerations

Alternatives

Upgrading + database migration

TODO

Uh oh!

github-actions bot commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rafaelcr left a comment

Choose a reason for hiding this comment

Uh oh!

rafaelcr Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelcr Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelcr Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelcr Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelcr Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

zone117x commented Jan 17, 2024

Uh oh!

Uh oh!

zone117x commented Mar 31, 2023 •

edited

Loading

github-actions bot commented Mar 31, 2023 •

edited

Loading

codecov bot commented Mar 31, 2023 •

edited

Loading