Skip to content

Commit aefa048

Browse files
Arbitrary WebAPI JS instrumentation (#642)
* Add mdn-browser-compat-data * js_instrument_modules as list * Add mdn-browser-compat * Pass a list of instrumentingFunctions * Script to generate api data * Working give or take Getting errors like OpenWPM: Error name: TypeError post_request_ajax.html:237:17 OpenWPM: Error message: can't redefine non-configurable property "UNSENT" post_request_ajax.html:238:17 * Small naming cleanup * Handle non-configurable properties * Lint * Add aspirational API * Begin migration to new JSInstrumentationRequest interface. * We build and mandate LogSettings. * We have a new JSInstrumentatinRequest that everything runs through * Preset, fingerprinting, will be specified in JSON * Continue making progress Enum for Operation * Begin implementing jsModuleRequest validation. Changing my mind - all validation and construction to be done python side. This will reduce JS overhead at runtime. * Big cleanout- js-instrumentation work moving to python. * Continue update to python js-instrumentation * Lint Can't do all the things I want to with typing due to scope when content is loaded into page. * noqa on wip jsinstrumentation file * Begin updating existing js instrument tests. * Small cleanups * Fix naming in calling instrumentJS * No display mode native for testing * Restore py test file to orig. * Support null propertiesToInstrument * Re-work instrumentObject tests * Clean-up text in test page. * Add default to getLogSettings function * Don't re-assign logSettings.propertiesToInstrument * Revert "Don't re-assign logSettings.propertiesToInstrument" This reverts commit 87ccdab. * Better assign propertiesToInstrument * Small cleanup * Make new logSettings object * Prettify * Small clean * -- BREAK -- JS Rework complete With this commit, the JS side of this PR is complete. Tests are still failing as fingerprinting implementation has not been completed on the python side, but all test_js_* tests are passing due to the core JS API rework being in place. * Write-out mdn compat data to js_instrumentation .py * Dry out js test code * Consolidate JS tests * Finish missing renames, and add test js via browser_params * pep8 * New files and failing tests. * Add a json schema for js_instrument_modules * Latest py tests * Flake8 * Ongoing progress. * More code, more tests. * flake * Rename mdn file * Add latest tests - just implement fingerprinting.json * flake8 * Add fingerprinting.json (incomplete) Mimetypes and plugins * Correct logSettings property name * Restore create_xpi as function Needed by manual_test * Make explicit option for logging to console * Process browser_params in task manager * Start being able to pass browser_params to selenium Also update manual_test to use click * Revert "Make explicit option for logging to console" This reverts commit c840fbc. * Get manual_test working with browser_params From toplevel directory run: `python -m test.manual_test --selenium --browser-params --browser-params-file=debug_params.json` * More robust test for simple fingerprinting output Can't guarantee order of string output * Add timing information when testing * Make recheck really fast. You'll never hit this recheck as it all happens before page load. * Handle all inputs properly * Debug with all window params instrumented * Load xpi we just built * Check for ff version support * Save a bunch of properties * Relax constraints on what we can instrument. Let failing happen during instrumentation by using subscript notation. Don't restrict to MDN list. * Correct stringifying * Better name example params, fix some bugs, sample a_f Some example browser_params - a_f is just working - but crushes on a page like google.com. g_l and m_z haven't been vetted yet. * flake8 * Move example browser_params file out of harms way * Add failing test for regression I introduced. * Fix for regression. * Add simple mimeTypes and plugins to fingerprinting. * Lint JS * Rm mdn_browser_comat stuff no longer needed * Remove example_browser_params They're not used in tests, were just for my testing. * Load JS_INSTRUMENT_MODULES from JSON string * Rename JS_INSTRUMENT_MODULES to JS_INSTRUMENT_SETTINGS * Fixes #28 - Instrument all window.navigator properties. * Finish removing unused mdn-compat pieces. * EventID as a shadow variable * Flake8 * Remove $ prefix and rename $instrumentionRequests -> jsInstrumentationSettings * Rename jsInstrumentationRequests->jsInstrumentationSettings * TS Lint * Remove use of "request". Rename python side as per discussion with @englehardt. Privatize most methods Numpy docstrings for public methods * Convert assertions to ValueErrors * Rename file/folder and fingerprinting -> collection_fingerprinting file JSInstrumentation.py -> js_instrumentation/__init__.py collections have their own folder * Clean-up naming in schema * Add processing of json schema to documentation * Rename js_instrumentation again and ref schema location * Pass JSON not a js string * Do copying to xpi in npm postbuild step * Fix import in manual_test * Revert "Pass JSON not a js string" This reverts commit 8eb4edb. * Add titles to schema pieces * Add docs for js_instrument_settings * Bit more README cleanup * Update README.md Co-authored-by: Steven Englehardt <[email protected]> * Move updating schema docs section * Add title * Fix typo in mac-osx hyperlink * Make the single-key dictionary clearer * Remove versions from npm package files * Clean up instrument_existing_window_property.html and js We're not using the js in two htmls now, so unify like other test files * Fix pyside instrumentation test, add more clarificaiton to README * pyside test must instrument browser apis * add more to readme to clarify instrumenting * Use example.com and example.org as localDomains * context-manage open, and flake8 Co-authored-by: Steven Englehardt <[email protected]>
1 parent 627f440 commit aefa048

File tree

62 files changed

+6862
-4077
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+6862
-4077
lines changed

README.md

Lines changed: 66 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ Table of Contents <!-- omit in toc -->
3434
* [Debugging the platform](#debugging-the-platform)
3535
* [Managing requirements](#managing-requirements)
3636
* [Running tests](#running-tests)
37-
* [Mac OSX](#mac-osx-limited-support-for-developers)
37+
* [Mac OSX](#mac-osx)
38+
* [Updating schema docs](#updating-schema-docs)
3839
* [Troubleshooting](#troubleshooting)
3940
* [Docker Deployment for OpenWPM](#docker-deployment-for-openwpm)
4041
* [Building the Docker Container](#building-the-docker-container)
@@ -80,9 +81,9 @@ After running the install script, activate your conda environment by running:
8081
### Developer instructions
8182

8283
Dev dependencies are installed by using the main `environment.yaml` (which
83-
is used by `./install.sh` script.
84+
is used by `./install.sh` script).
8485

85-
You can install pre-commit hooks install the hooks by running `pre-commit install` to
86+
You can install pre-commit hooks install the hooks by running `pre-commit install` to
8687
lint all the changes before you make a commit.
8788

8889
### Troubleshooting
@@ -173,8 +174,22 @@ available [below](#output-format).
173174
with the exception of images.
174175
See: [Bug 634073](https://bugzilla.mozilla.org/show_bug.cgi?id=634073).
175176
* Javascript Calls
176-
* Records all method calls (with arguments) and property accesses for APIs
177-
of potential fingerprinting interest:
177+
* Records all method calls (with arguments) and property accesses for configured APIs
178+
* Set `browser_params['js_instrument'] = True`
179+
* Configure `browser_params['js_instrument_settings']` to desired settings.
180+
* Data is saved to the `javascript` table.
181+
* The full specification for `js_instrument_settings` is defined by a JSON schema.
182+
Details of that schema are available in [docs/schemas/README.md](docs/schemas/README.md).
183+
In summary, a list is passed with JS objects to be instrumented and details about how
184+
that object should be instrumented. The js_instrument_settings you pass to browser_params
185+
will be validated python side against the JSON schema before the crawl starts running.
186+
* A number of shortcuts are available to make writing `js_instrument_settings` less
187+
cumbersome than spelling out the full schema. These shortcuts are converted to a full
188+
specification by the `clean_js_instrumentation_settings` method in
189+
[automation/js_instrumentation.py](automation/js_instrumentation.py).
190+
* The first shortcut is the fingerprinting collection, specified by
191+
`collection_fingerprinting`. This was the default prior to v0.11.0. It contains a collection
192+
of APIs of potential fingerprinting interest:
178193
* HTML5 Canvas
179194
* HTML5 WebRTC
180195
* HTML5 Audio
@@ -184,8 +199,43 @@ available [below](#output-format).
184199
and `window.name` access.
185200
* Navigator properties (e.g. `appCodeName`, `oscpu`, `userAgent`, ...)
186201
* Window properties (via `window.screen`)
187-
* Set `browser_params['js_instrument'] = True`
188-
* Data is saved to the `javascript` table.
202+
* `collection_fingerprinting` is the default if `js_instrument` is `True`.
203+
* The fingerprinting collection is specified by the json file
204+
[fingerprinting.json](automation/js_instrumentation_collections/fingeprinting.json).
205+
This file is also a nice reference example for specifying your own APIs using the other
206+
shortcuts.
207+
* Shortcuts:
208+
* Specifying just a string will instrument
209+
the whole API with the [default log settings](docs/schemas/js_instrument_settings-settings-objects-properties-log-settings.md)
210+
* For just strings you can specify a [Web API](https://developer.mozilla.org/en-US/docs/Web/API)
211+
such as `XMLHttpRequest`. Or you can specify instances on window e.g. `window.document`.
212+
* Alternatively, you can specify a single-key dictionary that maps an API name to the properties / settings you'd
213+
like to use. The key of this dictionary can be an instance on `window` or a Web API.
214+
The value of this dictionary can be:
215+
* A list - this is a shortcut for `propertiesToInstrument` (see [log settings](docs/schemas/js_instrument_settings-settings-objects-properties-log-settings.md))
216+
* A dictionary - with non default log settings. Items missing from this dictionary
217+
will be filled in with the default log settings.
218+
* Here are some examples:
219+
```
220+
// Collections
221+
"collection_fingerprinting",
222+
// APIs, with or without settings details
223+
"Storage",
224+
"XMLHttpRequest",
225+
{"XMLHttpRequest": {"excludedProperties": ["send"]}},
226+
// APIs with shortcut to includedProperties
227+
{"Prop1": ["hi"], "Prop2": ["hi2"]},
228+
{"XMLHttpRequest": ["send"]},
229+
// Specific instances on window
230+
{"window.document": ["cookie", "referrer"]},
231+
{"window": ["name", "localStorage", "sessionStorage"]}
232+
```
233+
* Note, the key / string will only have it's properties instrumented. That is, if you want to instrument
234+
`window.fetch` function you must specify `{"window": ["fetch",]}`. If you specify just `window.fetch` the
235+
instrumentation will try to instrument sub properties of `window.fetch` (which won't work as fetch is a
236+
function). As another example, to instrument window.document.cookie, you must use `{"window.document": ["cookie"]}`.
237+
In instances, such as `fetch`, where you do not need to specify `window.fetch`, but can use the alias `fetch`,
238+
in JavaScript code. The instrumentation `{"window": ["fetch",]}` will pick up calls to both `fetch()` and `window.fetch()`.
189239
* Response body content
190240
* Saves all files encountered during the crawl to a `LevelDB`
191241
database de-duplicated by the md5 hash of the content.
@@ -537,7 +587,7 @@ in the test directory to run all tests:
537587
$ cd test
538588
$ py.test -vv
539589

540-
See the [pytest docs](https://docs.pytest.org/en/latest/) for more information on selecting
590+
See the [pytest docs](https://docs.pytest.org/en/latest/) for more information on selecting
541591
specific tests and various pytest options.
542592

543593
### Mac OSX
@@ -552,6 +602,14 @@ Running Firefox with xvfb on OSX is untested and will require the user to instal
552602
an X11 server. We suggest [XQuartz](https://www.xquartz.org/). This setup has not
553603
been tested, we welcome feedback as to whether this is working.
554604

605+
### Updating schema docs
606+
607+
In the rare instance that you need to create schema docs
608+
(after updating or adding files to `schemas` folder), run `npm install`
609+
from OpenWPM top level. Then run `npm run render_schema_docs`. This will update the
610+
`docs/schemas` folder. You may want to clean out the `docs/schemas` folder before doing this
611+
incase files have been renamed.
612+
555613

556614
Troubleshooting
557615
---------------

automation/Extension/firefox/feature.js/index.js

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,25 @@ async function main() {
2020
navigation_instrument:true,
2121
cookie_instrument:true,
2222
js_instrument:true,
23-
js_instrument_modules:"fingerprinting",
23+
js_instrument_settings: `
24+
[
25+
{
26+
object: window.CanvasRenderingContext2D.prototype,
27+
instrumentedName: "CanvasRenderingContext2D",
28+
logSettings: {
29+
propertiesToInstrument: [],
30+
nonExistingPropertiesToInstrument: [],
31+
excludedProperties: [],
32+
excludedProperties: [],
33+
logCallStack: false,
34+
logFunctionsAsStrings: false,
35+
logFunctionGets: false,
36+
preventSets: false,
37+
recursive: false,
38+
depth: 5,
39+
}
40+
},
41+
]`,
2442
http_instrument:true,
2543
callstack_instrument:true,
2644
save_content:false,
@@ -51,7 +69,7 @@ async function main() {
5169
loggingDB.logDebug("Javascript instrumentation enabled");
5270
let jsInstrument = new JavascriptInstrument(loggingDB);
5371
jsInstrument.run(config['crawl_id']);
54-
await jsInstrument.registerContentScript(config['testing'], config['js_instrument_modules']);
72+
await jsInstrument.registerContentScript(config['testing'], config['js_instrument_settings']);
5573
}
5674

5775
if (config['http_instrument']) {

automation/Extension/firefox/package-lock.json

Lines changed: 17 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

automation/Extension/firefox/package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"name": "OpenWPM",
33
"description": "OpenWPM Client extension",
4-
"version": "1.0.0",
54
"author": "Mozilla",
65
"dependencies": {
76
"openwpm-webext-instrumentation": "../webext-instrumentation"
@@ -35,11 +34,12 @@
3534
"private": true,
3635
"repository": {
3736
"type": "git",
38-
"url": "https://github.com/mozilla/openwpm-firefox-webext"
37+
"url": "git+https://github.com/mozilla/OpenWPM.git"
3938
},
4039
"scripts": {
4140
"prebuild": "cd ../webext-instrumentation && npm run build && cd - && webpack",
4241
"postinstall": "cd ../webext-instrumentation && npm install",
42+
"postbuild": "cp dist/openwpm-1.0.zip openwpm.xpi",
4343
"build": "web-ext build",
4444
"eslint": "eslint . --ext jsm,js,json",
4545
"lint": "npm-run-all lint:*",

0 commit comments

Comments
 (0)