Add initial sections on security and privacy considerations #99

fsteeg · 2022-09-06T14:42:03Z

Based on https://www.w3.org/TR/security-privacy-questionnaire

See https://etherpad.lobid.org/p/reconciliation-september-2022

Based on https://www.w3.org/TR/security-privacy-questionnaire See https://etherpad.lobid.org/p/reconciliation-september-2022

wetneb · 2022-09-06T15:21:48Z

latest/index.html

+        <h3>Other considerations</h3>
+        <ul>
+          <li>Reconciliation services and clients MAY support user authentication (see <a href="#authentication">authentication</a>).</li>
+          <li>Reconciliation features don't expose information about the underlying platform to origins, they don't allow origins to send data to the underlying platform, access other devices, device sensors, a user agent's native UI,


By "origin" do you mean "client"? That would read more natural to me.

Yeah, it's confusing. In this case, I think it's both client and server for the first two instances and server for the third instance (see here & here for some background). I'll turn this into a draft for now and revisit it later. Maybe we could add an explanation of origin, or I'll reword to use client and server only. Thanks for the review!

I removed the confusing 'origin' terminology, added another security consideration that came to mind while rewording (which already shows the benefit of clearer language here!), and linked to some existing definitions for consistency. Would be great if you could take another look!

wetneb · 2022-10-12T07:57:52Z

latest/index.html

+          <li>Reconciliation services SHOULD guard against malicious code when querying their underlying data store with data from <a>reconciliation queries</a> to avoid code [[injection]].</li>
+          <li>Reconciliation services SHOULD NOT parse the JSON requests sent by clients using a code execution mechanism like JavaScript's <code>eval()</code>, since requests could contain malicious code.</li>
+          <li>Reconciliation clients that display data from <a href="#preview-service">preview services</a> SHOULD ensure that the data is encoded, sanitized, or isolated to prevent [[XSS]] and [[CSRF]] attacks.</li>
+          <li>Reconciliation clients SHOULD provide a configuration option to disallow calls to localhost, with the default set to disallow, as a malicious service could cause the client to make arbitrary POST requests to itself.</li>


I do not understand this attack fully and I feel like I should, because it might apply to OpenRefine. Is it worth elaborating in the specs or do you want to explain that separately?

Yeah, I also stopped there yesterday and had to think about it for a bit. So maybe it'd be good to elaborate in the specs, if we agree it makes sense and applies to us. I've got this from the ActivityPub and Webmention specs, which both mention it. Maybe we could adapt and combine the explanations from these specs like this to make it clearer:

"If a reconciliation client has any services that listen on localhost and don't require authentication, it's possible for a malicious reconciliation service to provide URLs in its manifest that could cause the client to make arbitrary POST requests to itself. Therefore, if your reconciliation client does permit making requests to localhost URLs for development purposes, it SHOULD be a configuration option which defaults to off."

Actually, which of the URLs provided in the manifest are used for POST requests? The suggest services use GET, so does the preview service, property proposal and so on… The only POST requests are made to the reconciliation endpoint itself (but that is not a URL that the manifest is able to define).

I am not saying this is by design (it is probably rather by luck), but as far as I can tell we are safe here. And it should only get better with #91 since manifests can no longer point to arbitrary URLs.

Yes, you are right, both the actual query and the data extension POSTs are made to the service endpoint, and the URLs in the manifest are only used for GET requests. Removed the localhost consideration in a062854.

wetneb · 2022-10-12T08:11:45Z

latest/index.html

+        <ul>
+          <li>Reconciliation clients MAY introduce new state that persists across browsing sessions for persisting <a>reconciliation queries</a>,
+            the service's <a>reconciliation results</a>, and/or the user's selection of matches.</li>
+          <li>Reconciliation clients running in a browser’s <em>Private Browsing</em> or <em>Incognito</em> mode SHOULD NOT store <a>reconciliation queries</a>, the service's <a>reconciliation results</a>, or the user's selection of matches.


Interesting, I never thought about that.
Is Private browsing/Incognito mode actually meant to be detected and acted upon by websites? My understanding was that it was just implemented client side, with the client discarding cookies, localStorage and other things like that.

This is different from the Do Not Track header, which is specifically there to be announced by the client and acted upon by the server, but this feature seems to be deprecated.

Given that detecting Private browsing/Incognito mode seems to be quite fiddly, I am not sure we should assume that clients do that (for instance, to prevent their own backend from saving information).

Could it be that you got this idea from W3C recommendations which are primarily designed for new DOM/Javascript APIs? For those, it definitely makes sense to consider which state the browser is supposed to keep. But because I do not think we expect people to implement a reconciliation client natively in a browser (meaning, in a web rendering engine), it feels less fitting here.

detecting Private browsing/Incognito mode seems to be quite fiddly [...] Could it be that you got this idea from W3C recommendations which are primarily designed for new DOM/Javascript APIs?

Ah, interesting. Yes I got this from the W3C questionnaire, and I would have expected private mode to be a normal thing to check via browser API, but that's actually considered leaking of private mode (though it still seems to happen). And I guess it makes sense to consider private mode a purely first-party thing, which is a matter of the browser only, and no business of any site visited.

Somewhere in between I guess are browser plugins, where checking for private mode seems to be possible. And I guess there it can still provide privacy too: if only the browser part of the plugin handles that information and never sends it to the service. So maybe we could clarify that section by adding something like "For reconciliation clients running as web applications, this is typically handled by the browser, and data will be stored on the web application's server side only."

It's a bit tricky with OpenRefine being a reconciliation client, running as a client-server application, with the client part running in the browser, and the server part often also running on the client machine. But I think with the added wording above we cover considering the issue in general (which for browser plugin clients might be important), and also that in clients like OpenRefine, the client's server part does actually store the data, even if the client part is running in private mode.

For reconciliation clients running as web applications, this is typically handled by the browser, and data will be stored on the web application's server side only.

I am happy with this formulation, but what about reconciliation clients not running as web applications? For desktop software or mobile apps, there is no notion of private browsing, right?

So it seems like:

for clients which are web applications (and can therefore be run in private browsing), they do not need to worry about this since the private browsing is handled by the browser

for clients which are not web applications, the notion of private browsing does not even apply in the first place

-> so perhaps we don't need that paragraph at all? Or did I miss something?

As we discussed in today's meeting, the remaining thing are browser extensions, which seem to be disabled in private mode by default, but can be enabled both in Firefox and Chrome. I've made the private mode consideration specific to extensions in 9b27f21.

URLs defined in the manifest are only used for GET requests

For regular web clients, the browser will handle this

URLs defined in the manifest are only used for GET requests

For regular web clients, the browser will handle this

Add initial sections on security and privacy considerations

b677273

Based on https://www.w3.org/TR/security-privacy-questionnaire See https://etherpad.lobid.org/p/reconciliation-september-2022

fsteeg requested a review from wetneb September 6, 2022 14:42

wetneb reviewed Sep 6, 2022

View reviewed changes

fsteeg marked this pull request as draft September 13, 2022 07:45

fsteeg self-assigned this Sep 13, 2022

Avoid confusing terminology, add content, link to definitions (#99)

ab442f1

fsteeg marked this pull request as ready for review October 11, 2022 14:47

fsteeg assigned wetneb and unassigned fsteeg Oct 11, 2022

fsteeg requested a review from wetneb October 11, 2022 14:47

wetneb reviewed Oct 12, 2022

View reviewed changes

fsteeg added 2 commits October 13, 2022 18:50

Remove localhost security consideration (#99)

a062854

URLs defined in the manifest are only used for GET requests

Make private browsing consideration specific to extensions (#99)

9b27f21

For regular web clients, the browser will handle this

wetneb approved these changes Oct 13, 2022

View reviewed changes

fsteeg merged commit 90b5148 into master Oct 14, 2022

fsteeg deleted the security-privacy branch October 14, 2022 09:02

fsteeg added a commit that referenced this pull request Jul 13, 2023

Avoid confusing terminology, add content, link to definitions (#99)

5c2f9e5

fsteeg added a commit that referenced this pull request Jul 13, 2023

Remove localhost security consideration (#99)

b4bb60e

URLs defined in the manifest are only used for GET requests

fsteeg added a commit that referenced this pull request Jul 13, 2023

Make private browsing consideration specific to extensions (#99)

d077a04

For regular web clients, the browser will handle this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial sections on security and privacy considerations #99

Add initial sections on security and privacy considerations #99

fsteeg commented Sep 6, 2022

wetneb Sep 6, 2022

fsteeg Sep 13, 2022

fsteeg Oct 11, 2022

wetneb Oct 12, 2022

fsteeg Oct 13, 2022

wetneb Oct 13, 2022

fsteeg Oct 13, 2022

wetneb Oct 12, 2022

fsteeg Oct 13, 2022

wetneb Oct 13, 2022

fsteeg Oct 13, 2022

Add initial sections on security and privacy considerations #99

Add initial sections on security and privacy considerations #99

Conversation

fsteeg commented Sep 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment