Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow device score calculation based on context and without user exposure #16

Open
ricardogarim opened this issue Sep 28, 2022 · 6 comments

Comments

@ricardogarim
Copy link

Hey folks! I'm from CAF, a Brazilian company founded in 2019 that aims to prevent identity frauds with a variety of products, such as digital onboarding, background checking and facial authentication.

Context and threat

Several types of attacks come from automation carried out by malicious agents engaged against legitimate targets and, in particular, against targets in the financial segment (web and mobile applications).

The most relevant attacks monitored and contained by our team in LatAm refer to abuse in the creation of accounts in digital banks and fintech apps.

In this context, creating fake accounts for money laundering or receiving stolen money in other ways has become quite important for cybercrime, especially in Brazil.

Therefore, preventing account creation by malicious actors is important not only because of fraudulent account creation but it also prevents carrying out other scams, especially those related to phishing against legitimate users.

Despite using real people data in their creation, these accounts are referenced as "Orange accounts" (straw-man, stooge accounts), as they are not controlled by the real owners, but by third parties, usually cybercriminals.

An attack example that ends up using these accounts is what we know as the WhatsApp Scam, where a person impersonates another on WhatsApp and ends up managing to trick a legitimate user into making a fraudulent transfer to an "orange" account. The targets most exposed to this type of attack are older people who are unable to perform some type of validation during the conversation with the criminal.

In most observed cases, abuse is not automated as digital banks already have several defenses in their KYC processes, but they are usually associated with high financial value transactions.

In all transactions of this nature, we observe several similar characteristics (clear signs) of a fraud being carried out.

Proposal

Our proposal is to create a device score mechanism respecting the regionalization/context where that transaction is taking place. Initially, this score would be just another signal for the decision to authorize or not a transaction, however, in a future version of this API, this decision could be made on-device.

For example: Observing signs such as apps installed, apps cloned, battery level, number of times the user used the device in the last 24 hours, device usage pattern (eg if only one app was used many times in the same day, this could be a important signal) and various other signals, it is possible to assign a score for that device.

To protect user privacy, our proposal is, similar to FLEDGE's proposal, that a worklet, configured to load data(K,V)/endpoints from a legitimate backend (from an anti-fraud company, for example), could be used to make the decision (like an on-device auction) of the device score (in a first version, only pattern-matching rules could be used for the calculation – in the future, ML models could be loaded to a more complex calculation).

In this proposal, instead of extracting data from the device, we load logic from reliable backends so that the score decision is kept on-device.

Ideally, user consent should be requested for high-risk transactions that make use of this API, since a variety of data could be used in this calculation (all pre-existing client-side data).

Relevant signs

Observing signs such as apps installed, apps cloned, battery Level, number of times the user used the device, device usage pattern (eg if only one app was used many times in the same day).

Privacy implications and safeguards

If the implementation is carried out completely on-device, there is no exposure of end-user data for possible identification.


This post was made together with @menegais1.

@bmayd
Copy link

bmayd commented Oct 11, 2022

I've had similar thoughts about using a FLEDGE-like model, also inspired by Shared Storage API (and this issue related to Shared Storage and the many proposals that depend on).

As I've been thinking about it, a user-agent (UA) could provide a restricted environment which provided:

  • Controlled access to on-device data assets, both native and custom.
  • Access to an off-device, trusted, read-only k/v store for data and executable code.
  • A means of executing worklets with read access to both data sources which can either ask the UA to add outputs (potentially high-entropy) to a data store only available to worklets or low-entropy, carefully gated outputs that can be returned to the caller and communicated off-device.

To give an example of what I'm suggesting: In a sensitive app a worklet could me invoked every time a user logged in which:

  • Asked the UA to check a trusted k/v store for updates.
  • Queried the UA for characteristics of the active context: time of day, location, network, etc., as well as for previously generated outputs.
  • Generated a new set of scores and had them added to the history.
  • Returned a score, 1 to 10, indicating how likely it was the login was initiated by the person who usually logged in.

The output of the worklet could then be used by the app to decide whether or not to prompt the user for additional input, like a pin.

This also implies to me the possibility of converging on a standard set of APIs, based on what information worklets commonly request, that could be provided by UAs for access by worklets to answer questions like:

  • Is this a time of day when the user is normally active?
  • Is the user in a location they visit regularly?
  • Was this app installed recently?
  • Has this resource (e.g. web page) been interacted with before?

@bmayd
Copy link

bmayd commented Oct 11, 2022

Another thought I had about FLEDGE that might apply here as well: there was some discussion of the possibility of allowing user-defined functions to be run in a restricted environment on the k/v store and I think that potentially allows for data from the browser to be supplied as an input. The basic idea being to allow the trusted UA environment to be extended to a trusted server-based enclave where greater persistence and compute resources were available and where the data from many UA may also be able to be made available in privacy-preserving aggregates or carefully constructed output sets.

@p-j-l
Copy link

p-j-l commented Oct 11, 2022

@bmayd - there're some details about user-defined functions in the k/v server here: https://github.com/privacysandbox/fledge-docs/blob/main/key_value_service_trust_model.md#support-for-user-defined-functions-udfs

I'm not sure the discussion about what data from the browser will be sent to the server has happened quite yet.

@bmayd
Copy link

bmayd commented Oct 11, 2022

I'm not sure the discussion about what data from the browser will be sent to the server has happened quite yet.

The possibility of sending data from the browser was part of an informal discussion I had offline at TPAC which included potentially leveraging MPC and/or TEEs. Anything like that would, of course, require that there was sufficient trust in the data protections provided by the k/v execution environment.

@dvorak42
Copy link
Member

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

@dvorak42
Copy link
Member

From the CG meeting, there was discussion about some of the privacy issues with consuming a ton of signals to effectively do on device fingerprinting, and how much of that fingerprinting would be exposed by the output of the device score calculation. It may be okay if there's a small (1-2 bit) output from the on-device logic, but if there's a full high-entropy score, it effectively acts as a full fingerprinting vector. There was some interest in seeing if some of this logic could be combined with the trusted server/trusted execution environment proposal ( #3 )..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants