Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eTLD+1 Request Header to Expose Domain Spoofing #12

Open
bmayd opened this issue Jun 10, 2022 · 9 comments
Open

Add eTLD+1 Request Header to Expose Domain Spoofing #12

bmayd opened this issue Jun 10, 2022 · 9 comments

Comments

@bmayd
Copy link

bmayd commented Jun 10, 2022

Domain Spoofing (aka Domain Laundering) is a form of Sophisticated Invalid Traffic (SIVT) in which a false representation is made about the domain associated with an ad impression. Two examples are when the domain in the ad request is different from the domain of the actual inventory being supplied or the actual ad is rendered to a different website or application than the one identified in the ad request. (See “False Representation” on page 8 of the Trustworthy Accountability Group (TAG) TAG Invalid Traffic Taxonomy v2.0.)

Four commonly identified expressions of Domain Spoofing are:

  • URL Substitution
    • Simple replacement of the URL in an ad/bid request.
  • Cross-domain Embedding
    • Using frames to embed a high-quality domain within a page on a low-quality domain or simply to hide the top-level domain in the ad or bid request.
  • Custom Browsers
    • Using a custom browser which provides falsified URL and host information.
  • Malware
    • Hijacks pages and Injects ad tags into otherwise legitimate pages.

Of these, the latter two seem to be out of scope: the third, custom browsers, obviously aren’t subject to W3C standards; the forth, malware, would be unreasonably difficult for browsers to defend against and is better dealt with via mechanisms like ads.txt and ads.cert validations on the parts of SSPs, DSPs and/or ad-verification vendors.

For the first two, a simple, straightforward and effective countermeasure would be for browsers to include a header value containing the eTLD+1 of the page in every request. Inclusion of the value could be made optional, with site owners having the ability to tell the browser not to provide it in cases where it is considered to be sensitive or otherwise inappropriate.

If each browser request included the eTLD+1 in a header, all the key constituencies in the ad supply chain which interact with the browser would have an opportunity to validate the impressions they’re buying, or have purchased, are from the top-level page the user sees in their address bar.

There is some server-to-server communication that happens between entities, like SSPs and DSPs, which allow for manipulation of bid requests, so the method isn’t entirely preemptive. However, because buyers subsequently interact with the browser directly via creatives, various measurement pixels and other interactions, were an intermediary to misrepresent the source of an impression, it could be easily caught, the source black-listed and payment withheld.

Providing the eTLD+1 of the page would be particularly valuable in cases where impressions are isolated from the host page by nested iFrames, as is common when publishers work with resellers.

Although the focus here is on ad-tech, given the central role eTLD+1 plays in the web, it seems likely making eTLD+1 generally available will support other antifraud use-cases as well.

@philippp
Copy link
Contributor

Is the proposal that embedded (e.g. iFrame'd) domains would get a header on each request that discloses the domain of the embedding (top-level) page? If so, this seems in tension with the objective of reducing cross-site tracking.

I wonder if a narrower mechanism that would allow the embedded domain to assert and validate its assumed embedded context would be helpful while reducing the privacy impact. For example, an embedded ad unit could assert the expectation that it is embedded in abc.com, and that this is the top-level frame.

@bmayd
Copy link
Author

bmayd commented Jul 1, 2022

Thanks for the feedback.

Is the proposal that embedded (e.g. iFrame'd) domains would get a header on each request that discloses the domain of the embedding (top-level) page?

Yes, a specific issue this is intended to address is the use of nested iFrames to hide the actual domain of the impression and claim it is instead a more desirable one.

I reread my original post and see that I didn't qualify that I was thinking of this as a post 3p-cookie feature. To be clear, I don't think it should be employed in contexts where there are means of linking cross-site information to a browser instance.

I was hoping a proactive means of identifying misrepresentations could be found to eliminate the overhead of reactive approaches, however I think where that isn't possible that what you suggest might be an interesting alternative. Correct me if I'm misunderstanding you, but I assume you're saying something along the lines of: the ad buyer identifying the domain they believe they're buying the impression from and the browser validating it is in fact that domain?

@philippp
Copy link
Contributor

philippp commented Jul 7, 2022

the ad buyer identifying the domain they believe they're buying the impression from and the browser validating it is in fact that domain?

Correct. As with user agent client hints, I think we want to avoid broadcasting side-channel information to all attending parties, and prefer mechanisms that can be invoked as-needed. I wonder whether there are learnings from ancestorOrigins (e.g. is it being spoofed? Not verifiable for networked DSPs? For browsers that did not implement it, what were their reasons?) that we should consider.

@supanate7
Copy link

Might there be an approach in which both the ad buyer and browser encrypt the domain and then the ad buyer only gets a "match" result if the encrypted domains match (e.g. the domain could be the unique part of the encryption key, so the ad buyer would only be able to decrypt the message if the domains matched)?

@bmayd
Copy link
Author

bmayd commented Jul 11, 2022

Might there be an approach in which both the ad buyer and browser encrypt the domain and then the ad buyer only gets a "match" result if the encrypted domains match

Very interesting idea. I think it covers the use-case and the check could happen as part of the bid decision, so seems like a relatively minor impact.

@bmayd
Copy link
Author

bmayd commented Jul 19, 2022

Seems like this could be accomplished by having user-agents provide a hash of the eTLD+1 in a header when the host-page requests it (e.g., during any ad transactions). The parties to the ad-transactions could then compare the hash in the header with a hash they generate from eTLD+1 provided in the ad-request to determine if the declared and actual eTLD+1 are the same. This would provide no net-new information to any participant already being provided an eTLD+1 other than whether it matched the browser and practically no usable information when the eTLD+1 isn't known.

As with user agent client hints, I think we want to avoid broadcasting side-channel information to all attending parties, and prefer mechanisms that can be invoked as-needed.

If there is concern about the hash provided by the browser being consistent over time and thereby potentially leaking information, a nonce could be added by the browser and returned to the host page to be included by participants when hashing the declared eTLD+1.

@supanate7
Copy link

Yes, that sounds like a good approach!

@dvorak42
Copy link
Member

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

@dvorak42
Copy link
Member

From the CG meeting, it seems like there is interest in exploring this keeping the custom browser/malware case out of scope to avoid needing attestation requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants