This repository contains the code for the paper titled Targeted and Troublesome: Tracking and Advertising on Children's Websites (to be presented at the 45th IEEE Symposium on Security and Privacy). The paper investigates targeted and inappropriate advertising on websites targeted at children, along with online tracking.
We are working on preparing and documenting the dataset for release.
We extended Tracker Radar Collector from DuckDuckGo to scrape ads, detect fingerprinting attempts and capture a video recording of the screen.
Our main modifications can be found in the following files:
AdCollector.js
: detects ads and scrapes ad disclosures. AdCollector's ad detection and scraping code is partly based on adscraper (UW CSE Security Lab).FingerprintCollector.js
andfingerprintDetection.js
: detect fingerprinting related function calls and property accesses.LinkCollector.js
: extracts inner page links.VideoCollector.js
: captures the crawl video.
- Clone this repo:
git clone https://github.com/targeted-and-troublesome/targeted-and-troublesome-crawler.git cd targeted-and-troublesome-crawler
- Install the required npm packages:
npm install
For a single URL:
npm run crawl -- -u 'https://games2jolly.com' \
-o ./data/ -v -f \
-d "fingerprints,requests,cookies,ads,screenshots,cmps,videos" \
--reporters 'cli,file' \
-l ./data/ \
--autoconsent-action "optIn"
For a list of URLs:
npm run crawl -- -u urls/fra_desktop_home_inner_combined.csv \
-o ./data/ -v -f \
-d "fingerprints,requests,cookies,ads,screenshots,cmps,videos" \
--reporters 'cli,file' \
-l ./data/ \
--autoconsent-action "optIn"
Please check the upstream Tracker Radar Collector repository repo for explanations of the command line options.
The shell script we used to start the crawls can be found in the
shell_scripts/ad-scraper folder
.
You can find all crawled URLs, including landing and inner page URLs associated with the child-directed sites,
in the URLs directory
.
Refer to the inference.ipynb
notebook for downloading and using the fine-tuned model that detects child-directed webpages by
page titles and descriptions. More details about the classification pipeline can be found in the classifier
directory.
@inproceedings {,
author = {Zahra Moti and Asuman Senol and Hamid Bostani and Frederik Zuiderveen Borgesius and Veelasha Moonsamy and Arunesh Mathur and Gunes Acar},
booktitle = {2024 IEEE Symposium on Security and Privacy (SP)},
title = {Targeted and Troublesome: Tracking and Advertising on Children’s Websites},
year = {2024},
volume = {},
issn = {2375-1207},
pages = {118-118},
doi = {10.1109/SP54263.2024.00118},
url = {https://doi.ieeecomputersociety.org/10.1109/SP54263.2024.00118},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {may}
}
For any questions, suggestions, or issues regarding this project or our paper, please contact:
You can also reach out to us by opening an issue on our GitHub repository.