Skip to content

Code for the paper titled "Targeted and Troublesome: Tracking and Advertising on Children's Websites" (IEEE Security and Privacy 2024)

License

Notifications You must be signed in to change notification settings

targeted-and-troublesome/targeted-and-troublesome-crawler

Repository files navigation

Targeted and Troublesome: Tracking and Advertising on Children's Websites (IEEE S&P'24)

This repository contains the code for the paper titled Targeted and Troublesome: Tracking and Advertising on Children's Websites (to be presented at the 45th IEEE Symposium on Security and Privacy). The paper investigates targeted and inappropriate advertising on websites targeted at children, along with online tracking.

ad_classification

NSFW ⚠️ Click to see a sample of improper ads found on child-directed websites in our crawls.

bad_ads_collage

Data release (TBD)

We are working on preparing and documenting the dataset for release.

Crawler

We extended Tracker Radar Collector from DuckDuckGo to scrape ads, detect fingerprinting attempts and capture a video recording of the screen.

Our main modifications can be found in the following files:

Crawler - Getting Started

Installation

  1. Clone this repo:
    git clone https://github.com/targeted-and-troublesome/targeted-and-troublesome-crawler.git
    cd targeted-and-troublesome-crawler
  2. Install the required npm packages:
    npm install

Running the Crawler

For a single URL:

  npm run crawl -- -u 'https://games2jolly.com' \
      -o ./data/ -v -f \
      -d "fingerprints,requests,cookies,ads,screenshots,cmps,videos" \
      --reporters 'cli,file' \
      -l ./data/ \
      --autoconsent-action "optIn"

For a list of URLs:

  npm run crawl -- -u urls/fra_desktop_home_inner_combined.csv \
      -o ./data/ -v -f \
      -d "fingerprints,requests,cookies,ads,screenshots,cmps,videos" \
      --reporters 'cli,file' \
      -l ./data/ \
      --autoconsent-action "optIn"

Please check the upstream Tracker Radar Collector repository repo for explanations of the command line options.

Crawl scripts and webpage lists

The shell script we used to start the crawls can be found in the shell_scripts/ad-scraper folder.

You can find all crawled URLs, including landing and inner page URLs associated with the child-directed sites, in the URLs directory.

Webpage classifier

Refer to the inference.ipynb notebook for downloading and using the fine-tuned model that detects child-directed webpages by page titles and descriptions. More details about the classification pipeline can be found in the classifier directory.

Reference

@inproceedings {,
    author = {Zahra Moti and Asuman Senol and Hamid Bostani and Frederik Zuiderveen Borgesius and Veelasha Moonsamy and Arunesh Mathur and Gunes Acar},
    booktitle = {2024 IEEE Symposium on Security and Privacy (SP)},
    title = {Targeted and Troublesome: Tracking and Advertising on Children’s Websites},
    year = {2024},
    volume = {},
    issn = {2375-1207},
    pages = {118-118},
    doi = {10.1109/SP54263.2024.00118},
    url = {https://doi.ieeecomputersociety.org/10.1109/SP54263.2024.00118},
    publisher = {IEEE Computer Society},
    address = {Los Alamitos, CA, USA},
    month = {may}
}

Contact

For any questions, suggestions, or issues regarding this project or our paper, please contact:

Author Email
Zahra Moti [email protected]
Asuman Senol [email protected]
Gunes Acar [email protected]
Hamid Bostani [email protected]
Frederik Zuiderveen Borgesius [email protected]
Veelasha Moonsamy [email protected]
Arunesh Mathur [email protected]

You can also reach out to us by opening an issue on our GitHub repository.

About

Code for the paper titled "Targeted and Troublesome: Tracking and Advertising on Children's Websites" (IEEE Security and Privacy 2024)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published