Skip to content

CrudeCreations/pornpics-dataset-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PornPics Dataset Generator

This Go application scrapes images and their associated metadata from PornPics to create a labeled dataset suitable for fine-tuning Stable Diffusion models.

Features

  • Fetches popular images from PornPics using the provided API.
  • Downloads images from each gallery and saves them in a structured directory.
  • Extracts categories, tags, models & channels from the gallery page and includes them in the dataset.
  • Creates a text file for each image containing the prompt (alt text, categories, and tags) for OneTrainer.
  • Persists the current offset to allow resuming the scraping process on subsequent runs.
  • Utilizes concurrency to improve performance.
  • Image reviewing tool under /review
    • This allows for you to review the dataset and refine any labels into a new location

Requirements

  • Go (version 1.16 or higher)

Usage

  1. Clone the repository:

    git clone https://github.com/CrudeCreations/pornpics-dataset-gen
    cd pornpics-dataset-gen
  2. Install dependencies

    go install
  3. Configure Params in main.go

    Currently popular image scraping is broken so you need to put a search term in until I resolve.

    const (
        baseURL          = "https://www.pornpics.com"
        popularAPI       = "/popular/"
        searchAPI        = "/search/srch.php"
        query            = "strip tease" //Empty query searches popular images
        imageDir         = "dataset/"
        limitPerPage     = 5
        maxConcurrentReq = 10
        offsetFile       = "offset.txt"
    )
  4. Run the application

    go run main.go
  • The application will start fetching and processing images.
  • Images will be saved in the dataset directory, organized into subdirectories based on categories.
  • The current offset will be saved to offset.txt after each page.
  • You can stop the application at any time (e.g., with Ctrl+C), and it will resume from the last saved offset - when you run it again.
  1. Run the reviewer

    cd review
    go run review.go

    And go visit http://localhost:8080 to start reviewing.

Disclaimer

This application is provided for educational and research purposes only. The author is not responsible for any misuse or consequences arising from its use.

About

Go application to generate and scrape labelled datasets of adult images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published