This Go application scrapes images and their associated metadata from PornPics to create a labeled dataset suitable for fine-tuning Stable Diffusion models.
- Fetches popular images from PornPics using the provided API.
- Downloads images from each gallery and saves them in a structured directory.
- Extracts categories, tags, models & channels from the gallery page and includes them in the dataset.
- Creates a text file for each image containing the prompt (alt text, categories, and tags) for OneTrainer.
- Persists the current offset to allow resuming the scraping process on subsequent runs.
- Utilizes concurrency to improve performance.
- Image reviewing tool under /review
- This allows for you to review the dataset and refine any labels into a new location
- Go (version 1.16 or higher)
-
Clone the repository:
git clone https://github.com/CrudeCreations/pornpics-dataset-gen cd pornpics-dataset-gen
-
Install dependencies
go install
-
Configure Params in main.go
Currently popular image scraping is broken so you need to put a search term in until I resolve.
const ( baseURL = "https://www.pornpics.com" popularAPI = "/popular/" searchAPI = "/search/srch.php" query = "strip tease" //Empty query searches popular images imageDir = "dataset/" limitPerPage = 5 maxConcurrentReq = 10 offsetFile = "offset.txt" )
-
Run the application
go run main.go
- The application will start fetching and processing images.
- Images will be saved in the dataset directory, organized into subdirectories based on categories.
- The current offset will be saved to offset.txt after each page.
- You can stop the application at any time (e.g., with Ctrl+C), and it will resume from the last saved offset - when you run it again.
-
Run the reviewer
cd review go run review.go
And go visit
http://localhost:8080
to start reviewing.
This application is provided for educational and research purposes only. The author is not responsible for any misuse or consequences arising from its use.