Clean-Tags-Utility

Contains a series of hardcoded processes & LLM tag pruning features as final preparation to be manually overviewed by the user for training. It's pupose is to process data from a messy format of being webscraped from "any" website and other data source. This pipeline contains all the pieces to completely automate data curation for the user.

Use case:

The user wants to use messy unformated data from various webscraped sites, possibly in combination with their own carefully curated data. Or data from the https://github.com/x-CK-x/Joy-Captioner-Inference or https://github.com/x-CK-x/Model-Builder-DCT tools. The user may want to merge the aforemetioned data in a way that makes sense. The user may want to prune the data after being merged base on a set of rules specific to the model they are training. The user may have the data in the format to load into the data curation tool for final review: https://github.com/x-CK-x/Dataset-Curation-Tool The user may have the data in the exact format to train (except for the trigger "instance token/prompt") for LoRA training

This tool hold the implemented solutions to all of these use cases ^^

(IMPORTANT) LLM USAGE w/ HuggingFace models is via API token/s, i.e. you need to get access to the `gated` models on HF and go to your api tokens in your settings

The only LLM model not gated with special access is the Phi model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
artist_llm.py		artist_llm.py
e6_tag_utils.py		e6_tag_utils.py
environment.yml		environment.yml
run.bat		run.bat
run_tag_cleaner.sh		run_tag_cleaner.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clean-Tags-Utility

Use case:

This tool hold the implemented solutions to all of these use cases ^^

(IMPORTANT) LLM USAGE w/ HuggingFace models is via API token/s, i.e. you need to get access to the `gated` models on HF and go to your api tokens in your settings

About

Uh oh!

Releases

Packages

Languages

License

x-CK-x/Clean-Tags-Utility

Folders and files

Latest commit

History

Repository files navigation

Clean-Tags-Utility

Use case:

This tool hold the implemented solutions to all of these use cases ^^

(IMPORTANT) LLM USAGE w/ HuggingFace models is via API token/s, i.e. you need to get access to the gated models on HF and go to your api tokens in your settings

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

(IMPORTANT) LLM USAGE w/ HuggingFace models is via API token/s, i.e. you need to get access to the `gated` models on HF and go to your api tokens in your settings

Packages