This tool provides functionality to process images and texts in a bidirectional manner:
- Generate text descriptions from images (image → text)
- Generate images from text descriptions (text → image)
The tool maintains directory structures throughout conversions, making it easy to process batches of images or texts while preserving their organization.
.
├── data/
│ ├── real/ # Source images directory
│ ├── text/ # Generated text descriptions directory
│ └── output/ # Generated images directory
├── utils/
│ ├── download_image.py
│ ├── text_to_image.py
| └── ...
├── main.py # Main execution script
├── keys.json # API keys configuration
└── README.md
The application uses a configuration dictionary in main.py
with the following parameters:
config = {
"override_text_prompt": False, # Whether to override existing text files
"override_output_image": True, # Whether to override existing image files
"real_image_path": "./data/real", # Path to the source images
"text_image_path": "./data/text", # Path to the generated text descriptions
"output_path": "./data/output", # Path to the generated images
"text_prompt": "What is the main content of the image? Please generate a detailed prompt to create an image, following the format of artistic style + subject description, for example: ..."
}
You can modify these settings according to your needs:
- Set
override_text_prompt
toTrue
to regenerate existing text descriptions - Set
override_output_image
toTrue
to regenerate existing images - Customize the
text_prompt
to get different types of image descriptions
Install required packages using pip:
# Install the volcengine SDK for ARK runtime
pip install -U volcengine-python-sdk[ark]
# Install the volcengine Python SDK
pip install --user volcengine
You need three API keys for this application, corresponding to Volcengine, ARK, and Aliyun OSS. Create a keys.json
file with the following structure:
{
"oss": {
"access_key_id": "aliyun_access_key_id",
"access_key_secret": "aliyun_access_key_secret",
"bucket_name": "aliyun_bucket_name",
"endpoint": "aliyun_endpoint" // Example: oss-cn-beijing.aliyuncs.com
},
"ark": {
"api_key": "volc_ark_api_key"
},
"volc": {
"ak": "volc_ak",
"sk": "volc_sk"
}
}
Get your API key for the doubao-1-5-vision-pro-32k model from Volcengine ARK Console.
Get your API key from the Volcengine IAM Console.
Get your API key from the Aliyun OSS Console.
- Ensure your images are placed in the
./data/real
directory with any folder structure you want to maintain - Run the main script:
python main.py
- You'll be prompted to choose an action:
- Option 1: Generate text descriptions from images
- Option 2: Generate images from text descriptions
Image to Text Conversion:
- Images from
./data/real
will be processed - Text descriptions will be saved to
./data/text
with the same directory structure - Files will be skipped if they exist and
override_text_prompt
isFalse
Text to Image Conversion:
- Text files from
./data/text
will be processed - Generated images will be saved to
./data/output
with the same directory structure - Files will be skipped if they exist and
override_output_image
isFalse