ControlNet adds an extra layer of control to Stable Diffusion. The official implementation allows control of generated images through dimensions such as depth, edge lines, and OpenPose. In this project, we aimed to control image generation through brightness (grayscale) to achieve requirements such as old photo colorization and recoloring of existing images.
This article will document and introduce the process of training a Brightness ControlNet using HuggingFace Diffusers.
Data sources:
Downloading data:
from img2dataset import download
import shutil
import multiprocessing
def main():
download(
processes_count=16,
thread_count=64,
url_list="laion2B-en-aesthetic",
resize_mode="center_crop",
image_size=512,
output_folder="../laion-en-aesthetic",
output_format="files",
input_format="parquet",
skip_reencode=True,
save_additional_columns=["similarity","hash","punsafe","pwatermark","aesthetic"],
url_col="URL",
caption_col="TEXT",
distributor="multiprocessing",
)
if __name__ == "__main__":
multiprocessing.freeze_support()
main()
Building HuggingFace Datasets, saving locally and pushing to Hub:
import os
from datasets import Dataset
from pathlib import Path
from PIL import Image
data_dir = Path(r"H:\DataScience\laion-en-aesthetic")
def entry_for_id(image_folder, filename):
img = Image.open(image_folder / filename)
gray_img = img.convert('L')
caption_filename = filename.replace('.jpg', '.txt')
with open(image_folder / caption_filename) as f:
caption = f.read()
return {
"image": img,
"grayscale_image": gray_img,
"caption": caption,
}
max_images = 1000000
def generate_entries():
index = 0
# All subfolders in cc3m
image_folders = [f.path for f in os.scandir(data_dir) if f.is_dir()]
for image_folder in image_folders:
image_folder = Path(image_folder)
print(image_folder)
# All files in cc3m subfolder
for filename in os.listdir(image_folder):
if not filename.endswith('.jpg'):
continue
yield entry_for_id(image_folder, filename)
index += 1
if index >= max_images:
break
if index >= max_images:
break
ds = Dataset.from_generator(generate_entries, cache_dir="./.cache")
ds.save_to_disk("./grayscale_image_aesthetic_1M")
ds.push_to_hub('ioclab/grayscale_image_aesthetic_1M', private=True)
Using the ControlNet training example script for training, with the following parameters:
accelerate launch train_controlnet_local.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--output_dir="./output_v1a2u" \
--dataset_name="./grayscale_image_aesthetic_100k" \
--resolution=512 \
--learning_rate=1e-5 \
--image_column=image \
--caption_column=caption \
--conditioning_image_column=grayscale_image \
--train_batch_size=16 \
--gradient_accumulation_steps=4 \
--num_train_epochs=2 \
--tracker_project_name="control_v1a2u_sd15_brightness" \
--enable_xformers_memory_efficient_attention \
--checkpointing_steps=5000 \
--hub_model_id="ioclab/grayscale_controlnet" \
--report_to wandb \
--push_to_hub
Wandb backend data:
A6000 GPU training duration: 13h, sample_count: 100k, epoch: 1, batch_size: 16, gradient_accumulation_steps: 1.
TPU v4-8 GPU training duration: 25h, sample_count: 3m, epoch: 1, batch_size: 2, gradient_accumulation_steps: 25.
Google's TPU v4-8 machine is configured with 240 cores 480 threads CPU, 400GB RAM, 128GB TPU memory, 2000Mbps bandwidth, 3TB disk.
A rough calculation shows that TPU v4-8 bf16 provides a 15x speed improvement compared to a single A6000 fp32.
The "Sudden Convergence" phenomenon mentioned in the ControlNet paper was observed during training.
[Note: The original document includes several images showing the training results and comparisons. These image references have been preserved but the actual images would need to be transferred to the appropriate location in your system.]
Adding Conditional Control to Text-to-Image Diffusion Models
The ControlNet paper. Contains important information including principle explanations, training parameters, and comparison images.
Official ControlNet repository, including a training tutorial.
ControlNet 1.1 Nightly.
Train your ControlNet with diffusers 🧨
HuggingFace official tutorial for training ControlNet using Diffusers, very detailed.
HuggingFace ControlNet training script examples.
JAX/Diffusers community sprint 🧨
HuggingFace × Google community sprint documentation.