Brightness ControlNet Training Record

Introduction

ControlNet adds an extra layer of control to Stable Diffusion. The official implementation allows control of generated images through dimensions such as depth, edge lines, and OpenPose. In this project, we aimed to control image generation through brightness (grayscale) to achieve requirements such as old photo colorization and recoloring of existing images.

This article will document and introduce the process of training a Brightness ControlNet using HuggingFace Diffusers.

Dataset Preparation

Data sources:

LAION-Aesthetics V1 (subset with LAION aesthetic scores above 7)
COYO-700M (includes aesthetic_score_laion_v2 ratings)

Downloading data:

from img2dataset import download
import shutil
import multiprocessing

def main():
    download(
        processes_count=16,
        thread_count=64,
        url_list="laion2B-en-aesthetic",
        resize_mode="center_crop",
        image_size=512,
        output_folder="../laion-en-aesthetic",
        output_format="files",
        input_format="parquet",
        skip_reencode=True,
        save_additional_columns=["similarity","hash","punsafe","pwatermark","aesthetic"],
        url_col="URL",
        caption_col="TEXT",
        distributor="multiprocessing",
    )

if __name__ == "__main__":
    multiprocessing.freeze_support()
    main()

Building HuggingFace Datasets, saving locally and pushing to Hub:

import os
from datasets import Dataset
from pathlib import Path
from PIL import Image

data_dir = Path(r"H:\DataScience\laion-en-aesthetic")

def entry_for_id(image_folder, filename):
    img = Image.open(image_folder / filename)
    gray_img = img.convert('L')
    caption_filename = filename.replace('.jpg', '.txt')

    with open(image_folder / caption_filename) as f:
        caption = f.read()
    return {
        "image": img,
        "grayscale_image": gray_img,
        "caption": caption,
    }

max_images = 1000000

def generate_entries():
    index = 0

    # All subfolders in cc3m
    image_folders = [f.path for f in os.scandir(data_dir) if f.is_dir()]
    for image_folder in image_folders:

        image_folder = Path(image_folder)
        print(image_folder)

        # All files in cc3m subfolder
        for filename in os.listdir(image_folder):
            if not filename.endswith('.jpg'):
                continue
            yield entry_for_id(image_folder, filename)
            index += 1
            if index >= max_images:
                break

        if index >= max_images:
            break

ds = Dataset.from_generator(generate_entries, cache_dir="./.cache")
ds.save_to_disk("./grayscale_image_aesthetic_1M")
ds.push_to_hub('ioclab/grayscale_image_aesthetic_1M', private=True)

Training Process

Using the ControlNet training example script for training, with the following parameters:

accelerate launch train_controlnet_local.py \
 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
 --output_dir="./output_v1a2u" \
 --dataset_name="./grayscale_image_aesthetic_100k" \
 --resolution=512 \
 --learning_rate=1e-5 \
 --image_column=image \
 --caption_column=caption \
 --conditioning_image_column=grayscale_image \
 --train_batch_size=16 \
 --gradient_accumulation_steps=4 \
 --num_train_epochs=2 \
 --tracker_project_name="control_v1a2u_sd15_brightness" \
 --enable_xformers_memory_efficient_attention \
 --checkpointing_steps=5000 \
 --hub_model_id="ioclab/grayscale_controlnet" \
 --report_to wandb \
 --push_to_hub

Wandb backend data:

A6000 GPU training duration: 13h, sample_count: 100k, epoch: 1, batch_size: 16, gradient_accumulation_steps: 1.

TPU v4-8 GPU training duration: 25h, sample_count: 3m, epoch: 1, batch_size: 2, gradient_accumulation_steps: 25.

Google's TPU v4-8 machine is configured with 240 cores 480 threads CPU, 400GB RAM, 128GB TPU memory, 2000Mbps bandwidth, 3TB disk.

A rough calculation shows that TPU v4-8 bf16 provides a 15x speed improvement compared to a single A6000 fp32.

The "Sudden Convergence" phenomenon mentioned in the ControlNet paper was observed during training.

Results

[Note: The original document includes several images showing the training results and comparisons. These image references have been preserved but the actual images would need to be transferred to the appropriate location in your system.]

Reference Materials

Adding Conditional Control to Text-to-Image Diffusion Models

The ControlNet paper. Contains important information including principle explanations, training parameters, and comparison images.
ControlNet - GitHub

Official ControlNet repository, including a training tutorial.
ControlNet 1.1 - GitHub

ControlNet 1.1 Nightly.
Train your ControlNet with diffusers 🧨

HuggingFace official tutorial for training ControlNet using Diffusers, very detailed.
ControlNet training example

HuggingFace ControlNet training script examples.
JAX/Diffusers community sprint 🧨

HuggingFace × Google community sprint documentation.

Brightness ControlNet Training Record

Introduction

Dataset Preparation

Training Process

Results

Reference Materials

Products

Company

Connect

By Latent Cat