Augmentation with Hydra¶
Why We Need Hydra for Augmentation¶
So, how do you write augmentation code in your dataset loader? You must write it like this:
from torch.utils.data import Dataset
from torchvision.transforms import transforms
class Dataset(Dataset):
def __init__(self, data_dir: str):
...
augmentations = transforms.Compose([
transforms.CenterCrop(224),
transforms.RandomAutocontrast(0.45),
transforms.Normalize(),
transforms.ToTensor()
])
...
If you don't write like this, maybe you should skip this article. If you write it like the code above, imagine when you want to add/replace one of the augmentation functions. Of course, you will add/replace it directly in the code. So what's so hard?
Now imagine when you want to do augmentation tuning on your dataset code. You will go back and forth continuously changing the dataset code and running the training model. It's a tiring thing.
To summarize, we have two problems: first, we have to change our code if we want to add/replace augmentation functions; second, we need to back-and-forth when we want to tune augmentation for training purposes.
So, this is why Hydra is coming to save you. Hydra can add/replace your augmentation functions without changing your code. It's because, in Hydra, you are changing your augmentation functions in the .yaml
file, not your .py
(dataset code) file. For tuning purposes, you can run multiple experiments with a powerful Hydra config. It's very convenient because your code doesn't change at all.
How to Use Hydra for Augmentation¶
You can check my GitHub repo ruhyadi/Augmentation-Hydra to follow this tutorial. To use Hydra for your project, first of all, you need to install some requirements:
pip install torch torchvision # for augmentation
pip install hydra-core hydra-colorlog # for hydra configs
After that, you need to create a configs
directory that contains the .yaml
config that can be loaded with Hydra. In this tutorial, an src
directory is also created, which will contain code for dataset augmentation, and other supporting files are also produced. You can create a directory like an example below.
.
├── configs
│ ├── augmentation
│ │ └── transforms.yaml
│ ├── experiment
│ │ └── experiment_01.yaml
│ └── main.yaml
├── src
│ └── main.py
└── README.md
The src/main.py
will contain the main augmentation code. The augmentation code will load the configuration file from configs/main.yaml
. To be able to load configuration files, we need Hydra. We can write Hydra loader with python decorator @hydra.main(...)
. The src/main.py
briefly contains the following code:
...
import hydra
from omegaconf import DictConfig
@hydra.main(config_path='../configs/', config_name='main.yaml')
def main(config: DictConfig):
orig_img = Image.open('src/astronaut.jpg')
augmentation: List[torch.nn.Module] = []
if "augmentation" in config:
for _, conf in config.augmentation.items():
if "_target_" in conf:
preprocess.append(hydra.utils.instantiate(conf))
augmentation_compose = transforms.Compose(augmentation)
# perform augmentation example
plot([aug(orig_img) for aug in augmentation])
def plot(imgs, with_orig=True, row_title=None, **imshow_kwargs):
# code borrowed from https://pytorch.org/vision/stable/transforms.html
...
Meanwhile main.yaml
contains the following simple code:
The default.augmentation
(yaml file format) has value transforms.yaml
that refer to configs/augmentation/transforms.yaml
. So, The main.yaml
will load transforms.yaml
subfile. It's the common format to write hydra configs. Meanwhile, the transforms.yaml
contains:
resize:
_target_: torchvision.transforms.Resize
size: 50
random_crop:
_target_: torchvision.transforms.RandomCrop
size: 50
random_perspective:
_target_: torchvision.transforms.RandomPerspective
distortion_scale: 0.75
p: 1.0
The resize
, random_crop
, and random_perspective
is an augmentation function of torchvision. Hydra will load those functions referring to their respective _target_
. You can input the parameters for each function like size
for resize
and random_crop
, and so on. Python will load resize
, random_crop
, and random_perspective
functions with Hydra function hydra.utils.instantiate(config)
. Since we have more than one augmentation function, we can get each function with config.augmentation.items()
. For more details, see main.py
.
Changing Augmentation Functions¶
To answer the first question of this article, with Hydra, we can edit the contents of main.yaml
if we want to make changes to the augmentation function. Easy isn't it. It also can be done in another way, namely through the terminal.
python src/main.py \
augmentation.resize.size=100 \
augmentation.random_crop.size=75 \
augmentation.random_perspective.distortion_scale=0.5
python src/main.py \
augmentation.resize.size=150 \
augmentation.random_crop.size=125 \
augmentation.random_perspective.distortion_scale=0.35
Experiment Augmentation Functions¶
To answer the second question of this article, with Hydra, we can override all augmentation functions in transforms.yaml
without changing the contents of the file. It can be done using experiment
. In this tutorial, an example of using experiment_01.yaml
, which contains the code:
defaults:
- override /augmentation: transforms.yaml
# override parameters
augmentation:
resize:
size: 25
random_crop:
size: 15
random_perspective:
distortion_scale: 0.85
We can run experiments with the following code without changing the code in the main file (transforms.yaml
). It saves for experiment purposes.