The SSD paper details its random-crop data augmentation scheme as:
Data augmentation To make the model more robust to various input object sizes and shapes, each training image is randomly sampled by one of the following options: – Use the entire original input image. – Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9. – Randomly sample a patch. The size of each sampled patch is [0.1, 1] of the original image size, and the aspect ratio is between 1 and 2. We keep the overlapped part of the ground truth box if the center of it is in the sampled patch. After the aforementioned sampling step, each sampled patch is resized to fixed size and is horizontally flipped with probability of 0.5, in addition to applying some photo-metric distortions similar to those described in . https://arxiv.org/pdf/1512.02325.pdf
My question is: what is the reasoning for resizing crops that range in aspect ratios between 0.5 and 2.0?
For instance if your input image is 300x300, reshaping a crop with AR=2.0 back to square resolution will severely stretch objects (square features become rectangular, circles become ellipses, etc.) I understand small distortions may be good to improve generalization, but training the network on objects distorted up to 2x in either dimension seems counter-productive. Am I misunderstanding how random-crop works?