Latest AI Research From Intel Explains an Alternative Approach to Train Deep Learning Models for Fast-Paced Real World Use Cases, Across a Variety of Industries

Object detection means all the techniques and means for detecting, identifying, and classifying objects in an image. Recently, the field of artificial intelligence has seen many advances thanks to deep learning and image processing. It is now possible to recognize images or even find objects inside an image. With deep learning, object detection has become very popular with several families of models (R-CNN, YOLO, etc.). However, most of the existing methods in the literature adapt to the training database and fail to generalize when faced with images belonging to different domains.

Although most architectures are optimized for well-known benchmarks, significant results have been achieved using CNNs for tasks particular to a certain domain. However, these domain-specific solutions are often well-tuned for a specific target dataset, starting with carefully chosen architecture and training techniques. This method of training models has the drawback of unnecessarily adapting the approaches to a particular dataset. To address this issue, a research team from Intel offers a different strategy that also serves as the foundation of the Intel® Geti™ platform: a dataset-agnostic template for object detection training made up of carefully selected and pre-trained models and a reliable training pipeline for additional training.

The authors experimented with architectures in three categories: lightweight, extremely accurate, and medium, to develop a scope of the models used for the various object detection datasets regardless of complexity and object size. Pretrained weights are employed to reach model convergence quickly and begin with high accuracy. In addition, a data augmentation operation is performed to augment images with a random crop, horizontal flip, and brightness and color distortions. Multiscale training was applied for medium and accurate models to make them more robust. Additionally, to strike a balance between accuracy and complexity, the authors empirically selected particular resolutions for each model after conducting several trials. Early stopping and the adaptive ReduceOnPlateau scheduler are also used to end training if a few epochs of training do not further improve the outcome.

Blog