April learnings – Text Recognition and Multi Task Learning

Multi Task Learning: Soft parameter sharing vs Hard parameter sharing. Hard parameter sharing is preferred.
Multi Task Learning with attention: See reference 1 below. Code is also available at: https://github.com/lorenmt/mtan
- https://shikun.io/projects/multi-task-attention-network (Project page)
Object Detection: One stage (YOLO) versus 2 stage detectors (R-CNN based)
Object detection: Labelling data
Adding constraints in a deep learning framework

coco-annotator labelling tool: This week, I spent most of the time labelling data. It is possible to extract portion of the categories from the labelled data.
Detectron2 training on a custom dataset
Training SOLOv2 AdelaiDet on a custom dataset:

Scene Text Detection: There are generally two phases to it – Text Detection and Text Recognition.
- For Text detection, any available object detection model can be used. EAST is one of them. Others include, Faster R-CNN, YOLO, etc. Reference 2 reviews various methods
- For Text recognition, there are few methods available such as Tesseract. Reference 2 reviews various methods.
Additionally, there are end to end systems also available.
How to label data for Scene Text detection?
- COCO Text detection dataset. See reference 1 and Figure 1 below.
COCO- annotation tool: This is a very easy to use tool to annotate images. I explored it this week and used it to annotate images.

https://vision.cornell.edu/se3/coco-text-2/ – COCO Text detection annotation format
Deep Learning Based OCR for Text in the Wild
OpenCV OCR and text recognition with Tesseract – tutorial with code (EAST + Tesseract)
(paper) ABCNet – end to end text spotting framework. ABCNet is an efficient end-to-end scene text spotting framework over 10x faster than previous state of the art. It’s published in IEEE Conf. Comp Vis Pattern Recogn.’2020 as an oral paper.
CRAFT:
How to use COCO annotator – video walkthrough