April learnings – Text Recognition and Multi Task Learning

4/19/21 – 4/25/21

  • Multi Task Learning: Soft parameter sharing vs Hard parameter sharing. Hard parameter sharing is preferred.
  • Multi Task Learning with attention: See reference 1 below. Code is also available at: https://github.com/lorenmt/mtan
  • Object Detection: One stage (YOLO) versus 2 stage detectors (R-CNN based)
  • Object detection: Labelling data
  • Adding constraints in a deep learning framework
This figure is from reference 8 below

References

  1. (paper) End-to-End Multi-Task Learning with Attention – not yet read
  2. (paper) Multi-Task Deep Learning for Depth-based Person Perception in Mobile Robotics – not yet read
  3. (paper) QuadroNet: Multi-Task Learning for Real-Time Semantic Depth Aware Instance Segmentation – not yet read
  4. (paper) Learning to Segment Every Thing – partially read
    1. Code: https://github.com/ronghanghu/seg_every_thing
  5. An Overview of Multi-Task Learning in Deep Neural Networks (blog/paper – read)
  6. (paper) A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks (introduces proportional sampling for MTL)
  7. (video) Integrating Constraints into Deep Learning Architectures with Structured Layers
  8. (paper) A Survey on Multi-Task Learning
  9. (MUST read paper for Multi Task Learning) Multitask Learning

4/12/2021 – 4/18/2021

  • Ordinal labels in Machine Learning
  • Multi Task Learning
  • Transfer Learning
  • Is Faster R-CNN an example of Multi Task Learning?

References

  1. (paper) A Multi-Task Learning Model for Better Representation of Clothing Images
  2. (thesis) Multi Task Learning in Computer Vision
  3. (paper) A multitask deep learning model for real-time deployment in embedded systems
  4. (paper) Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches

4/5/2021 – 4/11/2021

  • coco-annotator labelling tool: This week, I spent most of the time labelling data. It is possible to extract portion of the categories from the labelled data.
  • Detectron2 training on a custom dataset
  • Training SOLOv2 AdelaiDet on a custom dataset:

References

  1. Breakdown of Detecton2 trainer code
  2. How do I scale SOLVER.STEPS with SOLVER.MAX_ITER – Detectron2
  3. Roboflow blog on using the Detctron2 framework

3/29/2021 – 4/4/2021

  • Scene Text Detection: There are generally two phases to it – Text Detection and Text Recognition.
    • For Text detection, any available object detection model can be used. EAST is one of them. Others include, Faster R-CNN, YOLO, etc. Reference 2 reviews various methods
    • For Text recognition, there are few methods available such as Tesseract. Reference 2 reviews various methods.
  • Additionally, there are end to end systems also available.
  • How to label data for Scene Text detection?
    • COCO Text detection dataset. See reference 1 and Figure 1 below.
  • COCO- annotation tool: This is a very easy to use tool to annotate images. I explored it this week and used it to annotate images.
COC Toext dataset annotation example
COCO Text dataset example

References

  1. https://vision.cornell.edu/se3/coco-text-2/ – COCO Text detection annotation format
  2. Deep Learning Based OCR for Text in the Wild
  3. OpenCV OCR and text recognition with Tesseract – tutorial with code (EAST + Tesseract)
  4. (paper) ABCNet – end to end text spotting framework. ABCNet is an efficient end-to-end scene text spotting framework over 10x faster than previous state of the art. It’s published in IEEE Conf. Comp Vis Pattern Recogn.’2020 as an oral paper.
  5. CRAFT:
  6. How to use COCO annotator – video walkthrough