Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning

Problem

Detection of text and identification of characters in scene images is a challenging visual recogniiton problem.

Related Work

Survey some related work in scene text recognition, as well as the machine learning and vision results that inform their basic approach.

Desription of the learning architecture used in their
experiments.

Experimental Results

Conclusion

Feature Learning

Feature extraction

Text detector training

Character classifier training

To solve this problem...

To solve this problem authors apply methods recently developed in machine learning - specifically, large-scale algorithms for learning the features and show that they allow them to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

Authors' system proceeds in several stages

Apply an unsupervised feature learning algorithm to a set of image patches harvested from the training data to learn a bank of image features.

Evaluate the features convolutionallty over the training images. Reduce the number of features using spatial pooling.

Train a linear classifier for either text detection or character recognition.

Authors present experimental results achuieved wuth the their system, demonstrating the impact of being able to train increasing numbers of features.

Specifically, for detection and character recognition, they trained their classifiers with increasing numbers of learned features and in each case evaluated the results on the test sets for text detection and character recognition.

81.7% ; 81.4% ; 85,5%.

Authors reach high accuracy, using developed system based on scalable feature learning algorithm.

Also they say that with more scalable and sophisticated feature learning algorithms currently being developed by machine learning researchers, it is possible thet the approaches pursued here might achieve performance well beyond what is possible through other methods that rely heavity on hand-coded prior knowledge.