Adam Brzeski, Kamil Grinholc, Kamil Nowodworski, Adam Przybyłek
https://link.springer.com/chapter/10.1007/978-3-030-28957-7_1
In this paper we evaluated a set of potential improvements to the successful Attention-OCR architecture, designed to predict multiline text from unconstrained scenes in real-world images. We investigated the impact of several optimizations on model’s accuracy, including employing dynamic RNNs (Recurrent Neural Networks), scheduled sampling, BiLSTM (Bidirectional Long Short-Term Memory) and a modified attention model. BiLSTM was found to slightly increase the accuracy, while dynamic RNNs and a simpler attention model provided a significant training time reduction with only a slight decline in accuracy.