STAIL / STIKO

STAIL / STIKO is a project implemented for the book publishing house Gdańskie Widawnictwo Oświatowe. It is a proprietary cloud platform that allows to automatically build mobile applications that can recognize book pages and play multimedia files related to recognized page.

Overview

The system allows the end user to automatically create a mobile application, based on images of book pages and multimedia files provided. After generation, the application can play the requested media when given book page is in the sight of the phone camera. Pages are recognized with high accuracy even if only small part of the page is visible to the phone. Moreover, pages do not need any markers nor codes to be correctly classified. The application may be built for various book types, including children books, textbooks or novels.

Details

The heart of the system is STIKO, a fully automatic service that allows to train neural network classifiers of book pages. As the input, it accepts the original images of book pages. Only one image per page is needed, and no additional tags, markers or codes besides the original image are required. Various types of books are supported: picture books, children books, textbooks and novels - from several to hundreds pages long. Basing on the original images, STIKO generates a large amount of training data, simulating various conditions in which the pages can be seen by the phone camera (zoom, perspective, light, motion and much more).

STAIL/STIKO architecture

After generation of data, training of deep neural networks is performed. The training and selection of the model proceeds in a fully automatic manner (utilizing local GPUs), without any human intervention. The efficiency of the obtained models ranges from 95% to above 99%. Best model is converted to the TF Lite format and transferred to the STAIL system for further processing.

Using this model and multimedia files provided by the user, the STAIL system builds a mobile application (Android and iOS). This application allows to analyze the image from the phone's camera in real time. After detecting the specific page of the book with certainty, the multimedia file assigned to it is played back to the user.