Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books

Santamaría, Gonzalo; Domínguez, César; Heras, Jónathan; Mata, Eloy; Pascual, Vico

doi:10.1007/978-3-031-06555-2_37

Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books

Actas:

International Workshop on Document Analysis Systems

ISSN: 0302-9743, 1611-3349

ISBN: 9783031065545, 9783031065552

Año de publicación: 2022

Páginas: 553-567

Congreso: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings

Tipo: Aportación congreso

beta Ver similares en nube de resultados

DOI: 10.1007/978-3-031-06555-2_37 GOOGLE SCHOLAR

Repositorio institucional: Acceso abierto Editor

Resumen

Digitizing historical music books can be challenging sincestaves are usually mixed with typewritten text explaining some charac-teristics of them. In this work, we propose a new methodology to under-take such a digitization task. After scanning the pages of the book, thedifferent blocks of text and staves can be detected and organized intomusic pieces using image processing techniques. Then, OCR and OMRmethods can be applied to text and stave blocks, respectively, and theinformation conveniently stored using the MusicXML format. In addi-tion, we explain how this methodology was successfully applied in thedigitization of a book entitled “The Music in the Santo Domingo’s Cathe-dral”. In particular, we provide a new annotated database of musicalsymbols from the staves included in this book. This database was usedto develop two new OMR deep learning models for the detection andclassification of music scores. The detection model obtained a F1-scoreof 90% on symbol detection; and the classification model a note pitchaccuracy of 98.4%. The method allows us to conduct text searches, obtainclean PDF files of music pieces, or reproduce the sound represented bythe pieces. The database, models, and code of this project are availableat https://github.com/joheras/MusicaCatedralStoDomingoIER

Referencias bibliográficas

Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)
Bitteur, H.: Audiveris (2004). https://github.com/audiveris
Bochkovskiy, A.: YOLO v4, v3 and v2 for Windows and Linux (2020). https://github.com/AlexeyAB/darknet
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLO v4: optimal speed and accuracy of object detection (2020). https://arxiv.org/abs/2004.10934
Bradski, A.: Learning OpenCV, Computer Vision with OpenCV Library. O’Reilly Media, Sebastopol (2008)
Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Calvo-Zaragoza, J., Hajič, J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 1–35 (2020). https://doi.org/10.1145/3397499
Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th ISMIR Conference, pp. 248–255 (2018)
Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4) (2018). https://doi.org/10.3390/app8040606
Chandra, S., Sisodia, S., Gupta, P.: Optical character recognition-a review. Int. Res. J. Eng. Technol. 7(04), 3037–3041 (2020)
Gallego, A.J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert Syst. Appl. 89, 138–148 (2017)
Good, M.: MusicXML: an internet-friendly format for sheet music. In: XML Conference and Expo, pp. 3–4 (2001). https://michaelgood.info/publications/music/musicxml-an-internet-friendly-format-for-sheet-music/
Hajic, J., Pecina, P.: In search of a dataset for handwritten optical music recognition: Introducing MUSCIMA++ (2017). http://arxiv.org/abs/1703.04824
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://arxiv.org/abs/1512.03385
Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645–2665 (2019). https://doi.org/10.3390/app9132645
Huber, D.M.: The MIDI Manual: A Practical Guide to MIDI within Modern Music Production. A Focal Press Book, Waltham (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
López-Caro, J.: La Música en la Catedral de Santo Domingo de la Calzada. Vol. I: Catálogo del Archivo de Música (1988)
Lyu, L., Koutraki, M., Krickl, M., Fetahu, B.: Neural OCR post-hoc correction of historical corpora. Trans. Assoc. Comput. Linguist. 9, 479–493 (2021)
Mursari, L.R., Wibowo, A.: The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Comput. Eng. Appl. J. 10(3), 177–186 (2021)
Musitek: SmartScore 64 (2021). https://www.musitek.com/
Neuratron: PhotoScore 2020 (2020). https://www.neuratron.com/photoscore.htm
Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoč, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). http://arxiv.org/abs/1506.01497
Rosebrock, A., Thanki, A., Paul, S., Haase, J.: OCR with OpenCV, Tesseract and Python. PyImageSearch (2020)
Serra, J., Soille, P.: Mathematical Morphology and Its Applications to Image Processing. Springer Science & Business Media, Dordrecht (2012). https://doi.org/10.1007/978-94-011-1040-2
Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). https://arxiv.org/abs/2006.07885
Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). https://arxiv.org/abs/2107.07786
Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. 2(3), 314 (2012)
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection (2019). http://arxiv.org/abs/1911.09070
Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9188–9195. IEEE (2021)
Vazquez, L.: IceVision: an agnostic object detection framework (2020). https://github.com/airctic/icevision
Yousefi, J.: Image binarization using Otsu thresholding algorithm (2015). https://doi.org/10.13140/RG.2.1.4758.9284