Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books

  1. Santamaría, Gonzalo
  2. Domínguez, César
  3. Heras, Jónathan
  4. Mata, Eloy
  5. Pascual, Vico
Actas:
International Workshop on Document Analysis Systems

ISSN: 0302-9743 1611-3349

ISBN: 9783031065545 9783031065552

Año de publicación: 2022

Páginas: 553-567

Congreso: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings

Tipo: Aportación congreso

DOI: 10.1007/978-3-031-06555-2_37 GOOGLE SCHOLAR
Repositorio institucional: lockAcceso abierto Editor

Resumen

Digitizing historical music books can be challenging sincestaves are usually mixed with typewritten text explaining some charac-teristics of them. In this work, we propose a new methodology to under-take such a digitization task. After scanning the pages of the book, thedifferent blocks of text and staves can be detected and organized intomusic pieces using image processing techniques. Then, OCR and OMRmethods can be applied to text and stave blocks, respectively, and theinformation conveniently stored using the MusicXML format. In addi-tion, we explain how this methodology was successfully applied in thedigitization of a book entitled “The Music in the Santo Domingo’s Cathe-dral”. In particular, we provide a new annotated database of musicalsymbols from the staves included in this book. This database was usedto develop two new OMR deep learning models for the detection andclassification of music scores. The detection model obtained a F1-scoreof 90% on symbol detection; and the classification model a note pitchaccuracy of 98.4%. The method allows us to conduct text searches, obtainclean PDF files of music pieces, or reproduce the sound represented bythe pieces. The database, models, and code of this project are availableat https://github.com/joheras/MusicaCatedralStoDomingoIER

Referencias bibliográficas

  • Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
  • Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)
  • Bitteur, H.: Audiveris (2004). https://github.com/audiveris
  • Bochkovskiy, A.: YOLO v4, v3 and v2 for Windows and Linux (2020). https://github.com/AlexeyAB/darknet
  • Bochkovskiy, A., Wang, C., Liao, H.M.: YOLO v4: optimal speed and accuracy of object detection (2020). https://arxiv.org/abs/2004.10934
  • Bradski, A.: Learning OpenCV, Computer Vision with OpenCV Library. O’Reilly Media, Sebastopol (2008)
  • Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
  • Calvo-Zaragoza, J., Hajič, J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 1–35 (2020). https://doi.org/10.1145/3397499
  • Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th ISMIR Conference, pp. 248–255 (2018)
  • Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4) (2018). https://doi.org/10.3390/app8040606
  • Chandra, S., Sisodia, S., Gupta, P.: Optical character recognition-a review. Int. Res. J. Eng. Technol. 7(04), 3037–3041 (2020)
  • Gallego, A.J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert Syst. Appl. 89, 138–148 (2017)
  • Good, M.: MusicXML: an internet-friendly format for sheet music. In: XML Conference and Expo, pp. 3–4 (2001). https://michaelgood.info/publications/music/musicxml-an-internet-friendly-format-for-sheet-music/
  • Hajic, J., Pecina, P.: In search of a dataset for handwritten optical music recognition: Introducing MUSCIMA++ (2017). http://arxiv.org/abs/1703.04824
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://arxiv.org/abs/1512.03385
  • Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
  • Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
  • Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645–2665 (2019). https://doi.org/10.3390/app9132645
  • Huber, D.M.: The MIDI Manual: A Practical Guide to MIDI within Modern Music Production. A Focal Press Book, Waltham (2020)
  • Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
  • López-Caro, J.: La Música en la Catedral de Santo Domingo de la Calzada. Vol. I: Catálogo del Archivo de Música (1988)
  • Lyu, L., Koutraki, M., Krickl, M., Fetahu, B.: Neural OCR post-hoc correction of historical corpora. Trans. Assoc. Comput. Linguist. 9, 479–493 (2021)
  • Mursari, L.R., Wibowo, A.: The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Comput. Eng. Appl. J. 10(3), 177–186 (2021)
  • Musitek: SmartScore 64 (2021). https://www.musitek.com/
  • Neuratron: PhotoScore 2020 (2020). https://www.neuratron.com/photoscore.htm
  • Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoč, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
  • Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). http://arxiv.org/abs/1506.01497
  • Rosebrock, A., Thanki, A., Paul, S., Haase, J.: OCR with OpenCV, Tesseract and Python. PyImageSearch (2020)
  • Serra, J., Soille, P.: Mathematical Morphology and Its Applications to Image Processing. Springer Science & Business Media, Dordrecht (2012). https://doi.org/10.1007/978-94-011-1040-2
  • Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). https://arxiv.org/abs/2006.07885
  • Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). https://arxiv.org/abs/2107.07786
  • Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. 2(3), 314 (2012)
  • Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)
  • Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection (2019). http://arxiv.org/abs/1911.09070
  • Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9188–9195. IEEE (2021)
  • Vazquez, L.: IceVision: an agnostic object detection framework (2020). https://github.com/airctic/icevision
  • Yousefi, J.: Image binarization using Otsu thresholding algorithm (2015). https://doi.org/10.13140/RG.2.1.4758.9284