Giuseppe De Gregorio

Barcelona (Spain), +39 348 6412436 degregorio.gius@gmail.com

Postdoctoral researcher specializing in Computer Vision, Document Analysis, and Handwriting Recognition, with a PhD in Information Engineering. My doctoral work centered on Word Spotting in historical manuscripts, where I developed deep learning techniques for subword-level detection in challenging, low-quality documents, a problem that demands robustness under extreme data scarcity and visual noise.
My current research focuses on handwriting recognition in low-resource scenarios, working with some of the most demanding document types available: Ancient Greek papyri, ciphered manuscripts, and rare scripts. To tackle these challenges, I draw on self-supervised learning, synthetic data generation, and in-context learning with vision-language models, methods that sit at the intersection of classical computer vision and modern large-scale model capabilities. I have also worked on layout analysis, character clustering, and retrieval-based approaches, giving me a broad foundation across the document understanding pipeline. I have published at leading peerreviewed venues (ICDAR, ICFHR) and co-organized tutorials and workshops at major conferences in document analysis and machine learning.

Skills

ML Frameworks: PyTorch, HuggingFace Transformers, scikit-learn
Key Methods: Self-supervised learning, few-shot/low-resource learning, synthetic data generation, in-context learning, contrastive learning, retrieval-based methods
Vision & OCR: Object detection (YOLO, Faster R-CNN), semantic segmentation, HTR/OCR pipelines, document layout analysis
Vision-Language Models: Hands-on experience applying VLMs (LLaVA, GPT-4V-class) to historical document tasks
Languages & Tools: Python, Java, JavaScript, Git, LaTeX, Linux

Programming Languages & Tools

Experience

Post-Doctoral Researcher

Computer Vision Center - CVC, Barcelona (Spain)

Postdoc in DESCRYPT project.

Research on handwriting recognition and document analysis for ciphered and rare-script manuscripts.
Working at CVC's Document Analysis Group (DAG), one of Europe's leading CV research centres.

November 2025 - Present

Post-Doctoral Researcher

Department of Ancient Civilizations of the University of Basel, Basel (Switzerland)

Postdoc in EGRAPSA project.

Developed deep learning models for character detection and recognition in Ancient Greek papyri.
Built GlyFix, a data curation tool for correcting and validating AI-generated manuscript annotations.
Applied VLMs and language models to automated correction of transcription errors in ancient Greek texts.

October 2023 - September 2025

Research Grant

DIEM Departement, University of Salerno , Fisciano (Italy)

Few-Shot learning techniques for the elaboration of handwritten documents of historical interest.

November 2022 - February 2023

Research Grant

DIEM Departement, University of Salerno , Fisciano (Italy)

Support tools for the transcription of handwritten documents of historical-cultural interest.

January 2022 - June 2022

Research Grant

DIEM Departement, University of Salerno , Fisciano (Italy)

Support tools for the transcription of handwritten documents of historical-cultural interest.

June 2021 - December 2022

Research Grant

DIEM Departement, University of Salerno , Fisciano (Italy)

Support tools for the transcription of handwritten documents of historical-cultural interest.

February 2020 - March 2020

Education

Ph.D. in Information Engineering

DIEM Department, University of Salerno, Fisciano Italy

Specific field of the degree course: Ingegneria dell’Infromazione ING-INF/05

Thesis Title: N-gram Retrieval for Word Spotting in Historical Handwritten Collections

December 2019 - March 2023

Master’s Degree in Computer Enginering

DIEM Department, University of Salerno, Fisciano Italy

Specific field of the degree course: Ingegneria Informatica LM-32

2nd level degree in Computer Engineering

Thesis Title: Early Diagnosis for Neurodegenerative diseases from Handwriting Analysis: AI-based Approach

Final Degree Mark: 110/110

September 2016 - April 2019

Bachelor’s Degree in Computer Enginering

DIEM Department, University of Salerno, Fisciano Italy

Specific field of the degree course: Ingegneria Informatica L-8

1st level degree in Computer Engineering

Thesis Title: Un Linguaggio per la Descrizione di Modelli Fiscali (A Language for the Description of Tax Models)

Final Degree Mark: 97/110

September 2008 - April 2015

Studies and Experiences Abroad

Abroad Research Period

Computer Vision Center (CVC) - Universitad Autònoma de Barcelona, Spain

The Computer Vision Centre (CVC) hosted me as a visiting PhD Student. The CVC is a research centre with a major impact on computer vision across Europe and the DAG research group (Document Analysis Group) is one of the centre's main groups. The experience was guided by the direct supervision of inner experts who acted as facilitators and links between the whole international research group and me.

January 2022 - June 2022

Abroad Research Period

Athens University of Economics and Business, Athens - Greek

Selected for the ATRIUM Transnational Access (TNA) Program, a competitive, fully funded research mobility initiative supporting international collaboration. Hosted by Prof. John Pavlopoulos, I worked on the project “Language Models for Automatic Correction of Manuscript Annotations on Ancient Greek Papyrus.” The project explores the use of language models to automatically correct transcription errors in ancient Greek texts, improving the accuracy and efficiency of manuscript annotation.

May 2025

Conferences and Seminars

ICDAR 2025: The 19th International Conference on Document Analysis and Recognition

Wuhan, Hubei, China

Tutorial titled “Historical Documents in Focus: Visual and Computational Analysis from Papyri to Inscriptions”. The tutorial offers a guided introduction to the computational analysis of historical documents, with a focus on methods from computer vision and pattern recognition.

16 September 2025

Source Codes of the Past: Launching an international ATR/HTR Network for Manuscript Analysis

Princeton NJ, USA

Invited Speaker
Given a talk entitled "Comparing Alphas: Detection and Recognition of Ancient GreekCharacters on Papyri and their Applications in Digital Paleography". During the talk,Deep Models for the detection of Greek manuscript characters were presented and theGlyFix tool was introduced to curate the automatically generated annotations.

12-13 June 2025

IGS 2025: The 22th Conference of the International Graphonomics Society

Montreal, Canada

Presented the paper “Hypothesis-Aware Ductus Generation for Greek Papyri: A Data-Driven Approach.” The work explores the alignment of online pen trajectories, recorded via a character tablet, with images of the letter alpha extracted from Greek papyri, aiming to model the writing ductus in a hypothesis-aware framework.

31 August 2024

Workshop on AI for Paleography of Ancient Documents

Naples, Italy

Invited Speaker
I Presented recent developments in the use of AI for the automatic labelling of characters in papyrus documents. Demonstrated how, with appropriate application of transcription tools, it is theoretically possible to reduce document transcription time by up to 50%.

31 August 2024

IWCP 2024: ICDAR Workshop on Computational Paleography

Athens, Greece

At the conference, I presented the Workshop 'NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval' and 'A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri'. This work analyzes the use of machine learning and artificial intelligence techniques for the analysis and dating of historical documents written on papyrus in ancient Greek.

31 August 2024

ICFHR 2022: International Conference on Frontiers in Handwriting Recognition

Hyderabad, India

At the conference, I presented the work 'Few Shot Multi-Representation N-gram Spotting for Historical Manuscripts'. Performance in retrieving words in historical documents is moderate. This is mainly due to the paucity of labelled data to train the models. In the paper, we propose a few-stroke learning paradigm for detecting sequences of a few characters in images of handwritten text that requires a small amount of labelled training data

04-07 December 2022

AIxIA 2022: 21st International Conference of the Italian Association for Artificial Intelligence

Udine, Italy

At the conference, I presented the work 'Word Spotting in Handwritten Historical Documents by N-gram Retrieval'. The contribution deals with the problem of recovering handwritten words in images of documents belonging to small collections of historical interest. The proposed methodology focuses the search on sequences of characters instead of whole words, with the aim of making unknown words searchable by the system.

28 November 2022 – 2 December 2022

IGS 2021: The 20th Conference of the International Graphonomics Society

Las Palmas de Gran Canaria, Spain

At the conference, I presented the work 'Transcript Alignment for Historical Handwritten Documents: The MiM Algorithm'. Tracing the image portion of a document that contains the handwritten text starting from the transcription is essential for the study as well as for the development of modern technologies that facilitate searching, indexing, and transcription. We have proposed a method to automatically align the transcript to images of handwritten words.

7 June 2022 – 9 June 2022

EvoStar 2022

Madrid, Spain

At the conference, I presented the work 'Negative Selection Algorithm for Alzheimer's Diagnosis: Design and Performance Evaluation'. In the article, we present a method to discriminate between healthy subjects and Alzheimer's patients by analysing online writing. The methodology adapts a Negative Selection algorithm for the purpose and the idea is to use only information relating to the control group of healthy subjects in the learning phase.

20 April 2022 – 22 April 2022

AIHA2020 - Artificial Intelligence for Healthcare Applications

Milan, Italy

At the conference, I presented the paper 'A Multi Classifier Approach for Supporting Alzheimer's Diagnosis Based on Handwriting Analysis'. The work presents an AI-based methodology that analyses handwriting and drawing tasks to discriminate between healthy subjects and patients affected by Alzheimer's disease. The use of AI favours the development of reliable, non-invasive, easy-to-use and inexpensive diagnostic tools.

10 January 2021

Publications

Conference Proceedings

Exploring the Automatic Alphabet Identification of Images of Handwritten Ciphers To Appear

International Conference on Historical Cryptology, HistoCrypt 2026

Reinares, A., De Gregorio, G., Fornés, A., Megyesi, B. Exploring the Automatic Alphabet Identification of Images of Handwritten Ciphers. In: *Proceedings of the International Conference on Historical Cryptology, HistoCrypt 2026*, Amiens, France (To appear).

2026

Conference Proceedings

Unsupervised Feature Learning via Convolutional Autoencoders for Cross-Manuscript Comparison in Historical Cryptanalysis To Appear

International Conference on Historical Cryptology, HistoCrypt 2026

Reinares, A., De Gregorio, G., Fornés, A. Unsupervised Feature Learning via Convolutional Autoencoders for Cross-Manuscript Comparison in Historical Cryptanalysis. In: *Proceedings of the International Conference on Historical Cryptology, HistoCrypt 2026*, Amiens, France (To appear).

2026

Conference Proceedings

Learning Diachronic Representations of Ancient Greek Letterforms To Appear

International Conference on Document Analysis and Recognition, ICDAR 2026

Pavlopoulos, J., Barbakos, S., Ferretti, L., Voulgarakis, D., Paparrigopoulou, A., Konstantinidou, M., De Gregorio, G., Marthot-Santaniello, I., Platanou, P., Essler, H. Learning Diachronic Representations of Ancient Greek Letterforms. In: *Proceedings of the 20th International Conference on Document Analysis and Recognition, ICDAR 2026*, Vienna, Austria (To appear).

2026

Conference Proceedings

Hypothesis-Aware Ductus Generation for Greek Papyri: A Data-Driven Approach

International Conference of the International Graphonomics Society, IGS 2025

Dung Van, D., De Gregorio, G., Pena, R., Fischer, A., Marthot-Santaniello, I. Hypothesis-Aware Ductus Generation for Greek Papyri: A Data-Driven Approach. In: IGS 2025 - 22nd IGS conference: Investigating Human Movements - Handwriting and Beyond

2024

Conference Proceedings

A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri

IWCP 2024: ICDAR Workshop on Computational Paleography

De Gregorio, G., Ferretti, L., Pena, R.C.G., Marthot-Santaniello, I., Konstantinidou, M., Pavlopoulos, J. (2024). A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri. In: Mouchère, H., Zhu, A. (eds) Document Analysis and Recognition – ICDAR 2024 Workshops. ICDAR 2024. Lecture Notes in Computer Science, vol 14936. Springer, Cham. https://doi.org/10.1007/978-3-031-70642-4_7

2024

Conference Proceedings

NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval

IWCP 2024: ICDAR Workshop on Computational Paleography

De Gregorio, G., Perrin, S., Pena, R.C.G., Marthot-Santaniello, I., Mouchère, H. (2024). NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval. In: Mouchère, H., Zhu, A. (eds) Document Analysis and Recognition – ICDAR 2024 Workshops. ICDAR 2024. Lecture Notes in Computer Science, vol 14936. Springer, Cham. https://doi.org/10.1007/978-3-031-70642-4_5

2024

Conference Proceedings

I Can’t Believe It’s Not Better: In-air Movement for Alzheimer Handwriting Synthetic Generation

International Conference of the International Graphonomics Society, IGS 2023

Bensalah, A., Parziale, A., De Gregorio, G., Marcelli, A., Fornés, A., Lladós, J. (2023). I Can’t Believe It’s Not Better: In-air Movement for Alzheimer Handwriting Synthetic Generation. In: Parziale, A., Diaz, M., Melo, F. (eds) Graphonomics in Human Body Movement. Bridging Research and Practice from Motor Control to Handwriting Analysis and Recognition. IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer, Cham. https://doi.org/10.1007/978-3-031-45461-5_10

2023

Conference Proceedings

Estimating the Optimal Training Set Size of Keyword Spotting for Historical Handwritten Document Transcription

International Conference of the International Graphonomics Society, IGS 2023

De Gregorio, G., Marcelli, A. (2023). Estimating the Optimal Training Set Size of Keyword Spotting for Historical Handwritten Document Transcription. In: Parziale, A., Diaz, M., Melo, F. (eds) Graphonomics in Human Body Movement. Bridging Research and Practice from Motor Control to Handwriting Analysis and Recognition. IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer, Cham. https://doi.org/10.1007/978-3-031-45461-5_12

2023

Conference Proceedings

The Neglected Role of GUI in Performance Evaluation of AI-Based Transcription Tools for Handwritten Documents

International Conference of the International Graphonomics Society, IGS 2023

De Gregorio, G., Marcelli, A. (2023). The Neglected Role of GUI in Performance Evaluation of AI-Based Transcription Tools for Handwritten Documents. In: Parziale, A., Diaz, M., Melo, F. (eds) Graphonomics in Human Body Movement. Bridging Research and Practice from Motor Control to Handwriting Analysis and Recognition. IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer, Cham. https://doi.org/10.1007/978-3-031-45461-5_11

2023

Journal Article

End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code

Journal of Imaging

De Gregorio, G., Capriolo, G., & Marcelli, A. (2023). End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code. Journal of Imaging, 9(1), 17.

2023

Conference Proceedings

A Few Shot Multi-Representation Approach for N-Gram Spotting in Historical Manuscripts

International Conference on Frontiers in Handwriting Recognition (ICFHR)

De Gregorio, G., Biswas, S., Souibgui, M. A., Bensalah, A., Lladós, J., Fornés, A., & Marcelli, A. (2022, November). A Few Shot Multi-Representation Approach for N-Gram Spotting in Historical Manuscripts. In Frontiers in Handwriting Recognition: 18th International Conference, ICFHR 2022, Hyderabad, India, December 4–7, 2022, Proceedings (pp. 3-17). Cham: Springer International Publishing.

2022

Conference Proceedings

Word Spotting in Handwritten Historical Documents by N-gram Retrieval.

International Conference of the Italian Association for Artificial Intelligence

De Gregorio, Giuseppe, and Angelo Marcelli. Word Spotting in Handwritten Historical Documents by N-gram Retrieval. In International Conference of the Italian Association for Artificial Intelligence, Udine, Italy.

2022

Conference Proceedings

Transcript Alignment for Historical Handwritten Documents: The MiM Algorithm.

International Conference of the International Graphonomics Society, IGS 2021

De Gregorio, Giuseppe, Ilaria Citro, and Angelo Marcelli. Transcript Alignment for Historical Handwritten Documents: The MiM Algorithm. Intertwining Graphonomics with Human Movements: 20th International Conference of the International Graphonomics Society, IGS 2021, Las Palmas de Gran Canaria, Spain, June 7-9, 2022, Proceedings. Cham: Springer International Publishing, 2022.

2022

Journal Article

Diagnosing Alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking

Engineering Applications of Artificial Intelligence

Nicole D. Cilia, Giuseppe De Gregorio, Claudio De Stefano, Francesco Fontanella, Angelo Marcelli, Antonio Parziale, Diagnosing Alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking, Engineering Applications of Artificial Intelligence, Volume 111, 2022.

2022

Conference Proceedings

Negative selection algorithm for alzheimer’s diagnosis: Design and performance evaluation

EvoStar 2022

De Gregorio, Giuseppe, Antonio Della Cioppa, and Angelo Marcelli. Negative selection algorithm for alzheimer’s diagnosis: Design and performance evaluation. Applications of Evolutionary Computation: 25th European Conference, EvoApplications 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings. Cham: Springer International Publishing, 2022.

2022

Conference Proceedings

A multi classifier approach for supporting Alzheimer’s diagnosis based on handwriting analysis

ICPR International Workshop: Artificial Intelligence for Healthcare Applications – AIHA2020

De Gregorio, Giuseppe, Desiato, Domenico, Marcelli, Angelo, Polese, Giuseppe. A multi classifier approach for supporting Alzheimer’s diagnosis based on handwriting analysis. Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part I. Springer International Publishing, 2021.

2021

Journal Article

A Model for Evaluating the Performance of a Multiple Keywords Spotting System for the Transcription of Historical Handwritten Documents

Journal of Imaging

Marcelli, A., De Gregorio, G., & Santoro, A. (2020). A Model for Evaluating the Performance of a Multiple Keywords Spotting System for the Transcription of Historical Handwritten Documents. Journal of Imaging, 6(11), 117.

2020