Accès directs

Enriching Scientific Publications: An Article and a Poster for RAPIDO

Enriching Scientific Publications: An Article and a Poster for RAPIDO

Enriching Scientific Publications: An Article and a Poster for RAPIDO

As part of the RAPIDO project, Persée and Inist-CNRS co-authored a research article and a poster, both presented at the CORIA-TALN 2025 conference (Marseille, June 30 to July 4, 2025).

A Scientific Article…

Written by Lucas Anki, Pascal Cuxac, Justine Revol (Inist-CNRS), and Agnieszka Halczuk (Persée), the article presents a method for automatically identifying geographic entities (toponyms) in scientific publications and aligning them with the IdRef authority records. The aim is to strengthen the connection between publications and validated research data.

… and a Poster

These results were also shared in a poster presented at the conference. Titled “Rapido, interoperability and text mining: towards the alignment of scientific publications in archaeology”, it describes the processing pipeline developed for the project, the results obtained, and the prospects for future work.

A method and preliminary results

The developed method relies on:

  • An annotated corpus from the Bulletin de Correspondance Hellénique (15 volumes, 10,000 pages analyzed), available on the Persée portal
  • A named entity recognition (NER) technique, combining manual annotation and machine learning (Flair, BERT)
  • A rigorous evaluation of results, achieving an F1‑score of 85% for toponym extraction
  • An automatic alignment strategy, achieving 73% precision, even in the presence of ambiguities

The results demonstrate the robustness of the model, including on OCR-processed texts, and open up opportunities for improvement (multi‑label classification, expansion of geographic coverage, enriched navigation on the Persée platform…).

The RAPIDO project also involves the French Schools of Rome and Athens, as well as Abes.

Read the full article [PDF]
View the poster on HAL