Persée is constantly working to improve the service provided to its users, which results in the availability of new functionalities:
- citation management in 2007;
- management of bibliography tools in 2009;
- authors’ alignments on IdRef from 2014 onward;
- opening of data.persee.fr in 2017,…
Today, Persée is taking a new step by exploiting the power of the semantic web in a richer way by transforming “author” pages into “authority” pages, an authority that serves to unambiguously identify people, objects or concepts.
We have therefore developed three types of “authority” pages: people, taxa and monuments of Cairo. This note focuses on the so-called “People” authorities. We will later devote posts to the “Taxons” and “Monuments” authorities.
What is a “Person” authority?
These may be the authors of the digitized and distributed documents but also the subjects of these documents, as in the case of obituaries.
What information do I find on a “Person” authority page?
In practice, they are structured around different elements:
- The definition of authority, which includes textual or graphic information produced and collected by Persée;
- Resources related to authority in Persée’ collections:
- Documents that have these authorities as their author or subject
- Authors who have written about, or with, these authorities;
- Illustrations relating to these authorities.
- Data concerning these authorities from selected platforms such as sudoc, dbpedia, theses.fr, data.bnf.fr, GBIF or the Cairo Gazetteer.
How do I acces it?
It is accessible from the article records by clicking on a person’s name in the bibliographic reference.
But what does the semantic web have to do with all this?
To build these pages, we need to retrieve data from other databases. These databases are called “repositories” and we align ourselves with them by creating links between our “Persons” authorities and the corresponding “Persons” authorities in these famous repositories.
What are the standards used?
- IdRef : we align the authors present in the Persée collections with their equivalent in this repository. Thanks to a fruitful partnership with the ABES teams, all our author management tools are now backed by IdRef. This makes it possible (1) to improve the quality of the information of each of the partners, (2) to serve as a gateway to other international standards and (3) to provide access to their scientific production catalogued in but also in Calames (online catalogue of archives and manuscripts of higher education) and in theses.fr (catalogue of French ongoing or doctoral theses defended since 1985).
- bnf.fr : we retrieve the author file of the authority described on this platform as well as its disciplinary fields;
- Dbpedia : we retrieve the authority record of the structured version, as well as in the form of standardized data in the format of the semantic web of Wikipedia;
What about the internal machinery in all this?
Upstream of the portal, the Persée team has integrated into its documentation procedures the creation and management of links for an initial group of repositories (IdRef, GBIF, Cairo Gazetteer). A major data winnowing project is being carried out in partnership with the promoters of these standards, to improve the alignment rate, as well as the consistency and quality of the data of each of the partners.
These “primary” alignments are complemented by the use of data and/or services made available by different sites (data.bnf.fr, viaf, dbpedia, eol, etc.). All this information is cross-checked and verified by automated procedures before being uploaded to the Persée portal.
For example, for Albert jacquard, we have stored the following links : http://ws.persee.fr/authority/persee/29942/id
On the Persée portal, services are available to retrieve all the data available from external services and store it. They are then used to pre-build the HTML fragments that the authority page aggregates. All the data collected is regularly compared with the original data to be kept up to date (circa two-week cycle).
All information made available by external services is retrieved indiscriminately, with the filtering of the most relevant elements to us being carried out afterwards.
We chose to differentiate querying services and external data display so as not to be dependent on these services and to be able to display data in the authority pages, whatever the status of the external service.
All these production and curing procedures can be generalized to other types of authorities and other standards.
A more technical post will be written in the near future to present in detail the technologies and processes used to create the mashups pages.
And finally, a few figures :
198 660 authorities are already proposed, distributed as follows:
– 160 747 individuals
– 37275 taxa
– 638 monuments
The alignments are distributed as follows:
– 67 352 alignments to IdRef
– 42 338 alignments to data.bnf.fr
– 37 275 alignments to GBIF
– 7 337 alignments to DBpedia
– 638 alignments to the Cairo Gazetteer
Other alignments have been made but are not yet being exploited.
Hélène Begnis, Partnerships Officer