Accès directs

Text and data mining

Taken together, the digital content produced by Persée – whether it feeds the Persée portal  or the Perséides collections  – constitutes a critical mass of structured data. Beyond being available for retrieval through our websites, these data constitute a body of data in their own right that can be used across the board: visualizing trends, monitoring the evolution of certain concepts, terms or metaphors from a diachronic point of view, measuring occurrences, evaluating the influence of an author or a text in relation to a context.

Persée offers researchers access to data, while respecting the rights of third parties. Two complementary options coexist:

● Data Persée  contains the data describing all the resources produced by Persée in RDF. You can download a dataset, which may be a thematic part or the entire graph, that you can study, process or reuse on your computer. You can also search directly in the triplestore using the Sparql Endpoint or the Sparklis software which allows you to make queries in natural language.

●  The OIA-PMH repository is used to harvest the metadata from the Persée portal. You can access document metadata (bibliographic metadata in DC, MODS and marcXML) and all the information needed to represent a collection (periods and title changes, list of issues, parallel publications) and a number (table of contents). Access to the full text of the documents is reserved and subject to certain conditions.

● The “Authorities” web service facilitates the reporting of resources related to the authorities of the Persée portal, in other interfaces and third-party sites. You can retrieve the list of identifiers of the different authorities managed by Persée (author, taxa or monuments of Cairo), know the roles of each person authority or have access to all the documentary resources associated with an authority.


If you are a researcher and feel the content produced by Persée is appropriate for a text and data mining project, contact us.