Identifiers
This page explains what types of identifiers are important in the context of paleo data, and how they can be used. A version of this content is also published on the SPNHC wiki.
As we increasingly digitize specimens and share digital data about our collections, Persistent IDentifiersA unique, stable, and typically resolvable identifier. E.g. a DOI or an ARK (PIDspersistent identifier. A unique, stable, and typically resolvable identifier. E.g. a DOI or an ARK) are often assigned to digital specimen records and can also be used to reference other elements of collections data, such as people or taxa. PIDs are foundational elements of data infrastructure because they enable automated and semi-automated linking between concepts (see Meadows et al. (2019)Solid overview publication for understanding how PIDs are used in the scientific research domain.), and also help make data FAIR (see FAIR Guiding PrinciplesCanonical publication introducing the concept of FAIR for scientific data.).
What is a Persistent Identifier?
When we talk about persistent identifiers, we assume that they are:
- Unique. Unlike a catalog numberUnique identifier for a physical specimen in a collection. Typically human-readable., which may be locally unique, a PID must be unique on the global scale in order to ensure that the object it identifies can be unambiguously referenced. The need for uniqueness means that PIDs must be generated programmatically rather than created by human logic.
- Persistent. Once assigned to an object, a PID should never change. PIDs also should not be deleted or reassigned, although in some circumstances a PID may refer to an object that no longer exists. “Never” is relative; the current systems we use to manage PIDs are expected to have a lifespan of anywhere from decades to centuries.
- Computer readable. PIDs are designed primarily for use by computers, not humans, although some PID schemes do have components that are meaningful to humans.
- Resolvable. Generally, we expect that a PID can be reliably resolved to meaningful information about the identified object, e.g. the PID http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674 is an actionable URLUniform Resource Locator. A type of URI that specifies the location of a resource on the internet by describing its primary access mechanism. E.g. https://… that redirects to a specimen record on an institutional web portal.
Truly reliable, long term resolvability can be difficult to achieve. Registration agencies are the social infrastructure that govern and maintain resolvability for various PID schemes. For example, DataCite is a registration agency that mints Digital Object IdentifiersWidely used identifier format, mostly for digital publications and other documents. E.g. https://doi.org/10.5962/p.304567 (DOIsDigital Object Identifier. Widely used identifier format, mostly for digital publications and other documents. E.g. https://doi.org/10.5962/p.304567). See Hardisty et al. 2021Report on decision-making process for a regional project regarding how to implement PIDs. for a thorough discussion on what resolvability means and an example of how the European DiSSCoDistributed System of Scientific Collections. A project with the aim to digitally unify all European natural science assets, sharing common access, curation, policies and practices across countries while ensuring that all the data complies with the FAIR principles. project evaluated PID options for use by its member institutions.
Check out these two values to see what resolvable means in practice: “3af2b96d2-a8a1-47c5-9895-b0af03b21674” is a non-resolvable UUIDUniversally Unique Identifier. Type of GUID where the format is a string of 32 hexadecimal digits displayed in five groups separated by hyphens. Not inherently resolvable. E.g. fb515804-02c1-4e03-bfe0-c69437e180ec and “http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674” is a resolvable PID.
See below for a list of commonly used identifier formats and their applications in our domain. Agosti et al. (2022)Summary report of community discussions about PIDs in biodiversity data, including for specimens and for taxonomic concepts. also provide an excellent and practical review of how PIDs are being used or should be used by the collections community.
Types of Identifiers
PIDs (and other identifiers) can be assigned to different types of objects within the realm of natural history collections. Although we often think of them in relation to digital specimen records, PIDs are also useful when assigned to people, organizations, taxonomic concepts and names, geographic places, etc. See the table below for a summary of what types of identifiers are most commonly used where, with more detailed explanations following. von Mering et al (2021)Report from a working group detailing the implementation options for PIDs. is a good resource for additional examples.
| Identifier name | Registration/resolution agency | Format & Example value | Use |
|---|---|---|---|
| ARK (Archival Resource Key) | ARK Alliance | [URI]+[namespace]+[alphanumeric object identifier] http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674 |
generic |
| GUID (Globally Unique Identifier) | N/A | [alphanumeric object identifier] fb51580402c14e03bfe0c69437e180ec |
generic |
| Universally Unique IDentifier (UUID) | N/A | [alphanumeric object identifier] fb515804-02c1-4e03-bfe0-c69437e180ec |
generic |
| Wikidata QID (Q IDentifier) | Wikidata | [URI]+[numeric object identifier] https://www.wikidata.org/entity/Q43649390 |
generic |
| DOI (Digital Object Identifier) | e.g. DataCite or CrossRef | Handle (DOI) https://doi.org/10.5962/p.304567 |
digital objects |
| IGSN (International Generic Sample Number) | DataCite | Handle (DOI) https://doi.org/10.58052/DSR0004SY or igsn:10.58052/DSR0004SY |
physical objects |
| ORCID (Open Researcher and Contributor ID) | ORCID | [URI]+[numeric object identifier] https://orcid.org/0000-0001-6514-963X |
agents |
| ROR (Research Organization Registry identifier) | Research Organization Registry | [URI]+[alphanumeric object identifier] https://ror.org/03pnyy777 |
agents |
| COL identifier | Catalogue of Life | [namespace]+[alphanumeric object identifier] col:4QHKG or col:P |
taxa |
Generic identifiers
A Universally Unique IDentifierType of GUID where the format is a string of 32 hexadecimal digits displayed in five groups separated by hyphens. Not inherently resolvable. E.g. fb515804-02c1-4e03-bfe0-c69437e180ec (UUID)–more generically known as a Globally Unique IDentifierA more generic version of a UUID that consists of an integer generated by an algorithm such that is unique enough to nullify the risk of generating the same GUID twice. E.g. fb51580402c14e03bfe0c69437e180ec (GUIDGlobally Unique Identifier. A more generic version of a UUID that consists of an integer generated by an algorithm such that is unique enough to nullify the risk of generating the same GUID twice. E.g. fb51580402c14e03bfe0c69437e180ec)–is an identifier that is unique, persistent, and computer readable but not resolvable. A GUID is just a 128-bit integer number generated by an algorithm that is unique enough to make the risk of collision (i.e. generating the same GUID twice) null. Anyone can generate GUIDs for free via online tools such as https://www.guidgenerator.com. A UUID is a type of GUID where the format is a string of 32 hexadecimal digits displayed in five groups separated by hyphens. Anyone can generate UUIDs for free via online tools such as https://www.uuidgenerator.net.
ARKsArchival Resource Key. A decentralized, persistent identifier formatted as a URL. E.g. http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674 (Archival Resource KeysA decentralized, persistent identifier formatted as a URL. E.g. http://n2t.net/ark:/65665/3af2b96d2-a8a1-47c5-9895-b0af03b21674) are decentralized, persistent identifiers with the aim of being resolvable via a separate service, N2T. They are used in a diversity of contexts, from museum collections to genealogical records to public health documents. The ARK Alliance is not a true registration agency, as ARKs are created in a decentralized system, and in fact, many ARK users are larger libraries or museums with the capacity to mint their own ARKs. EZID is one example of a registration service provider.
Wikidata QIDs (Q IDentifiers) can be useful to reference concepts that do not have another, more authoritative source, e.g. a person who collected specimens in the early 1900s but never published anything. Anyone can create or edit WikidataA free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of Wikipedia. records, and thus the data associated with a QID should be expected to constantly evolve.
Identifiers for digital objects
Generic identifiers like those described above can be used to reference born-digital objects. A common example is the prevalence of UUIDs/GUIDs or ARKs in the Darwin CoreBiodiversity data standard maintained by TDWG, with terms for sharing species occurrence and specimen data. term occurrenceIDAn identifier for the dwc:Occurrence (as opposed to a particular digital record of the dwc:Occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:occurrenceID globally unique..
DOIs are a widely used digital object identifier format with multiple registration and resolution agencies, including DataCite and CrossRef. Learn more at the International DOI Foundation. Specimens could be assigned DOIs, but the most common applications for DOIs in our community are digital publications and other similarly formal documents.
Identifiers for physical objects
Generic identifiers like those described above can also be used to reference physical objects, such as fossil specimens or geologic samples. Some identifiers are scoped specifically for referring to physical objects. For example, IGSNsInternational Generic Sample Number. An identifier that is functionally a DOI registered in a namespace dedicated to physical samples. (International Generic Sample NumbersAn identifier that is functionally a DOI registered in a namespace dedicated to physical samples.) are functionally DOIs that are registered in a namespace dedicated to physical samples. Users can register samples and get IGSNs via SESAR, which is a community platform dedicated to managing and sharing metadataInformation describing data, e.g., who collected it, where, when, how. aboout physical samples. Learn about IGSNs from DataCite, which is the parent registration agency that manages IGSNs.
The Darwin Core terms materialEntityIDAn identifier for a particular instance of a dwc:MaterialEntity. and materialSampleIDAn identifier for the dwc:MaterialSample (as opposed to a particular digital record of the dwc:MaterialSample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:materialSampleID globally unique. both expect an identifier such as a UUID/GUID, ARK, or IGSN. Think of these identifiers as a digital complement to human-friendly catalog numbers.
Learn more about IGSNs and SESAR from the materials for PDWG Happy Hour on 2025-10-23.
Identifiers for agents
Identifiers for agents, such as people or organizations, can be incredibly useful as a means to link out to biographical and other metadata about agents when such data are maintained in a different system from other collections data. Identifiers for agents also help disambiguate agents with the same name, or agents who have changed their name over time. For more about disambiguating agents, check out BionomiaBionomia is an online platform that links natural history specimens to the world’s collectors. It integrates data from GBIF, ORCID, and Wikidata to allow users to connect name strings included in GBIF data to reconcilable identifiers..
For people, identifiers may be used as values in the Darwin Core terms recordedByIDA list (concatenated and separated) of the globally unique identifier for the person, people, groups, or organizations responsible for recording the original dwc:Occurrence. and identifiedByIDA list (concatenated and separated) of the globally unique identifier for the person, people, groups, or organizations responsible for assigning the dwc:Taxon to the subject.. ORCIDs are commonly used to identify living people; individuals create an ORCIDA persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities. for themselves rather than being assigned one by a third party. Wikidata QIDs (Q IDentifiers) are a generic type of identifier but frequently used for people, both living and dead. Learn more in Bauer et al. (2022)The purpose of this document is to provide a framework for how to mobilize information via Wikidata about people working in and/or associated with scientific collections. Building on previous Wikidata documentation produced by Siobhan Leachman (2020, https://doi.org/10.5281/zenodo.4724139) participants of the Using Wikidata to Capture and Share Information about People in Paleontology workshop (held March 29-31, 2022) created this framework to formalize and share practical knowledge gained from the workshop..
For institutions, ROR (Research Organization Registry) is the next generation of a similar identifier type called GRID (Global Research Identifier Database). RORs are designed to identify the top level institution, e.g. a university, and so can be difficult to apply to collections which may be, e.g. a department within a university.
Identifiers for taxa
Similarly to identifiers for agents, identifiers for taxa are essential tools to link out to data maintained in other systems, such as taxonomic classifications or nomenclatural history. Catalogue of Life (COL) identifiers allow the same identifier to be used for the same name across all versions of COLCatalogue of Life. A resource designed to compile and share the most complete authoritative list of the world’s species, as maintained by hundreds of global taxonomists., regardless of monthly, annual or extended releases. The “col” namespace is registered with identifiers.org, so users have to option to reference these in a resolvable way, e.g. https://identifiers.org/col:4QHKG. In Darwin Core, such identifiers may be used with the term taxonIDAn identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set..
Assigning PIDs
How to assign PIDs is a decision that should be considered carefully. See A Beginner’s Guide to Persistent IdentifiersA publication from the Global Biodiversity Information Facility that holds up over time as a solid introduction to PIDs. for a practical discussion on assigning PIDs that continues to be relevant even a decade past its writing (in particular, Section 5. Checklist for Implementing Persistent Identifiers). For more on the mechanics of assigning different types of PIDs to different subjects, such as DOIs for taxonomic treatments or INSDC accession numbers for genetic sequences, see Agosti et al. (2022)Summary report of community discussions about PIDs in biodiversity data, including for specimens and for taxonomic concepts..
It is best practice for PIDs to be assigned by the authoritative source, e.g. the institution who created and will manage in perpetuity a digital specimen record, or an individual for a PID referencing themself. It is also important to consider what the identifier is actually representing, e.g. a physical object vs. its digital surrogate, and to document this decision. For example, the European DiSSCo project has determined that in their context PIDs will be assigned to identify the digital representations of physical specimens (see Hardisty et al. 2021Report on decision-making process for a regional project regarding how to implement PIDs.), and a report on PIDs from the Research Data Alliance also recommends assigning them to digital rather than physical objects (see Wittenburg et al. 2017Report from the Research Data Alliance.).
Keep in mind that you can also make use of PIDs that you did not assign. A common example of this is the use of ORCIDs to identify people, like collectors or identifiers, associated with your specimen data.
Related content
Browse additional related content, including PDWG Happy Hours and links out to external resources, via topic: identifier