Manage fossil specimen data using Symbiota
Draft
The content on this page has not been finalized. Contributors can mark a page as complete and remove this warning by adding status: published
to the front matter in the Markdown source file.
This guide is intended to complement documentation for getting started in the Symbiota Paleo Data Portal, as well as the official Symbiota user documentation, Symbiota Docs. Symbiota Docs provides general guidance for working in Symbiota-based data portals and should be referenced for basic functions and workflows. This manual expands on this resource to provide discipline-specific information for fossil collections.
Tip: Refer to example fossil specimen records in Symbiota here.
Introduction
There are two ways specimen records are typically entered into a Symbiota portal: 1) as a bulk data import or 2) directly using the Occurrence Editor interface. Additional methods are possible (more on that here), but these two options are commonly used by collections that actively (“live”) manage their specimen data using Symbiota.
Regardless of data entry method, it is important that all data providers become familiar with the Darwin Core data standard, which forms the basis for the majority of Symbiota’s Data Fields.
📬 Questions? Data providers are encouraged to contact paleoinformatics@gmail.com for assistance with questions related to importing and maintaining fossil specimen data using Symbiota. Include “Symbiota” in the subject of your email, e.g. “Help with preparing my data for the Symbiota Paleo Data Portal”.
Bulk data import
Formatting data for import
If you maintain fossil specimen data that needs to be imported into a Symbiota portal (e.g. in a spreadsheet), this section outlines actions you can take to prepare existing digital catalog records from your fossil collection for ingestion into Symbiota. This data preparation guide is intentionally designed help make your data more easily managed, discovered, and used for research; data providers and thus strongly encourged to follow the steps outlined below when possible.
Steps you can take to ready your records for ingestion
- If you maintain existing catalog records to be imported into Symbiota, perform some data cleaning to align your records to the Symbiota data fields and formatting specified in the previous step. OpenRefine is free software that can be used for this purpose. Highly recommended: Use the checklist below to prepare your data for import.
- If you’d like a template to follow, this spreadsheet is preformatted for use with Symbiota. Not all fields are required to contain data. Your spreadsheet must be converted to CSV format prior to ingestion into the portal, which can be easily accomplished in a program like Microsoft Excel or Google Sheets.
Data formatting checklist
If you maintain cataloged specimen records in a spreadsheet, this information can be imported into Symbiota in CSV format. Before doing so, it is highly recommended that you complete this checklist to prepare your data for ingestion to maximize the interoperability between your data and that of other collections, ultimately making your records more discoverable and useful for research. Additional data cleaning can be performed once your records are in Symbiota (printable version). The checklist below has been compiled based on scenarios observed in other datasets from fossil collections prepared for import. The Symbiota Data Import Fields guide provides important information about fields available in Symbiota, as well as the types of data that can be imported into each one—for instance, which fields can only contain numbers, dates only, textual data, etc.—and how this information should be formatted. Examples of datasets that have been cleaned in preparation for ingestion into Symbiota can be found here.
Example datasets: content forthcoming
Checklist Item | Recommendation |
---|---|
Catalog Numbers | Every occurrence (=specimen record) to be imported must have a catalog number assigned. Example: USNM000001 |
Basis of Record | Every record corresponding to a fossil specimen should receive the basisOfRecord value, “FossilSpecimen”. Example: FossilSpecimen |
Secondary identifiers | Parse into a semicolon delimited list of key:value pairs (i.e., tagName: identifier). Example: otherCatalogNumbers = legacy catalog number: ASU 3541; accession number: WIS-L-001456 |
Delimiters | Use pipes (| ) or semicolons to separate values in a list, and be consistent with formatting. Doing so will facilitate parsing of data, if ever needed, in the future. Avoid using commas as delimiters. Example: Associated Collectors = Charlotte Hill | Samuel Scudder | Arthur Lakes |
Fields containing different kinds of information | When this is unavoidable, use key:value pairs to concatenate data that must be combined into one field. Example: Occurrence Remarks = ACQUISITION DETAILS: Gift of Arthur Lakes April 1890 | NOTES: Original specimen label misplaced . |
Dates | Fields containing dates should be formatted in ISO format, e.g. YYYY-MM-DD. An exception to this rule is verbatimEventDate; use this field when dates are incomplete or not ISO formatted. |
Identifications | For specimens identified above the species level, do not include sp. , indet. , or similar suffixes. Qualifiers like aff. and ? should be recorded in a separate field, identificationQualifier. Verbatim label identifications (e.g. Lepidophyllum [?] can be captured in identificationRemarks. Leave blank for specimens/specimen lots without identifications. Refer to Symbiota-specific guidance for scientificName vs. sciName. Example: scientificName= Phylledestes vorax Cockerell, 1907 or sciName=Phylledestes vorax |
Localities | If any locality data should be obscured, include a locationSecurity column in your spreadsheet and give records with sensitive locality data a value of “1”. |
Geological time | Conform values to the ICS Time Scale and use “Early” in place of “Upper” and “Late” in place of “Lower”. Provide values for both earlyInterval and lateInterval, even if both values are the same. Regionally accepted values should be recorded using localStage (e.g. “Bridgerian”). See notice below regarding verbatim values in geological context data. Example: earlyInterval = Late Jurassic and lateInterval = Early Cretaceous |
Geological units | Values for group, formation, member, and bed should be parsed into the appropriate fields. See notice below regarding verbatim values in geological context data. Example: formation = Wasatch Formation and member = Niland Tongue |
Vocabularies | If your dataset contains anatomical elements that may benefit from the use of a controlled vocabulary, refer to these examples. |
Type specimens | Include a value in typeStatus (ICZN and IAPT values preferred). See below for information about “extending” your specimens that are referenced in literature. |
File format | Save your spreadsheet in comma-separated (CSV) format. Additionally, to ensure any special or accented characters import correctly, always save your data import files using UTF-8 character encoding. |
Cataloging specimen lots | When multiple individuals of a single taxon exist in a given lot (i.e. isolated in one physical container), they can be cataloged as a single occurrence record. See below for advice when a lot contains multiple taxa. |
Cataloging part-counterpart pairs and similar relationships between records | See below for information about “extending” your specimens. |
A note on verbatim values in geological context data: Many fossil specimens are accompanied by labels, field notes, and other primary data sources containing values that are no longer accepted (e.g. “Tertiary”), informally used (e.g. “Precambrian”), or indicate uncertainty (e.g., “Upper Mio?”). This information is important and should be recorded; however, it should not be captured using Symbiota’s earlyInterval and lateInterval fields, which map to a portal’s standardized geological time scale values (by default, these values are based on the ICS Time Scale). In the absence of an appropriate, standard-based term to record these data, this information should be captured in dynamicProperties as a key:value pair.
Example: VERBATIM CHRONOSTRATIGRAPHY: Permian?
How to import your data into Symbiota
There are multiple ways to import new records into a Symbiota portal. This action can only be completed by users with Administrator permissions through the Administration Control Panel.
- To import a spreadsheet of specimen occurrence data, use the “Full Text File Import” option.
- To import a spreadsheet of extended specimen data, use the “Extended Data Import” option. See below for more information about how to extend your specimens using Symbiota.
Recommendation: Import one or a very small number of representative records prior to initiating a larger import, especially if you are new to this process. Doing so will allow you to assess how your records will look in the portal. Similar to bulk data ingestion, only users with Administrator permissions can delete records, and this action cannot be done in bulk; records can only be deleted one-by-one using the Admin tab interface on the Occurrence Editor.
Steps you can take immediately after your records are in Symbiota
- Moving forward, make edits to your records and complete other management tasks, like managing loans, directly in Symbiota.
- Save your import spreadsheets somewhere safe, but you likely will not need them again once the records are ingested into your Symbiota portal.
- Run your portal’s built-in data cleaning tools to ingest new taxonomy and clean geographic location details.
- Further clean your data using tips in the Symbiota Data Quality Toolkit.
- Georeference your specimen records.
💡 The last two steps can be delegated to users with Editor permissions, such as students or volunteers!
Direct data entry
This section outlines recommendations for direct data entry using Symbiota’s Occurrence Editor interface, which allows users with Administrator and Editor user permissions to add and edit specimen records in Symbiota. As a reminder, the Darwin Core data standard forms the basis for the majority of Symbiota’s Data Fields. This guide is intentionally designed help make your data more easily managed, discovered, and used for research; data providers are thus strongly encourged to conform with the receommendations outlined in this section.
Orientation
How do I keep my records clean once they’re available in Symbiota?
Prevent new errors
When training new staff or volunteers on data entry or management, it is highly recommended that you point them toward this Knowledge Hub, but more specifically, have them become familiar with the [Symbiota Data Fields] and the data formatting checklist. The content on this page
Mitigate existing errors
Mistakes are likely to happen, even in carefully curated datasets. It is therefore recommended that you routinely assess your data using the Symbiota Data Quality Toolkit. This guide is designed to enable users with either Administrator or Editor permissions to your Collection Profile to “clean” your data–i.e. find and correct errors–using the portal’s built-in features wherever possible.
Crowdsource quality control
Symbiota maintains several built-in tools to facilitate collaborative data entry and data cleaning when enabled for your collection. For example, Administrators of a given collection can enable any portal user who is logged in with an account to suggest edits to your records in the portal. Suggestions must be reviewed by an Administrator before they become public. By default, this option is turned off, but it can be activated through your Administrator Control Panel. Review Symbiota Docs for more information about this feature.
Set up a data import profile
If you intend to repeatidly import data using a standard import template–for example, if you intend to cataloging using a spreadsheet method–you can set up a new data import profile based on your cleaned spreadsheet.
Extending your specimens
Once your occurrence records are available in Symbiota, associations can be created between your specimen data in Symbiota and external resources, including digitally available literature and other occurrence records (both in and external to your Symbiota portal). This can be accomplished using two methods. Users with Editor or Administrator permissions can create these linkages one-by-one using the Linked Resources tab; additionally, users with Administrator permissions can create these linkages in batch by uploading a CSV-formatted spreadsheet using the Extended Data Import tool. The latter option contains several fields that are not available in the Linked Resources tab, such as accordingTo.
Tip: When creating associations with external resources, provide a stable URL—like a DOI or a permalink—for the resourceURL whenever possible.
Examples of “Extended Specimens” in Symbiota are available in this dataset.
Type and referred specimens
You can create linkages between occurrence records in your Symbiota portal and digitally available publications using the fields and parameters specified below.
Examples: 1) USNMV4735 (holotype of Ceratosaurus nasicornis); 2) USNM P34765 (specimen of Carya libbeyii that has been referenced in several publications)
- Association Type =
Non-occurrence Resource
- Relationship Type =
isReferencedBy
subjectCatalogNumber | basisOfRecord | accordingTo | resourceURL |
---|---|---|---|
USNMP34765 | ReferenceCitation | Knowlton; 1916; Proceedings of the National Museum | https://www.biodiversitylibrary.org/page/7764079 |
USNMV4735 | ReferenceCitation | Carrano & Choinier; 2016; Journal of Vertebrate Paleontology | https://doi.org/10.1080/02724634.2015.1054497 |
Part/Counterpart specimens and similar scenarios
Scenario A: One institution owns all pieces of a fossil specimen
You can create associations between one or more occurrence records cataloged in your Symbiota portal using the fields and parameters specified below.
Example: ANSP3472 (part) and ANSP3473 (counterpart) were cataloged as separate records within the same Symbiota portal and subsequently linked as associated records.
- Association Type =
Occurrence - Internal (this portal)
- Relationship Type =
part
ORcounterpart
(describe the specimen being linked to)
subjectCatalogNumber | objectCatalogNumber | basisOfRecord |
---|---|---|
ANSP3472 | ANSP4373 | FossilSpecimen |
Think of the “subject” as the “part” and the “object” as the “counterpart” when creating a a part-counterpart pairing in Symbiota. Both records must already exist in the portal in order to create this type of relationship.
Alternative method: If you prefer to catalog part-counterpart specimens as a single specimen record, this is also possible, as in this example.
Scenario B: Multiple institutions own different pieces of a fossil specimen
Similarly, associations can be created between specimen occurrences in your Symbiota portal and occurrences in other data portals—for example, if your collection maintains one half of a part-counterpart pair, one or more pieces of an individual cataloged by different institutions, or a specimen-cast pairing. In all of these cases, you can create linkages between your catalog records in Symbiota and records hosted in external portals.
Example: USNM PAL 603860 (cataloged in Symbiota) is a cast of YPM VP 058990 (cataloged in an external database). An association has been created between these records in Symbiota.
- Association Type =
Occurrence - External Link
- Relationship Type = value varies depending on the association to be created
subjectCatalogNumber | objectID | basisOfRecord | verbatimSciname | resourceURL |
---|---|---|---|---|
USNMPAL603860 | YPMVP058990 | FossilSpecimen | Goleroconus alfi | https://collections.peabody.yale.edu/search/Record/YPM-VP-058990 |
Think of the “subject” as the piece of specimen retained in your collection (cataloged in Symbiota) and the “object” as part retained in an external collection. The verbatimSciName refers to the identification of the occurrence maintained by the external collection.
Cataloging multi-taxon specimen lots
Content forthcoming
External resources
- New Symbiota Features to Support Digital and Extended Specimen Data: Abstract of conference oral presentation on New Symbiota Features to Support Digital and Extended Specimen Data. Presented at the 2022 meeting of the Society for the Preservation of Natural History Collections.
- Symbiota Docs: Documentation for users of Symbiota software.
Abstract of conference oral presentation on New Symbiota Features to Support Digital and Extended Specimen Data. Presented at the 2022 meeting of the Society for the Preservation of Natural History Collections.