• Home
  • Research
  • Publications
  • Blog and News
  • CV and Links
Hannah L. Owens

What's Going On?

New R Tools to Acquire, Manage, Visualize, and Cite Occurrence Data

9/29/2021

 
By Hannah L. Owens and Jamie M. Kass, on behalf of all co-authors*
 
There are billions of species occurrence records served by aggregator databases. The Global Biodiversity Information Facility (GBIF) serves over 1.8 billion occurrence records for species from across the tree of life (GBIF Secretariat 2021), and the Botanical Information and Ecology Network (BIEN) serves over 200 million plant observations (Botanical Information and Ecology Network 2021). The primary datasets these aggregators serve are the result of millions of hours of work by museums and community science initiatives (among others) and are constantly updated as taxonomy changes and data are accrued. Citing the primary datasets that supply data to GBIF and BIEN, together with accession dates, facilitates reproducibility and scientific transparency. These citations also support primary data providers by acknowledging their role as an essential link in the research chain.
 
However, when researchers download occurrence datasets from multiple primary providers via aggregator databases (such as those used in broad-scale biogeographic and macroecological studies), managing and effectively communicating the metadata can be incredibly time-consuming. This is where our new R package, occCite, comes in. occCite is designed to facilitate searches of dataset aggregation services (currently, GBIF and BIEN) that store and manage metadata on primary data providers, database accession dates, DOIs, and taxonomic sources in a unified framework within the R environment. Search results are organized as single objects that can be passed to functions to generate visual and statistical summaries and generate formatted citations.
 
occCite’s Two Main Steps
  1. Search. occQuery() provides several ways of generating and optimizing  occurrence searches while storing detailed metadata. occQuery() returns an occCiteData object with information on the type of query made, date of the query, taxonomy used, species names used, database aggregators searched, and a named list of search results corresponding to each species.
  2. Cite. The occCiteData object generated by occQuery() can be passed to occCitation() to automatically generate citations with accession dates and DOIs. occCitation() returns a named list with entries corresponding to the taxonomic names used to build a query. The print() method can be used to turn these results into a formatted and alphabetized set of references, either as a single block of text for all species, or as blocks of text for each species individually.
 
Additional Features
Taxonomic Rectification. By default, occQuery() checks species’ names against the GBIF backbone taxonomy. The user may instead elect to use studyTaxonList() to prepare a data object with the species’ names to be searched that has been checked against a taxonomy of their choice from the Global Names Index (http://gni.globalnames.org/).
 
Text Summaries. When the print() method is used on an occCiteData object, tables summarizing taxonomic cleaning results, search results with counts of occurrences for each species from each dataset aggregator, and the GBIF DOIs associated with each species’ search are returned.
 
Summary Plots. occCite provides three types of plots for results from occQuery() when the plot() method is used on an occCiteData object: a histogram showing occurrences by year, a waffle plot showing the proportion of results supplied by GBIF versus BIEN, and a waffle plot showing the proportion of occurrences supplied by each primary data provider. These plots can be generated either for all search results or by species.

Maps. Interactive leaflet maps can be generated from occCiteData objects via the occCiteMap() function, for all search results or by species. Users can specify occurrence point marker colors and symbologies. Hovering over a point in the interactive map provides information on the species name, coordinates, date, dataset, and dataset aggregator that supplied it.

 
The Future of occCite
occCite has been integrated as a module in the development version of Wallace, a modular, R-based graphical user interface for modeling species’ ecological niches and geographic distributions (Kass et al. 2018). When Wallace users opt to include data source citations in occurrence data searches, occCite will be invoked to run the search and generate citations.
 
In the future, we plan to expand the number of database aggregators that occCite queries, and add various fit-for-purpose filtering actions (e.g., duplicate removal, temporal downsampling, geographic and environmental outlier removal). We also plan to add comparative summary plots for raw vs. filtered data or comparing different occCiteData objects. We hope you’ll keep up-to-date via our GitHub website (hannahlowens.github.io/occCite/) for these and other exciting developments!
 
 
Further Reading

Manuscript: https://onlinelibrary.wiley.com/doi/10.1111/ecog.05618
 
CRAN release: https://CRAN.R-project.org/package=occCite
 
YouTube Tutorial: https://www.youtube.com/watch?v=7qSCULN_VjY&t=17s
 

References
Botanical Information and Ecology Network. 2021. BIEN, the Botanical Information and Ecology Network. bien.nceas.ucsb.edu, accessed 6 August 2021.
 
GBIF Secretariat. 2021. GBIF: Global Biodiversity Information Facility. gbif.org, accessed 6 August 2021.
 
Kass, JM, Vilela, B, Aiello‐Lammens, ME, Muscarella, R, Merow, C. and Anderson, RP. 2018. Wallace: A flexible platform for reproducible modeling of species niches and distributions built for community expansion. Methods in Ecology and Evolution, 9: 1151-1156. DOI: 10.1111/2041-210X.12945

*Originally written for Ecography blog

Comments are closed.

    Author

    Biodiversity. Biogeography.
    Climate change.
    Fish.
    Butterflies.
    Not butterfly fish. Yet.

    Archives

    May 2023
    January 2023
    September 2021
    April 2020
    February 2020
    June 2019
    April 2019
    October 2017
    April 2017
    November 2016
    September 2016
    June 2016

    Categories

    All
    Butterflies
    Fish
    Mardigra
    OccCite
    R
    Science Communication

    RSS Feed

Universitetsparken 15, byg 3
2100 Copenhagen Ø, Denmark

hannah.owens(at)SUND.ku.dk

Copyright © 2015
  • Home
  • Research
  • Publications
  • Blog and News
  • CV and Links