Recently, I got the very exciting news that a project I've been leading was awarded second place in the Ebbe Nielsen Challenge, an annual contest put on by the Global Biodiversity Information Facility (GBIF). The idea behind the challenge is to recognize projects that use GBIF-supplied biodiversity data and tools to innovate and promote open science.
The project I and my colleagues (Cory Merow of the University of Connecticut, Brian Maitner of the University of Arizona, and Vijay Barve & Rob Guralnick of the Florida Museum of Natural History) submitted is an R package called occCite. OccCite helps track where species occurrence data comes from. When we are trying to understand why species are found in a particular place, we often download our data from aggregators like the Global Biodiversity Information Facility. GBIF is a meta-database that serves data from over a thousand other sources, from museums like the Florida Museum to community science initiatives like eBird and iNaturalist. Often, the datasets we download contain data from multiple primary sources, and it can take a long time to track down a good citation for each source. OccCite looks at the raw data we've downloaded, and generates summaries of data sources, including formatted citations for inclusion in research papers. Citing primary data providers is important not just so that the research we do is reproducible, but also so primary providers like museums can keep track of how the data they provide is being used. Museums can then use this information to demonstrate how relevant their collections are for ongoing research.
I came up with the idea for OccCite after spending the better part of a week creating tables and collecting appropriate citations for a paper I wrote on mapping butterfly diversity that used occurrence data from 37 papers, four community science websites, directly from three natural history museums, four aggregator databases (like GBIF), a colleague's personal collection, and Flickr. Through occCite, you can download all known records from hundreds of museums and community scientists. That data will come not just with where they were found and when, but also comes with tables showing how many records came from each source, as well as pre-formatted citations for that data.
If you are interested in learning more about how to use occCite, I made a video tutorial (because after attending a recent workshop on how to make videos, it seems much less daunting). Here it is:
Universitetsparken 15, byg 3
2100 Copenhagen Ø, Denmark
Copyright © 2015