I am one of a long list of co-authors on a paper that came out in Nature Ecology and Evolution recently, titled “A global phylogeny of butterflies reveals their evolutionary history, ancestral hosts and biogeographic origins”. This is the most recent result of the ButterflyNet project, which began while I was a postdoc in the Kawahara and Guralnick labs at the Florida Museum of Natural History. It is incredibly global in its collaborative scope, not just in terms of data collection, but also in analysis and interpretation. The project’s modest goal was to generate the largest butterfly phylogeny to date, and assemble a massive database of morphological, ecological, and geographic information to pair it with.
The phylogeny presented in the recently-published paper represents 92% of all butterfly genera and is based on 391 genes; it provided the backbone for several comparative phylogenetic analyses, that focused on biogeographic and host-plant interaction reconstructions, which was largely where my contribution fit in. Honestly, I felt a bit spoiled being able to swan in after the hard work of data collection to consult on the analyses. Based on the available data, it appears butterflies originated in the Americas and first fed on plants in the family Fabaceae. However, the true message of the paper could just as easily be “we’ve just scratched the surface here, there is so much more to do!!”, as there are a lot more data that have been collected and not yet analyzed. Look forward to continuing downstream papers in the coming years!
Here is the citation: Kawahara, A.Y., Storer, C., Carvalho, A.P.S. et al. A global phylogeny of butterflies reveals their evolutionary history, ancestral hosts and biogeographic origins. Nat Ecol Evol (2023). https://doi.org/10.1038/s41559-023-02041-9
And here's reporting on the paper by National Public Radio in the US!
One of the base units of analysis for biogeography and conservation science is the species range map. Once we know where a species is, we can ask questions like "Why is it there?", "How did it get there?", or "What can we do to make this place better for it?" Especially these days, I am very interested in mapping marine fish distributions, which, it turns out, is not as simple as mapping terrestrial species.
The problem is that ecological niche modeling and species distribution modeling methods and theory were largely developed by people working in terrestrial systems. Specifically, correlative modeling approaches were developed based on environmental conditions extracted at two-dimensional coordinates across a horizontal landscape. Sure, birds and insects fly (and some of them at very high altitudes), but their interactions with the upper troposphere are generally brief and unlikely to directly effect the distributions of these species.
In comparison, ocean fishes may live their whole lives at the ocean's surface or the seafloor, but they may also swim freely somewhere between the two. The largest migration in the world occurs DAILY, when a million tons of mesopelagic fishes travel between the cold, dark safety of the deep sea and warm, food-rich surface waters. If you modeled the ecological niche of one of these pelagic species based on surface conditions where it was caught at night, you might drastically underestimate its temperature tolerances. If distribution maps based on these inaccurate models are used in downstream analyses, they may bias results and cause all kinds of other problems.
This is where voluModel comes in. voluModel is an R package I developed in collaboration with my advisor and co-author, Carsten Rahbek, during my Marie Curie Fellowship postdoc. In our new paper, out in Methods in Ecology and Evolution this week, we introduce a workflow to extract environmental conditions based on the three-dimensional coordinates where fishes are observed, as well as their background environments. These data can then be used in any typical ecological niche modeling algorithm that accepts data in a points-with-data format; the model can then be projected back into three-dimensional space using a simple loop. While three-dimensional modeling has been done before, this is the first set of tools to efficiently move through a 3-D workflow instead of using case-specific custom code. Accompanying the package are vignettes that provide 1) an overview of how voluModel works, 2) raster processing tools, 3) 3D environmental data sampling, 4) visualization tools, and 5) a basic overview of how to generate a generalized linear model with 3D data.
Development of voluModel is ongoing--there have been three major updates since we first submitted the paper. Largely, this is due to the rapidly-developing landscape of faster, more efficient geoprocessing R tools using terra and other successors to the raster and rgeos packages, which are being phased out. However, we have also implemented several suggestions from manuscript reviewers and early package adopters. If you have suggestions for improvements or missing features, or if you have found bugs, you may report them here, or even better, send me a (detailed) pull request!
My aspiration is that voluModel helps niche modelers (including me) to efficiently generate more accurate estimates of pelagic species distributions for downstream biogeographic analyses and conservation assessments. This is especially useful for data-poor species that may not be subject to the same exhaustive study as target fisheries species, but which are nonetheless important pieces in the puzzle both for biogeographers and conservation scientists.
Read our new paper in Methods in Ecology and Evolution HERE.
Access the voluModel website HERE.
View voluModel on CRAN HERE.
Wallace 2.0 is out!
I’m thrilled to have been part of a paper out in Ecography today: “wallace 2: a shiny app for modeling species niches and distributions redesigned to facilitate expansion via module contributions”. It updates the original wallace R package, which is a really useful tool for teaching niche modeling in the R ecosystem without requiring students to be exceptionally proficient at coding first. All a person needs do is execute two lines of code, and a graphical user interface pops up that walks them through the steps of a niche modeling analysis, while documenting the decisions made along the way in R code file that can re-run to repeat the analyses.
There is a lot of potential for this approach to be used to teach students the workflows and methods involved in fairly complex niche modeling analyses. I have incorporated it into teaching materials for Masters' and PhD level courses, as well as non-academic workshops, and the new features really expand the scope of applications one can cover. wallace 2.0 also makes it easier to add custom modules to the wallace workflow (essentially analysis options, like specific statistics or data sources), and adds several such modules (including my occCite occurrence citation package).
Here's the package website, which includes links to tutorials in multiple languages, including English, Spanish, and Japanese: wallaceecomod.github.io
Here’s the citation:
Kass, J.M., Pinilla-Buitrago, G.E., Paz, A., Johnson, B.A., Grisales-Betancur, V., Meenan, S.I., Attali, D., Broennimann, O., Galante, P.J., Maitner, B.S., Owens, H.L., Varela, S., Aiello-Lammens, M.E., Merow, C., Blair, M.E. and Anderson, R.P. (2023), wallace 2: a shiny app for modeling species niches and distributions redesigned to facilitate expansion via module contributions. Ecography e06547. https://doi.org/10.1111/ecog.06547
By Hannah L. Owens and Jamie M. Kass, on behalf of all co-authors*
There are billions of species occurrence records served by aggregator databases. The Global Biodiversity Information Facility (GBIF) serves over 1.8 billion occurrence records for species from across the tree of life (GBIF Secretariat 2021), and the Botanical Information and Ecology Network (BIEN) serves over 200 million plant observations (Botanical Information and Ecology Network 2021). The primary datasets these aggregators serve are the result of millions of hours of work by museums and community science initiatives (among others) and are constantly updated as taxonomy changes and data are accrued. Citing the primary datasets that supply data to GBIF and BIEN, together with accession dates, facilitates reproducibility and scientific transparency. These citations also support primary data providers by acknowledging their role as an essential link in the research chain.
However, when researchers download occurrence datasets from multiple primary providers via aggregator databases (such as those used in broad-scale biogeographic and macroecological studies), managing and effectively communicating the metadata can be incredibly time-consuming. This is where our new R package, occCite, comes in. occCite is designed to facilitate searches of dataset aggregation services (currently, GBIF and BIEN) that store and manage metadata on primary data providers, database accession dates, DOIs, and taxonomic sources in a unified framework within the R environment. Search results are organized as single objects that can be passed to functions to generate visual and statistical summaries and generate formatted citations.
occCite’s Two Main Steps
Taxonomic Rectification. By default, occQuery() checks species’ names against the GBIF backbone taxonomy. The user may instead elect to use studyTaxonList() to prepare a data object with the species’ names to be searched that has been checked against a taxonomy of their choice from the Global Names Index (http://gni.globalnames.org/).
Text Summaries. When the print() method is used on an occCiteData object, tables summarizing taxonomic cleaning results, search results with counts of occurrences for each species from each dataset aggregator, and the GBIF DOIs associated with each species’ search are returned.
Summary Plots. occCite provides three types of plots for results from occQuery() when the plot() method is used on an occCiteData object: a histogram showing occurrences by year, a waffle plot showing the proportion of results supplied by GBIF versus BIEN, and a waffle plot showing the proportion of occurrences supplied by each primary data provider. These plots can be generated either for all search results or by species.
Maps. Interactive leaflet maps can be generated from occCiteData objects via the occCiteMap() function, for all search results or by species. Users can specify occurrence point marker colors and symbologies. Hovering over a point in the interactive map provides information on the species name, coordinates, date, dataset, and dataset aggregator that supplied it.
The Future of occCite
occCite has been integrated as a module in the development version of Wallace, a modular, R-based graphical user interface for modeling species’ ecological niches and geographic distributions (Kass et al. 2018). When Wallace users opt to include data source citations in occurrence data searches, occCite will be invoked to run the search and generate citations.
In the future, we plan to expand the number of database aggregators that occCite queries, and add various fit-for-purpose filtering actions (e.g., duplicate removal, temporal downsampling, geographic and environmental outlier removal). We also plan to add comparative summary plots for raw vs. filtered data or comparing different occCiteData objects. We hope you’ll keep up-to-date via our GitHub website (hannahlowens.github.io/occCite/) for these and other exciting developments!
CRAN release: https://CRAN.R-project.org/package=occCite
YouTube Tutorial: https://www.youtube.com/watch?v=7qSCULN_VjY&t=17s
Botanical Information and Ecology Network. 2021. BIEN, the Botanical Information and Ecology Network. bien.nceas.ucsb.edu, accessed 6 August 2021.
GBIF Secretariat. 2021. GBIF: Global Biodiversity Information Facility. gbif.org, accessed 6 August 2021.
Kass, JM, Vilela, B, Aiello‐Lammens, ME, Muscarella, R, Merow, C. and Anderson, RP. 2018. Wallace: A flexible platform for reproducible modeling of species niches and distributions built for community expansion. Methods in Ecology and Evolution, 9: 1151-1156. DOI: 10.1111/2041-210X.12945
*Originally written for Ecography blog
|Hannah L. Owens||
What's Going On?
Universitetsparken 15, byg 3
2100 Copenhagen Ø, Denmark
Copyright © 2015