5 van Gestel

The van Gestel data base was constructed to aggregate soil data from field experiments including warming, nutrient addition, and moisture manipulation. This is an ongoing, unpublished project.

5.1 June 2020 Interview

These are notes from the June 2020 interview with Dr. Natasja van Gestel on the van Gestel data model.

  1. Why did you start this study?
    • There was a DOE funded project during van Gestel’s postdoc where it became obvious that the modeling project would benefit from a more robust data set for parameterization. This became an ongoing project over the past 5 years.
  2. Describe your workflow for ingesting data sets?
    • Data is extracted from the published literature, generally though transcription of points recovered from figures or published tables.
    • Studies are selected based on availability of soil carbon, and an attempt is made to capture all the data associated with these studies.
    • This data is then entered into an Excel spreadsheet, fed through a series of scripts to automate gap-filling, and then generate data products relevant to modeling studies.
    • van Gestel and her lab were responsible for everything from the initial design of the spreadsheets, to data recovery and post-processing.
    • There are currently over 100 studies in the database.
  3. What decisions did you make to arrive at this workflow?
    • In general, the design of the spreadsheet has not changed significantly over the years. To design the spreadsheet, van Gestel tried to think mechanistically about the processes that control carbon cycling in soils and create an exhaustive list of variables to capture from the literature. This exhaustive template was then pruned to create more restricted data products as needed.
    • Expanded on factors originally related mostly to carbon as represented by soil carbon models (i.e. carbon stocks, fluxes, experimental treatments for temperature, nutrients, and moisture).
    • Simplicity was a core value in developing the data model.
    • Tried to preserve as much information from the original data source as possible.
      • Needed to record original units and then harmonize units in processing scripts.
      • Key to determine how bulk density was treated in the study
        • Sometimes there were no bulk density values, some studies only used 1 value for bulk density or a prescribed standard bulk density.
        • Bulk density was frequently gap-filled in the generated data products using imputed bulk density to organic carbon relationships
  4. How would someone get a copy of the data in this study?
    • Contact van Gestel directly, publication is pending.
  5. What would you do differently if you had to start again? What would be the same?
    • Happy with the outcome of the meta analysis and would not change much. Current template and data product pipeline has proven effective.
    • Currently using PowerPoint to extract data from figures, so using a different type of software could make the process more efficient.
    • Considering focusing more on collecting data from repositories in the future.

5.2 Data model

Figure 5.1: vanGestel data model

5.3 Acknowledgements

Special thanks to Dr. Natasja van Gestel (Texas Tech University) for making the metadata for this data product available for analysis and making herself available for interview.