Data Control &
Management

Data systems are a crucial part of a fully integrated field and laboratory workflow. At all moments in time, we must know what we have, where samples and supplies are located, and what is being worked on. Programs with numerous stakeholders contributing samples and sharing common equipment and resources should agree to a common data system.

CLICK AROUND TO LEARN ABOUT OUR RECOMMENDED TOOL KIT

ISL.app

ODK

Docker

Protocols.io

Cloud Storage

ISL.app is an software that we created to integrate field data collection with laboratory sample analysis and management. The software is freely available from at https://github.com/insitulabs. What it does…

  • data are automatically/manually imported to SOURCES
  • source data is cleaned and linked with other tables, as may be desired
  • one or more SOURCES may be combined into a VIEW, which is a clean and transformed subset of data that laboratories use for analysis.
  • Further documentation is available in github

As the name implies, the Open Data Kit (ODK) software is completely free to download and use. It is maintained by a community of contributors and the consortium also offers paid subscription options for users who do not have the time or ability to set-up and run it on their own.  It was designed to:

  • allow researchers, field workers, and medical and educational professionals to create effective offline dataforms for use anywhere in the world
  • integrate with all the standard features of mobile devices – text entry, voice recording, gps, video and image capture.

The Docker application allows one to have self-contained environments on their computer which can use any operating system that is compatible with a desired software package.  The containers are isolated from the rest of the computer.

One could have a MacBook computer that runs OS, but a docker image that has a UBUNTU operating system, or a Windows operating system.

As long as a computer can run the Docker application, one can move their workflow to any computer that has the appropriate hardware (i.e. sufficient RAM and CPUs/GPUs), or a remote server. Here is a reasonably good Docker tutorial on YouTube.

Thanks to the Gordon and Betty Moore Foundation the ISL has a priority account with Protocols.io, which has become a standard resource for sharing laboratory science methods. The online tool allows anyone to publish protocols following industry conventions, with an abstract and unique DOI number,  just as any journal publication. The system accommodates joint authorship, and versioning, similar to collaborative coding platforms such as GitHub, allowing methods to be improved overtime by the community.

Our protocols will be linked to from the Protocols page.

Except the for the first initial moment of sample/data collection – when using a hardcopy data sheet or digital form – all information is safely stored and managed on cloud servers. Moreover, there is redundancy built into the use of these tools, and therefore minimal chance that data can become lost, corrupted, or edited beyond correction. For bulk storage services of large file types (i.e. videos, sequence data), Amazon Web Services and Google offer affordable options.

Smaller data and file types are freely and securely maintained under the ISL data management system. For this, each partner receives separate subdomains and project folders on insitulabs.app and odk.insitulabs.org, respectively.

OVERVIEW

As we endeavor to set-up fully-functional field molecular laboratories paired with long-term biosample collection efforts, we also strive to set-up more seamless data pathways (from paper, to digital form, to server, to laboratory database, to analysis results). ISL program data collection, management and storage combines the ODK software (https://getodk.org/) with a MongoDB tool developed by this initiative (https://github.com/insitulabs/ISL_DataSystems), both are openaccess. Large media and raw sequence data storage uses Amazon S3 web services and Google Drive. Physical sample collection and laboratory methods are shared on the In Situ Laboratory protocol.io account. Analysis scripts and software are made available through Docker and Github

SCHEMATIC

In a nutshell…

  1. Not everyone knows how to use Excel as well you might think.
  2. Files can become corrupted when sharing across platforms and when used over long periods of time.
  3. Excel does not easily link reliably to external data sources
  4. Excel does not have auditing controls
  5. There are more reasons, and these issues are not only related to Excel but to most spreadsheet software.

Excel, and other software tools like it, are EXCELLENT for quick or basic data manipulation and analyses.

All of the tools we use are maintained free to ISL members currently. Non-ISL members can still use these tools, only they would have to set them up independently on their own server space. The cost of server space varies.

Yes, tools are created to be functional for researchers and staff members regardless of their background. We are constantly working on making the toolkit more intuitive, and if you have good ideas, please share with them us.

Instructional videos are forthcoming.