ISL RESOURCES
TOOLS – METHODS – PROTOCOL
The initiative primarily uses Oxford Nanopore sequencers to perform DNA/RNA sequencing on-site. Devices we currently use include the MK1B, MK1C, and P2 Solo. For information on how to set-up a sequencing run, as well as initial data processing (basecalling, demultiplexing, etc.) the Oxford Nanopore Community has the most up-to-date information. You must create an account with ONT to access the community resources.
As hubs standardize laboratory workflows for sequence library preparation, return to this page for links to our Protocols.io account.
For sequence data analysis a variety of software tools will be needed (Guppy, Amplicon Sorter, NGSpeciesID, NanoFilt, NanoPlot, etc). We suggest setting up software in one or more DOCKER images to avoid compatibility issues.
The Docker application allows one to have self-contained environments on their computer which can use any operating system that is compatible with a desired software package. The containers are isolated from the rest of the computer. For example, one may have a MacBook computer that runs OS, but a docker image that has a UBUNTU operating system, or a Windows operating system. As long as a computer can run the Docker application, one can move their workflow to any computer that has the appropriate hardware (i.e. sufficient RAM and CPUs/GPUs), or a remote server. Here is a reasonably good Docker tutorial on YouTube.
You don’t have to become a Docker pro to start analyzing data
Try these simple steps that will run a docker image that some of the ISL hubs are currently using. Note, there will be some differences to running Docker on Windows versus OS versus Linux – the following example is for OS (but a simple web search on how to run Docker on another platform will clarify the equivalent steps below).
1) Download and install docker (create a free account if required)
2) Once installed, open up the Terminal applicationand run the following command. The image is about 8 Gb, so this may take a while.
(cmd)$ docker pull insitulab/junglegenomics:latest
3) Once complete you can now start an interactive session (the ‘-it’ parameter os what makes the session interactive).
(cmd)$ docker run -it insitulab/junglegenomics:latest
If it’s working, test that you can run the following commands (everything from the ‘#’ onwards is just a comment and NOT part of the command)
$ ls -lah # everything in the root directory of the image
$ minibary.py # shows the help information for minibar
$ blastn -h # shows help information for NCBI blastN tool
$ NGSpeciesID # shows help information for NGSpeciesID
4) A self-contained environment is running at this moment, so whatever is done inside this session stays inside and will be lost when the session ends. To access data and save work that occurs inside the image, one must connect this environment to an actual directory/folder on the computer (or server space) that is being used. ‘To do this, exit this session with the following command
(cmd)$ exit
Now restart the container with the -v parameter, which makes the connection as follows
(cmd)$ docker run -it -v users/johnD/documents/data/:/data \
insitulab/junglegenomics:latest
The -v has two arguments. To the left of the ‘:’ is the path to the directory of interest on the computer or server being used. To the right of the ‘:’ is the location where the data will be in the docker container. Anything output to the “data” directory will be written to the hard drive, and likewise, anything that is deposited from the computer into the “data” directory will be accessible from inside the container. Do the following to check this
(cmd)$ cd data/
(cmd)$ ls .
5) If all prior steps have been completed, then mission accomplished and good luck analyzing the data. Helpful information about this docker image can be found at https://hub.docker.com/r/insitulab/junglegenomics
We are currently in the process of uploading all of our wildlife and eDNA sampling procedures to Protocols.io. Return to this page in the near future for links to these resources. Protocols that are close to publication include>
Wildlife Sampling:
- Chiroptera (Bats)
- Nonhuman Primates
- Aves (Birds)
- Rodentia & Marsupialia
- Medium- to Large-Size Terrestrial Mammals
The In Situ Laboratory Data System is a combination of tools to capture, organize, and analyze field data. See our work in progress on github, https://github.com/insitulabs/ISL_DataSystems
Historically, biological sample collection at remote field locations and laboratory analyses have been carried out independently of one another, by separate institutions and/or groups of stakeholders. Most data systems that support these distinct parts of the research process have been developed in isolation, and remain 1) unintegrated or 2) unadaptable to diverse environments with power and data service constraints. As we endeavor to set-up more fully-functional molecular laboratories in the field (in-situ), our goal is to seamlessly integrate biological sample collection with laboratory sample analysis and data sharing applications in an intuitive, user-friendly, and secure way. We strive to build off of other prior openware initiatives, and remain committed to making all our data system tools freely available for scientific research, public health, and conservation applications. Our system can be fully replicated from the information provided on github. Additionally, all ISL hubs receive support from ISL coordinators to set-up an an isolated version of the ISL Data System.
We are currently preparing a demo video of the data system. When published, the link will be published here. We apologize for the inconvenience, but please return soon.
A tool to aid PCR primer design and evaluation. Full description is available on github.
https://github.com/insitulabs/assessPrimers
Docker image link: https://hub.docker.com/r/insitulab/assessprimer
The primer assessment tool requires several inputs including:
- List of nucleotide reference sequences
- A file of primers (forward and/or reverse)
- A reference protein sequence (optional)
- Prefix for output files
The inputs are used to create a non-redundant multiple sequence alignment of all references sequences to each other as well as to each primer pair. From this alignment, the following statistics are printed to stdout:
Primer number: The number of unique primers calculated after converting all degenerate bases to their non-degenerate equivalents.
Entropy: Cumulative entropy score for each length of k-mer along the alignment (lower entropy scores reflect more conserved sequences)
Start coordinate position
Number mismatches: For each primer provided, histogram of the number of mismatches for each reference sequence
Example output (see Github for more detail):
PMX1 coordinate: 3106 entropy: 0.27 numPrimers:32768 GARGGNYNNTGYCARAARNTNTGGAC
PMX1 captures:
45 sequences with 0 mismatches
6 sequences with 1 mismatch(es)
4 sequences with 2 mismatch(es)
PMX2 coordinate: 3202 entropy: 0.33 numPrimers:65536 GGNGAYAAYCARNYNATWGCNRTNA
PMX2 captures:
31 sequences with 0 mismatches
19 sequences with 1 mismatch(es)
3 sequences with 2 mismatch(es)
2 sequences with 3 mismatch(es)