Help: MiFish Pipeline
Introduction
MiFish is a set of universal PCR primers for metabarcoding environmental DNA (eDNA) that are shed into waters from fishes. MiFish primers target a hypervariable region of fish mitochondrial 12S rRNA gene (approximately 160-190 bp), which contains information to identify fishes to taxonomic family, genus and species except for some closely related congeners. After amplification by MiFish primers, MiFish pipeline accepts your sequence data in FASTQ (paired-end or single file) or FASTA format and returns a species list.
Workflows
MiFish pipeline accepts sequence data files in FASTQ (paired-end or single file) or FASTA format. If the input file is in FASTQ format, quality check, concatenation of paired-end sequences (if necessary), removal of unreliable sequences (including those with possible base call errors and those of atypical lengths), removal of primer sequences are performed.
Uploading multiple samples in a single run is allowed. Sample names are inferred from the files' names but only alphabets, numbers and underline scores are kept. Please make sure the names are different among samples. You can upload up to 40 files in a single run.
After clustering of identical sequences, BLASTN searches are performed against the reference fish sequence database. For sequences that had hits above the similarity threshold (default is 97% but can be adjusted), species (or genus) name list with reliability information is shown.
In addition, species diversity analysis (only if more than one group of samples are specified) and phylogenetic analysis by the maximum-likelihood method is also run.
The following list is the external programs used in MiFish pipeline.
- Quality check of FASTQ and Tail trimming: fastp ( Bioinformatics. 2018, 34(17):884–890 )
- Paired-end read assembly: FLASH ( Bioinformatics. 2011, 27:2957-2963. )
- Primer removal: Cutadapt ( EMBnet. journal. 2011, 17(1), 10-12. )
- Read denosing, chimeric removing and OTU detecting: usearch ( Bioinformatics. 2010, 26:2460-2461. )
- Sequence similarity search: BLAST+ version 2.9.0 ( BMC Bioinformatics. 2009, 10:421. )
- Multiple alignment: MAFFT ( Mol. Biol. Evol. 2013, 30: 772-780. )
- Phylogenetic analysis: FastTree ( PLOS ONE. 2010, 5(3), e9490. )
- Bio-diversity analysis: scikit-bio ( http://scikit-bio.org. 2020 )
Explanation on the Exported Excel File
There are three sheets in the exported Excel file: "Comparison of Samples", "List of Sample Details", and "Haploids with Low Identities" Following is the explanation on each column included in the Excel file.
- Class
- Taxonomic level "Class" for the species, according to the following orders: Myxini, Hyperoartia, Chondrichthyes, Cladistia, Actinopteri, then non-fish species such as Amphibia, Lepidosauria, Aves, Mammalia
- Order
- Taxonomic level "Order" for this species
- Family
- Taxonomic level "Family" for this species
- Scientific Name
- Scientific name for this species
- Common Name
- Common name for this species, from FishBase. Would be blank if unknown.
- Ave. Confidence
- The average confidence for this species among all the samples. Usually equal with the confidence of the sample with largest abandance.
- Water area
- Indicate what kind of water this fish lives in. Possible values are: Fresh Water, Salt Water, and Brack Water. Would be blank if unknown.
- Habitat
- Indicate the habitat of this fish. Example: reef-associated, demersal, benthopelagic, etc.... Would be blank if unknown.
- DepthS
- Indicate the shallowest region (m) where this fish appears. Would be blank if unknown.
- DepthD
- Indicate the deepest region (m) where this fish appears. Would be blank if unknown.
- IUCN Red List Status
- Indicate the conservation status of this fish. Possible values are: Not Evaluated, Data Deficient, Not Available, Least Concern, Near Threatened, Vulnerable, Endangered, Critically Endangered
- Importance in Fisheries
- Indicate the importance in fisheries of this fish. Possible values are: highly commercial, commercial, of potential interest, minor commercial, of no interest, subsistence fisheries. Would be blank if unknown.
- Threat to Humans
- Indicate possible threat to humans of this fish. Possible values are: harmless, potential pest, reports of ciguatera poisoning, poisonous to eat, venomous, traumatogenic. Would be blank if unknown.
- Abandances
- Indicate the abandance (reads number) of this fish in each sample.
- Sample name
- Sample name, inferred from the uploaded filenames.
- Species
- Scientific name for this species
- Total read
- Sum of the read number of all the haploids of this species
- Representative Sequence
- The sequence of the most abandant haploid of this species
- Size
- The abandance (read number) of each haploid of this species
- Confidence
- The confidence level of each haploid of this species
- Identity(%)
- The BLAST Identity of each haploid of this species against the MiFish database
- Confidence Score
- The Confidence Score of each haploid of this species.
- Align Len
- The alignment length in the BLAST result of each haploid of this species against the MiFish database
- Mismatch
- The Mismatch value in the BLAST result of each haploid of this species against the MiFish database
- 2nd-sp Name
- The Scientific name of the second best-hit species of each haploid against the MiFish database
- 2nd-sp Align Len
- The alignment length in the BLAST result of the second best-hit species of each haploid against the MiFish database
- 2nd-sp Mismatch
- The Mismatch value in the BLAST result of the second best-hit species of each haploid against the MiFish database
- Sequence
- Sequence of each haploid.
Haploids with BLAST identities lower than the defined threshold are shown here. Columns are mainly the same as "List of Sample Details".
About Non-fish Species in the Result
MiFish primers can sometimes amplify some mitochondrial sequences from non-fish vertebrates (e.g. human). and MiFish pipeline outputs such data because their exitence information can be useful in some contexts (e.g., to assess pollution).
In each version of MiFish RefDB, it contains sequences from 1,410 non-fish vertebrates, including amphibians (Amphibia), reptiles (Lepidosauria and others), birds (Aves) and mammals (Mammalia). These species are manully inspected by our experts since they might be amplified by MiFish primers.
Non-fish species can be seen in the table of the result webpage, and in the first sheet of the exported Excel file "Comparison of Samples", where they are listed separately in the "Non-fish species section" below all fishes.
If you do not need non-fish vertebrate data, just ignore them.
References
If you use our tools, please refer to our relevant papers.
MitoFish Information:
- Version:
- Update:
- Complete mtDNA Data:
- species
- Complete + Partial Data:
- sequences
species