Help: MiFish Pipeline

Introduction

MiFish is a set of universal PCR primers for metabarcoding environmental DNA (eDNA) that are shed into waters from fishes. MiFish primers target a hypervariable region of fish mitochondrial 12S rRNA gene (approximately 160-190 bp), which contains information to identify fishes to taxonomic family, genus and species except for some closely related congeners. After amplification by MiFish primers, MiFish pipeline accepts your sequence data in FASTQ (paired-end or single file) or FASTA format and returns a species list.


Workflows

MiFish pipeline accepts sequence data files in FASTQ (paired-end or single file) or FASTA format. If the input file is in FASTQ format, quality check, concatenation of paired-end sequences (if necessary), removal of unreliable sequences (including those with possible base call errors and those of atypical lengths), removal of primer sequences are performed.

Uploading multiple samples in a single run is allowed. Sample names are inferred from the files' names but only alphabets, numbers and underline scores are kept. Please make sure the names are different among samples. You can upload up to 40 files in a single run.

After clustering of identical sequences, BLASTN searches are performed against the reference fish sequence database. For sequences that had hits above the similarity threshold (default is 97% but can be adjusted), species (or genus) name list with reliability information is shown.

In addition, species diversity analysis (only if more than one group of samples are specified) and phylogenetic analysis by the maximum-likelihood method is also run.

The following list is the external programs used in MiFish pipeline.


Explanation on the Exported Excel File

There are three sheets in the exported Excel file: "Comparison of Samples", "List of Sample Details", and "Haploids with Low Identities" Following is the explanation on each column included in the Excel file.

Comparison of Samples
Class
Taxonomic level "Class" for the species, according to the following orders: Myxini, Hyperoartia, Chondrichthyes, Cladistia, Actinopteri, then non-fish species such as Amphibia, Lepidosauria, Aves, Mammalia
Order
Taxonomic level "Order" for this species
Family
Taxonomic level "Family" for this species
Scientific Name
Scientific name for this species
Common Name
Common name for this species, from FishBase. Would be blank if unknown.
Ave. Confidence
The average confidence for this species among all the samples. Usually equal with the confidence of the sample with largest abandance.
Water area
Indicate what kind of water this fish lives in. Possible values are: Fresh Water, Salt Water, and Brack Water. Would be blank if unknown.
Habitat
Indicate the habitat of this fish. Example: reef-associated, demersal, benthopelagic, etc.... Would be blank if unknown.
DepthS
Indicate the shallowest region (m) where this fish appears. Would be blank if unknown.
DepthD
Indicate the deepest region (m) where this fish appears. Would be blank if unknown.
IUCN Red List Status
Indicate the conservation status of this fish. Possible values are: Not Evaluated, Data Deficient, Not Available, Least Concern, Near Threatened, Vulnerable, Endangered, Critically Endangered
Importance in Fisheries
Indicate the importance in fisheries of this fish. Possible values are: highly commercial, commercial, of potential interest, minor commercial, of no interest, subsistence fisheries. Would be blank if unknown.
Threat to Humans
Indicate possible threat to humans of this fish. Possible values are: harmless, potential pest, reports of ciguatera poisoning, poisonous to eat, venomous, traumatogenic. Would be blank if unknown.
Abandances
Indicate the abandance (reads number) of this fish in each sample.

List of Sample Details
Sample name
Sample name, inferred from the uploaded filenames.
Species
Scientific name for this species
Total read
Sum of the read number of all the haploids of this species
Representative Sequence
The sequence of the most abandant haploid of this species
Size
The abandance (read number) of each haploid of this species
Confidence
The confidence level of each haploid of this species
Identity(%)
The BLAST Identity of each haploid of this species against the MiFish database
Confidence Score
The Confidence Score of each haploid of this species.
Align Len
The alignment length in the BLAST result of each haploid of this species against the MiFish database
Mismatch
The Mismatch value in the BLAST result of each haploid of this species against the MiFish database
2nd-sp Name
The Scientific name of the second best-hit species of each haploid against the MiFish database
2nd-sp Align Len
The alignment length in the BLAST result of the second best-hit species of each haploid against the MiFish database
2nd-sp Mismatch
The Mismatch value in the BLAST result of the second best-hit species of each haploid against the MiFish database
Sequence
Sequence of each haploid.

Haploids with Low Identities

Haploids with BLAST identities lower than the defined threshold are shown here. Columns are mainly the same as "List of Sample Details".


About Non-fish Species in the Result

MiFish primers can sometimes amplify some mitochondrial sequences from non-fish vertebrates (e.g. human). and MiFish pipeline outputs such data because their exitence information can be useful in some contexts (e.g., to assess pollution).

In each version of MiFish RefDB, it contains sequences from 1,410 non-fish vertebrates, including amphibians (Amphibia), reptiles (Lepidosauria and others), birds (Aves) and mammals (Mammalia). These species are manully inspected by our experts since they might be amplified by MiFish primers.

Non-fish species can be seen in the table of the result webpage, and in the first sheet of the exported Excel file "Comparison of Samples", where they are listed separately in the "Non-fish species section" below all fishes.

If you do not need non-fish vertebrate data, just ignore them.


References

If you use our tools, please refer to our relevant papers.

MitoFish Information:

Version:
Update:
Complete mtDNA Data:
species
Complete + Partial Data:
sequences
species