MiFish pipeline accepts a sequence data file in FASTQ (paired-end or single file) or FASTA format. If the input file is in FASTQ format, quality check, concatenation of paired-end sequences (if necessary), removal of unreliable sequences (including those with possible base call errors and those of atypical lengths), removal of primer sequences are performed.
After clustering of identical sequences, BLASTN searches are performed against the reference fish sequence database (currently containing more than 6,600 fish species).
For sequences that had hits with >97% similarity, species (or genus) name list with reliability information is shown. For sequences that do not had hits with >97% similarity, they are separately analyzed and provided for additional in-depth analysis.
In addition, species diversity analysis, principal component analysis, cluster analysis, phylogenetic analysis by the neighbor joining method can also be run for further analysis.
The following list is the external programs used in MiFish pipeline.
Quality check of FASTQ file: FastQC (FastQC Website)
Tail trimming: SolexaQA (BMC Bioinformatics. 2010, 11:485.)
Paired-end read assembly: FLASH (Bioinformatics. 2011, 27:2957-63.)
Primer removal: TagCleaner (BMC Bioinformatics. 2010, 11:343.)
Read clustering: Uclust/usearch (Bioinformatics. 2010, 26:2460-2461.)
Detection of chimeric sequences: UCHIME (Bioinformatics. 2011, 15:2194-2200.)
Sequence similarity search: BLAST+ version 2.2.29 (BMC Bioinformatics. 2008, 10:421.)
Multiple alignment: MAFFT (Mol. Biol. Evol. 2013, 30: 772-780.)
Phylogenetic analysis: Morphy (J. Mol. Evol. 1995, 40: 622-628.)