Sequence Similarity Search (BLAST) - Help

Sequence Similarity Search in MitoFish allows you to find sequences of fish mitochondrial DNA (and their species names) similar to your query sequence. You can choose either of "complete mtDNA" or "complete + partial mtDNA" sequence databases. "complete mtDNA" corresponds to NCBI RefSeq database, while "complete + partial mtDNA" corresponds to NCBI GenBank database. We have manually fixed or deleted some taxonomically misidentified sequences, and the list of changes is available upon inquiry.

Simple Search
Simple Search provides you a simple way of finding sequences of fish mitochondrial DNA ( and their species names) similar to your query sequence.
Enter query sequence in FASTA format or plain text into the Query box.
Clicking on the Search button, you will have BLAST (Basic Local Alignment Search Tool) search results.

Advanced Search
Advanced Search provides more detailed Search and Results View Options. Provided are selections for Filter and Expect (E value) for the Search Options, and those for Nos. of Descriptions and Alignments and Alignment View for Results View Options.

Query
FASTA format
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA
format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
A --> adenosine           M --> A C (amino)
C --> cytidine            S --> G C (strong)
G --> guanine             W --> A T (weak)
T --> thymidine           B --> G T C
U --> uridine             D --> G A T
R --> G A (purine)        H --> A C T
Y --> T C (pyrimidine)    V --> G C A
K --> G T (keto)          N --> A G C T (any)
-  gap of indeterminate length

Options
Filter (Low-complexity)
Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences.
Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs.
It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.
Expect (E Value)
The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable.
Other Options (for BLASTN)
-gapopen
Cost to open a gap [Integer]
default = 5
-gapextend
Cost to extend a gap [Integer]
default = 2
-penalty
Penalty for a mismatch in the blast portion of run [Integer <=0]
default = -3
-reward
Reward for a match in the blast portion of run [Integer]
default = 1
-evalue
Expectation value (E) [Real]
default = 10.0
-word_size
Word size, default is 11 for blastn, 3 for other programs
default = 11
-num_descriptions
Number of one-line descriptions (V) [Integer]
default = 10
-num_alignments
Number of alignments to show (B) [Integer]
default = 10

Results View Options
No. of Descriptions
Restricts the number of short descriptions of matching sequences reported to the number specified; default limit is 100 descriptions. See also EXPECT
No. of Alignments
Restricts database sequences to the number specified for which high-scoring segment pairs (HSPs) are reported; the default limit is 100. If more database sequences than this happen to satisfy the statistical significance threshold for reporting (see EXPECT above), only the matches ascribed the greatest statistical significance are reported.
Alignment View
The choice of which to use is based on personal preference. Pairwise alignment gives a good view of the quality of an individual hit. However, a flat query-anchored alignment (with identities) is a format in which identities shared by numerous sequences can be easily spotted.

MitoFish Information:

Version:
Update:
Complete mtDNA Data:
species
Complete + Partial Data:
sequences
species