Introduction
A script for determining homozygous and heterozygous positions from an alignment
using binomial probabilities from a predicted error rate (BiSCaP) and a method
to benchmark the accuracy of this and other alignment/SNP-calling methods (cFDR)
Synopsis
GBiD.pl -x 200 -p 0.5
IRMS.pl -g ref.fasta -n 10000
BiSCaP.pl -p aln.pileup -r ref.fasta
CFDR.pl -i IntroducedMutations -f FoundMutations -s SAMfile -p Pileup
Description
A collection of scripts that aim to assess the quality of alignment
and SNP/Indel calling and to perform SNP/Indel calling. Introduce Random
Mutations into a sequence (IRMS) simulates any number of random single
nucleotide polymorphisms or indels, which are placed randomly within the
genome or randomly within a specific feature type (Specified by a feature
file). After aligning to the modified fasta sequence generated by IRMS,
and SNP calling using any method that provides the final calls in the
Variant Call Format (VCF). Comparison of False Discovery Rates (CFDR)
can next be used to assess the most suitable method of alignment and
snp-calling.
Binomial SNP Caller from Pileup (BiSCaP) is a method of calling
homozygous and heterozygous (given a diploid) SNPs and Indels using a
look up table of cumulative binomial probabilities.
Commands and Options
GBiD.pl perl GBiD.pl -x MaxDepth -p ProbOfError
If a binomial lookup table is wanted other than the 0.1 and
0.01 provided, this script will produce those probabilities
using R. Requires Statistics::R, and has required to run in
steps (E.g. -x 50 -> -x 100 -> -x 150 etc.
IRMS.pl perl IRMS.pl -g ref.fasta -n NumOfMutations optional_params
Introducing Random Mutations into a sequence, which can then
be used to assess the best alignment and SNP calling params.
OPTIONS:
-t Type of mutation to introduce (SNP/DEL/INS/CNV)
INS/DEL not supported when using -c/-f [SNP]
CNV not currently testable by CFDR
-c A feature file (GFF/GTF) that specifies where the
mutations are placed in the genome. [anywhere]
-f Feature in the GFF/GTF to mutate (CDS, exon, mRNA etc.)
BiSCaP.pl perl BiSCaP.pl -p aln.pileup -r ref.fasta optional_params
Binomial SNP Caller from Pileup uses a look up table of
Binomial probabilities to calculate the consensus sequence
from a pileup. The default output of the program is
Variant Call Format Columns describe in order: contig, position,
reference base, consensus base, average mapping quality, average
base quality, maximum mapping, depth, aligned bases and read qualities.
In a seperate file are positions in the genome that are outside
of the binomial distribution look-up table, which have not been
categorised, a summary of the mutations found, and a tally of
different read depths through the alignment.
OPTIONS:
-m Minimum read depth to be considered a mutation [4]
-e Probability of error [0.1]
-l Ploidy (h = haploid, d = diploid) [h]
-s Stringency to call heterozygosity. h = highly stringent
n = normal. [n]
-g If depth > max depth in look-up table, analyse up to
max depth (y/n) [y]
-a Print out homozygous Agree lines (y/n) [n]
-i Print out seperated pileup lines specifying SNPs and
indels in addition to the all mutations file (y/n) [n]
-q Read Quality minimum cut-off for SNPs (e.g. 10, 20...)
[0]
-o Output folder location. [folder of pileup]
-n Sample name for VCF [default name of pileup]
CFDR.pl perl CFDR.pl -i IntroducedMutations -f FoundMutations -s SAMfile
-p Pileup optional_params
Comparison of False Discovery Rates (CFDR) should be used on
an alignment to the output fasta file of IRMS, and that has
had SNPs called (optionally by BiSCaP).
OPTIONS:
-i Full list of introduced mutations (details output of
IRMS)
-f Found SNPs in Samtools format (tab deliminated file w/
10 columns)
-s SAM file of alignment
-p Pileup of alignment
-c GFF/GTF
-f Feature in the GFF/GTF to CFDR (CDS, exon, mRNA etc.)
-o Output folder location. [folder of found mutations]