BamQC¶
Contents:
Created on 12 Oct 2016
@author: ernesto
-
class
BamQC.BamQC.
BamQC
(bam, samtools_folder=None, java_folder=None, picard_folder=None, chk_indel_folder=None, verifybamid_folder=None)¶ Class to do the quality assessment on a BAM format file
Methods
aggregate_stats
(cov_list)Used to calculate aggregated stats on a list of SDepth objects get_contigs
()Get all contigs from this BAM get_simple_stats
()Get a dict with stats on the BAM file as calculated by samtools flagstat list_of_readgroups
()Get the Read Groups extracted from the header of the BAM file list_of_samples
()Get the samples names from the header of the BAM file run_CollectHsMetrics
(baits_file[, outfile, …])Run Picard’s CollectHsMetrics on a Exome sequencing BAM file run_CollectWgsMetrics
(reference[, outfile, …])Run Picard’s CollectWgsMetrics on a WGS BAM file run_chk_indel_rg
([outfile])Run Heng Li’s chk_indel_rg on a BAM file run_samtools_depth
(chros)Calculate several coverage metrics on a whole genome sequencing BAM file using ‘samtools depth’ run_verifybamid
(genotype_file, outprefix[, …])Run VerifyBAMID to check for sample swap or contamination issues -
aggregate_stats
(cov_list)¶ Used to calculate aggregated stats on a list of SDepth objects
Parameters: - cov_list : list
List containing the SDepth objects for which the stats will be aggregated.
Returns: - A SDepth object
-
get_contigs
()¶ Get all contigs from this BAM
Parameters: - None
Returns: - dict
A dictionary containing the following information:
{‘contig Name’: length (in bp)}
-
get_simple_stats
()¶ Get a dict with stats on the BAM file as calculated by samtools flagstat
Parameters: - None
Returns: - dict
A dictionary containing the following information: {
“total_no_reads”: int “no_duplicates”: int “total_no_mapped”: int “no_properly_paired”: int }
-
list_of_readgroups
()¶ Get the Read Groups extracted from the header of the BAM file
Parameters: - None
Returns: - list
List composed of the read groups
-
list_of_samples
()¶ Get the samples names from the header of the BAM file
Parameters: - None
Returns: - list
List with the sample names
-
run_CollectHsMetrics
(baits_file, outfile=None, cov_cap=None)¶ Run Picard’s CollectHsMetrics on a Exome sequencing BAM file
Parameters: - baits_file : filename
Path to the file containing the Exome baits.
- outfile : filename, optional
If provided, then create a file with the output of this program
- cov_cap : int, optional
Picard’s Coverage Cap parameter. Treat positions with coverage exceeding this value as if they had coverage at this value. Default value: 250.
- Returns
- ——
- A CMetrics object
-
run_CollectWgsMetrics
(reference, outfile=None, cov_cap=None)¶ Run Picard’s CollectWgsMetrics on a WGS BAM file
Parameters: - reference : filename
Fasta file used as the genome reference
- outfile : filename, optional
If provided, then create a file with the output of this program
- cov_cap : int, optional
Picard’s Coverage Cap parameter. Treat positions with coverage exceeding this value as if they had coverage at this value. Default value: 250.
- Returns
- ——
- A CMetrics object
-
run_chk_indel_rg
(outfile=None)¶ Run Heng Li’s chk_indel_rg on a BAM file
Parameters: - outfile : filename, optional
If provided, then create a file with the output of this program
Returns: - list
A list of Chk_indel objects
-
run_samtools_depth
(chros)¶ Calculate several coverage metrics on a whole genome sequencing BAM file using ‘samtools depth’
Parameters: - chros : list or string
List of contigs or just a single contig used for calculating the coverage
- Returns
- ——
- List of SDepth objects
- This method runs samtools depth on a BAM file and will calculate the following metrics:
- Number of Bases mapped: This is the number of bases having at least one read mapped
- Sum of depths of coverage: This is the sum of all the depths in each of the Bases mapped
- Breadth of coverage: This is the result of dividing bases_mapped/length(contig) (i.e. what portion of the contig has reads mapped)
- Depth of coverage: This is the result of dividing sum_of_depths/length(contig)
-
run_verifybamid
(genotype_file, outprefix, outdir=None)¶ Run VerifyBAMID to check for sample swap or contamination issues
Parameters: - genotype_file : filename
vcf file with chip genotypes to use
- outprefix : str
prefix for outputfiles
- outdir : str, optional
If provided, then put output files in this folder
Returns: - list
A list with the paths to the output files generated by VerifyBAMID
-
-
class
BamQC.BamQC.
CMetrics
(metrics, cov_data)¶ Class to store coverage information on the metrics calculated by Picard’s CollectHsMetrics/CollectWgsMetrics on an Exome or WGS BAM file
Methods
create_cov_barplot
(filename[, xlim, ylim])This method will create a Barplot using the different coverage values counts calculated by Picard’s CollectHsMetrics or CollectWgsMetrics print_report
([filename])Used to print a text report of data in the object -
create_cov_barplot
(filename, xlim=None, ylim=None)¶ This method will create a Barplot using the different coverage values counts calculated by Picard’s CollectHsMetrics or CollectWgsMetrics
Parameters: - filename : filename
PDF file to write the plot.
- xlim : tuple, optional
Set the X-axis limit
- ylim : tuple, optional
Set the Y-axis limit
-
print_report
(filename=None)¶ Used to print a text report of data in the object
Parameters: - filename : filename, optional
Filename to write the report. The default is STDOUT.
-
-
class
BamQC.BamQC.
Chk_indel
(RG, ins_in_short_homopolymer, del_in_short, ins_in_long, del_in_long, outcome=None)¶ Class to store information on the ratio of short insertion and deletion calculated by runnint Heng Li’s chk_indel_rg
Methods
calc_ratio
()Method to calc ratio ins-in-short-homopolymer/del-in-short and check if it is > 5 -
calc_ratio
()¶ Method to calc ratio ins-in-short-homopolymer/del-in-short and check if it is > 5
Returns: - str
It returns PASS/FAILED depending on the outcome of the test
-
-
class
BamQC.BamQC.
SDepth
(contig=None, mapped=None, breadth=None, depth=None, length=None, sum_of_depths=None, max=None)¶ Class to store coverage metrics on a Whole Genome Sequencing BAM file calculated using SAMtools depth
Methods
print_report
([filename])Used to print a text report of data in the object -
print_report
(filename=None)¶ Used to print a text report of data in the object
Parameters: - filename : filename, optional
Filename used to write the report. The default is STDOUT.
-