BamQC

Contents:

Created on 12 Oct 2016

@author: ernesto

class BamQC.BamQC.BamQC(bam, samtools_folder=None, java_folder=None, picard_folder=None, chk_indel_folder=None, verifybamid_folder=None)

Class to do the quality assessment on a BAM format file

Methods

aggregate_stats(cov_list) Used to calculate aggregated stats on a list of SDepth objects
get_contigs() Get all contigs from this BAM
get_simple_stats() Get a dict with stats on the BAM file as calculated by samtools flagstat
list_of_readgroups() Get the Read Groups extracted from the header of the BAM file
list_of_samples() Get the samples names from the header of the BAM file
run_CollectHsMetrics(baits_file[, outfile, …]) Run Picard’s CollectHsMetrics on a Exome sequencing BAM file
run_CollectWgsMetrics(reference[, outfile, …]) Run Picard’s CollectWgsMetrics on a WGS BAM file
run_chk_indel_rg([outfile]) Run Heng Li’s chk_indel_rg on a BAM file
run_samtools_depth(chros) Calculate several coverage metrics on a whole genome sequencing BAM file using ‘samtools depth’
run_verifybamid(genotype_file, outprefix[, …]) Run VerifyBAMID to check for sample swap or contamination issues
aggregate_stats(cov_list)

Used to calculate aggregated stats on a list of SDepth objects

Parameters:
cov_list : list

List containing the SDepth objects for which the stats will be aggregated.

Returns:
A SDepth object
get_contigs()

Get all contigs from this BAM

Parameters:
None
Returns:
dict

A dictionary containing the following information:

{‘contig Name’: length (in bp)}

get_simple_stats()

Get a dict with stats on the BAM file as calculated by samtools flagstat

Parameters:
None
Returns:
dict

A dictionary containing the following information: {

“total_no_reads”: int “no_duplicates”: int “total_no_mapped”: int “no_properly_paired”: int }

list_of_readgroups()

Get the Read Groups extracted from the header of the BAM file

Parameters:
None
Returns:
list

List composed of the read groups

list_of_samples()

Get the samples names from the header of the BAM file

Parameters:
None
Returns:
list

List with the sample names

run_CollectHsMetrics(baits_file, outfile=None, cov_cap=None)

Run Picard’s CollectHsMetrics on a Exome sequencing BAM file

Parameters:
baits_file : filename

Path to the file containing the Exome baits.

outfile : filename, optional

If provided, then create a file with the output of this program

cov_cap : int, optional

Picard’s Coverage Cap parameter. Treat positions with coverage exceeding this value as if they had coverage at this value. Default value: 250.

Returns
——
A CMetrics object
run_CollectWgsMetrics(reference, outfile=None, cov_cap=None)

Run Picard’s CollectWgsMetrics on a WGS BAM file

Parameters:
reference : filename

Fasta file used as the genome reference

outfile : filename, optional

If provided, then create a file with the output of this program

cov_cap : int, optional

Picard’s Coverage Cap parameter. Treat positions with coverage exceeding this value as if they had coverage at this value. Default value: 250.

Returns
——
A CMetrics object
run_chk_indel_rg(outfile=None)

Run Heng Li’s chk_indel_rg on a BAM file

Parameters:
outfile : filename, optional

If provided, then create a file with the output of this program

Returns:
list

A list of Chk_indel objects

run_samtools_depth(chros)

Calculate several coverage metrics on a whole genome sequencing BAM file using ‘samtools depth’

Parameters:
chros : list or string

List of contigs or just a single contig used for calculating the coverage

Returns
——
List of SDepth objects
This method runs samtools depth on a BAM file and will calculate the following metrics:
  • Number of Bases mapped: This is the number of bases having at least one read mapped
  • Sum of depths of coverage: This is the sum of all the depths in each of the Bases mapped
  • Breadth of coverage: This is the result of dividing bases_mapped/length(contig) (i.e. what portion of the contig has reads mapped)
  • Depth of coverage: This is the result of dividing sum_of_depths/length(contig)
run_verifybamid(genotype_file, outprefix, outdir=None)

Run VerifyBAMID to check for sample swap or contamination issues

Parameters:
genotype_file : filename

vcf file with chip genotypes to use

outprefix : str

prefix for outputfiles

outdir : str, optional

If provided, then put output files in this folder

Returns:
list

A list with the paths to the output files generated by VerifyBAMID

class BamQC.BamQC.CMetrics(metrics, cov_data)

Class to store coverage information on the metrics calculated by Picard’s CollectHsMetrics/CollectWgsMetrics on an Exome or WGS BAM file

Methods

create_cov_barplot(filename[, xlim, ylim]) This method will create a Barplot using the different coverage values counts calculated by Picard’s CollectHsMetrics or CollectWgsMetrics
print_report([filename]) Used to print a text report of data in the object
create_cov_barplot(filename, xlim=None, ylim=None)

This method will create a Barplot using the different coverage values counts calculated by Picard’s CollectHsMetrics or CollectWgsMetrics

Parameters:
filename : filename

PDF file to write the plot.

xlim : tuple, optional

Set the X-axis limit

ylim : tuple, optional

Set the Y-axis limit

print_report(filename=None)

Used to print a text report of data in the object

Parameters:
filename : filename, optional

Filename to write the report. The default is STDOUT.

class BamQC.BamQC.Chk_indel(RG, ins_in_short_homopolymer, del_in_short, ins_in_long, del_in_long, outcome=None)

Class to store information on the ratio of short insertion and deletion calculated by runnint Heng Li’s chk_indel_rg

Methods

calc_ratio() Method to calc ratio ins-in-short-homopolymer/del-in-short and check if it is > 5
calc_ratio()

Method to calc ratio ins-in-short-homopolymer/del-in-short and check if it is > 5

Returns:
str

It returns PASS/FAILED depending on the outcome of the test

class BamQC.BamQC.SDepth(contig=None, mapped=None, breadth=None, depth=None, length=None, sum_of_depths=None, max=None)

Class to store coverage metrics on a Whole Genome Sequencing BAM file calculated using SAMtools depth

Methods

print_report([filename]) Used to print a text report of data in the object
print_report(filename=None)

Used to print a text report of data in the object

Parameters:
filename : filename, optional

Filename used to write the report. The default is STDOUT.