CNV detection in capture based sequencing

Copy number variations (CNVs) have been well studied regarding their association with tumorigenesis. Meanwhile, numerous targeted therapies have been approved to target specific CNV harbored in the tumor cells. Here we talk about some basics of CNV and its detection.

It is believed that CNV promote tumorigenesis by altering the expression of oncogene and tumor suppressor. For example, the amplification of oncogene or deletion of tumor suppressor. It is interesting to note that, comparing to CNV deletion, much more CNV amplification have been identified to induce cancer.

Three methods for CNV detection

  • Split reads: Looking for individual reads that span a deletion or insertion breakpoint

  • Read pairs: Looking for read pairs that map to an improbable distance apart (for deletions) or even to different contigs (for structural variations) or both map to same strand (inversion). However, this strategy would not help detecting CNV produced by aneuploidy

  • Read depth: Looking for regions that have too much or too little read depth compared to a reference sample.

All three methods above work best for WGS data. However, Other cost effective sequencing techniques, like capture based sequencing (WES, amplicon) are often preferable in practice. Can any of these three methods work for these techniques?

  • Split reads: Not applicable because break points may reside on off-target region.

  • Read pairs: Same as above.

  • Read depth: Work best for WGS because WGS has good depth uniformity. Still, it could work for capture based sequencing. The problem is the existence of bias including probe efficiency on region with different GC content, Mappability of region, the concentration of DNA, and even the temperature of hybridization.

How can we mitigate these bias and make this method work on captured sequencing?

CNV detection in tumor sample

Processing tumor sample and normal control simultaneously introduces bias to both samples and therefore neutralize the bias.

Example (VarScan)

  • Calculate depth of each position on normal and tumor sample

  • To set initial segments, look for ratio of tumor depth / normal depth changes significantly between current position. fisher’s exact test is used for significance.

| N | T


0 | 100 | 200


-1 | 100 | 100 ​

  • Calculate log ratio adjusted for GC content and classify each fragment as amp, neutral or del based on defined log-ratio threshold.

  • Apply circular binary segmentation (CBS) to produce segmented calls delineated by significant change-points of at least three standard deviations.

  • Adjacent segments of similar copy number from the CBS algorithm were merged by an internally developed Perl script.

CNV detection in normal sample

To detect CNV in normal sample, a control sample is not necessary. You can instead try to apply PCA to get ride of artifacts that may bias read depth.

XHMM is optimized for calling germline CNV from large number of samples (minimum of 30). However, it allows somatic calling by running an algorithm that specializes with somatic changes.

  • Calculate mean depth of each exon on normal and tumor sample

  • mean-center the depth and filter out exon with extreme depth

  • Run PCA and remove top PCs. i.e. artifacts that contribute to depth bias

  • HMM to merge continuous exons with similar depth and call CNV

Note that, because of the nature of HMM, it is only good at calling long CNV. Therefore it is generally not suitable for amplicon sequencing that covers small number of regions.