CNVision is designed for detecting and scoring Copy Number Variants (CNVs) from Illumina SNP genotyping data. It runs in a UNIX environment and works with all Illumina chips (from 300k to latest Omni). CNVs are predicted using PennCNV, QuantiSNPv2.3, and GNOSIS (an in-built algorithm). The predicted CNVs are merged, joined (if appropriate), and scored based on the per SNP variability in the raw genotyping data. CNVision can also identify de novo CNVs in family-based data using the per SNP variability algorithm. Comparison with 1000 Genomes, the Genome Structural Variation (GSV) Consortium, and replicate Illumina data demonstrates the efficacy of the CNV scoring method in both inherited and de novo CNVs.

CNVision was written to analyze data for the Simons Simplex Collection (SSC) autism data. A full description of methods are given in the following paper which can be used to reference CNVision:​

CNVision can be downloaded here:


Identity check

Managing large genomic datasets requires accurate estimation of sample identity. This script rapidly identifies all BAM files and Illumina SNP genotyping FinalReports on a cluster, generates a SNP barcode from each one, and uses BLAT to identify duplicates and/or matches. It is run off aligned, indexed BAM files directly (hg18 or hg19) and FinalReports directly (hg18 or hg19). Cross platform (BAM to FinalReport) and cross genome build (hg18 to hg19) is handled automatically.

Details and instructions are available here:


UNIX treasure hunt tutorial

This perl script will install a series of directories and clues that teaches basic UNIX command line skills including cd, ls, grep, less, head, tail, and nano. Run the perl script from the command line on a UNIX based machine (e.g. Mac or Linux) using the command: perl Then use 'ls' to find the first clue. A PDF of command line commands is also available to download: