From Bioinformatics Core Wiki
HumMeth27QCReport is an R package that permits a quick overview of the quality of Illumina’s Infinium BeadChip methylation assays. This project has been developed as collaboration between the CRG Genotyping Unit and the CRG Bioinformatics Core
The HumMeth27QCReport R package can be downloaded from the CRAN repository
In order to enhance the use of our package by wet-lab researchers, ad-hoc scripts for its implementation in the Galaxy workbench were developed. They can be downloaded at the Galaxy Tool Shed
DNA methylation is an epigenetic mechanism that in vertebrates occurs most frequently at cytosines followed by guanines (CpG). This modification regulates gene expression and can be inherited through cell division, thus being essential for preserving tissue identities and guiding normal cellular development . Hypermethylation of CpG islands located in the promoter regions of tumor suppressor genes has been firmly established as one of the most common mechanisms for gene regulation in cancer [2,3]. As investigating the human DNA methylome has gained interest, several methods have been developed to detect cytosine methylation on a genomic scale. Among these, Illumina’s Infinium Methylation Assay is a hybridization-based technique that offers quantitative methylation measurements at the single-CpG-site level providing as accurate results as sequencing-based methylation assays (e.g. MethylCap-seq, MeDIP-seq, RRBS) . Microarray-based Illumina Infinium methylation assay has been recently used in epigenomic studies [5-7] due to its high throughput, good accuracy, small sample requirement and relatively low cost. To date, available Infinium Illumina platforms for methylation analysis are: the HumanMethylation27 BeadChip with 27,578 CpG sites, covering >14,000 genes; and, the new HumanMethylation450 BeadChip comprising >450,000 methylation sites. To estimate the methylation status, the Illumina Infinium assay utilizes a pair of probes (a methylated probe and an unmethylated probe) to measure the intensities of the methylated and unmethylated alleles at the interrogated CpG site . The methylation level is then estimated based on the measured intensities of this pair of probes.
HOW TO INSTALL
From R command line type:
R> install.packages("HumMeth27QCReport", repos="http://cran.r-project.org", dependencies=T, type="source")
HOW TO RUN
To run the example inside the package, type from R command line:
R> Dir <- system.file("extdata/",package="HumMeth27QCReport")
R> ImportDataR <- ImportData(Dir)
R> normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean")
* Dir is a character string containing the location of the directory in which the input files are. All output files will be stored here.
* platform is the type of Illumina Infinium BeadChip methylation assay. This must be one of "Hum27"(Infinium HumanMethylation27 BeadChip) or "Hum450"(Infinium HumanMethylation450 BeadChip).
* pval is the p-value threshold number to define which samples keep for the normalization and the following analysis;
* ChrX ia a logical value indicating whether the CpGs that belong to the X chromosome should be deleted from normalization and the following steps. The default is FALSE.
* ClustMethod is the distance measure to be used for the clustering. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" or "kendall";
If you want to make the QC analysis of your data, just substitute the Dir variable with the directory where your data are stored (i.e. Dir <- "C:/Analysis/data").
If you are interested in only one of the three available functions, type:
R> ControlResults <- getAssayControls(ImportDataR,platform="Hum27") in case you want to export only the internal controls as suggested by Illumina's guidelines;
R> QCresults <- QCCheck(ImportDataR, pval=0.05) in case you are interested in other QC analyses as distribution of Beta values or average p-value;
R> normMvalues <- NormCheck(ImportDataR, platform="Hum27", pval=0.05, ChrX=F, ClustMethod="euclidean") in case you are interested in exporting the normalized M-values and in generating PCA and hierarchical Clustering plots.
HumMeth27QCReport takes in input the three files from BeadStudio plus an optional text file with the chip control samples to discard from the normalization step:
- Sample table: it is compulsory that the file name contains the word "Sample", case sensitive, and not the others reserved words (example for platform 27k) ( example for platform 450k )
- Control table: it is compulsory that the file name contains the word "Control", case sensitive, and not the others reserved words (example for platform 27k) ( example for platform 450k )
- BetaAverage table: it is compulsory that the file name contains the word "AvgBeta", case sensitive, and not the others reserved words (example for platform 27k) ( example for platform 450k )
- Discard.txt: compulsory name
NOTE: all data were obtained in the CRG Genotyping Unit (now CRG Genomics Unit).
Required columns from BeadStudio:
|Index||Sample ID||Sample Group||Sentrix Barcode||Sample Section||Detected Genes (0.01)||Detected Genes (0.05)||Signal Average GRN||Signal Average RED||Signal P05 GRN||Signal P05 RED||Signal P25 GRN||Signal P25 RED||Signal P50 GRN||Signal P50 RED||Signal P75 GRN||Signal P75 RED||Signal P95 GRN||Signal P95 RED||Sample_Well||Sample_Plate|
Required columns from BeadStudio (<Sn> = Sample Name):
Required controls (rows):
- BISULFITE CONVERSION (4 rows)
- EXTENSION (4 rows)
- HYBRIDIZATION (3 rows)
- NEGATIVE (16 rows)
- NON-POLYMORPHIC (4 rows)
- SPECIFICITY (4 rows)
- STAINING (4 rows)
- TARGET REMOVAL
Average Beta table
Required columns from BeadStudio (<Sn> = Sample Name):
Text file containing the name of the samples (the same name present in the Sample table; one sample per row.) you want to discard from normalization. i.e. sample controls to see if chip worked properly like un-methylated samples.
HumMeth27QCReport creates as output different plots (saved in pdf files) to asses the quality of the samples:
- a histogram foreach internal control.
- an Intensity Graph plot foreach sample recalling the "plotSampleIntensities" function of methylumi package.
- a histogram with the percentage of non dectected CPG (that is the CPGs tha have a detection p-value bigger than 0.05 or 0.01.
- a histogram with the average p-value for each sample.
- a PCA of normalized Beta values
- a Cluster of normalized Beta values
As further outputs, a text file with the normalized M values and an Excel file are provided. The Excel file contains a summary of the Internal Controls and of the gene detection and different lists of non-detected CPGs.
This paragraph describes the controls used in the Illumina Infinium Methylation Assay for 27k example data, their expected outcomes, and how to view them. Diagrams are included with descriptions for sample-independent and sample-dependent controls as well as controls that are specific to the green channel or red channel. The sample-independent controls let you evaluate the quality of specific steps in the process flow, and include:
- Staining controls
- Extension controls
- Target removal controls
- Hybridization controls
The sample-dependent controls let you evaluate performance across samples, and include:
- Bisulfite conversion controls
- Specificity controls
- Negative controls
- Non-polymorphic (NP) controls
Figure 1: Barplot of DNP staining control
This figure represents the ratio (%) between background and signal for Staining control in the red channel (DNP). Staining controls are used to examine the efficiency of the staining step in both the red and green channels. Staining controls have dinitrophenyl (DNP) or biotin attached to the beads. The ratios should result in low signal, indicating that the staining step was efficient.
Figure 2: Barplot of Biotin staining control
This figure represents the ratio (%) between background and signal for Staining control in the green channel (Biotin). These controls are independent of the hybridization and extension step. The ratios should result in low signal, indicating that the staining step was efficient.
Figure 3: Barplot of hybridization control
This figure represents the ratio (%) between background and signal for Hybridization controls in the green channel for three concentrations. The hybridization controls test the overall performance of the entire assay using synthetic targets instead of amplified DNA. These synthetic targets complement the sequence on the array perfectly, allowing the probe to extend on the synthetic target as template. The synthetic targets are present in the hybridization buffer at three levels, monitoring the response from high-concentration (5 pM), medium-concentration (1 pM), and low-concentration (0.2 pM) targets. All bead type IDs should result in signal with various intensities, corresponding to the concentrations of the initial synthetic targets.
Figure 4: Barplot of target removal control
This figure represents the intensity value for Target removal controls in the green channel. Target removal controls test the efficiency of the stripping step after the extension reaction. The control oligos are extended using the probe sequence as template. This process generates labeled targets. The probe sequences are designed such that extension from the probe does not occur. All target removal controls should result in low signal, indicating that the targets were removed efficiently after extension. Values < 3400 have been detected (108 samples). There is not a range specified from illumina, the value is based on previous experiments run in our facility.
Figure 5: Barplot of extension control: green channel
This figure represents the ratio (%) between background and signal for Extension control in the green channel (C,G). Extension controls test the extension efficiency of A, T, C, and G nucleotides from a hairpin probe, and are therefore sample-independent. The ratios should result in low signal, indicating that the extension was efficient.
Figure 6: Barplot of extension control: red channel
This figure represents the ratio (%) between background and signal for Extension control in the red channel (A,T). The ratios should result in low signal, indicating that the extension was efficient.
Figure 7: Barplot of bisulfite control
This figure represents the ratio (%) between background and signal for Bisultife conversion control. The Bisulfite conversion Control asses the efficiency of bisulfite conversion of the genomic DNA. The Infinium Methylation probes query a [C/T] polymorphism created by bisulfite conversion of two different Hind III sites [AAGCTT] in the genome. If the bisulfite conversion reaction was successful, the "C" (Converted) probes will match the converted sequence and get extended. If the sample has unconverted DNA, the "U" (Unconverted) probes will get extended. There are no underlying C bases in the primer landing sites, except for the query site itself. Performance of bisulfite conversion controls should only be monitored in the Green channel. The ratios should result in low signal, indicating that the Bisulfite conversion was efficient.
Figure 8: Barplot of specificity control (mismatch 1) in red channel
This figure represents the ratio (%) between background (MM) and signal (PM) for Specificity controls in red channel. In the Infinium Methylation assay, the methylation status of a particular cytosine is carried out following bisulfite treatment of DNA by using query probes for unmethylated and methylated state of each CpG locus. In assay oligo design, the A/T match corresponds to the unmethylated status of the interrogated C, and G/C match corresponds to the methylated status of C. G/T mismatch controls check for non-specific detection of methylation signal over unmethylated background. Specificity controls are designed against non-polymorphic T sites. PM controls correspond to A/T perfect match and should give high signal. MM controls correspond to G/T mismatch and should give low signal. The ratios should result in low signal, indicating that the performance of the assay was efficient.
Figure 9: Barplot of specificity control (mismatch 2) in green channel
This figure represents the ratio (%) between background (MM) and signal (PM) for Specificity controls in the green channel. PM controls correspond to A/T perfect match and should give high signal. MM controls correspond to G/T mismatch and should give low signal. The ratios should result in low signal, indicating that the performance of the assay was efficient.
Figure 10: Barplot of negative control
This figure represents the intensity value for the Negative control. Negative control probes are randomly permutated sequences that should not hybridize to the DNA template. Negative controls are particularly important for methylation studies because of a decrease in sequence complexity after bisulfite conversion. The mean signal of these probes defines the system background. This is a comprehensive measurement of background, including signal resulting from cross-hybridization, as well as non-specific extension and imaging system background. All target negative controls should result in low signal. Values < 2500 have been detected (108 samples). There is not a range specified from illumina, the value is based on previous experiments run in our facility.
Figure 11: Barplot for green channel of non-polymorphic control
This figure represents the ratio (%) between background and signal for Non-Polymorphic control in the green channel. Non-polymorphic controls test the overall performance of the assay, from amplification to detection, by querying a particular base in a non-polymorphic region of the bisulfite genome. They let compare assay performance across different samples. One non-polymorphic control has been designed to query each of the four nucleotides (A, T, C and G). The target with the C base results from querying the opposite whole genome amplified strand generated from the converted strand. The ratios should result in low signal, indicating that the performance of the assay was efficient.
Figure 12: Barplot for red channel of non-polymorphic control
This figure represents the ratio (%) between background and signal for Non-Polymorphic control in the red channel. The ratios should result in low signal, indicating that the performance of the assay was efficient.
Figure 13: Intensity at high and low betas
(depending on the number of samples there could be more than one figure) For each sample the intensity at high and low betas is showed. The intensities as output by the GenomeStudio software often show a considerable amount of dye bias. This is a graphical example of this dye bias. In short, for each of the Cy3 and Cy5 channels, a cutoff in beta is used to calculate which Cy3 and Cy5 values should be plotted at high-methylation and low-methylation status. Any offset between Cy3 and Cy5 when plotted in this way likely represents dye bias and will lead to biases in the estimate of beta.
Figure 14: Barplot of percentages of non detected genes
This figure represents the percentage (%) of non detected genes at P-value cut-off 0.05 and p-value cut-off 0.01. Non detected genes are the CpGs with no significant AverageBeta.
Figure 15: Barplot of average detection p-values
The boxplots show the average p-value for each sample; the red dotted line is the treshold defined by the user to select the samples for the following analysis.
Figure 16: Principal Component Analysis
PCA is made on filtered and normalized data.
Figure 17: Hierarchical Clustering
The clustering is made on filtered and normalized data. The distance method is defined by the user.
 - Ladd-Acosta C, Pevsner J, Sabunciyan S, Yolken RH, Webster MJ, Dinkins T, Callinan PA, Fan JB, Potash JB, Feinberg AP: DNA methylation signatures within the human brain. Am J Hum Genet 2007, 81:1304-1315.
 - Esteller M: CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future. Oncogene 2002, 21(35):5427-5440.
 - Herman JG, Baylin SB: Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med 2003, 349(21):2042-2054.
 - Bock C, Tomazou EM, Brinkman AB, Muller F, Simmer F, Gu H, Jager N, Gnirke A, Stunnenberg HG, Meissner A: Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 2010, 28:1106-1114.
 - Bell CG, Teschendorff AE, Rakyan VK, Maxwell AP, Beck S, Savage DA: Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus. BMC Med Genomics 2010, 3:33.
 - Thirlwell C, Eymard M, Feber A, Teschendorff A, Pearce K, Lechner M,Widschwendter M, Beck S: Genome-wide DNA methylation analysis of archival formalin-fixed paraffin-embedded tissue using the Illumina Infinium HumanMethylation27 BeadChip. Methods 2010, 52(3):248-54.
 - Grafodatskaya D, Choufani S, Ferreira JC, Butcher DT, Lou Y, Zhao C, Scherer SW, Weksberg R: EBV transformation and cell culturing destabilizes DNA methylation in human lymphoblastoid cell lines. Genomics 2010, 95(2):73-83.
 - Weisenberger DJ, Berg DVD, Pan F, Berman BP, Laird PW: Comprehensive DNA Methylation Analysis on the Illumina Infinium Assay Platform. Illumina Illumina Application Note 2008.
 - illumina: Chapter 4 System Controls. In Infinium HD Assay Methylation Protocol Guide. 2010: 231-244
 - Du P, Kibbe WA, Lin SM: lumi: a pipeline for processing Illumina microarray. Bioinformatics 2008, 24:1547-1548.
 - Ihaka R, Gentleman R: R: a language for data analysis and graphics. J Comput Graph Stat 1996, 5:299-314.
 - Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19:185-193.
 - Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, Lin SM: Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 2010, 11:587.
This software is distributed only for non-commercial purposes and only for acedemic use. For any question please contact the author (mail)