Best practice guide for CHiCAGO protocol used in analysis of Capture Hi-C data

 9 August 2021  

Researchers from the Functional Gene Control group at the LMS have produced a set of best practices for the computational detection of chromosomal interactions in Capture Hi-C data using tools they have developed. These results were published in an invited article in Nature Protocols on August 9th. Postdocs Dr Helen Ray-Jones (joint first author on the paper), Dr Valeriya Malysheva and group leader Dr Mikhail Spivakov (joint senior authors), explore how their computational pipeline CHiCAGO aids the analysis of Capture Hi-C data, and why the publication of a ‘user guide’ is important.

1. Firstly, what is Hi-C and Capture Hi-C?
Hi-C is a widely-used method for profiling chromosomal interactions. Chromosomal interactions, such as chromatin looping (which brings regulatory elements in the genome close together in 3D space), are particularly important for gene expression control. As such, profiling chromosomal interactions is important for understanding gene regulation in any biological setting.

The Hi-C method uses formaldehyde to “fix” the DNA in its 3D conformation within the nucleus, enabling the DNA regions to be identified using next-generation sequencing. This method works very well for looking at the overall architecture of the genome, but it is not ideal for examining fine-scale interactions. This is because Hi-C samples are very complex, and so a great deal of sequencing would be required to detect individual interactions amongst the rest of the information.

This problem is addressed by Capture Hi-C, which “captures” specific regions of interest within the Hi-C sample. This enrichment step increases our power to detect fine-scale interactions involving those regions – which could be gene promoters, disease-associated loci or any other desired genomic element. Ourselves and others have widely used Capture Hi-C to detect the dynamics of regulatory chromosomal interactions in a variety of different cell types and conditions.

2. What are the challenges with analysing Capture Hi-C data and how does CHiCAGO combat them?
Capture Hi-C data has some challenging statistical properties that differ from standard Hi-C. For example, the interactions are “asymmetric” because the hundreds to thousands of captured viewpoints are outnumbered by the up to millions of potential interacting other ends. In addition, the regions of interest may have been captured with differing levels of success, which affects the background of the experiment. Therefore, the data has unique statistical properties that require a specialised tool for detecting true signals.

CHiCAGO (Capture Hi-C Analysis of Genomic Organisation) is a computational pipeline developed by our group that detects significant chromosomal interactions in Capture Hi-C data. To do so, it incorporates a bespoke statistical model, background correction and multiple testing procedures.

3. How does the protocol help users?
The CHiCAGO protocol, published in Nature Protocols, provides an extensive guide for users, which we hope will make analysing Capture Hi-C data more accessible for scientists, particularly those who are only just starting with this technique or with bioinformatics data analysis. We also share our experience with tuning the parameter settings of the analysis pipeline to account for data sparsity and experimental design.

In addition, we have described two companion tools to CHiCAGO that users may wish to use to further explore their results. The first, is our recently published method for detecting chromosomal interactions that are different between conditions in Capture Hi-C data (Chicdiff). In our Nature Protocols article, we provide detailed instructions for running this. The second tool is a separate method for fine-mapping chromosomal interactions (Peaky), which can be used to generate a smaller set of “causal” interacting fragments when the threshold for true interactions is exceeded. We collaborated with the authors of Peaky to develop a straightforward way to use their method as part of the CHiCAGO workflow.

4. Why is this protocol important?
Since its first use in 2015, the Capture Hi-C assay has attracted great attention and is now a popular technique for profiling chromosomal interactions across various research fields. CHiCAGO is now the de-facto standard method for analysing Capture Hi-C data and, as such, it is important to provide a comprehensive and updated guide for users of all experience levels.

Recent changes in the preparation of Hi-C and Capture Hi-C libraries, has increased the complexity of the data which in turn can increase the data sparsity and affect the background component of CHiCAGO. We were therefore motivated to evaluate the performance of CHiCAGO on this data and to provide advice on how to tune the computational parameters to suit the user’s restriction enzyme of choice, experimental design and depth of sequencing.

5. Have you applied this technique to any research?
CHiCAGO analysis has been applied in an increasing number of studies using Capture Hi-C to link 3D chromosomal conformation and gene control. Most recently our group has contributed to a major collaborative study on SARS-Cov-2, where we used Promoter Capture Hi-C and CHiCAGO to reveal the rewiring of chromosomal interactions upon infection.

6. Do you have any plans to develop CHiCAGO further?
CHiCAGO is now a mature tool that is used by many labs around the world. However, a long-standing challenge in the field has been the lack of reliable “gold standards” of functional enhancer-promoter contacts. Emerging data from high-throughput enhancer perturbation assays are starting to provide opportunities to validate and further improve the performance of CHiCAGO and related methods.

“Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools” was published in Nature Protocols August 9th 2021. Read the full paper here.