LMS to host the most comprehensive atlas yet of genomic data on zebrafish

 7 July 2022  

Researchers at the MRC London Institute of Medical Sciences (LMS),University of Birmingham, and Karolinska Institute (Sweden) led the consortium of 27 laboratories to construct the most comprehensive atlas yet of genomic data on zebrafish, a major model organism for studying vertebrate development and disease. The atlas will help researchers to better study conditions from various types of cancer, including skin cancer, to heart disease and neurodegeneration. It may also help more researchers replace mammal models in their studies.  

The DANIO-CODE consortium, a multinational team of 27 laboratories, worked together to identify over 140,000 regulatory regions driving gene expression in zebrafish. For this purpose, the researchers catalogued, published and generated complementary new genomic data, resulting in over 1800 genomic datasets. 

Zebrafish are ideal laboratory animals for studying various diseases and disorders. They have unusual regenerative properties, which have already provided important insights into human diseases. With the new catalogue, the field moves a significant step closer to allowing researchers around the world to pursue at pace novel treatments, drugs, and a better understanding of human and animal diseases. 

“I was astonished by how well the community embraced this resource, showing us that we created something useful and it’s inspired us to continue to maintain and develop it. I believe this is only the beginning and a stepping stone for many important discoveries.”, says Damir Baranasic, the presenting author of the study and part of the LMS Computational Regulatory Genomics group.

“This has been a major effort requiring coordination of researchers across the globe, for which we are all grateful to Professor Ferenc Mueller, whose perseverance, optimism, and the ability to make the rest of us get things done have been the main driving source behind both the consortium and the paper. While the community will appreciate it as a comprehensive, integrated resource, for me the most exciting thing about it was the demonstration of how much more biological knowledge can be extracted using solely computational methods from existing data, including previously undescribed phenomena.”, says Professor Boris Lenhard, one of the project’s lead scientists. 

Zebrafish, the second most-used animal model in medical and life sciences research (after the mouse), still lacked a systematic functional annotation program, readily available for other popular model organisms. This study draws on 1,802 genomic datasets to provide the broadest picture of candidate DNA regions relevant for transgenic breeding and genetic research into vertebrate development and disease.  

This large-scale analysis has resulted in several new biological discoveries, described in the paper. For example, the study describes new insights into regulatory subclasses of accessible chromatin elements relevant for healthy embryonic development. It also reveals the establishment of the long-range regulation of gene expression in the early stages of embryonic development. Finally, it identifies functionally equivalent DNA regions between zebrafish and mammals even in the absence of sequence similarity. To achieve this, the researchers developed several novel methodological approaches in data analysis. The methods include a new way of visualising epigenomic profiles and establishing an anchored projection tool to uncover functional counterparts without sequence conservation on long evolutionary distances, namely between zebrafish and mice. 

Overall, the DANIO-CODE study is an important milestone for the global zebrafish research community and for developmental biology in general. Its findings will help design novel experimental systems aiming to understand the progression of human disease, embryonic development, and many other biologically relevant questions. As such, this vital resource must be continually updated by integrating datasets consisting of new types of data, which are growing daily. 

The article was published in Nature Genetics on the 4 July 2022. ( 

Most of the work at LMS was performed by the Computational Regulatory Genomics group in collaboration with the Developmental Epigenomics group at LMS. The study was partly enabled by ZENCODE-ITN founded from the Horizon 2020 programme. Other fundings also include projects and programmes financed by MRC, BBSRC and Wellcome Trust. The presenting author, Damir Baranasic holds the Rutherford Fund Fellowship.