Explore research by The Rohs Lab
Genome analysis views DNA as a linear string of the letters A, C, G, and T but proteins recognize DNA as a three-dimensional object (Figure). Our main interest is to understand better how transcription factors (TFs) recognize nuances in intrinsic DNA structure and to identify TF families for which the readout of local DNA shape contributes to binding specificity and explains distinct functions of closely related TFs. We aim to uncover these mysteries using a combination of experimental methods, biological domain knowledge and modern computational model design techniques rooting from recent advances in Artificial Intelligence/Deep Learning. Until recently, our research mainly focused on the analysis of TF binding sites (TFBSs) for which structural information was available.
However, sequence information for whole genomes has become available in recent years due to advances in high-throughput sequencing technologies, whereas structural information on that scale is not available. It is still unknown why certain TFs bind to similar DNA sequences but execute different in vivo functions, or in turn bind to diverse sequences. Our scientific contributions suggest that direct chemical contacts with base pairs cannot sufficiently explain binding specificity and that DNA shape is a crucial specificity determinant. We published various software, web servers and databases to high-throughput predict DNA shape features and revealed their importance in understanding TF-DNA binding events. Recent focus of our lab aims for understanding TF-DNA binding through deep learning and MD simulations. We are also exploiting possible extensions from TF-DNA bindings to drug discoveries.
Another aspect of our research is on developing models that directly capture protein-DNA interactions, without being limited to standard Watson-Crick base pairs. We aim to leverage the power of deep learning to uncover intricate physicochemical binding patterns within large amounts of sequencing data.
In conjunction with the rise of AI based methods like AlphaFold which achieves hitherto unimaginable accuracy for predicting protein structures we are developing Deep Learning models for analyzing and learning from such structures. For example, our method GeoBind, can predict whether a given structure is DNA/RNA binding or non-binding and if so annotate the binding site on said structure. DeepPBS, another AI model being developed in our lab, is able to predict experimental binding specificity from a given protein-DNA complex structure, is interpretable and promises diverse application opportunity in terms of DNA targeted protein therapeutics design, analyzing simulation trajectories and predicted protein-DNA complex structures.