Research

Explore Research by The Rohs Lab on (1) Protein-DNA binding specificity, (2) Drug design, (3) DNA structure, and (4) RNA structure

1.

Predicting protein–DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein–DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein–DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology. See for more details:

https://www.nature.com/articles/s41592-024-02372-w

2.

Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE’s hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome. See for more details:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c01193

3.

Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies. See for more details:

https://www.nature.com/articles/s41467-024-45191-5

4.

Analyzing and visualizing the tertiary structure and complex interactions of RNA is essential for being able to mechanistically decipher their molecular functions in vivo. Secondary structure visualization software can portray many aspects of RNA; however, these layouts are often unable to preserve topological correspondence since they do not consider tertiary interactions between different regions of an RNA molecule. Likewise, quaternary interactions between two or more interacting RNA molecules are not considered in secondary structure visualization tools. The RNAscape webserver produces visualizations that can preserve topological correspondence while remaining both visually intuitive and structurally insightful. RNAscape achieves this by designing a mathematical structural mapping algorithm which prioritizes the helical segments, reflecting their tertiary organization. Non-helical segments are mapped in a way that minimizes structural clutter. RNAscape runs a plotting script that is designed to generate publication-quality images. RNAscape natively supports non-standard nucleotides, multiple base-pairing annotation styles and requires no programming experience. RNAscape can also be used to analyze RNA/DNA hybrid structures and DNA topologies, including G-quadruplexes. Users can upload their own three-dimensional structures or enter a Protein Data Bank (PDB) ID of an existing structure. The RNAscape webserver allows users to customize visualizations through various settings as desired. See for more details:

https://academic.oup.com/nar/article/52/W1/W354/7648766