Tools & Data
Computational Analysis Tools & Additional Data
This update of the DNAproDB database expands the number of pre-analyzed protein-DNA structures, which are automatically updated weekly. The analysis pipeline identifies water-mediated hydrogen bonds that are incorporated into the visualizations of protein-DNA complexes. Tertiary structure-aware nucleotide layouts are now available. New file formats and external database annotations are supported. The website has been redesigned, and interacting with graphs and data is more intuitive.
Mitra et al. DNAproDB: an updated database for the automated and interactive analysis of protein–DNA complexes
Nucleic Acids Res. (2024)
Deep Predictor of Binding Specificity (DeepPBS) is a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Mitra et al. Geometric deep learning for interpretable prediction of protein-DNA binding specificity.
Nat. Methods 21, 1674–1683
Deep DNAshape can predict DNA shape features at the core of a DNA fragment considering flanking regions of up to seven base pairs, trained on molecular simulation data. Deep DNAshape is a webserver, which has the benefits of the recently published Deep DNAshape method (Li et al. Nat. Commun. 15, 1243, 2024) [Link to Reference 81] and prior pentamer-based DNAshape webserver while being accurate, fast, and accessible to all users. Additional improvements of the webserver include the detection of user input in real time, the ability of interactive visualization tools and different modes of analyses
Li et al. Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k-mers.
Nucleic Acids Res. 52, W354-W361 (2024)
The RNAscape webserver produces visualizations that can preserve topological correspondence while remaining both visually intuitive and structurally insightful. RNAscape achieves this by designing a mathematical structural mapping algorithm which prioritizes the helical segments, reflecting their tertiary organization. RNAscape natively supports non-standard nucleotides, multiple base-pairing annotation styles and requires no programming experience. RNAscape can also be used to analyze RNA/DNA hybrid structures and DNA topologies, including G-quadruplexes.
Mitra et al. RNAscape: geometric mapping and customizable visualization of RNA structure.
Top-Down Crawl (TDC) is an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods.
Cooper et al. Top-Down-Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics.
Bioinformatics 38, 5121-5123 (2022)
TFBSshape is a motif database for analyzing structural profiles of transcription factor binding sites (TFBSs). This new release includes new entries from the JASPAR and UniPROBE databases, methylated TFBSs derived from in vitro high-throughput binding assays and in vivo methylated TFBSs. The structural profiles for each TFBS entry now include 13 shape features and minor groove electrostatic potential for DNA and four shape features for methylated DNA. We designed new tools for the shape-based alignment of TFBSs, for the comparison of methylated and unmethylated shape profiles, and for the design of shape-preserving nucleotide mutations in TFBSs.
Chiu et al. TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 48, D246-255 (2020)
Link to TFBSshape database
We developed a new method to calculate the electrostatic potential in the minor groove in a high-throughput manner for any length or number of sequences based on the data mining of results from solving the non-linear Poisson-Boltzmann equation for many DNA fragments with diverse sequences. To model DNA binding specificities of transcription factors using electrostatic potential, we included a statistical machine learning approach (MLR) that combines minor-groove electrostatic potential with DNA sequence features.
Chiu et al. Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding.
Nucleic Acids Res. 45, 12565-12576 (2017)
Link to DNAphi web server
DNAproDB is a database and structural analysis tool that offers data visualization, data processing and search functionality with which researchers can analyze, access and visualize structural data of DNA-protein complexes. The new release of DNAproDB supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features and improved structural moiety assignment.
Sagendorf et al. DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes.
Nucleic Acids Res. 48, D277-287 (2020)
Link to DNAproDB database
We developed a high-throughput method for predicting the effect of cytosine methylation on DNA shape and its subsequent influence on protein-DNA interactions. This approach overcomes the limited availability of experimental DNA structures that contain 5-methylcytosine.
Rao et al. Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding.
Epigenetics Chromatin 11, 6 (2018)
Link to methyl-DNAshape web server
DNAproDB is a database and web-based visualization tool which is intended to make structural analysis of DNA-Protein complexes easy. Here users can find a wealth of data on the structure of and interaction between DNA and proteins in complex for currently 2,568 structures contained in the PDB. This data can be used to analyze individual structures, or to generate large datasets by constructing queries on a set of structural and interaction features using the search form. Additionally, the users can upload their own structure using the upload form, and use the same processing and visualization tools for unpublished data.
Sagendorf et al. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes.
Nucleic Acids Res. 45, W89-W97 (2017)
Link to DNAproDB web server
DNAshapeR is a software package implemented in the statistical programming language R that predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input, and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies.
Chiu et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.
Bioinformatics 32, 1211-1213 (2016)
Link to DNAshapeR software package
GBshape provides DNA shape annotations of entire genomes. The database currently contains annotations for minor groove width, roll, propeller twist, helix twist and hydroxyl radical cleavage for 98 different organisms. Additional genomes can easily be added in the provided framework. GBshape contains two major tools, a genome browser and a table browser. The genome browser provides a graphical representation of DNA shape annotations along standard genome browser annotations.
Chiu et al. GBshape: a genome browser database for DNA shape annotations.
Nucleic Acids Res. 43, D103-109 (2015)
Link to GBshape database
Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-155 (2014)
Link to TFBSshape database
We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.
Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-62 (2013)
Link to DNAshape web server
Additional Data
Li et al. Predicting of DNA structure using a deep learning method.
Rao et al. Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding.
J. Li et al. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.
Nucleic Acids Res. 45, 12877-12887 (2017)
Supplementary Information
T.P. Chiu et al. Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding.
Nucleic Acids Res. 45, 12565-12576 (2017)
Supplementary Information
L. Yang et al. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.
Mol. Syst. Biol. 13, 910 (2017)
Supplementary Information
T. Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)
Supplementary Information
N. Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)
Supplementary Information