Biography:

In the past Chun-Ting Zhang has collaborated on articles with Ren Zhang and Hong-Yu Ou. One of their most recent publications is Upper limit for the variances of some helical parameters in DNA double helix. Which was published in journal International Journal of Biological Macromolecules.

More information about Chun-Ting Zhang research including statistics on their citations can be found on their Copernicus Academic profile page.

Chun-Ting Zhang's Articles: (10)

Upper limit for the variances of some helical parameters in DNA double helix

AbstractAssuming that the realistic DNA chains are random sequences of purines and pyrimidines and, by using the Dickerson sum functions, it is shown that there exists an upper limit for the variance of helical twist angles, the base-plane roll angles and the other helical parameters respectively in the DNA sequences. The estimates of the variances of all the above helical parameters for the DNA sequences in the Los Alamos data base have been performed and found to be in good agreement with the theoretical results obtained in this paper.

Analysis of sequences of twist angles in DNA double helix

AbstractBy assuming that the realistic DNA chains are random sequence of bases and using the Tuang-Harvey formula for the prediction of twist angles, it is shown that the mean value of the sequence of twist angles is almost sequence-independent. In general the variance for the A, T-rich sequence is larger than that of G, C-rich sequence. There exists an upper bound for the variance of all possible sequences, i.e. the variance is not greater than 27 deg2. It is pointed out that the large conformational deviation from ideal DNA is an important factor for the recognition of DNA with protein/enzyme.

Beat motion in DNA double helix and a mechanism of energy exchange between its two strands with microwave frequency

AbstractA linear vibration theory of base rotation for the B-form DNA double helix is proposed in terms of a new Hamiltonian in which, in addition to the hydrogen-bond energy, the dipole-dipole interaction between two bases in a base pair is taken into account. The H-bond energy takes the general form and is expanded in terms of a Taylor series. In an important case in which the vibration amplitude is small, a set of coupled Klein-Gordon equations has been derived. The solution of the equations under some initial and boundary conditions shows that a special motion form, the beat motion, occurs between the bases and their complementary bases. According to the beat motion, the energy flows back and forth between the bases and their complementary bases. The value of the beat frequency strongly depends on the dipole moments of the bases. In the case of Poly(dG)-Poly(dC), the beat frequency is estimated to be about 2 GHz, i.e., in the range of microwaves. The precise value of the beat frequency depends on the vibration modes, the length and the shape (linear or circular) of the DNA chain. It is shown that the phenomenon of beat is mainly caused by the dipole-dipole interaction. The resonant absorption of the microwave energy and the possible biological implication of the beat motion have been discussed. However, further experimental confirmation is needed before any major advances can be made.

Multiple replication origins of the archaeon Halobacterium species NRC-1

AbstractThe genomic sequence of the halophilic archaeon Halobacterium NRC-1 has been analyzed by the Z curve method. The Z curve is a three-dimensional curve that uniquely represents a given DNA sequence. Based on the known behaviors of the Z curves for the archaea whose replication origins have been identified, the analysis of the Z curve for the genome of Halobacterium NRC-1 strongly suggests that the large genome has two replication origins, oriC1 (921,863–922,014) and oriC2 (1,806,444–1,807,229), which are located at two sharp peaks of the Z curve. These two regions are next to the cdc6 genes and contain multiple copies of stretches of G and C, i.e., ggggtgggg and ccccacccc, which may also be regarded as direct and inverted repeats. Based on the above analysis, a model of replication of Halobacterium NRC-1 with two replication origins and two termini has been proposed. The experimental confirmation of this model would constitute the first example of multiple replication origins of archaea, which will finally provide much insight into the understanding of replication mechanisms of eukaryotic organisms, including human. In addition, the potential multiple replication origins of the archaeon Sulfolobus solfataricus are suggested by the analysis based on the Z curve method.

Analysis of nucleotide distribution in the genome of Streptomyces coelicolor A3(2) using the Z curve method

AbstractThe nucleotide distribution of all 33 527 open reading frames (ORFs) (≥300 bp) in the genome of Streptomyces coelicolor A3(2) has been analyzed using the Z curve method. Each ORF is mapped onto a point in a 9-dimensional space. To visualize the distribution of mapping points, the points are projected onto the principal plane based on principal component analysis. Consequently, the distribution pattern of the 33 527 points in the principal plane shows a flower-like shape, in which there are seven distinct regions. In addition to the central region, there are six petal-like regions around the center, one of which corresponds to 7172 coding sequences. The central region and the remaining five petal-like regions correspond to the intergenic sequences and out-of-frame non-coding ORFs, respectively. It is shown that selective pressure produces a remarkable bias of the G+C content among three codon positions, resulting in the interesting phenomenon observed. A similar phenomenon is also observed for other bacterial genomes with high genomic G+C content, such as Pseudomonas aeruginosa PA01 (G+C=66.6%). However, for the genomes of Bacillus subtilis (G+C=43.5%) and Clostridium perfringens (G+C=28.6%), no similar phenomenon was observed. The finding presented here may be useful to improve the gene-finding algorithms for genomes with high G+C content. A set of supplementary materials including the plots displaying the base distribution patterns of ORFs in 12 prokaryotes is provided on the website http://tubic.tju.edu.cn/highGC/.

A new quantitative criterion to distinguish between α/β and α+β proteins (domains)

AbstractAccording to the statistical analysis, it is shown that the differences of the content of α-helix and β-strand between α/β and α+β proteins are of statistical significance. Based on the secondary structure content and the percentage of parallel or anti-parallel strands, any mixed αβ protein can be represented by a point in a three-dimensional prism. The distribution of the mapping points for 79 mixed αβ proteins (domains), of which 26 are class α/β and 53 are class α+β, shows that the two kinds of points are situated at distinct regions roughly. A new quantitative criterion based on the Fisher discriminant algorithm is proposed to distinguish between the α/β and α+β proteins (domains). Of the 79 proteins 77 are correctly classified (97.5%). As a stringent cross-validation test, the jackknife test shows that of the 79 proteins 77 are correctly classified. The jackknife test accuracy is still 97.5%. These figures indicate the self-consistence and the extrapolating effectiveness of the new quantitative criterion. Applying the new criterion to reclassify the α/β and α+β proteins (domains) in SCOP is also discussed. It is hoped that the new quantitative criterion will be useful for the development of protein classification databases.

Regular ArticleA Graphic Approach to Analyzing Codon Usage in 1562 Escherichia coli Protein Coding Sequences

AbstractThe occurrence frequencies of the four bases (adenine, cytosine, guanine and thymine) at each of the three codon positions for 1562 Escherichia coli protein coding sequences have been calculated. The 1562 × 4 × 3 = 18,744 data thus obtained have been analyzed by a graphic method in which the four base occurrence frequencies at each codon position for each coding sequence are represented by a point in a three-dimensional space. Thus, the 18,744 data, which would otherwise occupy several printed pages, can be intuitively displayed by a graph. The point distribution pattern for each of the three codon positions has been analyzed. The results of our analysis indicate that the patterns for the first two codon positions reflect the origin for producing native folding structures of proteins. We thus come to the conclusion that the distribution patterns for the first two codon positions should be basically species-independent, as confirmed by studies for a number of other species. However, the distribution pattern for the third codon position is species-dependent. Based on the point distribution of the third codon position, six collective parameters have been defined to describe the overall feature of the pattern concerned. These collective parameters can be generally used to classify different species, and hence would be a useful vehicle for studies in taxonomy. In addition to E. coli, the collective parameters for a number of other species have been calculated and analyzed.

Regular ArticleA Joint Prediction of the Folding Types of 1490 Human Proteins from their Genetic Codons

AbstractThe codon usages for 1490 human proteins have been published by Wada et al. (1990). Based on these data, the frequencies of occurrence of 20 amino acids for each of the 1490 proteins have been calculated according to the genetic codes. Proteins are generally classified into five folding types, i.e. the α, β, α + β, α/β and ζ (irregular) types. The folding type of a protein is correlated to its amino acid composition. By means of three methods established by different investigators, the folding type for each of the 1490 human proteins has been predicted. It has been demonstrated that the accuracy of prediction for the 1490 human proteins is at least 80% by examining the predicted results of some structurally known proteins with these methods. There are only six proteins for which there is uncertainty about their folding types as completely inconsistent results were obtained when predicted with the three different methods. For the remaining 1484 human proteins the numbers of α, β, α + β, α/β, and ζ folding type proteins were found to be 128, 235, 169, 933 and 19, respectively, suggesting that the α/β type proteins would predominate in this set of human proteins. The occurrence frequencies of bases in the first, second and third codon position for each folding type of protein have been calculated. It is shown that the folding type of a protein is strongly dependent on the ratio of frequency of base G in the first codon position with that in the second codon position. The biological implication of the results has been discussed.

Regular ArticleAnalysis on the Distribution of Bases in 1487 Human Protein Coding Sequences

AbstractThe occurrence frequencies of bases A, C, G and T, denoted by a, c, g and t, respectively, in 1487 human protein coding sequences have been calculated and analyzed. The analysis has been performed by a diagrammatic method presented recently, in which each coding sequence is represented by a point in 3-D space. The distribution of points gives the observer an overall and intuitive picture of the base frequencies. The distance between a point and the origin of the co-ordinate, which corresponds to the case of a = c = g = t = 1/4, is called the radical distance. The radical distribution of 1487 points in 3-D space has been found to be normal, with the center basically coinciding with the origin of the co-ordinate. We have found that among 1487 coding sequences, an empirical rule a2 + c2 + g2 + t2 < 1/3 holds for 1486 sequences. The only sequence in which the above rule does not hold is the one coding for the human parathymosin protein. The composition of amino acids and the structural class of this protein has been studied in some detail.

A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins

AbstractZp curve, a three-dimensional space curve representation of protein primary sequence based on the hydrophobicity and charged properties of amino acid residues along the primary sequence is suggested. Relying on the Zp parameters extracted from the three components of the Zp curve and the Bayes discriminant algorithm, the subcellular locations of prokaryotic proteins were predicted. Consequently, an accuracy of 81.5% in the cross-validation test has been achieved using 13 parameters extracted from the curve for the database of 997 prokaryotic proteins. The result is slightly better than that of using the neural network method (80.9%) based on the amino acid composition for the same database. By jointing the amino acid composition and the Zp parameters, the overall predictive accuracy 89.6% can be achieved. It is about 3% higher than that of the Bayes discriminant algorithm based merely on the amino acid composition for the same database. The prediction is also performed with a larger dataset derived from the version 39 SWISS-PROT databank and two datasets with different sequence similarity. Even for the dataset of non-sequence similarity, the improvement can be of 4.4% in the cross-validation test. The results indicate that the Zp parameters are effective in representing the information within a protein primary sequence. The method of extracting information from the primary structure may be useful for other areas of protein studies.

Advertisement
Join Copernicus Academic and get access to over 12 million papers authored by 7+ million academics.
Join for free!

Contact us