Embargo until
Journal Title
Journal ISSN
Volume Title
Johns Hopkins University
There is growing interest in proteins that lack a stable and well-defined three-dimensional structure, often referred to as intrinsically disordered proteins, but have functionally important properties that depend on the lack of structure. It has been shown that these proteins possess a range of important properties and functions that derive from being disordered. In this dissertation I explore the properties of intrinsically disordered proteins with both computational and experimental methods. First, I present a support vector machine (SVM) trained on naturally occurring disordered and ordered proteins, which is used to examine the contribution of various parameters to recognizing proteins that contain disordered regions. I show that a SVM that incorporates only amino acid composition has a recognition accuracy of 87+/-2%. This result suggests that composition alone is sufficient to accurately recognize disorder. Interestingly, SVMs using reduced sets of amino acids based on chemical similarity preserve high recognition accuracy. A set as small as four retains an accuracy of 84+/-2%; this result suggests that general physicochemical properties rather than specific amino acids are important factors contributing to protein disorder. Second, I build on the SVM analysis by examining the relationship of disorder propensity to sequence complexity. I graph the distributions of 40 amino acid peptides from both ordered and disordered proteins in disorder-complexity space. An analysis of the Swiss-Prot database shows that most peptides are of high complexity and relatively low disorder. However, there are also an appreciable number of low complexity-high disorder peptides in the database. In contrast, there are no low complexity-low disorder peptides. A similar analysis for peptides in the Protein Data Bank (PDB) reveals a much narrower distribution, with few peptides of low complexity and high disorder. I also examine disorder-complexity distributions of individual proteins and sets of proteins grouped by function. Among individual proteins, there are an enormous variety of distributions that in some cases can be rationalized with regard to function. Groups of functionally related proteins are found to have distributions that are similar within each group, but show notable differences between groups. In addition, I use a pattern-matching algorithm to search for proteins with particular disorder-complexity distributions. The results suggest that this approach might be used to identify relationships between otherwise dissimilar proteins. Finally, I present experimental results from the cloning, expression, and characterization of the disordered projection domain of microtubule-associated protein 2. Using analytical ultracentrifugation, I show that the hydrodynamic properties of the protein are responsive to changes in ionic strength, pH, and protein phosphorylation in a manner expected for a flexible, charged polymer. This result suggests that disordered proteins can be represented by theoretical models for polyelectrolytes. The computational and experimental methods described here contribute to a better understanding of the properties of intrinsically disordered proteins and lay the foundation for possible applications in biomedicine.