The next Step: Exploring the Proteome:
Translation and Beyond

Phylogenomics of Protein Domains

Eugene V. Koonin, Senior Investigator
Computational Biology Branch, NCBI, NLM, NIH


Recent progress in three major directions of research, genomics, protein sequence analysis and protein structure determination join in the new field of evolutionary genomics (phylogenomics) and creates unprecedented opportunities for a deep exploration of the protein universe. Comparisons of the protein sets encoded in sequenced genomes with sensitive methods for sequence and structure analysis provides for a confident reconstruction of parts of the proteome of the Last Universal Common Ancestor (LUCA) of all extant life forms. This analysis shows that LUCA encoded a complex repertoire of functionally diverse proteins but probably did not have a modern-type system for DNA replication. Examination of ancient protein families also provides for reconstruction of evolutionary events prior to LUCA, including some that might have been involved in the transition from the primordial ribozyme world to the protein world of modern-type cells. At the later stages of evolution, particularly during and after the emergence of eukaryotes, a major role in the diversification of the protein world belonged to domain accretion and shuffling. A clear trend of domain accretion in orthologous lines is observed in parallel with the increase in complexity of eukaryotic organisms. In addition, a significant number of new domains have emerged de novo at the prokaryote-eukaryote transition and during diversification of the major eukaryotic lineages. The majority of such new domains appear to have a-helical structure and probably have evolved from non-specific coiled-coil and other non-globular domains.

In addition to fundamental insights into the evolution of the protein world, phylogenomics provides specific input for structural and functional genomics. An important contribution of comparative studies to functional genomics is “guilt by association”, prediction of protein function on the basis of domain fusion or gene juxtaposition in prokaryotic genomes. In this lecture I will try to present an overview of the fundamental and more practical aspects of phylogenomics of proteins and domains.

Biographical sketch:


PRINCIPAL RESEARCH GOALS Comparative analysis of sequenced genomes and automatic methods for functional annotation of newly sequenced genomes. Prediction of protein functions. Genome evolution; reconstruction of ancestral life forms and large-scale evolutionary scenarios. Methods for protein motif identification and fold recognition; construction of systematic motif libraries.

1973-1978 Virology Branch., Department of Biology, Moscow State University Biology diploma /equiv.of M.Sc./, 1978 Summa cum laude Supervisor: Professor V. I. Agol "A comparative study on the UV resistance of single-stranded and double-stranded RNA of encephalomyocarditis virus: Evaluation of the possible contribution of host-mediated repair".
PhD - Department of Biology, Moscow State University, 1983 Supervisor: Professor V. I. Agol "Multienzyme organization of encephalomyocarditis virus replication complexes"

1996- Senior Investigator, National Center for Biotechnology Information, National Library of Medicine, N.I.H., Bethesda, MD
11/15/1991-1996 Visiting Scientist, National Center for Biotechnology Information, National Library of Medicine, N.I.H., Bethesda, MD
02/1991-05/1991 Visiting Scientist, Biology Department, Texas A&M University, College Station, TX
1990-1991 Head, Laboratory of Gene Systematics and Bacterial Evolution, Institute of Microbiology, USSR Academy of Sciences
1989-1990 senior research scientist, Laboratory of Bacterial Genetics, Institute of Microbiology, USSR Academy of Sciences
1986-1988 senior research scientist, Laboratory of Virus Biochemistry, Institute of Poliomyelitis, USSR Academy of Medical Sciences
1983-1985 research scientist, Laboratory of Virus Biochemistry, Institute of Poliomyelitis, USSR Academy of Medical Sciences

Honors and awards
2001 Editorial Board, Archaea
2000-present Editorial Board, Nucleic Acids Research
1999 National Library of Medicine Board of Regents Award
1999 Guest Editor, "Current Opinions in Genetics and Development Genomes and Evolution"
1999 Co-organizer, Cold Spring Harbor Laboratory Workshop on Computational Biology Bridging the Gap Between Sequence and Function, Cold Spring Harbor, Sept 7-9
1999-present Editorial Board, Bioinformatics
1998-present Editor of the "Genome Analysis" section, Trends in Genetics
1997-present Associate Editor, In silico Biology
1997, 1999, 2001 Co-Chair of the Program Committee, Conference on In Silico Biology, Atlanta, GA.
1995 Georgia Institute of Technology Phi-Beta-Kappa Award for the best research paper of the year
1993-1999. Member of the Executive Committee, International Committee for Taxonomy of Viruses
1992. Guest Editor, Seminars in Virology - Evolution of Viral Genomes
Editorial Boards: Nucleic Acids Research, Genome Biology, Bioinformatics
Peer review of manuscripts for Nature, Nature Genetics, Science, Cell, Trends Biochem. Sci., J. Mol. Biol., Nucleic Acids Res., Proteins etc.

Invited lectures and seminars (a selection of talks given in 1998-1999):
11/09/1999 The 11th NAS Symposium Frontiers of Science, Irvine, California Invited lecture "Comparative genomics and its effect on our understanding of evolution"
8/24/1999 8th Congress of the European Society for Evolutionary Biology, Barcelona, Spain. Invited talk "Horizontal gene transfer: evidence and role in the evolution of prokaryotes"
8/7/1999 Intelligent Systems in Molecular Biology-99, Heidelberg, Germany. Keynote lecture "Comparative genomics: Is it changing the paradigm of evolutionary biology?"
7/13/1999 International Society for the Study of the Origins of Life-99 meeting, San Diego, CA. Plenary lecture "How far back can we see through genome comparison?"
12/11/1998 Genome Informatics Workshop, Tokyo, Japan Invited talk "Comparative Genomics: Is it changing the paradigm of evolutionary biology?"
10/07/1998 - NIH Research Festival, NIH, Bethesda, MD Plenary lecture "The minimal set of genes required to form a cell" 04/17/1998 NIH Director's Seminar Series, NIH, Bethesda, MD "Complete genomes of cellular life forms the first major lessons from comparative analysis"

SELECTED BIBLIOGRAPHY (2000-2001). A selection from ~300 peer-reviewed and invited articles.
Peer-reviewed papers:
International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.
Aravind L, Dixit VM, Koonin EV (2001). The programmed cell death molecular machinery: genome comparisons show vastly increased complexity in vertebrates. Science 291, 1279-1284.
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: 22-28.
Uren AG, O'Rourke K, Aravind L, Pisabarro TM, Seshagiri S, Koonin EV, Dixit VM. (2000) Identification of Paracaspases and Metacaspases: two ancient families of caspase-related proteins, one of which plays a central role in MALT lymphoma. Mol. Cell. 6: 961-967.
Aravind L, Watanabe H, Lipman DJ, Koonin EV. (2000) Lineage-specific loss and divergence of functionally-linked genes in eukaryotes. Proc Natl Acad Sci U S A 97: 12068-12073.
Uhlmann F., Wernic D., Poupart M-A, Koonin EV, Nasmyth K. (2000) Cleavage of cohesin by the CD-clan protease separin triggers anaphase in yeast. Cell 103: 375-386.
Wolf YI, Kondrashov FA, Koonin EV. (2000) No footprints of primordial introns in a eukaryotic genome. Trends Genet. 16: 333-334.
Boja O, Suzuki MT, Aravind, L, Koonin EV, Hadd A, Nguyen LP, Jovanovich SB, Gates C, Feldman RA,. DeLong EF. (2000) Bacterial Bacteriorhodopsin: Evidence for Light-Driven Proton Pumping in the Sea. Science 289: 1902-1906.
Grishin NV, Wolf YI, Koonin EV (2000) From complete genomes to measures of substitution rate variability within and between proteins.Genome Res. 10:991-1000.
Wolf YI, Grishin NV, Koonin EV. (2000) Estimating the Number of Protein Folds and Families from Complete Genome Data. J. Mol. Biol. 299: 897-905.
Koonin EV, Aravind L, Hofmann K, Tschopp J, Dixit VM (1999) Apoptosis. Searching for domains in FLASH. Nature 401: 662.
Koonin EV, Aravind L, Kondrashov AS. (2000) The impact of comparative genomics on our understanding of evolution. Cell. 101:573-6.
Galperin MY, Koonin EV. (2000) Tell me who is your neighbor... New computational approaches in functional genomics (review). Nature Biotechnol. 18: 609-613.

Contact Information:

Eugene V. Koonin, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, 8600 Rockville Pike, Bethesda, MD 20894 Tel: (301)435-5913, Fax: (301)480-9241, email: