Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological pathways of cells. The term "proteomics" was coined to make an analogy with genomics, the study of the genes. The word "proteome" is a portmanteau of "protein" and "genome". The proteome of an organism is the set of proteins produced by it during its life, and its genome is its set of genes.
Proteomics is often considered the next step in the study of biological systems, after genomics. It is much more complicated than genomics, mostly because while an organism's genome is rather constant, a proteome differs from cell to cell and constantly changes through its biochemical interactions with the genome and the environment. One organism has radically different protein expression in different parts of its body, different stages of its life cycle and different environmental conditions. Another major difficulty is the complexity of proteins relative to nucleic acids.
Scientists are very interested in proteomics because it gives a much better understanding of an organism than genomics. First, the level of transcription of a gene gives only a rough estimate of its level of expression into a protein. An mRNA produced in abundance may be degraded rapidly or translated inefficiently, resulting in a small amount of protein. Second, many proteins experience post-translational modifications that profoundly affect their activities; for example some proteins are not active until they become phosphorylated. Methods such as phosphoproteomics and glycoproteomics are used to study post-translational modifications. Third, many transcripts give rise to more than one protein, through alternative splicing or alternative post-translational modifications. Finally, many proteins form complexes with other proteins or RNA molecules, and only function in the presence of these other molecules.
Since proteins play a central role in the life of an organism, proteomics is instrumental in discovery of biomarkers, such as markers that indicate a particular disease.
With the completion of a rough draft of the human genome, many researchers are looking at how genes and proteins interact to form other proteins. A surprising finding of the Human Genome Project is that there are far fewer protein-coding genes in the human genome than proteins in the human proteome (20,000 to 25,000 genes vs. about 1,000,000 proteins). The human body may contain more than 2 million proteins, each having different functions. The protein diversity is thought to be due to alternative splicing and post-translational modification of proteins. The discrepancy implies that protein diversity cannot be fully characterized by gene expression analysis, thus proteomics is useful for characterizing cells and tissues.
To catalog all human proteins, their functions and interactions is a great challenge for scientists. An international collaboration with these goals is co-ordinated by the Human Proteome Organization (HUPO).
Most proteins function in collaboration with other proteins, and one goal of proteomics is to identify which proteins interact. This often gives important clues about the functions of newly discovered proteins. Several methods are available to probe protein-protein interactions. The traditional method is yeast two-hybrid analysis. New methods include protein microarrays, immunoaffinity chromatography followed by mass spectrometry, and combinations of experimental methods such as phage display and computational methods.
Current research in proteomics requires first that proteins be resolved, sometimes on a massive scale. Protein separation can be performed using two-dimensional gel electrophoresis, which usually separates proteins first by isoelectric point and then by molecular weight. Protein spots in a gel can be visualized using a variety of chemical stains or fluorescent markers. Proteins can often be quantified by the intensity of their stain. Once proteins are separated and quantified, they are identified. Individual spots are cut out of the gel and cleaved into peptides with proteolytic enzymes. These peptides can then be identified by mass spectrometry, specifically matrix-assisted laser desorption-ionization time-of-flight (MALDI-TOF) mass spectrometry. In this procedure, a peptide is placed on a matrix, which causes the peptide to form crystals. Then the peptide on the matrix is ionized with a laser beam and an increase in voltage at the matrix is used to shoot the ions toward a detector in which the time it takes an ion to reach the detector depends on its mass. The higher the mass, the longer the time of flight of the ion. In a MALDI-TOF mass spectrometer, the ions can also be deflected with an electrostatic reflector that also focuses the ion beam. Thus, the masses of the ions reaching the second detector can be determined with high precision and these masses can reveal the exact chemical compositions of the peptides, and therefore their identities.
Protein mixtures can also be analyzed without prior separation. These procedures begin with proteolytic digestion of the proteins in a complex mixture. The resulting peptides are often injected onto a high pressure liquid chromatography column (HPLC) that separates peptides based on hydrophobicity. HPLC can be coupled directly to a time-of-flight mass spectrometer using electrospray ionization. Peptides eluting from the column can be identified by tandem mass spectrometry (MS/MS). The first stage of tandem MS/MS isolates individual peptide ions, and the second breaks the peptides into fragments and uses the fragmentation pattern to determine their amino acid sequences. Labeling with isotope tags can be used to quantitatively compare proteins concentration among two or more protein samples.
One of the most promising developments to come from the study of human genes and proteins has been the identification of potential new drugs for the treatment of disease. This relies on genome and proteome information to identify proteins associated with a disease, which computer software can then use as targets for new drugs. For example, if a certain protein is implicated in a disease, its 3D structure provides the information to design drugs to interfere with the action of the protein. A molecule that fits the active site of an enzyme, but cannot be released by the enzyme, will inactivate the enzyme. This is the basis of new drug-discovery tools, which aim to find new drugs to inactivate proteins involved in disease. As genetic differences among individuals are found, researchers expect to use these techniques to develop personalized drugs that are more effective for the individual.
A computer technique which attempts to fit millions of small molecules to the three-dimensional structure of a protein is called "virtual ligand screening". The computer rates the quality of the fit to various sites in the protein, with the goal of either enhancing or disabling the function of the protein, depending on its function in the cell. A good example of this is the identification of new drugs to target and inactivate the HIV-1 protease. The HIV-1 protease is an enzyme that cleaves a very large HIV protein into smaller, functional proteins. The virus cannot survive without this enzyme; therefore, it is one of the most effective protein targets for killing HIV.
There are many distributed computing programs, such as the world community grid, which allows people around the world to help scientists by computing calculations. The software adds to the use of super computers by using the unused processing power of millions of home computers. The world community grid works on HIV, cancer, and protein folding. All three projects centre around protein modelling and protein modification models. Using the data gained from distributed computing models of proteins, scientists can develop more specific and effective therapies. In addition, most enzymes act as part of complexes and networks, which also affect the way an enzyme acts in a cell. Understanding these complex networks will assist in developing drugs that affect the function of these complexes.
Understanding the proteome, the structure and function of each protein and the complexities of protein-protein interactions will be critical for developing the most effective diagnostic techniques and disease treatments in the future.
An interesting use of proteomics is using specific protein biomarkers to diagnose disease. A number of techniques allow to test for proteins produced during a particular disease, which helps to diagnose the disease quickly. Techniques include western blot, immunohistochemical staining, enzyme linked immunosorbent assay (ELISA) or mass spectrometry. The following are some of the diseases that have characteristic biomarkers that physicians can use for diagnosis:
- In Alzheimer’s disease, elevations in beta secretase creates amyloid/beta-protein, which causes plaque to build up in the patient's brain, which causes dementia. Targeting this enzyme decreases the amyloid/beta-protein and so slows the progression of the disease. A procedure to test for the increase in amyloid/beta-protein is immunohistochemical staining, in which antibodies bind to specific antigens or biological tissue of amyloid/beta-protein.
- Heart disease is commonly assessed using several key protein based biomarkers. Standard protein biomarkers for CVD include interleukin-6, interleukin-8, serum amyloid A protein, fibrinogen, and troponins. cTnI cardiac troponin I increases in concentration within 3 to 12 hours of initial cardiac injury and can be found elevated days after an acute myocardial infarction. A number of commercial antibody based assays as well as other methods are used in hospitals as primary tests for acute MI.
- Proteomic analysis of kidney cells and cancerous kidney cells is producing promising leads for biomarkers for renal cell carcinoma and developing assays to test for this disease. In kidney-related diseases, urine is a potential source for such biomarkers. Recently, it has been shown that the identification of urinary polypeptides as biomarkers of kidney-related diseases allows to diagnose the severity of the disease several months before the appearance of the pathology.Article
- Protein separation. Proteomic technologies rely on the ability to separate a complex mixture so that individual proteins are more easily processed with other techniques.
- Protein identification. Well-known methods include low-throughput sequencing through Edman degradation. Higher-throughput proteomic techniques are based on mass spectrometry, commonly peptide mass fingerprinting on simpler instruments, or De novo repeat detection sequencing on instruments capable of more than one round of mass spectrometry. PEAKS has proven to be the popular leader for de novo sequencing while simultaneously utilizing peptide mass fingerprinting for protein identification. Antibody-based assays can also be used, but are unique to one sequence motif.
- Protein quantification. Gel-based methods are used, including differential staining of gels with fluorescent dyes (difference gel electrophoresis). Gel-free methods include various tagging or chemical modification methods, such as isotope-coded affinity tags (ICATs), metal coded affinity tags (MeCATs) or combined fractional diagonal chromatography (COFRADIC). In metabolic labeling cells incorporate heavy stable isotopes present in their growth media (e.g. stable isotope labeling with amino acids in cell culture or SILAC). Modern day gel electrophoresis research often leverages software-based image analysis tools primarily to analyze bio-markers by quantifying individual, as well as showing the separation between one or more protein "spots" on a scanned image of a 2-DE product. Additionally, these tools match spots between gels of similar samples to show, for example, proteomic differences between early and advanced stages of an illness.
- Protein sequence analysis is a branch of bioinformatics that deals with searching databases for possible protein or peptide matches by algorithms such as PEAKS(software), OMSSA, SEQUEST and X!Tandem, functional assignment of domains, prediction of function from sequence, and evolutionary relationships of proteins.
- Structural proteomics concerns the high-throughput determination of protein structures in three-dimensional space. Common methods are x-ray crystallography and NMR spectroscopy.
- Interaction proteomics concerns the investigation of protein interactions on the atomic, molecular and cellular levels. see related article on Protein-protein interaction prediction.
- Protein modification studies the modified forms of proteins. Almost all proteins are modified from their pure translated amino-acid sequence, by so-called post-translational modification. Specialized methods have been developed to study phosporylation (phosphoproteomics) and glycosylation (glycoproteomics).
- Cellular proteomics is a new branch of proteomics aiming to map the location of proteins and protein-protein interactions in whole cells during key cell events. Centers around the use of techniques such as X-ray Tomography and optical fluorescence microscopy.
- Experimental bioinformatics is a branch of bioinformatics, as it is applied in proteomics, coined by Mathias Mann. It involves the mutual design of experimental and bioinformatics methods to create (extract) new types of information from proteomics experiments.
Proteomics uses various technologies:
- One- and two-dimensional gel electrophoresis is used to identify the relative mass of a protein and its isoelectric point.
- X-ray crystallography and nuclear magnetic resonance are used to characterize the three-dimensional structure of peptides and proteins. However, low-resolution techniques such as circular dichroism, Fourier transform infrared spectroscopy and Small angle X-ray scattering (SAXS) can be used to study the secondary structure of proteins.
- Tandem mass spectrometry combined with reverse phase chromatography or 2-D electrophoresis is used to identify by database search tools such as PEAKS(software), OMSSA, X!Tandem and SEQUEST or de novo algorithms and quantify all the levels of proteins found in cells.
- Mass spectrometry (no-tandem), often MALDI-TOF, is used to identify proteins by peptide mass fingerprinting. This technology is also used in so-called "MALDI-TOF MS protein profiling" where samples (i.e. serum) are prepared by either protein chips (SELDI-TOF MS), magnetic beads (The Bruker Daltonics protein profiling platform) or with other methods of sample treatment, such as liquid chromatography, size-exclusion and immunoaffinity. Protein peaks of interest must be identified by tandem mass spectrometry. Protein profiling with MALDI-TOF MS could be of high use in clinical diagnostics, but so far there has been little success with advancing MALDI-TOF MS protein profiling into clinical validation due to high analytical variation.
- Affinity chromatography, yeast two hybrid techniques, fluorescence resonance energy transfer (FRET), and Surface Plasmon Resonance (SPR) are used to identify protein-protein and protein-DNA binding reactions.
- X-ray Tomography used to determine the location of labeled proteins or protein complexes in an intact cell. Frequently correlated with images of cells from light based microscopes.
- Software based image analysis is utilized to automate the quantification and detection of spots within and among gels samples. While this technology is widely utilized, the intelligence has not been perfected yet. For example, the leading software tools in this area tend to agree on the analysis of well-defined, well-separated protein spots, but they deliver different results and tendencies with less-defined less-separated spots - thus necessitating manual verification of results.
- List of omics topics in biology
- systems biology
- Belhajjame, K. et al. Proteome Data Integration: Characteristics and Challenges. Proceedings of the UK e-Science All Hands Meeting, ISBN 1-904425-53-4, September 2005, Nottingham, UK.
- Twyman, R. M. 2004. Principles of proteomics. BIOS Scientific Publishers, New York. ISBN 1-85996-273-4.(covers almost all branches of proteomics)
- Westermeier, R. and T. Naven. 2002. Proteomics in practice: a laboratory manual of proteome analysis. Wiley-VCH, Weinheim. ISBN 3-527-30354-5.(focused on 2D-gels, good on detail)
- Liebler, D. C. 2002. Introduction to proteomics: tools for the new biology. Humana Press, Totowa, NJ. ISBN 0-585-41879-9 (electronic, on Netlibrary?), ISBN 0-89603-991-9 hardback, ISBN 0-89603-992-7 paperback.
- Wilkins MR, Williams KL, Appel RD, Hochstrasser DF. Proteome research: new frontiers in functional genomics. Berlin Heidelberg, Springer Verlag; 1997, ISBN 3-540-62753-7.
- Arora, Pankaj S., et al. (2005). "Comparative evaluation of two two-dimensional gel electrophoresis image analysis software applications using synovial fluids from patients with joint disease". Journal of Orthopaedic Science 10 (2): 160-166. 
- Rediscovering Biology Online Textbook. Unit 2 Proteins and Proteomics. 1997-2006.
- Weaver. R.F. Molecular Biology. Third Edition. The McGraw-Hill Companies Inc. 2005. pgs 840-849.
- Campbell and Reece. Biology. Sixth Edition. Pearson Education Inc. 2002. pg 392-393.
- Hye A, Lynham S, Thambisetty M, et al. " Proteome-based plasma biomarkers for Alzheimer's disease." Brain 129: 3042-3050, (2006).
- Perroud B, Lee J, Valkova N, et al. "Pathway Analysis of Kidney Cancer Using Proteomics and *Metabolic Profiling." Biomed Central: 65-82, (24 November 2006).
- Macaulay IC, Carr P, Gusnanto A, et al. "Platelet Genomics and Proteomics in Human Health and Disease." The Journal of Clinical Investigation 115: 3370-3377, (December 2005).
- Rogers MA, Clarke P, Noble J, et al. "Proteomic Profiling of Urinary Proteins in Renal Cancer by Surface Enhanced Laser Desorption Ionization, and Neural-Network Analysis: Identification of Key Issues Affecting Clinical Potential Utility." Cancer Research 63: 6971-6983, (15 October 2003).
- Vasan RS. “Biomarkers of cardiovascular disease: molecular basis and practical considerations” Circulation. 2006;113:2335-2362.
- “Myocardial Infaction”. http://medlib.med.utah.edu/WebPath/TUTORIAL/MYOCARD/MYOCARD.html (Retrieved 29 Nov 2006)
- World Community Grid. http://www.worldcommunitygrid.org (Retrieved 29 Nov 2006)
- Introduction to Antibodies - Enzyme-Linked Immunosorbent Assay (ELISA). http://www.chemicon.com/resource/ANT101/a2C.asp. (Retrieved 29 Nov 2006)
- Decramer S et al "Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis" Nature Medicine 2006; 12:398-400 Article
- deltaMasses - Post-translational modification detection after mass spectrometry
- Journal of Proteome Research - Peer-reviewed research from the American Chemical Society
- PROTEOMICS podcast - The free podcast keeps you up to date with recent research results
- YRC Public Data Repository - Public data repository of experimental mass spectrometry, yeast two-hybrid, localization, protein structure prediction and large-scale protein complex prediction data from many organisms.
- Exhaustive proteomics software, tools and databases list curated by hand.
- GPMDB is a database of results obtained from protein identification experiments.
- ProteomeCommons.org is a site with news, links, data and code for proteomics.
- PRIDE (PRoteomics IDEntifications database) is a centralized, standards compliant, public data repository for proteomics data.
- CPRMap - Clinical Proteomics Research Map
- Introduction to Proteomics - An interactive web feature that explains how proteins are sequenced and identified.
- MIT - Reduction in the number of human genes from previous estimates.
- Proteomic World - Resources for proteomics research.
- Human Proteinpedia - portal for sharing and integration of human protein data.
- Human Protein Reference Database - a manually curated human protein database from literature allowing users to add data by using an annotation system called Human Proteinpedia.
- Yeast GFP Localization Database - Database of microscope images and quantitation for most of the yeast proteome.
- Sashimi is a SourceForge repository for open source proteomic software.
- PRIDE proteome database is a public repository for digital content relating to proteomics.
- IntAct Interaction Database is a public repository for manually curated molecular interaction data from literature.
- BioGRID: A General Repository for Interaction Datasets, is a public repository for manually curated molecular interaction data from literature.
- CPAS Fred Hutchinson Cancer Research Center open source software platform for processing and mining proteomics data, a development funded by the FHCRC, National Cancer Institute, and the Canary Foundation.
- msInspect FHCRC developed open source tool for mining LC-MS data, including peptide detection and quantitation.
- The 2006 Report from the National Academy of Sciences entitled Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health, free to read and research online.
- Human Protein Atlas
- deltaMasses denovo detection of protein modifications after high-accuracy mass spectrometry
- Proteomics Research Resource for Integrative Biology A resource for data, software tools, and other proteomics information.
- Resource Center for Biodefense Proteomics Research A searchable proteomics data resource.
- Expert Review of Proteomics. Peer-reviewed journal published by Future Science Group
- A Protocol for Large Scale Vitreous Fluid Proteomics using 1D SDS-PAGE, Tandem Mass Spectrometry, and a Relational Database