The “streetlight effect” in proteomics


An underlying problem in proteomics research

Our current understanding of human, animal and plant biology is largely derived from knowledge provided by the study of the DNA code. However, this code is only one of the constituent elements of the central dogma of biology. DNA must be read and converted into proteins, the “workhorses” of the cell, responsible for coordinating and carrying out specific functions.

The introduction of high-throughput technologies, bioinformatics tools and methods based on artificial intelligence (AI) has advanced the field of proteomics during the last years. Although not yet “in the clinic” so to speak, the study of proteins expressed in healthy and diseased states guides the development of diagnostic biomarkers, the identification of drug targets and the production of new biopharmaceuticals. Throughout the life sciences, the applications of proteomics are many and varied.

The Human Proteome Project (HPP), which aims to generate a map of the protein-based molecular architecture of the human body, discovered 93.2% of the human proteomeidentifying 18,407 proteins.

“Proteomics has evolved from an isolated field to a comprehensive tool for biological research that can be used to explain biological functions” – write Yahui Liu et al.

The future of proteomics is undoubtedly bright. However, a comment published article in Natural methods by Kustatscher et al. earlier this year drew attention to an underlying problem in the field: Some proteins attract more research attention than others.

The publication indicates that approximately 500 proteins (approximately 25% of the human proteome) account for 95% of all life science publications. Most of these proteins were already known to the scientific community in the pre-human era of the Genome Project. Tumor protein 53 (p53), sometimes referred to as the “guardian of the genome” because of its role in DNA repair and cell division, is one of the most studied proteins. “One of the many frightening statistics revealed is the fact that p53 is being posted 2 times a day,” says Professor Kathryn Lilleyprofessor of cellular dynamics at the University of Cambridge and co-author of the publication.

Why does this annotation bias exist?

This inequality in protein annotation is due to a variety of different factors, Lilley explains: “First, there are practical reasons why a protein may remain unannotated. This could be because it is expressed at low levels and therefore rarely “measured” in an experiment.

Extremely small proteins, or those with certain properties (such as being hydrophobic) can prove difficult for even the most sophisticated analytical technologies. Some proteins can adopt unstable states that are present for a fraction of a second but play biological roles keys – known as “ephemeral proteins”, which are probably overlooked in most studies.

“It may be that its correspondent and the gene or transcript do not show up as ‘interesting/significant’ in genomic studies, or may not be associated with any disease state. Additionally, the protein may be unlike any other protein in terms of likely domain structure, well-documented motifs, or clear evolutionary trajectory,” says Lilley.

She describes impractical reasons as “less acceptable” to her: “There is safety in numbers in scientific research. If a protein is well studied, there may be more resources available that can be shared among different groups. If a protein is perceived to be of high interest by the scientific community, there is a greater chance that research results will be published through high-impact mechanisms, resulting in high citation and, therefore, greater opportunity for continued funding.

This cycle may not be unique to the field of proteomics and addresses broader questions within scientific research. But in this case, it fuels what Lilley calls a “self-perpetuating microcosm of the well-studied proteome” at the expense of risk-taking.

“When studies uncover sets of proteins that require further investigation, it is frustrating to scan the literature to find that historically these proteins have been overlooked, with many simply having no significant interest to pursue. , not fashionable enough to attract funding, or generally considered to be a bit ‘boring’”, – Lilley.

Why are understudied proteins problematic?

The bias towards well-studied proteins inhibits our knowledge of cellular function, dysfunction, and ultimately hinders progress in life science research. “The little-studied proteome contains many examples of proteins essential for proliferation, a key cellular process, whose aberrant function underlies many diseases, with cancer being most relevant in many research avenues. This bias will extend to most cellular processes, and so without functional annotation of this subset of proteins, we will have little or no chance of fully understanding how cells function.

Many drugs used to treat human disease target proteins. The data of the DrugBank database suggests that the entire collection of US Food and Drug Administration (FDA)-approved drugs targets a total of 620 proteins, including transporters, enzymes, ion channels and receptors. “The understudied proteome contains a considerable number [of proteins] who should be drugged,” says Lilley.

To create a new drug, different stages of preclinical and clinical development are necessary. Laboratory research and preclinical trials rely on models that allow scientists to interrogate drug function in vitro and alive. However, if our basic knowledge of cellular mechanisms is flawed, our models may be flawed as well. “Knowing the function and role in disease of this considerable subset of the proteome may lead to a step change in drug discovery in the future,” notes Lilley.

The Understudied Proteins Initiative

Kustatscher and his colleagues have shed light on the magnitude of the problem – but how do you deal with it? A change is clearly needed in proteomics approaches to end the perpetual cycle. The Understudied Proteins Initiativea novel Welcome Trustfunded by Kustatscher et al., offers a solution: a coordinated effort by the functional proteomics community. The initiative suggests that enough data be collected about an understudied protein – perhaps about its interactions, location or expression – so that hypotheses about its function can be made. “In an ideal world, researchers could perform systems-level functional testing, where each protein is tested for a specific function. A good example of this is testing whether or not a protein binds to RNA. There are many routine methods to achieve such a feature screen which can also be applied in many conditions; some proteins can only bind to RNA under certain circumstances,” says Lilley.

Using these functional data, it would be easier to then clarify which area or laboratory is best suited to conduct further detailed studies of this protein. Essentially, the task is divided into two parts: large-scale pre-characterization by omics scientists, followed by targeted molecular biology studies. “More system-wide studies will require agreement on the biological system, sets of tested conditions, sharing of resources, and a holistic set of methods to ‘push and poke’ the understudied proteome,” says Lilley. . “What will be particularly essential will be data sharing, curation, database integration and the creation of dynamic cellular models. Drawing on resources such as MuSIC 1.0, a hierarchical map of the cell from the Ideker laboratorybeing a very good starting point.

She continues: “As a word of warning, however, the task at hand is almost incalculable in size. We have not yet adequately calculated the size of the proteome. If one takes into consideration the number of proteoforms that may exist, in other words, the number of chemical entities distinct by post-transcriptional and post-translational processing and the probable combinatorial nature of this processing, the size of the proteome increases by several orders of magnitude. .”

No matter how big the challenge is, a start has to be made somewhere. The Understudied Proteins Initiative posted an open invitation to the researchers, outlining his “roadmap” for the project. A freely accessible survey was first launched, which features a randomly chosen human protein and asks the user to assign it an annotation level. Next, the survey asks the user to describe the tools, resources, and considerations they would suggest for this assessment.

“Based on the survey responses, we aim to define the challenge of a community effort to combat protein annotation bias. We will present and discuss the results in a workshop,” the leaders of the ‘initiative State. The main questions to be addressed during the workshop are:

  • What new information about an uncharacterized protein would trigger detailed mechanistic studies?
  • What tool(s) would provide this information?
  • How could a consortium be structured?
  • How would information effectively reach molecular biologists to drive change?

To take part

Some of science’s greatest triumphs have been based on taking a potential risk. It seems imperative – arguably now more than ever – that researchers feel confident and comfortable pursuing studies of lesser known or understood proteins, regardless of the anticipated analytical challenge or the perception that the protein is “dull”. “. Who knows what we might find – perhaps solutions to some of the toughest scientific puzzles of our time?

The Understudied Proteins Initiative is leading the way and encouraging the community to get involved by participating in the survey and spreading the word.

“By providing basic molecular characterization of all proteins, the Understudied Proteins Initiative will catalyze mechanistic investigations of understudied proteins, stimulate new biomedical research, and enhance our understanding of the human proteome and its role in disease”, – The Understudied Proteins Initiative.


Comments are closed.