Systems Biology: Leveraging Biological Complexity and Computational Power

By Vicky Auyeung

When studying how a bicycle works, we can break it down into its composite parts (e.g. the gears, brakes and wheels) and study each part in isolation. But although we would become experts on the parts, the gears, brakes, or wheels, we would still need to integrate knowledge across these parts to understand how the whole bicycle system works, or how a change in one part of this system would affect other parts. This rationale can be applied similarly to the study of disease mechanisms. While early studies in molecular biology focused on individual genetic mutations and their consequences, the recent availability of large-scale ‘omics’ datasets allow us to study these pathways holistically, at the systems biology level.

In the past, the techniques available to measure and analyse data were not as good, limiting scientists to the study of a handful of molecules. But in recent years, exponential increases in computing power coupled with falling costs have enabled large-scale measurement of hundreds, or even hundreds of thousands, of molecules. ‘Omics’ refers to this large-scale measurement and is added as a suffix to the molecule being measured. Examples that form the core components of disease are genomic variation (‘genomics’), gene expression (‘transcriptomics’), proteins (‘proteomics’) or biochemicals (‘metabolomics’). In recent years, scientists have moved from studying one type of dataset (‘single omics’) to integrating multiple at a time (‘trans-omics’). This is because of the complex nature of most diseases, which are caused by the combination of molecular pathways across datasets.

While traditional molecular biology approaches focus on individual genetic mutations and their consequences, omics focuses on relationships between molecules within each layer. The development of technological profiling methods (‘Measurement’ column) for different layers make a systems biology approach using integrated omics (trans-omics) possible. NGS = next generation sequencing, NMR = nuclear magnetic resonance spectroscopy. Figure is taken from Yugi et al. (2016).

Though omics analysis is still young, the benefits are already apparent in the field of personalized medicine, where studies have helped identify new mechanisms of disease. One example is the study of type 2 diabetes, a disease characterised by the dysregulation of blood sugar. Research has shown that people with type 2 diabetes have higher levels of certain biochemicals such as long-chain fatty acids and amino acids in the blood. From this evidence alone, it might be tempting to conclude that these molecules reflect a diet high in fats and red meat. However, this is only partly the truth, as the addition of genetic data has also shown that these biochemicals also reflect a higher genetic tendency to develop insulin resistance and deposit excess fat in the stomach area.

Another benefit of trans-omics is that omics datasets can be used to create personalized strategies that improve overall health. In another study, researchers collected genetic, metabolic, protein and gut bacterial data from 108 participants every three months for nine months. After each collection, they gave participants personalized lifestyle advice based on their biological data. The study was a two-way success: the researchers were able to identify molecular pathways linked to key biomarkers of disease, and the lifestyle advice helped participants reduce levels of key inflammatory biomarkers and markers of cardiovascular and metabolic diseases such as total cholesterol.

Data alone is not enough

Although omics provides many opportunities, it also brings new challenges. One is ensuring the robustness and accuracy of measurements. For example, whole genome sequencing gives highly reproducible results, but data quality can vary by the scope of measurement (‘coverage’) and number of times the same genetic sequence is read (‘depth’). In another example, protein abundances detected in a sample can vary greatly depending on the technology used to measure them. Therefore, studies must be carefully designed to allow for the detection and correction of technical and human errors and biases.

After data collection and processing comes data analysis. When used across thousands of variables within and between datasets, basic statistical tests like t-tests and ANOVAs are more likely to produce significant results by chance and become unreliable. Therefore, tackling these larger datasets requires more sophisticated statistical methods. Such methods have been developed to address a wide range of analytical challenges and are currently sufficient for handling omics datasets. However, future technological advances will likely result in the discovery of even more molecules (for example, the largest metabolomics study to date detects 1,400 biochemicals in human blood, while over one million are estimated to exist in the human metabolome!).

Though systems biology is not a new concept, recent technological advances have transformed it into a discipline that combines increasingly complex biological datasets with computational and statistical modeling. This increasing complexity will eventually warrant the use of larger-scale hands-off statistical and machine learning methods to identify and validate novel mechanisms in disease.

Further reading:

1.Yugi K, Kubota H, Hatano A, Kuroda S. Trans-Omics: How To Reconstruct Biochemical Networks Across Multiple “Omic” Layers. Vol. 34, Trends in Biotechnology. Elsevier Ltd; 2016. p. 276–90.

2.Price ND, Magis AT, Earls JC, Glusman G, Levy R, Lausted C, et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat Biotechnol. 2017 Aug 1;35(8):747–56.

About the author

Vicky Auyeung is a PhD student in Metabolic and Cardiovascular Disease at the University of Cambridge. Her work focuses on integrating genomic, metabolomic and phenotypic data to identify shared mechanisms between rare, inherited metabolic disorders and complex diseases.

If you want to learn more about using R or Python for analysing your results the Biochemical Society are running online training courses ‘R for Biochemists 101‘ and ‘Practical Python for beginners: a Biochemist’s guide‘.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s