Fifteen years and two months ago, humanity knew where it stood in the macrocosm of life.  An issue of the journal Nature, published in February 2001, carried a scientific paper describing the near-complete sequence of letters making up the DNA, or the genetic material of modern humans. This sequencing of the human “genome” by an international consortium of scientists was the culmination of fifteen years of political, legal, ethical and scientific intrigue, cost us more than a billion US dollars, and marked the birth of big-science in biology. We had indeed taken a big stride forward in definitively knowing where we came from and predicting where we might end up.

Fast forward seven years to the end of 2007, when a meeting report from Cambridge, UK, announced that an international consortium of scientists will sequence the genomes of a whopping 1,000 humans! The idea was to catalogue the pace of genetic variations found across human populations. If it took 15 years and a billion dollars to sequence the first human genome, imagine the scale of effort and money that would be needed to sequence a thousand. Were these scientists being delusional? Clearly not: the effort, which began in 2008, was declared complete in 2015. The genomes of 1,000 humans from across the world were sequenced and studied in a matter of seven years, and at a cost that was about a hundredth of the pioneer human genome project’s budget.

Since the mid 1990s, the cost of sequencing DNA had been decreasing at a rate approximating Moore’s law: the cost more-or-less halved every other year. In the middle of the Noughties however, sequencing costs started deviating from Moore’s law, declining a lot faster than expected.  This sudden switch corresponded to a major turning point in the recent history of genetics: the development of what is now referred to as next-generation sequencing. Today, a genetic book that is a million letters long can be read for a hundredth of a dollar, compared to the corresponding number of $5000 15 years ago. Had the cost curve not deviated from Moore’s law in the mid-2000s, we would still be spending over $50 to read a million letters of our DNA. This technological revolution not only decreased the cost of sequencing, but also brought down the time it took to sequence a human genome. Today, one can sequence, using a single next-generation sequencing machine, tens to hundreds of human genomes in a matter of days or even hours. In fact, large genome sequencing projects aiming to associate genetic signatures with disease by sequencing hundreds of human genomes are becoming common today.

The success of the 1,000 genome project has prompted adventurous scientists to attempt to sequence the genomes of 10,000 vertebrates to help us understand the evolution of animal life. After all, evolution, which is central to life as we know it, is built in our DNA and the more genomes we read, the better we understand life.

Curiously, next-generation technologies that read static genome sequences can help us study dynamic physiological processes as well. Living organisms are made of cells, and a cell is essentially a factory consuming and producing a massive array of chemicals that determine its function. The nature of chemicals present in a cell changes with time and place, though its genome sequence itself may not. And genomics can be used to measure these changes, across thousands of chemical entities at once. This marks a departure from the more traditional molecular biology, which can measure these dynamics only one molecule at a time. Scientists who pursue research into genomes are ambitious people, and they came up with the ENCODE project in the mid 2000s. The ENCODE project aimed to perform the equivalent of several thousand human genome projects to describe molecular events that help interpret our genome sequence into processes that give life to the cell. Its findings were reported in the journal Nature in September 2012. Though it is believed in many quarters that these scientists might have botched up their analysis of these data, resulting in misleading conclusions, the monumental effort and coordination put into the project is astounding.

*** 

B

iological sciences have for the most part been an agglomeration of many pieces of small-science: individual laboratories pursuing independent research goals, each costing fairly small packets of money. The birth of molecular biology, genetic engineering and the modern day buzz-phrase gene editing, came about largely by small-science. But the field of genomics has brought big-science to the front line. It can be reasonably argued that nearly all modern-day small-science in molecular biology is affected at some level by data produced by large genome sequencing projects. Some argue however that the attraction of fame brought by the execution of a grand science project has resulted in the perverse extrapolation of big-science to inappropriate subfields of biology. In other words big-science today includes efforts which are little more than bombastic posturing. There are fears that small-science might fall prey to the goliathical big-science.

The prominent biochemist Gregory Petsko, in an article published in the journal EMBO Reports in 2009,  criticised the then-fashionable big-science project attempting to work out the molecular structures of thousands of proteins. He referred to this effort as one which “leaches scarce resources away from individual projects”, one whose “value for training young scientists is nil”, and one which claims to develop technologies “of limited use to the practising scientist”. Do these arguments hold in the context of genomics?

Big-science projects are expensive. Many of these projects are coordinated by large centres. For example, the Broad Institute at MIT, USA and the Sanger Institute near Cambridge, UK are genome sequencing centres of long-standing prominence. Science funding is scarce, and it becomes a problem for many small, innovative laboratories if a significant chunk of this small pot of money is diverted to a few people at centres for big-science. Small-science is important because big-science cannot—in the foreseeable future—find enough depth to enable us understand the intricacies of life. However, big-science can provide an invaluable substrate on which small science can thrive better than ever.

Whether big-science in biology helps train young scientists well or not raises the bugbear of finding out what qualifies as good training. If good training in science is all about making a young person learn to develop hypotheses, test them rigorously, write up a report on her / his conclusions and get it through the process of rigorous peer-review, big-science may not tick all boxes. But there are places where big-science scores and this includes significant method development—on par with small science of the right vintage—and the logistics of managing and handling large data. The latter will be seen as valuable by those who take the view that not all young trainees in science can or need to pursue a full career in academic research.

Big-science, of the right kind, is one that, in Petsko’s words, “supports and generates lots of good little science.” There is little doubt that the human genome project has been the right kind of big-science. It has, directly or indirectly, produced data that is used pretty much every day in molecular biology laboratories across the world. It has spawned technologies that brought genome sequencing out of the confines of large centres to within the ambit of small laboratories, enabling crowd-sourcing of thousands of independent small science data points to create unintended big outcomes. It is fair to say that what qualified as big-science 15 years ago, has now become a cog in the wheel of a small-science effort.

Thanks to the human genome project and the ecosystem of technological development it encouraged, the web of genomics has spread well beyond the confines of human genetic variation.  Large sequencing centres and even individual laboratories now routinely sequence hundreds of bacterial genomes during the course of a single study to track the spread of global pandemics. Laboratories across the world, including ours, often use next-generation sequencing of hundreds of bacterial genomes at once to observe evolution—including the development of antibiotic resistance—happen in real time, and to understand how the genetic variations that evolution produces generate novel cellular processes.

It was in 2011 that an epidemic of bacterial food poisoning spread rapidly across Germany, killing 50 people. The source of this epidemic was probably some Egyptian “organic fenugreek”, carrying a deadly form of E. coli. Within a few weeks of the detection of this infection, a group of scientists had obtained a near complete genome of the causative E. coli., and revealed its novel genetic makeup. This may not have quite helped contain the infection or treat it, but was a proof that genome sequencing can now be done in clinically-relevant timescales.

An even more dramatic development came about a year later, when scientists in the UK sequenced the genomes of several isolates of a bacterium that was causing havoc in a hospital, and used this data to trace the origin of this bug to a hospital worker. Thanks to this finding, this person could be quarantined, thus containing the spread of the infection.

Today, we even have a USB stick-sized DNA sequencer that can be carried to the remote site of an epidemic, hooked up to a laptop, and used to sequence DNA extracted with minimal care in a matter of hours. This could aid epidemiology as well as diagnostics at an unprecedented scale. Clearly, big science has made it to the bedside.

 ***

S

equencing can also illuminate history. Genome sequencing has helped trace the origins of the causative agent of modern day plague to the bacterium responsible for the epoch-defining 14th century Black Death. Scientists now sequence genomes of fossilised humans and their relatives to trace our ancient ancestry. At the beginning of this year, lower-resolution techniques that exploit publicly available human genome sequence data were used by three scientists from Kolkata to trace the beginnings of caste-limited endogamy to the Gupta period of Indian history. This provided much needed genetic support to the most accepted version of the ideology-loaded historical discourse.

Thus, the human genome project was transformative, not only in helping debunk narcissistic theories of human uniqueness, but also in enabling many vignettes of small-science of interest not only to academic research but also to medicine and public curiosity.

On the other hand, has the more recent and much maligned ENCODE project made small-science better? The jury is out to decide, but the complexity of cellular physiology—unlike the largely static form of a genome sequence—is such that a coordinated, predetermined project such as ENCODE can hardly do justice to it. In the words of Michael Eisen, a leading scientist, open access advocate and in fact one of the advisors for the ENCODE project, the project makes sense only if  “someone—or really a committee of someones—who has no idea about my work can predict (sic) precisely the data that I would need and generated it for me”,  years ahead of time. This sounds rather far-fetched.

While the debate for and against big-science will be waged for some time to come, the situation in India seems agnostic to these developments. There are hardly any large sequencing centres of a scale comparable to those in the US, Europe, China or Japan. Funding, even if frustratingly and endearingly chaotic, is available in plenty when normalised to the size of our research community. It seems more favourable to small-science, though a few relatively large projects attempting to find genetic associations for certain diseases have been recently funded.

The right kind of big-science may enable lots of small-science, but big-science cannot but be built on the edifice of a large body of small-science. The human genome project itself might have lasted about 15 years. But that it became reality rested on over 150 years of eclectic small-science, both fundamental and that driven by a need to solve medical problems. In the middle of the 19th century, towards the end of the age of the polymaths and the philosopher scientists of Europe, Gregor Mendel, a monk, performed breeding experiments with pea plants. Thanks to luck and his astute observation, he noted the “particulate” nature of inheritance. In other words, he showed that each trait of an organism—say the colour or shape of a pea—was governed by a distinct, discrete part of its genetic material. His contemporary, Charles Darwin, went around the world in a ship, and like a true naturalist made observations on many life-forms. He was astounded by the variation in traits that he saw among related animals and birds, and postulated the theory of natural selection and evolution.

The work of Mendel, initially ignored and later rediscovered in 1900, and that of Darwin sparked an urgent thirst to identify the genetic material. A hundred years ago, a financially-strained London medical man called Frederick Twort was —for reasons best known to himself—trying to discover viruses that do not cause disease. He ended up unveiling bacteriophages, or viruses that prey on bacteria. Notwithstanding a controversy surrounding primacy to the claim of being the discoverer of the bacteriophage, Twort’s labours heralded the beginnings of molecular biology. Resting on the shoulders of a series of studies on the genetics of bacteria and bacteriophage, Alfred Hershey and Martha Chase showed in the 1950s that it is its DNA that enables bacteriophage reproduction. In the late 1920s, an epidemiologist called Frederick Griffith had observed that mixing virulent but dead pneumonia-causing bacteria with their benign relatives resulted in the latter gaining the ability to cause disease. This paved the way for Oswald Avery, Colin MacLeod and Maclyn McCarty to show, in the 1940s, that the factor that caused this transformation was DNA. Together, bacteria and bacteriophage—alongside odd fungi and fruit flies—helped establish that DNA was the genetic material holding the secrets of inheritance.

The stage was set for Rosalind Franklin, James  Watson and Francis Crick to show in 1953, what DNA looked like in molecular terms and show how it might be replicated faithfully, an essential requirement for reproduction. It became clear that DNA was a book written in a language made of four alphabets. To unearth the secrets of our genes, all we had to do was to read this book. Easier said than done. Enter Frederick Sanger, the winner of two Nobel Prizes in Chemistry and a latter day home gardener, who taught the world how to read the language of our genes in the 1970s. Again, bacteriophages played a critical role in this, for the first few complete genetic materials to be read were those of bacteriophages and other viruses. The genetic book of a bacteriophage is a few thousand characters long, while that of humans is a string of a few billion letters. Not easy to scale up. Then came a man called Leroy Hood, who took Sanger’s method of sequencing DNA and automated it. That gave a few humans the confidence to go ahead and try to read our own genetic book cover to cover.

In 1986, the US Department of Energy coordinated a meeting at the Santa Fe Institute, at which the decision to seek funding to sequence the human genome—to attempt the first big-science project in molecular biology—was taken. The financial authorities of the US government, possibly lured by the splash that a mega project of this nature would make, and the National Institutes of Health came on board, and soon the project was underway. Many technical challenges had to be met, including those that had little to do with pipettes and beakers and everything to do with computers, and these were handled with ingenuity. Along the way, the genomes of several bacteria including that of the famous E. coli. were sequenced, alongside those of  baker’s yeast, transparent worms and fruit flies. Finally, the new millennium dawned and it was time to make the grand announcement that we finally knew ourselves. Over the course of this effort, the right to ownership of data was subsumed by an urgent need to make data openly accessible as soon as they were generated, and as genomics started to strain the scientific world’s data storage space with more and more open data, Big Data analytics became the next great thing to look forward to in biology.

To summarise, big-science in biology is here to stay. Its reach is immense, and its fruits trickle down to influence a lot of small, innovative science. An agglomeration of assorted independent small-science can produce great, unintended outcomes, not least making a revolutionary big-science project possible. Even if the demarcation between the two approaches to science has not quite bitten the Indian science establishment, it is likely that it will in the near future. How we manage to foster the right sort of big-science, while maintaining a thriving pool of small, independent science will eventually help determine our standing in the world of science.