Scientific activity 22 May 2020

Robert Penner on how curiosity and a little help from friends have led to his COVID initiative

Robert Penner is a mathematician whose early work in topology and geometry has found applications in high energy physics and, more recently, theoretical biology. He has held the René Thom Chair in Mathematical Biology at IHES since 2014, after having been a frequent visiting professor for decades.

In a paper entitled “Backbone Free Energy Estimator Applied to Viral Glycoproteins”[1] recently published in the Journal of Computational Biology, he proposes a method to predict promising targets for antiviral drugs or vaccines across all viruses. There is a sequel entitled “Conserved high free energy sites in human coronavirus spike glycoprotein backbones”[2] in the same journal, which applies these methods specifically to the known human coronaviruses, thus pushing forward the current efforts to fight SARS CoV-2, the virus causing COVID-19.

Such a timely result is a rare achievement in the life of a mathematician, and in this article Robert Penner recounts the adventurous steps and encounters that have led him here, along a sometimes tortuous, but very exciting path.

My first paper on RNA was published in 1992, co-authored with my close friend and onetime colleague Mike Waterman, sometimes called the “father of computational biology.” We would celebrate (bemoan?) the beginning of each academic year at USC with a deep-sea fishing trip, for it is late summer when the yellowtail tuna run in the warm waters off Southern California. Waiting for something to bite, he mentioned his recent work, which I immediately recognized as a kind of bastardized version of Poincaré duality. This led to our first paper on spaces of RNA secondary structures, which was well-received but had no major impact until much later. But this set the ball in motion, and he invited me over the next decades to any seminar he thought might be accessible and of interest to me. Some years later, we ran a private meeting at USC on macromolecules funded by the bio-philanthropist Peter Preuss, and among the star-studded attendees was Alexei Finkelstein, a leading world authority on protein who plays crucial subsequent roles. He and I instantly became friends. His book entitled “Protein Physics: a course of lectures,” written with his teacher Oleg Ptitsyn, is a masterpiece, and I devoured it.

Macromolecules–specifically, RNA and protein–were my gateway to biology, a separate and comprehensible piece of a dauntingly enormous puzzle. Macromolecules, after all, are essentially one-dimensional objects that interact along sites, just as the strings of high-energy physics do. And I immediately saw ways to extrapolate up 25 or so orders of magnitude from the Planck scale to the Angstrom scale the basic combinatorics of my earlier work in string theory. Once in a seminar at Caltech, the eminent physicist John Schwarz laughed out loud at my remark, because one of his great insights decades earlier was the same but different: strings were originally a model for protons whose combinatorics he had scaled down twenty-odd orders of magnitude to strings with exactly the same remark about invariance of combinatorics under rescaling. Down and up, up and down.

After nearly 25 years at USC, in the early 2000s I undertook a move to Aarhus, Denmark. Once my friend and colleague there, Joergen Andersen, and I were making dinner, and he asked if I had any crazy ideas for applied Teichmüller theory. I offered up two, one on color quantization and the other on the topology of proteins, the latter of which I had already hatched after the Preuss seminar. This evolved into our first paper on protein topology and later on protein geometry, basically for us the natural transition from the Z/2-graph connections of fatgraphs to
the SO(3)-graph connections we finally studied with a large team in Aarhus, including multiple academic departments, from molecular biology to biophysics to physics to nanotechnology and mathematics. It was actually during a visit of Alexeii Finkelstein to Aarhus, his last moments there sitting in the coffee lounge all together, that the passage to SO(3)-graph connections as protein descriptors came to light, following upon tools that Joergen and I had developed earlier. Alexeii and I apparently have a habit of making progress in the last seconds of our visits together…as will happen again.

This turned into a multi-year project leading to a kind of spectacular result. Proteins are basically–and over-simplistically–a concatenation of peptide groups, small units comprised of 6 atoms, forced to lie in a plane owing to quantum effects. Each such plane admits a canonical orientation that derives from the chemistry and contains a specified vector in the direction of the peptide bond it contains. Voilà: a peptide group gives a positively oriented orthonormal three-frame, so any ordered pair of such gives a well-defined rotation of three-dimensional space, or in other words an element of the Lie group SO(3). We took an unbiased and high-quality subset of the Protein Data Bank (PDB), the repository of all known three-dimensional protein structures, and computed the rotations of all the hydrogen bonds between peptide groups within it and found, quite remarkably, that Nature employs only about 33 percent of the volume of SO(3). Moreover, within that 33 percent, the data clusters into thirty well-defined regions, which reproduced, refined and extended the known classification of such hydrogen bonds. The results were sufficiently striking that the paper appeared in the prestigious journal Nature-Communications, a non-trivial feat coming, as it did, from outsiders to the field.

There things sat for about five years. I continued working in math/physics and on RNA, as this database of protein geometry just sat quietly in repose. I wanted to move from Aarhus because, as it turned out, I was not so good at socialism and grew tired of paying 108 percent marginal tax on my Danish income. No kidding!

Having visited the IHES on and off for decades, I jumped at the chance to call it my part-time and now full-time academic home, not the least of which would be the chance to interact with Misha Gromov, who had been a critical sounding board for me by email for years. We have both spent decades studying biology and attending seminars, and Paris is a treasure trove of biological talent just as it is for mathematics or physics. Discovering on arrival that I held the René Thom Chair in Mathematical Biology and with my understanding only of macromolecules, the first several quarterly visits to the IHES were spent reading and reading, thousands of pages of biology texts and then research papers.

Five or so years later, again enters Alexeii Finkelstein in mid-2019, since Misha and I had invited our common friend to spend a few weeks with us at the IHES. My own intentions were the selfish pursuit of trying to figure out what to do next with my protein clusters, and we spent several weeks without conclusion speaking of this among other things.

First thing in the mornings in France, I always start with a small regimen of exercises and calisthenics while watching the American PBS news from the previous evening. On one such morning during Alexeii’s visit, there happened to be a science segment with Anthony Fauci from NIH talking about the freshly-stated goal of finding a universal vaccine target for influenza, something about sexy new visualization methods and a remark about some protein or other, which a little online homework identified as hemagglutinin. I had one tool at my disposal, one stick with which to poke this protein, namely, I could run my method from Denmark and see which clusters occurred among its hydrogen bonds between peptide groups. Here I can only say that there was a lucky accident for one of the hydrogen bonds was incredibly rare: among the 1166165 bonds in the database, influenza hemagglutinin exhibited one bond from the cluster called B5e, the second-least populous with only 295 examples. This jumped off the page and showed how incredibly rare was this hydrogen bond in the universe of all hydrogen bonds between peptide groups in the whole PDB.

I showed Alexeii and Misha, and we discussed other aspects of this fascinating protein hemagglutinin. But it was not until the very last seconds of Alexeii’s visit, when he came to say goodbye–just like in Aarhus six or seven years before–that we at once said: the bond is so rare that if we can target it with a drug or vaccine, then such a drug or vaccine is unlikely to have side-effects! It was a shared eureka moment–less momentous than it seemed at the time I suppose–but nevertheless a good insight that brought to the forefront using the protein database of clusters to find vaccines. The train had left the station, as Alexeii left for Italy.

The first months of exploration were confused. I had only the clusters, so membership in a small one like B5e was obviously remarkable. I knew from the outset that there could be outliers in the bigger clusters which were equally so, but I had no sensible way to compare them. I nevertheless undertook awkwardly studying whole collections of viral glycoproteins with the same result: that B5e and a couple of the other small clusters typically occurred. A pattern was already emerging. Also, my first impression of remarkable hydrogen bonds, or exotic as I came to call them, was that they pinpointed places on the viral glycoprotein of extreme geometry, places that stuck out a lot and most especially stuck in a lot. This was not unreasonable since after all it had been geometry that pinpointed them. It was a fun if misguided enterprise, virus after virus, finding an exotic site and feeling a rush of gotcha! each time, like when you finally swat an annoying fly.

I was compiling a list of these exotic sites and planned a paper with a detailed analysis of influenza and a supplementary table of viral targets. Alexeii and I were back and forth online daily with him now back home in Puschino and a fellow named Sergiy Garbuzinskiy from his lab helping me with the analysis. Misha and I were in extended discussions on this every day. A joint paper by Alexeii and me was envisioned and even written entitled “Universal Influenza and Dengue Fever Targets.”

In the course of compiling the table to squash all viruses I could find on the PDB–though I was still learning which were the correct proteins and knew little, as one more example, I studied Rift Valley Fever Virus, RVFV, and found a signal stronger than ever. It was B5e again all right, but there was another measure–something we had called “stress” in an abandoned paper with the Danish group–which measured how rare was the given bond in its cluster. There was a hydrogen bond in RVFV more exotic by this measure than any I had seen before. A quick look online uncovered that there was an expert on RVFV right there in Paris, a fellow Pablo Guardado-Calvo at the Institut Pasteur, and I boldly wrote to him explaining my feeble understanding of things at the time and describing the exotic site I had discovered for RVFV. I was thrilled that he answered immediately even though he was at that moment on summer holiday, I suspect surprised that a mathematician had somehow targeted the RVFV fusion peptide with geometry. He made several excellent suggestions in response to my emails, as I worried about pestering him, ruining his holiday and poisoning our relationship. We made plans to meet upon his return to Paris.

Pablo came and spent the day at the IHES with us. It was fantastic for Misha and me, learning so much so quickly. And for Pablo, I think there was the curiosity about seeing what was this fabled place, the IHES. When Pablo left, Misha and I were positively struck with how great was this young man, how much he knew and we could learn from him. This was first of several visits of Pablo to the IHES and mine to Pasteur. We have become friends, and I owe him huge gratitude for all he has taught me.

Likewise with Alexeii and Sergiy was my learning curve steep and fun. By now, I understood that the abandoned Danish notion of stress gave a measure of the free energy using the Pohl-Finkelstein formalism that Alexeii and co-workers had first explained. I was so committed to the idea of clusters, however, and there still was lacking any sensible way to compare across clusters. Misha and I worked hard on this, how to sensibly combine Boltzmann distributions.

It was Sergiy who discovered that the site I had found for influenza was well known, called the fusion pocket. There was even a sticky antibody described in the literature by a fellow Jimmy Kwang and company out of Singapore, and the antibody gave 100 percent protection against infection in mice. I wrote to Kwang and his collaborators to ask why there had never been follow up, but they never responded. Pablo later explained that mice are not a good model for humans, and moreover the gurus of influenza in the states probably felt that other sites were more promising. This more or less killed the first paper since my universal site was the known fusion pocket, but it also gave a proof-of-concept for whatever it was I was finding with my still primitive methods.

I understood the basics of the Boltzmann distribution but had never really computed with it. So I turned to my colleague Thibault Damour, who works on gravitational waves and who indulged me to listen and explain things. He had me probe my clusters, only to find that the distribution of hydrogen bonds within them failed quite spectacularly to resemble a normal distribution. He taught me further details of Boltzmann distributions as I still struggled to figure out how to combine or compare them. It was a frustrating period.

With a eureka one morning, I awoke and saw that after all these years of living with the data in clusters, having learned which were large or small, other properties too, and some of their geography in SO(3), that they were entirely immaterial to the current circumstance. Indeed with the Danish group, we had computed a density on SO(3) itself, one big and beautiful density, no need to combine anything, just apply the Pohl-Finkelstein quasi Boltzmann Ansatz to the whole density! Thibault surely helped me come this understanding, and it was revolutionary enough that it took some convincing before Misha bought into it.

So now I was in business to compute and compute. It was great! I finally could look at the distribution of free energy across the entire database from PDB and plotted it. In fact, another dear friend for many decades, Greg McShane, a geometer who now enjoyed computer and stat studies of all sorts, had come to Paris from Grenoble to visit so that we could see The Cure at the Rock en Seine concert. He dove in and wrote codes for me that were crucial for the ongoing further analysis.

Having read and studied many texts and papers on viral glycoproteins, including supervision from Pablo on which PDB files to study, I was off and running. By now, I understood that high free energy targeted unstable sites, not geometrically significant ones, though the unstable ones are typically hidden from the immune system in caverns or troughs. I also had several examples of the different fusion mechanisms clear in my mind, and Pablo and I had a number of great meetings that cemented my understanding.

So the chemistry and math were perfect, and the biology absolutely clear. I had come to anthropomorphize viruses and could empathize with their search for the love of their lives. It became clear through this understanding why they would capitalize on hydrogen bonds in this pursuit. But the physics was still messed up: I could not resolve the overall energy distribution with the known energies of various motifs such as alpha helices. This was terribly troubling. If this was right, then everything must be perfect, and the physics just did not make sense. As Misha said at one point: if the physics is wrong, it is like having a beautiful meal before you but useless silverware.

There was still another conceptual hurdle to overcome, and Alexeii was frustrated with my inability to understand: the free energy is NOT that of the hydrogen bond itself, but rather that of the protein detail which it stabilizes. It is a subtle distinction and took me forever to comprehend.

With this final realization, all fell into place. The artificial manipulations I was trying in order to resolve the physics fell away, and all was perfect even giving an internal consistency check to the whole theory: the extreme energies in my distribution were exactly where they should be, just below the bounds of protein stability.

This led to the first paper in Journal of Computational Biology. The second paper, which will appear online in the next few days [3], applies these tools to the seven known coronaviruses diseases which afflict human beings, and in particular provides several sites of interest for vaccine/drug/test targets for the SARS-CoV-2 virus that causes COVID-19. Because of the lockdown in France and the consequent lack of interruptions, it has been 2 months of 12-15 hour days of work that has brought me to this moment.

It is exciting to feel involved. I also am fortunate to be so passionate about a project that I am able to pursue while in lockdown and distracted from the evident fears. I obviously hope that my sites will be useful for taming COVID-19, but only experiments can measure their utility, and quite rightly, a biologist should only care if it works.

The method has presumptive further applications throughout biology, of course to other viruses, but also in principle for example, to neurodegenerative diseases like Alzheimers, which involve inappropriate protein folding, and to cancer metastasis, which relies on cell motility–really in any context where proteins change their backbone geometry using hydrogen bonds.

With these many other potential applications of my methods across biology, I hope to recruit others to employ this new tool. Most good ideas don’t work, but this seems to be one that may.

[1] Robert C. Penner. Backbone Free Energy Estimator Applied to Viral Glycoproteins, Journal of Computational Biology https://doi.org/10.1089/cmb.2020.0120

[2] Robert C. Penner. Conserved High Free Energy Sites in Human Coronavirus Spike Glycoprotein Backbones. Journal of Computational Biology https://doi.org/10.1089/cmb.2020.0120

[3] The article was published on May 13, 2020: https://www.liebertpub.com/toc/cmb/0/0

Robert Penner on how curiosity and a little help from friends have led to his COVID initiative

Mathematics for and by Large Language Models

Physical Mathematics: Celebration of Albert Schwarz’s 70 Years in Science

Call for proposals for the 2026 IHES Summer School