A Closer Look at the Alarming Resurgence of Covid-19 in Manaus, Brazil

What sequence analysis and computation reveal about the deadly spread of the virus

Eliot Bush
Medium Coronavirus Blog


Photo: Isaac Quesada/Unsplash

The first wave of the coronavirus hit Brazil’s Amazonas state very hard. So many people got sick in the capital Manaus that researchers estimated 70% or more of the population had immunity by fall 2020. This level of population immunity is in the ballpark of what would be needed for herd immunity, and people in Manaus began to feel that their previous ordeal meant that they would not face another surge in infection and death.

Unfortunately, they were wrong. In December 2020 a new spike in cases began, and by early January the level of daily hospital admissions was more than twice that seen in the spring 2020 peak. This overwhelmed the health care system.

This second wave involved people who had already had Covid-19 getting it again. There are a number of factors that could explain this including the waning of immunity from the first wave. But the presence of a new variant of SARS-CoV-2 in Manaus seems to have been important. Over a period of about seven weeks in November and December, this variant, called P.1, went from being absent to being the dominant strain in Manaus.

As has been the case throughout the pandemic, researchers have been able to use sequence information and computational analysis to learn a lot about this new variant.

Mutations in the spike protein

First of all, comparisons with other strains of SARS-CoV-2 show that P.1 has a number of mutations in the spike protein. This protein resides on the surface of the viral particle and is key to the virus’s ability to enter human cells. It is also the primary target of the immune system.

Several of these mutations are in the part of the spike protein that binds onto receptors on human cells, suggesting they could affect how readily the virus infects those cells. A number of the mutations are also shared with other SARS-CoV-2 variants, including the B.1.1.7 lineage first described in Britain and B.1.351 that was found in South Africa.

Analysis of phylogenetic trees made from these sequences makes clear that these mutations arose independently in P.1. Below is a tree representing the relationships of a number of SARS-CoV-2 strains (redrawn and simplified from trees at nextstrain.org). The tips on the right correspond to strains, and the length of the branches corresponds to the amount of sequence change (that is, the number of mutations). During the evolution of these viruses, many mutations have arisen, only some of which are significant clinically. (For more on SARS-CoV-2 and phylogenetic trees, you can take a look at this video.)

Phylogenetic tree, redrawn and simplified based on

On the tips, I’ve highlighted three variants (P.1, B.1.17, and B.1.351) that have mutations of concern in the spike protein. The remaining tips represent other lineages that lack those key mutations but do have a variety of other mutations.

All of the strains in the tree evolved from a common ancestor (purple dot on left) which also lacked the key mutations. Given this, the simplest explanation is that mutations shared between P.1, B.1.17, and B.1.351 arose independently in those strains. Such a scenario is favored because of parsimony. It minimizes unlikely events, such as back mutations where a mutant sequence subsequently mutates back again to the original sequence.

It is interesting that some of the same mutations keep cropping up in different lineages. The virus, having entered humans recently, is evolving to better fit its new host. Evidently, there are certain improvements that make so much sense that they can arise and be selected for multiple times. The fact that the P.1 variant in Manaus had some of these contributed to its spread, despite the fact that many people there had already had Covid-19.

Estimating when the P.1 strain first arose

Another thing that can be estimated from viral sequences is when variants such as P.1 first arose. This is done by estimating when the common ancestor of a set of samples existed. Here, the starting data would be a phylogenetic tree consisting of many P.1 samples.

We can illustrate the basics of how this works. Consider a set of five hypothetical strains (A-E) that have been collected at different times. We can build a phylogenetic tree based on these strains, where the length of the branches corresponds to the number of mutations. For each of the strains on the tips, we can define a “cumulative branch length” which is the sum of the branch lengths going from the root of the tree to that tip. This measure tells us how much each sequence at the tip differs from the common ancestor. In the illustration below we’re showing the branches that would be summed to give the cumulative branch length for strain D.

Illustration of cumulative branch length.

For each strain, we can calculate the cumulative branch length. We also have the date that the strain was collected. From this, we can make a plot of cumulative branch length vs. collection date.

Plot of cumulative branch length vs. collection date.

Notice that there is an association between these two variables. This is not surprising. Because samples that were collected later have had more time to mutate, we expect them to have a larger cumulative branch length.

It is possible to use this association to estimate the time when the most recent common ancestor of the sequences existed. If we had a sequence sample from the common ancestor, its cumulative branch length would be zero (since this sample would have no sequence differences from the common ancestor in that case). We don’t usually have such a sample. However, we can extrapolate from the data we do have to estimate when a sequence with a cumulative branch length of zero would have existed. We do this by fitting a line to the data. The point at which that best fit line crosses the x-axis corresponds to an estimate of when there was zero cumulative branch length and thus when the common ancestor existed.

This gives you an idea of how such calculations can be done. A related but more sophisticated method was used in a recent paper (Faria et al. 2021) to estimate November 2020 as the date of origin of the P.1 variant. This is very much in line with the subsequent spike in cases in December and January.

One observation from the phylogenetic trees in this study is that the branch which separates the P.1 variants from other SARS-CoV-2 samples is especially long. Here is an illustration (a schematic based on the trees in Faria et al.):

Tree with long branch leading to P.1.

P.1 variants are in red here, and non-P.1 in blue. The non-P.1 variants were collected in the same time frame (late 2020) as the P.1. You can see that there’s one especially long branch that separates all P.1 from the rest. This indicates that an especially large number of mutations occurred in the common ancestor of all P.1 variants. This is also true of other variants of concern such as B.1.1.7. There are some indications that rapid evolution like this can occur inside a single immunocompromised or chronically infected patient.

Epidemiological modeling

Another computational approach that has been very helpful in understanding the pandemic is modeling. In the case of the Manaus outbreak, epidemiological models have been useful in helping us understand the importance of different factors in causing the second wave of infections.

Three factors that could have been important in causing the second peak are the degree of cross-immunity (that is how much immunity for the original strain transfers to P.1), whether the P.1 strain shows increased transmissibility (spreads more easily), and the extent to which immunity naturally waned. (It is possible that in the many months since the original peak in April/May 2020, the immunity conferred by infection has weakened significantly. This is something that can naturally happen with an immune response.)

Faria et al. used epidemiological models to explore the importance of these different factors. Such models are based on a description of the connection between parameters (cross-immunity, transmissibility, the waning of immunity) and observations (new infections, hospitalizations, deaths). Through repeated simulations, they can arrive at some plausible ranges for the parameters. In this case, the authors found that if natural immunity wanes 50% within a year, then P.1 might be in the range of 1.7–2.4 times more transmissible, and might evade 21%–46% of previous immunity.

The outbreak in Manaus was a tragedy for the people there. (How unfair it must have felt to be facing a surge of Covid-19 all over again.) And it is sobering for the rest of us. It shows that the combination of waning immunity and the arrival of new variants can lead to large-scale reinfection. Our hope is that widespread vaccination will provide better protection against this, especially through booster shots and tweaked vaccines that better target the variants. Nevertheless, the whole thing makes clear that SARS-CoV-2 is not going to suddenly disappear.

Main reference

Faria NR, Mellan TA, Whittaker C, Claro IM, Candido DD, Mishra S, Crispim MA, Sales FC, Hawryluk I, McCrone JT, Hulswit RJ. Genomics and epidemiology of the P. 1 SARS-CoV-2 lineage in Manaus, Brazil. Science. 2021 Apr 14.



Eliot Bush
Medium Coronavirus Blog

Professor of computational biology and evolution at Harvey Mudd College. Current research focuses on microbial genome evolution.