Press, H.) 3964 (Springer, 2009). Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. 36, 17931803 (2019). Biazzo et al.
Why Can't We Just Call BA.2 Omicron? - The Atlantic "This is an extremely interesting . The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. This is not surprising for diverse viral populations with relatively deep evolutionary histories. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. 206298/Z/17/Z. Evol. 5 Comparisons of GC content across taxa. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. Med. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. PubMed Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks.
Coronavirus origins: genome analysis suggests two viruses may have combined For coronaviruses, however, recombination means that small genomic subregions can have independent origins, identifiable if sufficient sampling has been done in the animal reservoirs that support the endemic circulation, co-infection and recombination that appear to be common. 36, 7597 (2002). 26, 450452 (2020). Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. SARS-CoV-2 and RaTG13 are the most closely related (their most recent common ancestor nodes denoted by green circles), except in the 222-nt variable-loop region of the C-terminal domain (bar graphs at bottom). A tag already exists with the provided branch name. You signed in with another tab or window. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. The extent of sarbecovirus recombination history can be illustrated by five phylogenetic trees inferred from BFRs or concatenated adjacent BFRs (Fig. performed Srecombination analysis. 68, 10521061 (2019). An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature ( 17 ). performed codon usage analysis. DRAGEN COVID Lineage App This app aligns reads to a SARS-CoV-2 reference genome and reports coverage of targeted regions. COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further in data analyses it helps to Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 #datascience #epidemiology 382, 11991207 (2020). All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. 2, vew007 (2016). is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. =0.00025. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. J. Virol. Katoh, K., Asimenos, G. & Toh, H. in Bioinformatics for DNA Sequence Analysis (ed. Coronavirus: Pangolins found to carry related strains. PubMed Central Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. GARD identified eight breakpoints that were also within 50nt of those identified by 3SEQ. 30, 21962203 (2020). Microbiol. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color).
Pangolins: What are they and why are they linked to Covid-19? - Inverse This boundary appears to be rarely crossed. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). In March, when covid cases began spiking around India, Bani Jolly went hunting for answers in the virus's genetic code. Bayesian evaluation of temporal signal in measurably evolving populations. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. & Bedford, T. MERS-CoV spillover at the camelhuman interface. Host ecology determines the dispersal patterns of a plant virus. Hu, B. et al. Concurrent evidence also proposed pangolins as a potential intermediate species for SARS-CoV-2 emergence and suggested them as a potential reservoir species11,12,13. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. CNN . Zhou, H. et al. Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. 1 Phylogenetic relationships in the C-terminal domain (CTD). [12] Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. Nature 503, 535538 (2013). A., Lytras, S., Singer, J. . 26 March 2020. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. It compares the new genome against the large, diverse population of sequenced strains using a Developed by the Centre for Genomic Pathogen Surveillance. Nat. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. Discovery and genetic analysis of novel coronaviruses in least horseshoe bats in southwestern China. Dis. As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. J. Virol. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
covid19_mostefai2021_paper/01_CreateObjects.r at master HussinLab Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). Patino-Galindo, J. Conducting analogous analyses of codon usage bias as Ji et al. Pangolin relies on a novel algorithm called pangoLEARN. and JavaScript. Boni, M.F., Lemey, P., Jiang, X. et al. Virological.org http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339 (2020). RegionC showed no PI signals within it. 2a. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. Lu, R. et al. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. 6, e14 (2017). Evol. A pneumonia outbreak associated with a new coronavirus of probable bat origin. He, B. et al.
cov-lineages/pangolin - GitHub master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin Nature 579, 270273 (2020). Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. Anderson, K. G. nCoV-2019 codon usage and reservoir (not snakes v2). Lam, H. M., Ratmann, O. In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. 04:20. The virus then. The red and blue boxplots represent the divergence time estimates for SARS-CoV-2 (red) and the 2002-2003 SARS-CoV (blue) from their most closely related bat virus, with the light- and dark-colored versions based on the HCoV-OC43 and MERS-CoV centered priors, respectively. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. One study suggests that over a century ago, one lineage of coronavirus circulating in bats gave rise to SARS-CoV-2, RaTG13 and a Pangolin coronavirus known as Pangolin-2019, Live Science . We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. We compiled a set of 69SARS-CoV genomes including 58 sampled from humans and 11 sampled from civets and raccoon dogs. Proc. Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. Liu, P. et al.
Phylogenetic Assignment of Named Global Outbreak Lineages Effect of closure of live poultry markets on poultry-to-person transmission of avian influenza A H7N9 virus: an ecological study. Biol. We named the length-sorted BFRs as: BFRA (ntpositions 13,29119,628, length=6,338nt), BFRB (ntpositions 3,6259,150, length=5,526nt), BFRC (ntpositions 9,26111,795, length=2,535nt), BFRD (ntpositions 27,70228,843, length=1,142nt) and six further regions (EJ). Article 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). CAS PubMed Central Note that six of these sequences fall under the terms of use of the GISAID platform.
Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively.
More evidence Pangolin not intermediary in transmission of SARS-CoV-2 Did Pangolin Trafficking Cause the Coronavirus Pandemic? Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent for the current coronavirus disease (COVID-19) pandemic that has affected more than 35 million people and caused . Conservatively, we combined the three BFRs >2kb identified above into non-recombining region1 (NRR1). Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. A phylogenetic treeusing RAxML v8.2.8 (ref. The boxplots show divergence time estimates (posterior medians) for SARS-CoV-2 (red) and the 20022003 SARS-CoV virus (blue) from their most closely related bat virus. 27) receptors and its RBD being genetically closer to a pangolin virus than to RaTG13 (refs. Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. Lancet 383, 541548 (2013). Nature 583, 286289 (2020). Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Given that these pangolin viruses are ancestral to the progenitor of the RaTG13/SARS-CoV-2 lineage, it is more likely that they are also acquiring viruses from bats. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. Boxplots show interquartile ranges, white lines are medians and box whiskers show the full range of posterior distribution. Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. Biol. The authors declare no competing interests. From this perspective, it may be useful to perform surveillance for more closely related viruses to SARS-CoV-2 along the gradient from Yunnan to Hubei. 5 (NRR1) are conservative in the sense that NRR1 is more likely to be non-recombinant than NRR2 or NRA3. Even before the COVID-19 pandemic, pangolins have been making headlines. Med. Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. Extended Data Fig. PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. Rev. 21, 15081514 (2015). & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Curr. PLoS ONE 5, e10434 (2010). Google Scholar. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. B 281, 20140732 (2014). Using the most conservative approach (NRR1), the divergence time estimate for SARS-CoV-2 and RaTG13 is 1969 (95% HPD: 19302000), while that between SARS-CoV and its most closely related bat sequence is 1962 (95% HPD: 19321988); see Fig. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. Zhou et al.2 concluded from the genetic proximity of SARS-CoV-2 to RaTG13 that a bat origin for the current COVID-19 outbreak is probable.
Cov-Lineages Sci. Li, Q. et al. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. Although the human ACE2-compatible RBD was very likely to have been present in a bat sarbecovirus lineage that ultimately led to SARS-CoV-2, this RBD sequence has hitherto been found in only a few pangolin viruses. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. Chernomor, O. et al.
PDF single centre retrospective study =0.00075 and one with a mean of 0.00024 and s.d. The first available sequence data6 placed this novel human pathogen in the Sarbecovirus subgenus of Coronaviridae7, the same subgenus as the SARS virus that caused a global outbreak of >8,000 cases in 20022003.
Current Overview on Disease and Health Research Vol. 6 3) clusters with viruses from provinces in the centre, east and northeast of China. Extended Data Fig. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub The coverage threshold and consensus sequence generation threshold were set to 20 and 90 respectively. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. In the meantime, to ensure continued support, we are displaying the site without styles Natl Acad. https://doi.org/10.1093/molbev/msaa163 (2020). Download a free copy. 2). Uncertainty measures are shown in Extended Data Fig. Because the SARS-CoV-2 S protein has been implicated in past recombination events or possibly convergent evolution12, we specifically investigated several subregions of the Sproteinthe N-terminal domain of S1, the C-terminal domain of S1, the variable-loop region of the C-terminal domain, and S2. These residues are also in the Pangolin Guangdong 2019 sequence. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). Google Scholar. Biol. 31922087). The research leading to these results received funding (to A.R. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses.