Coronaviruses (CoVs) are enveloped positive-sense RNA viruses. The club-like spikes projecting out from their surface gave them the name. Coronaviruses possess an unusual large RNA genome as well as a unique replication strategy. Coronaviruses cause a variety of diseases in animals ranging from cows, pigs to chicken, and other birds. In humans, coronaviruses can cause potentially lethal respiratory infections.
Coronaviruses belong to the largest group of viruses called the Nidovirales order. Members of this order include the Coronaviridae, Arteriviridae, and Roniviridae families. The Coronvirinae are one of two subfamilies in the Coronaviridae family. Coronavirinae are further subdivided into for groups, the alpha, beta, gamma, and delta coronaviruses. Nowadays, these viruses are divided using phylogenetic clustering. These virus families have animal and human hosts. The Middle Eastern Respiratory Syndrome Coronavirus (MERS-CoV) and Severe Acute Respiratory Coronavirus (SARS-CoV) are examples.
Nidoviruses contain an infectious, linear, positive-sense RNA genome that is capped and polyadenylated. Based on their genome size, nidoviruses are divided into two groups large and small nidoviruses.
All Nidovirales viruses are enveloped, non-segmented positive-sense RNA viruses containing very huge genomes.
Common features of coronaviruses include
(i) a highly conserved genomic organization with a large replicase gene preceding structural and accessory genes,
(ii) expression of many non-structural genes by ribosomal frameshifting,
(iii) several unique of unusual enzymatic activities encoded within the large replicase-transcriptase polyprotein, and
(iv) expression of downstream genes by synthesis of 3’-nested sub-genomic mRNAs.
The typical organization of the genome is
5’-leader-UTR-replicase-S(Spike)-E(Envelope)-M(Membrane)-N(Nucleocapsid)-3’-UTR-poly(A) tail. Accessory genes are interspersed within the structural genes at the 3’-end of the genome.
Accessory proteins are not needed for replication in tissue culture but appear to be important in viral pathogenesis. The synthesis of polypeptide 1ab (pp1ab) involves programmed ribosomal frame shifting during translation of open-reading frame 1a (orf1a). Frame shifting results in a new reading frame that produces a trans-frame protein product. In coronaviruses, a fixed portion of the ribosomes translating orf1a change reading frame at a specific location now decoding information contained in orf1b.
U_UUA_AAC is a universal frame-shifting site
Coronaviruses contain a frameshifting stimulation element as a conserved RNA sequence forming a stem-loop that promotes ribosomal frameshifting. Ribosomal frameshifting is a mechanism in which open-reading frame 1b (orf1b) is expressed. Replicase-transcriptase proteins are encoded in open-reading frame 1a and 1b (orf1a and orf1b) and are synthesized initially as two large polyproteins termed pp1a and pp1b. A comparative analysis performed by Baranov et al. in 2004 revealed the sequence U_UUA_AAC as a universal shift site. Frameshifting was characterized in SARS-CoV cultured in mammalian cells using a dual luciferse reporter system and mass spectrometry. Tandem tRNA slippage on the sequence U_UUA_AAC was confirmed by mutagenic analysis of the shift site. Mass spectrometry was used for the analysis of affinity tagged frameshift products. Further analysis of the frameshifting site showed that a proposed RNA secondary structure in loop II and two unpaired nucleotides at the stem I-stem II junction in SARS-CoV are important for frameshift stimulation.
Model of Coronavirus COVID19 Transcription. A possible model of the coronavirus COVID19 transcription mechanism is shown here. The model is based on the genomic sequence and the model for the transcription of coronaviruses as proposed by Sawicki et al. in 2007. The organization and the expression of the Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1 genome is depicted here. Structural relationships of the genome and subgenome mRNAs are shown. Orfs are defined by the published genome sequence. Possible autoproteolytic processing of orfs1a and orf1ab polypeptides into protein nsp1 to 16 are shown as well.
Reference
Baranov PV, Henderson CM, Anderson CB, Gesteland RF, Atkins JF, Howard MT (February 2005). "Programmed ribosomal frameshifting in decoding the SARS-CoV genome". Virology. 332 (2): 498-510. [Pubmed]
Buchan, J.R.; Stansfield, I. (2007). "Halting a cellular production line: responses to ribosomal pausing during translation". Biol Cell. 99 (9): 475–487. [Source]
Fehr & Perlman; Coronaviruses: An overview of their replication and pathogenesis. Method Mol Biol. 2015; 1282:1-23. [PMC]
Sawicki SG, Sawicki DL, Siddell SG. A contemporary view of coronavirus transcription. J Virol. 2007 Jan;81(1):20-9. doi: 10.1128/JVI.01358-06. Epub 2006 Aug 23. PMID: 16928755; PMCID: PMC1797243. [PMC]
Yang H, Yang M, Ding Y, Liu Y, Lou Z, Zhou Z, Sun L, Mo L, Ye S, Pang H, Gao GF, Anand K, Bartlam M, Hilgenfeld R, Rao Z; The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor. Proc. Natl. Acad. Sci. U.S.A. (2003) 100 p.13190-5. [Pubmed]
Genomic structure of Wuhan seafood market pneumonia virus [Now COVID-19]
Isolate 2019-nCoV/USA-AZ1/2020 - 2019 Outbreak Info
Source: Wiki Commonns; CDC Commons
# | Position | 1..29903 Wuhan seafood market pneumonia virus |
1 | 1-265 | 5’-UTR |
2 | 266-21555 | Orf1ab: Polyprotein. Ribosomal slippage, id "QHQ82463.1" |
3 | 266-805 | Orf1ab; nsp1: Leader protein produced by both pp1a and pp1ab. Protein id "YP_009725297.1" .
The result is blocking of the innate immune response. |
4 | 806-2719 | Orf1ab: nsp2, produced by both pp1a and pp1ab. protein_id="YP_009725298.1".
|
5 | 2720-8554 | Orf1ab: nsp3: Contains conserved domains: N-terminal acidic (Ac), predicted phosphoesterase, Large, multidomain transmembrane protein. Activities include: a) ubiquitin-like 1 (Ubl1) and Ac domains interacting with N protein; b) ADRP activity promoting cytokine expression; c) Papain-like protease (PLPro)/Deubiquitinase domain. Cleaves viral polyprotein and blocks host immune response. Ubiquitin-like 2 (UBl2), nucleic acid binding (NAB), G2M, SARS-unique domain (SUD), Y domains of unknown function. Known structures: |
6 | 8555-10054 | Orf1ab: nsp4, contains transmembrane domain 2 (TM2); Protein id "YP_009725300.1". Potential transmembrane scaffold protein, important for proper structure of double-membrane vesicles (DMVs). |
7 | 10055-10972 | Orf1ab: 3C-like proteinase; nsp5: Main proteinase (Mpro). Mediates cleavages downstream of nsp4.
Protein_id "YP_009725301.1". Cleaves viral polyprotein.
1--------10--------20--------30--------40--------50 SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDML NPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPK YKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNHTIKGSFLNGSCGSVGF NIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDRQTAQAAGTDTTI TLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDIL GPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC SGVTFQ |
8 | 10973-11842 | Orf1ab: nsp6; putative transmembrane domain; produced by both pp1a and pp1ab. Protein id "YP_009725302.1". |
9 | 11843-12091 | Orf1ab: nsp7; produced by both pp1a and pp1ab. Protein id "YP_009725303.1". Forms a hexadecameric complex with nsp8 and may act as a processivity clamp for RNA polymerase. Structures for SARS-CoV nsp12-nsp7-nsp8 cofactors are known. |
10 | 12092-12685 | Orf1ab: nsp8; produced by both pp1a and pp1ab. Protein id "YP_009725304.1". Forms a hexadecameric complex with nsp7 and may act as processivity clamp for RNA polymerase and/or primase. |
11 | 12686-13024 | Orf1ab: nsp9; ssRNA-binding protein; produced by both pp1a and pp1ab. Protein id "YP_009725305.1". |
12 | 13025-13441 | Orf1ab: nsp10; formerly known as growth-factor-like protein (GFL). Produced by both pp1a and pp1ab. Protein id "YP_009725306.1". Cofactor for nsp16 and nsp14. Forms a heterodimer with both and stimulates viral exoribonuclease (ExoN) and 2-O-methyltransferase (2-O-MT) activity. |
13 | 13442-13468, 13468-16236 | Orf1ab: RNA-dependent RNA polymerase; nsp12; Produced by pp1ab only. Protein id "YP_009725307.1". |
14 | 16237-18039 | Orf1ab: helicase; nsp13; zinc-binding domain (ZD), NTPase/helicase domain (HEL), RNA 5'-triphosphatase; produced by pp1ab only. Protein id "YP_009725308.1". |
15 | 18040-19620 | Orf1ab: 3'-to-5' exonuclease; nsp14; produced by pp1ab only. Protein id "YP_009725309.1. N7 methyl-transferase (MTase) and 3‘-5‘-exoribonuclease (ExoN). ExoN activity is important for proofreading of viral genome. |
16 | 19621-20658 | Orf1ab: endoRNAse; nsp15; produced by pp1ab only. Protein id "YP_009725310.1". Viral endoribonuclease (NendoU). A structure for the nsp15 (F307L) protein from the MHV coronavirus was solved in 2006. |
17 | 20659-21552 | Orf1ab: 2'-O-ribose methyltransferase; nsp16; 2'-O-MT; produced by pp1ab only. Protein id "YP_009725311.1". |
18 | 266-13483 | Orf1ab: pp1a; orf1a polyprotein. 2’-O-MT shielding viral RNA from Melanoma differentiation associated protein 5 (mMDA5) recognition. |
19 | 13442-13480 | Orf1ab: nsp11; produced by pp1a only". Protein_id="YP_009725312.1". |
20 | 21563-25384 | S gene = Surface glycoprotein. "QHQ82464.1" Protein id "YP_009724390.1"; GeneID: "43740568" |
21 | 25393-26220 | orf3a, “orf3a protein",”QHQ82465.1". 1--------10--------20--------30--------40--------50 MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGV ALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAG LEPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHT NCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCV VLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDG SSGVVNPVMEPIYDEPTTTTSVPL |
22 | 26245-26472 | E gene = Envelope Protein"QHQ82466.1" 1--------10--------20--------30--------40--------50 MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSL VKPSFYVYSRVKNLNSSRVPDLLV |
23 | 26523-27191 | M gene: ORF5; structural protein. start=1. Membrane glycoprotein. Protein_id “YP_009724393.1”. GeneID “43740571" 1--------10--------20--------30--------40--------50 MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNRFLYIIKL IFLWLLWPVTLACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRL FARTRSMWSFNPETNILLNVPLHGTILTRPLLESELVIGAVILRGHLRIAG HHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNY KLNTDHSSSSDNIALLVQ |
24 | 27192-27201 | ? |
25 | 27202-27387 | Orf6; Protein id"QHQ82468.1.“. 1--------10--------20--------30--------40--------50 MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQ LDEEQPMEID ORF6; protein id "YP_009724394.1“. GeneID “43740572" 1--------10--------20--------30--------40--------50 MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQ LDEEQPMEID |
26 | 27387-27393 | ? |
27 | 27394-27759 | ORF7a: GeneID “43740573”. ORF7a protein. Protein_id id "YP_009724395.1”. 1--------10--------20--------30--------40--------50 MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLA DNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPI FLIVAAIVFITLCFTLKRKTE Protein id="YP_009724395.1. GeneID:43740573 MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLA DNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPI FLIVAAIVFITLCFTLKRKTE |
28 | 27894-28259
| ORF8: ORF8 protein. Protein id "YP_009724396.1. GeneID:”43740577". MKFLVFLGIITTVAAFHQECSLQSCTQHQPYVVDDPCPIHFYSKWYIRVG ARKSAPLIELCVDEAGSKSPIQYIDIGNYTVSCLPFTINCQEPKLGSLVV RCSFYEDFLEYHDVRVVLDFI |
29 | 28274-29533 | N Protein: ORF9; structural protein. Nucleocapsid phosphoprotein. Protein id "YP_009724397.2”. GeneID: ”43740575”. 1--------10--------20--------30--------40--------50 MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTA SWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGK MKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRN PANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPG SSRGTSPARMAGNGGDAALALLLLDRLNQLESKMSGKGQQQQGQTVTKKS AAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKH WPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQV ILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADL DDFSKQLQQSMSSADSTQA |
30 | 29558-29674 | ORF10: ORF10 protein. Protein_id "YP_009725255.1" GeneID "43740576" MGYINVFAFPFTIYSLLLCRMNSRNYIAQVDVVNFNLT |
31 | 29675-29903 | 3‘-UTR |
----...---