Viruses have several common characteristics: they are small, have DNA or RNA genomes, and are obligate intracellular parasites. The virus capsid functions to protect the nucleic acid from the environment, and some viruses surround their capsid with a membrane envelope. Most viruses have icosahedral or helical capsid structure, although a few have complex virion architecture. An icosahedron is a geometric shape with 20 sides, each composed of an equilateral triangle, and icosahedral viruses increase the number of structural units in each face to expand capsid size.
The classification of viruses is very useful, and the International Committee on Taxonomy of Viruses is the official body that classifies viruses into order, family, genus, and species taxa. There are currently seven orders of viruses. Since that time, advances in microscopy and scientific techniques have led to a better classification of viruses and their properties. Electron microscopy has allowed us to visualize viruses in great detail, while molecular and cellular assays have broadened our understanding of how viruses function and are related to one another.
Taken together, we have learned that although they can be quite diverse, viruses share several common characteristics:. The smallest of viruses are about 20 nm in diameter, although influenza and the human immunodeficiency virus have a more typical size, about nm in diameter. However, some viruses are significantly larger than nm. Poxviruses, such as the variola virus that causes smallpox, can approach nm in length, and filoviruses, such as the dangerous Ebola virus and Marburg virus, are only 80 nm in diameter but extend into long threads that can reach lengths of over nm.
Several very large viruses that infect amoebas have recently been discovered: megavirus is nm in diameter, and pandoraviruses have an elliptical or ovoid structure approaching nm in length.
It is a common mistake to think that all viruses are smaller than bacteria; most bacteria are typically — nm in size, but certain strains of bacteria called Mycobacteria can be 10 times smaller than this, putting them in the range of these large viruses. So although a characteristic of viruses is that they are all small in size, this ranges from only a few nanometers to larger than some bacteria Fig. Human viruses can vary in size but are generally in the range of 20— nm in diameter.
Viruses are obligate intracellular parasites , meaning that they are completely dependent upon the internal environment of the cell to create new infectious virus particles, or virions. All viruses make contact with and bind the surface of a cell to gain entry into the cell. The virus disassembles and its genetic material made of nucleic acid encodes the instructions for the proteins that will spontaneously assemble into the new virions.
All living cells, whether human, animal, plant, or bacterial, have double-stranded DNA dsDNA as their genetic material. Genomes are not necessarily double-stranded, either; different virus types can also have single-stranded DNA ssDNA genomes, and viruses with RNA genomes can be single-stranded or double-stranded. Any particular virus will only have one type of nucleic acid genome, however, and so viruses are not encountered that have both ssDNA and ssRNA genomes, for example.
Similarly to how the size of the virus particle varies significantly, the genome size can also vary greatly from virus to virus. A typical virus genome falls in the range of —20, base pairs bp 7—20 kilobase pairs kb. Smaller-sized virions will naturally be able to hold less nucleic acid than larger virions, but large viruses do not necessarily have large genomes.
While most viruses do not contain much nucleic acid, some dsDNA viruses have very large genomes: herpesviruses have genomes that are — kb in total, and the very large pandoraviruses mentioned previously have the largest genomes: up to 2. In comparison, eukaryotic cells have much larger genomes: a red alga has the smallest known eukaryotic genome, at 8 million base pairs; a human cell contains over 3 billion nucleotides in its hereditary material; the largest genome yet sequenced, at over 22 billion base pairs, is that of the loblolly pine tree.
The infectious virus particle must be released from the host cell to infect other cells and individuals. In the extracellular environment, the virus will be exposed to enzymes that could break down or degrade nucleic acid. Physical stresses, such as the flow of air or liquid, could also shear the nucleic acid strands into pieces.
In addition, viral genomes are susceptible to damage by ultraviolet radiation or radioactivity, much in the same way that our DNA is. If the nucleic acid genome of the virus is damaged, then it will be unable to produce progeny virions.
This repeating structure forms a strong but slightly flexible capsid. Combined with its small size, the capsid is physically very difficult to break open and sufficiently protects the nucleic acid inside of it.
Together, the nucleic acid and the capsid form the nucleocapsid of the virion Fig. Viral capsid proteins protect the fragile genome, composed of nucleic acid, from the harsh environment. The capsid and nucleic acid together are known as the nucleocapsid. Remember that the genomes of most viruses are very small. Genes encode the instructions to make proteins, so small genomes cannot encode many proteins.
It is for this reason that the capsid of the virion is composed of one or only a few proteins that repeat over and over again to form the structure.
The nucleic acid of the virus would be physically too large to fit inside the capsid if it were composed of more than just a few proteins. In the same way that a roll of magnets will spontaneously assemble together, capsid proteins also exhibit self-assembly. The first to show this were H.
Fraenkel-Conrat and Robley Williams in They separated the RNA genome from the protein subunits of tobacco mosaic virus, and when they put them back together in a test tube, infectious virions formed automatically.
This indicated that no additional information is necessary to assemble a virus: the physical components will assemble spontaneously, primarily held together by electrostatic and hydrophobic forces.
Most viruses also have an envelope surrounding the capsid. Virus capsids are held together by some of the same bonds that are found in living organisms. Rarely are covalent bonds found in capsids; these are the strongest of bonds that are formed when atoms share electrons with each other. Hydrogen bonds are also weak electrostatic forces that occur between slightly charged atoms, usually between hydrogen slightly positively charged and another atom that is partially negatively charged, such as oxygen.
Van der Waals forces are weak interactions that occur when an atom becomes slightly charged due to random asymmetry of its electrons. The properties of water also contribute to virus assembly and attachment to cells. Water is a polar molecule, meaning that the molecule has two distinct ends, much like a battery or magnet has a positive and a negative end.
Molecules that do not have distinct ends are termed nonpolar. Other polar molecules are attracted to water, since water is polar too.
This explains the phenomenon of oil nonpolar not mixing with water polar. These viruses often have proteins, called matrix proteins , that function to connect the envelope to the capsid inside.
A virus that lacks an envelope is known as a nonenveloped or naked virus Fig. Each virus also possesses a virus attachment protein embedded in its outer-most layer. This will be found in the capsid, in the case of a naked virus, or the envelope, in the case of an enveloped virus. The virus attachment protein is the viral protein that facilitates the docking of the virus to the plasma membrane of the host cell, the first step in gaining entry into a cell.
The capsid of an enveloped virion is wrapped with a lipid membrane derived from the cell. Virus attachment proteins located in the capsid or envelope facilitate binding of the virus to its host cell.
Each virus possesses a protein capsid to protect its nucleic acid genome from the harsh environment. Virus capsids predominantly come in two shapes: helical and icosahedral. The helix plural: helices is a spiral shape that curves cylindrically around an axis.
It is also a common biological structure: many proteins have sections that have a helical shape, and DNA is a double-helix of nucleotides. In the case of a helical virus, the viral nucleic acid coils into a helical shape and the capsid proteins wind around the inside or outside of the nucleic acid, forming a long tube or rod-like structure Fig. The nucleic acid and capsid constitute the nucleocapsid.
In fact, the protein that winds around the nucleic acid is often called the nucleocapsid protein. Once in the cell, the helical nucleocapsid uncoils and the nucleic acid becomes accessible. A Viral capsid proteins wind around the nucleic acid, forming a helical nucleocapsid. B Helical structure of tobacco mosaic virus.
Graph , 12, —44 using a 2xea PDB assembly J. There are several perceived advantages to forming a helical capsid. First, only one type of capsid protein is required. This protein subunit is repeated over and over again to form the capsid. This structure is simple and requires less free energy to assemble than a capsid composed of multiple proteins. In addition, having only one nucleocapsid protein means that only one gene is required instead of several, thereby reducing the length of nucleic acid required.
Because the helical structure can continue indefinitely, there are also no constraints on how much nucleic acid can be packaged into the virion: the capsid length will be the size of the coiled nucleic acid.
Helical viruses can be enveloped or naked. The first virus described, tobacco mosaic virus, is a naked helical virus. In fact, most plant viruses are helical, and it is very uncommon that a helical plant virus is enveloped.
In contrast, all helical animal viruses are enveloped. These include well-known viruses such as influenza virus, measles virus, mumps virus, rabies virus, and Ebola virus Fig. A Vesicular stomatitis virus forms bullet-shaped helical nucleocapsids. Fred A. B Tobacco mosaic virus forms long helical tubes. C The helical Ebola virus forms long threads that can extend over nm in length.
Of the two major capsid structures, the icosahedron is by far more prevalent than the helical architecture. In comparison to a helical virus where the capsid proteins wind around the nucleic acid, the genomes of icosahedral viruses are packaged completely within an icosahedral capsid that acts as a protein shell. Initially these viruses were thought to be spherical, but advances in electron microscopy and X-ray crystallography revealed these were actually icosahedral in structure.
The net axes are formed by lines of the closest-packed neighboring capsomeres. In adenoviruses, the h and k axes also coincide with the edges of the triangular faces. This symmetry and number of capsomeres is typical of all members of the adenovirus family. Except in helical nucleocapsids, little is known about the packaging or organization of the viral genome within the core. Small virions are simple nucleocapsids containing 1 to 2 protein species.
The larger viruses contain in a core the nucleic acid genome complexed with basic protein s and protected by a single- or double layered capsid consisting of more than one species of protein or by an envelope Fig. Two-dimensional diagram of HIV-1 correlating immuno- electron microscopic findings with the recent nomenclature for the structural components in a 2-letter code and with the molecular weights of the virus structural glyco- proteins.
SU stands for more Because of the error rate of the enzymes involved in RNA replication, these viruses usually show much higher mutation rates than do the DNA viruses. Mutation rates of 10 -4 lead to the continuous generation of virus variants which show great adaptability to new hosts. The viral RNA may be single-stranded ss or double-stranded ds , and the genome may occupy a single RNA segment or be distributed on two or more separate segments segmented genomes.
In addition, the RNA strand of a single-stranded genome may be either a sense strand plus strand , which can function as messenger RNA mRNA , or an antisense strand minus strand , which is complementary to the sense strand and cannot function as mRNA protein translation see Ch.
Sense viral RNA alone can replicate if injected into cells, since it can function as mRNA and initiate translation of virus-encoded proteins. Antisense RNA, on the other hand, has no translational function and cannot per se produce viral components.
Schemes of 21 virus families infecting humans showing a number of distinctive criteria: presence of an envelope or double- capsid and internal nucleic acid genome. DsRNA viruses, e. Each segment consists of a complementary sense and antisense strand that is hydrogen bonded into a linear ds molecule. The replication of these viruses is complex; only the sense RNA strands are released from the infecting virion to initiate replication.
The retrovirus genome comprises two identical, plus-sense ssRNA molecules, each monomer 7—11 kb in size, that are noncovalently linked over a short terminal region. Retroviruses contain 2 envelope proteins encoded by the env-gene, 4—6 nonglycosylated core proteins and 3 non-structural functional proteins reverse transcriptase, integrase, protease: RT, IN, PR specified by the gag-gene Fig. This DNA, mediated by the viral integrase, becomes covalently bonded into the DNA of the host cell to make possible the subsequent transcription of the sense strands that eventually give rise to retrovirus progeny.
After assembly and budding, retroviruses show structural and functional maturation. In immature virions the structural proteins of the core are present as a large precursor protein shell.
After proteolytic processing by the viral protease the proteins of the mature virion are rearranged and form the dense isometric or cone-shaped core typical of the mature virion, and the particle becomes infectious. Most DNA viruses Fig. The papovaviruses, comprising the polyoma- and papillomaviruses, however, have circular DNA genomes, about 5. Three or 2 structural proteins make up the papovavirus capsid: in addition, nonstructural proteins are encoded that are functional in virus transcription, DNA replication and cell transformation.
Single-stranded linear DNA, 4—6 kb in size, is found with the members of the Parvovirus family that comprises the parvo-, the erythro- and the dependoviruses. The virion contains 2—4 structural protein species which are differently derived from the same gene product see Ch.
The adeno-associated virus AAV, a dependovirus is incapable of producing progeny virions except in the presence of helper viruses adenovirus or herpesvirus. It is therefore said to be replication defective. Circular single-stranded DNA of only 1. The isometric capsid measures 17 nm and is composed of 2 protein species only. On the basis of shared properties viruses are grouped at different hierarchical levels of order, family, subfamily, genus and species.
More than 30, different virus isolates are known today and grouped in more than 3, species, in genera and 71 families. Viral morphology provides the basis for grouping viruses into families. A virus family may consist of members that replicate only in vertebrates, only in invertebrates, only in plants, or only in bacteria.
Certain families contain viruses that replicate in more than one of these hosts. This section concerns only the 21 families and genera of medical importance. Besides physical properties, several factors pertaining to the mode of replication play a role in classification: the configuration of the nucleic acid ss or ds, linear or circular , whether the genome consists of one molecule of nucleic acid or is segmented, and whether the strand of ss RNA is sense or antisense.
Also considered in classification is the site of viral capsid assembly and, in enveloped viruses, the site of nucleocapsid envelopment.
Table lists the major chemical and morphologic properties of the families of viruses that cause disease in humans. The use of Latinized names ending in -viridae for virus families and ending in -virus for viral genera has gained wide acceptance. The names of subfamilies end in -virinae.
Vernacular names continue to be used to describe the viruses within a genus. In this text, Latinized endings for families and subfamilies usually are not used. Table shows the current classification of medically significant viruses.
In the early days of virology, viruses were named according to common pathogenic properties, e. From the early s until the mids, when many new viruses were being discovered, it was popular to compose virus names by using sigla abbreviations derived from a few or initial letters.
Thus the name Picornaviridae is derived from pico small and RNA; the name Reoviridae is derived from respiratory, enteric, and orphan viruses because the agents were found in both respiratory and enteric specimens and were not related to other classified viruses; Papovaviridae is from papilloma, polyoma, and vacuolating agent simian virus 40 [SV40] ; retrovirus is from reverse transcriptase; Hepadnaviridae is from the replication of the virus in hepatocytes and their DNA genomes, as seen in hepatitis B virus.
Hepatitis A virus is classified now in the family Picornaviridae, genus Hepatovirus. Although the current rules for nomenclature do not prohibit the introduction of new sigla, they require that the siglum be meaningful to workers in the field and be recognized by international study groups.
Several viruses of medical importance still remain unclassified. Some are difficult or impossible to propagate in standard laboratory host systems and thus cannot be obtained in sufficient quantity to permit more precise characterization. Hepatitis E virus, the Norwalk virus and similar agents see Ch. The fatal transmissible dementias in humans and other animals scrapie in sheep and goat; bovine spongiform encephalopathy in cattle, transmissible mink encephalopathy; Kuru, Creutzfeldt-Jakob disease, and Gerstmann-Straussler-Scheinker syndrome in humans see Ch.
The agents causing transmissible subacute spongiform encephalopathies have been linked to viroids or virinos i. Some of the transmissible amyloidoses show a familial pattern and can be explained by defined mutations which render a primary soluble glycoprotein insoluble, which in turn leads to the pathognomonic accumulation of amyloid fibers and plaques.
The pathogenesis of the sporadic amyloidoses, however, is still a matter of highly ambitious research. Turn recording back on. For multipartite viruses, the longest genome segment was considered for the plot. A Scatter plot colored by T number. Circles and squares represent eukaryotic viruses and bacteriophages, respectively.
The blue circle highlights the cluster formed by bacteriophages from the Podoviridae , Siphoviridae , and Myoviridae families. Pearson correlation results obtained from the inliers are shown in the inset. Data points contoured in red represent viruses that have more Lys than Arg in their positively charged segments see also Fig. C , D Show the Q max30res values per protein fragment and Q genome values according to capsid T number, respectively.
Error bars indicate the mean and SD values. Hence, we excluded the big bacteriophages and analyzed the other viruses separately Fig. A linear fit allowing outlier identification indicated that 20 viruses, members of 9 virus families marked in grey , deviated from the fit Fig. Assuming these families as outliers, we analyzed the remaining inliers from 17 families, including the controls, in a correlation analysis.
We obtained a Pearson r of 0. This group 3B — inlier points represents a subset of viruses for each genome packaging capacity is highly correlated to the internal net charge of the capsid. Figure 3 , panels C and D show the Q max30res and the Q genome vs.
T, respectively. Members from the Circoviridae family use only 60 subunits to pack genome sizes equivalent to the Geminiviridae subunits and Bromoviridae subunits see Fig. We conclude that positively charged domains are a common strategy for capsid assembly and stabilization employed by viruses of different genome types and hosts.
The clear separation of phages that use molecular motors to pump and concentrate the genomes under pressure into the capsid blue group, Fig. The identification of outliers from the linear regression Fig. The outlier position also indicates that these species have assembly strategies that are less dependent on electrostatic interactions between the genome and the capsid protein see Discussion for more details.
Although the fixed-frame method is fast and straightforward, it lacks size resolution. The new program starts from a pre-determined search frame e. Then, the search is re-initiated with 9 residues. The program will replace the previous stretch if the new stretch has a higher Q max value and if the Q c is higher than or equal to a predetermined threshold.
This approach minimizes the identification of long stretches that have an uneven distribution of the positively charged amino acids i. The program continues the search until it exhausts all the frame size possibilities, limited by the sequence size Fig.
We tested the new program using the same dataset analyzed in Fig. The frequency distribution of the Q max n res stretch sizes is shown in Fig. In these cases, the Q c threshold could be increased to better capture the domain enriched in positively charged residues. Next, we compared the Total Q max values found by the fixed and variable frame programs for each of the sequences in the data set used in Fig. Indeed, the variable frame outputs closely reproduced the plot from Fig. Figure 4B shows examples of sequences found by the variable frame red and fixed 30 residues frame program blue.
In the case of MS2, the variable frame program retrieved the same domain indicated by the fixed frame program, but for CHIK, the new program correctly added an important arginine-rich region to the positively charged domain. The overlap between the sequences identified by the fixed frame and the variable frame program demonstrates that the methods are consistently retrieving the same protein regions.
Therefore, we concluded that while using the variable frame could be useful to finely locate positively charged domains, the 30 amino acid residue frame is enough to capture the positively charged regions involved in genome stabilization and can be used for further analysis.
Automatic identification of positively charged domains with variable sizes. A Frequency distribution of the domain sizes retrieved by the variable frame program using the same viruses analyzed in Fig. The inset shows the Pearson correlation between the Total Q max30res calculated using the fixed frame program and the Total Q max n res calculated using the variable frame program. B Examples of sequences found by the variable frame marked in red and fixed 30 residues frame program marked in blue.
Next, we examined the composition and location of these positively charged protein segments in viral capsid proteins Fig. We complemented the protein dataset analyzed in Fig. The bias towards arginine was stronger among the families that are included in the linear fit shown in Fig. Another pattern that emerged from Fig. Although this feature is not exclusive to Group 1, we used this value as a threshold to the calculation with flexible sequence frames Fig.
To allow a direct comparison, protein lengths were normalized and split into bins of 0. Helical viruses presented a more scattered and fragmented pattern of charge distribution than the inliers, which tend to have their positively charged segments concentrated in one or both extremities of the capsid protein, usually in the N-terminus.
Finally, in Fig. Viruses had more arginine, proline, and tryptophan than the human dataset Fig. We looked for recurring patterns or known motifs in these sequences using MEME data not shown However, for the viral data set, no known nucleic-acid motifs were identified, and the few patterns retrieved by the program matched entries from the same family not shown. This result confirms the unique structural makeup of viral capsid positively charged domains with other DNA- and RNA-binding proteins.
Groups 1 and 2 grey box contain the icosahedral viruses shown in Fig. Group 3 contains bacteriophages and complex multicomponent icosahedral capsids that were not analyzed in Fig.
From this plot, we see that all groups included in the linear fit of Fig. The protein lengths were normalized and divided into bins of 0.
The panel shows the amino acid enrichment in relation to the total Swiss-Prot proteome amino acid composition. We found that the high frequency of positively charged domains found in many viruses Fig.
Eukaryotes are the only other group with a considerable number of proteins with a similar constitution. Among these proteins, nucleic acid binding proteins and, more notably, Protamines, small proteins expressed exclusively during spermatogenesis and are involved in DNA hyper-condensation 31 Fig.
The arginine side chain possesses a guanidinium group, able to form bidentate bonds that are advantageous to maximize nucleic acid folding and packing compared to Lys 32 , Moreover, arginine-rich cell-penetrating peptides are more efficient than lysine-rich peptides, probably because of the bidentate interaction forces membrane curvature and destabilization Hence, arginine seems to be the optimal amino acid to condense and stabilize the viral genome and to facilitate membrane interaction.
Nevertheless, unlike the negatively charged amino acids that can be found in stretches of 30 consecutive residues, the concentration of R and K in a short protein segment is limited Fig. The adverse effect of exceptionally positively charged protein segments on ribosomal synthesis efficiency may be among the selective pressures acting against repetitions of R or K in all organisms Additionally, the size and composition of positively charged viral domains might be controlled by other factors.
Viral nucleic-acid structural features that are rare in host cells usually serve as molecular targets for the innate immune response 35 , and R-rich domains may function as a viral protein-specific pattern. The proportion of proteins containing positively charged segments in the Swiss-Prot database. Protein sequences derived from the reviewed Swiss-Prot data-bank were used as input for a program that calculates the net charge of every consecutive 30 residue amino acid segments Q 30res.
The calculation of the capsid internal net charge shown in Fig. From the correlation between capsid total Q max30res and Q genome , we could distinguish at least 3 groups: complex bacteriophages that do not have a typical R-arm and pack the genome through molecular motors blue group Fig. This last group includes at least four RNA viruses for each the involvement of positively charged domains in genome packaging was experimentally demonstrated 4 , A strong correlation does not necessarily imply causation ie.
We tested whether R-arm size would correlate with the number of capsid subunits as an approximation for capsid radius. While our analysis implies a general role for positively charged domains in capsid assembly and genome interaction for all inlier families including some DNA viruses, it is important to note that the details of the assembly pathways can be highly diversified.
Some viral capsids rely more heavily on CP-CP interactions for assembly, as suggested by the formation of empty capsids in the absence of positively charged domains e. Johnson, personal communication.
Polyomaviridae and Papillomaviridae are known to pack their genome with histones, suggesting that the R-arms are not sufficient to stabilize or condense the stiffer dsDNA One unexpected finding among the inliers was MS2 and other Leviviridae bacteriophages.
MS2 depends on the RNA binding protein A for genome packaging 37 and is the prototype virus for assembly mechanisms driven by specific interaction between the capsid protein and RNA structural elements 38 , In fact, instead of being in a flexible arm, the most positively charged segment of the MS2 capsid was located on the internal beta-strand in close contact with the packaged RNA Fig.
However, mutations in some, but not all positively charged amino acid residues of this domain interfere with the RNA packaging capacity 40 , which indicates that charge balance and neutralization plays, at most, a secondary role in Leviviridae assembly and stability.
The bacteriophages highlighted in blue in Fig. In all cases, the genome charge exceeded the expected internal capsid charge Fig. More than transporting the genome, these particles are part of the viral factory, preventing the detection of viral dsRNA species by cellular proteins Because these capsids must sustain variable levels of RNA content during viral replication, it is reasonable that these families diverged from the group belonging to the linear fit.
Among the ssRNA outliers, we found Caliciviridae , the 3 families of picornavirales present in the dataset Dicistroviridae , Secoviridae , and Picornaviridae ; and Tymoviridae. A recent sequence-similarity network analysis of single jelly-roll capsid proteins from RNA viruses revealed two large clusters, one containing most of the ssRNA viruses present in our data set and another formed by picornavirales and Caliciviridae The primary role of these domains is unknown, but they may participate in membrane interaction, as already demonstrated for dicistroviruses CP4 Our data reinforce the structural similarities between these two groups and suggest a common yet unknown mechanism for genome stabilization and assembly.
Parvoviruses present 3 variations of the cap gene product, all having an overlapping amino acid sequence with similar C-termini. The most charged segment is a short Lys-enriched region unique to VP1. Because this CP variant is the least abundant, our charge calculation is probably overestimated. This stable and structured contact between the genome and the protein shell may represent an alternative strategy to the long super-charged R-arms that are observed in circovirus and anellovirus 6 , Both are thought to escort and direct the genome towards the interior of a pre-formed capsid in a packaging process that is coupled to ssDNA synthesis 1 , It should be noted that there is a considerable degree of uncertainty regarding the stoichiometry of these small peptides, especially for H protein 1 , 27 , which can explain the observed wide variation in the Total Q max30res for similar genomes sizes Fig.
Moreover, we do not rule out that these R-arms assist capsid assembly in other ways besides or alternative to the genome charge neutralization Moreover, viral proteins are notably versatile and more than one functional pressure might be shaping the final composition and charge of R-arms.
For example, positively charged domains of polyomavirus are known to drive endosome scape 20 ; and R-rich segments of Tetraviridae function as lytic peptides The protein database Swiss-Prot at Uniprot. Protein function, taxonomic, and structural information were retrieved from Uniprot. Reference sequences were used when available. UniRef advanced search options were used to retrieve datasets according to organism or protein function.
Advanced search options for H. Viral capsid proteins were separated in helical or non-helical regions by helical viral capsid Gene Ontology ID GO We used a program that screens the primary sequence of a given protein and calculates the net charge in consecutive frames of a predetermined number of amino acids 10, 30, or 60 were used.
The N and C termini charges were disregarded. In a previous publication, we have shown that these simplified parameters generated similar results to a calculation using partial charges of individual amino acids at pH 7. Our algorithm uses, as input, a fasta file containing the amino acid sequence of multiple proteins see Data sources section.
The algorithm initially establishes a stretch containing a predetermined number of amino acids 1 to 30, for example. The stretch charge is calculated, and the charge value and the position of the first amino acid are temporarily saved to the memory. Then, our algorithm advances to the next stretch, from amino acid 2 to amino acid 31, and performs the same analysis. The algorithm continues advancing through the protein until it reaches the stretch between the amino acids N and N, where N is the total amount of amino acids in that protein.
If its charge is higher than the charge previously saved in the memory, the current values of the charge and the position are replaced there.
A second algorithm was developed to compare the net charge of stretches with different sizes.
0コメント