Our previous metagenomic study16 identified two unrelated human microbiome samples that each contained an abundant crAss-like phage. The adult sample contained a 191 kilobase crAss-like phage genome with the potential to circularize, and the infant sample had a 94 kilobase crAss-like phage genome, which was curated to completion (Supplementary Fig. 1). These samples were prioritized for metaproteomic measurements to address two key questions: (1) can proteins of phages be detected in the presence of highly abundant bacterial, human, and dietary proteins, and (2) can phage proteins be detected that confirm the expression of alternative genetic code 15?
To answer these questions, paired metagenomic and metaproteomic measurements were conducted on fecal samples containing abundant crAss-like phages from one infant and one adult. Metagenomic data indicated these phages are predicted to use genetic code 15, based on the increased coding density observed with translations using genetic code 15 relative to genetic code 11. To ensure accurate peptide identifications from the metaproteomes, assembled metagenomic data from the same samples were used to generate databases that included phage proteins that were predicted using either the standard genetic code 11 (TAG→stop) or alternative genetic code 15 (TAG→Q), as well as all other bacterial proteins in the sample, the human reference proteome, and proteins commonly found as contaminants.
Phages contribute a relatively small proportion of proteinaceous biomass in fecal samples, making detecting their proteins by shotgun proteomics particularly challenging. In fact, initial measurements of the fecal samples detected no phage proteins. Thus, a combination of centrifugation and filtration-based enrichment techniques was employed to enrich phage particles and their proteins irrespective of the phage’s physical size. Fecal phage enrichment strategies for proteomics typically separate phage particles from other microbial biomass in the sample with a 0.2 μm filter, under the assumption that phage particles will be smaller in size than bacterial cells17. Previous work has shown alternatively coding phages have genome sizes, and presumably corresponding physical sizes, that range from very small to very large5. Thus, we developed a workflow to first separate phage particles, regardless of size, from bacterial cells in the sample using a low-speed centrifugation step (Supplementary Fig. 2). The resulting supernatant is then passed through a 0.8 μm filter to further remove non-proteinaceous debris. The eluted material is finally passed over a 300 kDa MWCO filter to capture intact phage particles on top of the filter while passing through highly abundant human proteins from epithelial cells that were lysed during the initial thawing and homogenization steps. In addition, the pellet from the low-speed centrifugation step can be further processed to examine phage proteins that are present inside the host bacterial cells. Overall, this enrichment strategy enables the successful detection of low abundance phage proteins in the presence of highly abundant proteins from the human host and bacteria. The LC-MS/MS data was searched against the comprehensive sample-matched databases that included phage proteins predicted using either code 11 or code 15. Identified peptides were evaluated codon by codon to determine whether translation using standard or alternative genetic code was appropriate. To complement the database search strategy, de novo peptide sequencing, which derives peptide sequence information directly from the MS/MS spectra, was incorporated into the traditional database search workflow to provide a database-independent confirmation of phage translation that is agnostic to the translation code used for gene predictions.
Database searching of the phage-enriched fraction of the samples yielded 173 phage-specific peptides in total, with peptide-level false discovery rates at <1%. These peptides mapped to 16 and 14 phage proteins in the infant and adult samples, respectively. In addition, numerous peptides and proteins from bacteria and humans were identified (Supplementary Data 1, 2, 4, 5). Many of the phage peptides identified by database searching were further supported by de novo sequencing tags. Roughly half of the identified phage peptides in each sample mapped only to proteins predicted using genetic code 15. Figure 1 shows the genome maps of the target phages in each sample, with the locations of predicted and detected proteins using either code 11 or code 15 translation. Some of the proteins identified with code 15 predictions were annotated as structural proteins, including capsid, portal, and tail-associated proteins (Supplementary Data 3 and 6), while the remaining proteins were unannotated. The detection of mostly late infection structural proteins was expected based on the enrichment for viral-like particles employed for sample preparation.
Across all identified phage proteins in these samples, 67% of genetic code 15 proteins could be confidently annotated, while only 34% of standard genetic code 11 proteins could be confidently annotated. As incorrect code prediction leads to genes predicted in incorrect reading frames and truncated gene products, this discrepancy in annotations levels is alarming. It does emphasize the need for correct code usage during gene predictions in order to accurately catalog phage gene inventories, as very few insights on biological function can be elucidated from incorrectly, and poorly, annotated genomes.
Figure 2 shows the protein sequence coverage map from the alternatively coded phage tail fiber protein (L3_063_250G2_scaffold_974_curated_39.code15) identified in the infant fecal sample. The region of the phage genome corresponding to this single protein would have constituted six truncated proteins when predicted using the standard code. However, when using code 15, the full-length alternatively coded protein contained 23 peptides identified through database matching, of which, 11 were exclusively identified using code 15. Four peptides, highlighted in red boxes, directly confirm that the TAG stop codon is reassigned to glutamine. The identification of several de novo sequencing tags provides additional evidence of the existence and expression of recoded stop codons in this alternatively coded protein.
Numerous identified peptides in both the infant and adult fecal samples further substantiate phage reassignment of the TAG stop codon to glutamine. Figure 3 shows two examples of high-quality MS/MS spectra for alternatively coded phage peptides. In both instances, the glutamine residue from the recoded stop codon was positioned in the middle of a tryptic peptide. In the figure, only the direct y-type fragment ion series was chosen for annotation due to their preferential generation in higher-energy C-trap dissociation (HCD) fragmentation during MS/MS measurement18.
The peptide in Fig. 3A contains three glutamine residues; one canonical glutamine and two glutamines from recoded stop codons. One of the recoded glutamines was predicted as a stop codon at the end of a protein predicted through standard code translation. With a nearly complete fragmentation ion series, the detected tryptic peptide shows several amino acids flanking this recoded stop codon, covering an amino acid sequence that would not exist in a standard code open reading frame. In addition, a de novo sequencing tag matching nearly the entire length of the database match had high local confidence scores for every amino acid residue, including the recoded glutamines, providing additional support that this peptide, and others like it, do in fact exist (Supplementary Fig. 3). Figure 3B shows a peptide containing a methionine from a predicted start codon using standard code residing in the middle of the peptide sequence in addition to a glutamine from a recoded stop codon. As several amino acids depicted here map to codons upstream of the standard code methionine start codon, this tryptic peptide would not exist if the phage was using standard code translation.
In order to expand the understanding of when these phages might deploy alternate coding, additional LC-MS/MS measurements were conducted on the unenriched fraction of the fecal samples that primarily contained unlysed bacterial cells and host proteins to determine if any additional phage peptides could be detected for early infection proteins that would be present in the host bacterium at the time of sample processing. Measurement yielded the detection of numerous phage peptides in the infant sample, including peptides for three additional proteins that were not identified in the original phage enriched samples. These three proteins included two hypothetical proteins and one ribosomal protein found in a region of the genome predicted to use genetic code 11. We found no evidence of stop codon recoding (code 15) in any of these new peptides in the sample fraction expected to contain intracellular early infection phage proteins. This supports the current hypothesis that alternatively coded phages employ stop codon recoding to prevent premature expression of structural and lytic phage genes at inappropriate times during the phage infection cycle.
Finally, as genetic code 15 only utilizes ATG as a start codon for translation initiation, confirmation of expression of this start codon was necessary to validate this genetic code. Supplementary Fig. 4 shows an example of direct peptide sequencing of a peptide containing a methionine from the ATG start codon for a genetic code 15 predicted protein, confirming translation initiation at this site in the genome. To confirm that alternative start codons were not being utilized, additional databases were generated to determine if translation was being initiated upstream of the predicted ATG start codon. Searches with databases that extended the protein-coding sequences several amino acids upstream of the predicted start codons yielded no peptide evidence that translation was occurring upstream of the predicted code 15 open reading frame. These examples provide experimental validation that standard genetic code 11 is not being utilized by the phage in the translation of this region of the genome, and instead, genetic code 15 is being used. In total, there is copious expressional evidence of genetic code 15, including direct evidence of stop codon readthrough and peptides existing outside of genetic code 11 predicted open reading frames, in regions of the genome with increased coding density using genetic code 15 predictions compared to code 11 predictions. There is no peptide evidence of TAG stop codon recoding in genome regions predicted to use standard genetic code 11 based on a similar coding density for each of the genetic codes. This peptide evidence supports the assignments of genetic codes based on relative coding densities for these regions of the genome.