HGVS Protein Nomenclature — Silent Changes, Substitutions, Deletions, Duplications, Insertions, Indels, Extensions (Page 1171)

  • the protein reference sequence should represent the primary translation product, not a processed mature protein, and thus include signal peptide sequences (see FAQ)
  • amino acids originating from changes introducing upstream translation initiation are numbered like nucleotides; …, Gln-2, Thr-1
  • amino acids originating from changes resulting in translation of intronic sequences are numbered like nucleotides; Val4+1, Ser4+2, …, Phe5-2, Gln5-1
  • amino acids originating from no-stop changes causing translation downstream of the translation termination codon are numbered like nucleotides; Gln*1, Ser*2, …

Silent changes

Description of so called “silent” changes can be described using p.(Leu54=) (see SVD-WG001). The format p.(Leu54Leu) (or p.(L54L)) should not be used. These descriptions can only be given in addition to a description at DNA level (see Discussion).

Substitutions

Substitutions (missense changes) replace one amino acid by one other amino acid and are described using the format p.Trp26Cys. The description does not use the ”>“-character used on DNA- and RNA level (indicating “changes to”).

  • missense variant
    p.Trp26Cys denotes that amino acid Tryptophan-26 (Trp, W) is changed to a Cysteine (Cys)
  • start codon (initiating methionine change - Met1) (see Discussion, see Examples)
    a change affecting the translation initiation codon (Met-1) is, depending on its consequence, either
    • a change which results in no protein being produced (p.0)
      Met1? - denotes that amino acid Methionine-1 (translation initiation site) is changed and that it is unclear what the consequences of the change are
    • an N-terminal deletion (p.Phe2 Met46del, i.e. activating downstream translation initiation)
      NOTE: up to August 2015 the example given was p.Met1 Lys45del which is not correct, the 3’ rule should be applied
    • an extension (p.Met1ValextMet-12, activating upstream translation initiation)
  • nonsense variant
    is a special type of amino acid deletion introducing an immediate translation stop codon and is described like an amino acid substitution (p.Trp261er or p.Trp26*)
    NOTE: the description does not include the deletion at protein level of the entire C-terminal amino acid sequence like p.Trp26 Leu833del
  • no-stop change (Ter) (change in stop codon, Ter/*)
    a change affecting the translation termination codon (Ter, *) is described as an extension (p.Ter110Glnext1er1/ or p.*110Glnext*1/)

Deletions

Deletions remove one or more amino acid residues from the protein and are described using “del” after an indication of the first and last amino acid(s) deleted separated by a ”_” (underscore). Deletions remove either a small internal segment of the protein (in-frame deletion), part of the N-terminus of the protein (initiation codon change) or the entire C-terminal part of the protein (nonsense change). A nonsense change is a special type of deletion removing the entire C-terminal part of a protein starting at the site of the variant (specified 2013-03-16).

  • in-frame deletions - are described using “del” after an indication of the first and last amino acid(s) deleted separated, by a ”_” (underscore).
    • p.Gln8del in the sequence MKMGHQQQCC denotes a Glutamine-8 (Gln, Q) deletion to MKMGHQCC
    • p.(Cys28 Met30del) denotes RNA nor protein was analysed but the predicted change is a deletion of three amino acids Cysteine-28 to Methionine-30
  • initiating methionine change (Met1) causing a N-terminal deletion (see Discussion, see Examples)
    NOTE: changes extending the N-terminal protein sequence are described as an extension
    • p.0 - no protein is produced (experimental data should be available)
      NOTE: this change is not described as p.Met1 Leu833del, i.e. as a deletion removing the entire protein coding sequence
    • p.Met1? - denotes that amino acid Methionine-1 (translation initiation site) is changed and that it is unclear what the consequences of this change are
    • p.Met1 Lys45del - a new translation initiation site is activated (at Met46)
  • nonsense variant - is a special type of amino acid deletion removing the entire C-terminal part of a protein starting at the site of the variant. A nonsense change is described as a substitution, using the format p.Trp26Ter (alternatively p.Trp26*). The description does not include the deletion at protein level from the site of the change to the C-terminal end of the change (stop codon) like p.Trp26 Leu833del (the deletion of amino acid residue Trp26 to the last amino acid of the protein Leu833).
    • p.(Trp261er) indicates RNA nor protein was analysed but amino acid Tryptophan26 (Trp, W) is predicted to change to a translation termination codon (Ter) (alternatively p.(W26*) or p.(Trp26*))

NOTE: for all descriptions the most C-terminal position possible is arbitrarily assigned to have been changed

Duplications

Duplications are described using “dup” after an indication of the first and last amino acid(s) duplicated separated by a ”_” (underscore). In-frame duplications containing a translation stop codon in the duplicated sequence are described as an insertion of a nonsense variant, not as a deletion-insertion removing the entire C-terminal amino acid sequence.

  • p.Gly4 Gln6dup in the sequence MKMGHGHQQCC denotes a duplication of amino acids Glycine-4 (Gly, G) to Glutamine-6 (i.e. MKMGHGHQQCC)
  • duplicating insertions in single amino acid stretches (or short tandem repeats) are described as a duplication, e.g. a duplicating insertion in the HQ-tandem repeat sequence of MKMGHQHQCC to MKMGHQHQHQCC is described as p.His7 Gln8dup not p.Gln8 Cys9insHisGln).

NOTE: for all descriptions the most C-terminal position possible is arbitrarily assigned to have been changed

Insertions

Insertions add one or more amino acid residues between two existing amino acids and this insertion is not a copy of a sequence immediately 5’-flanking (see Duplication). Insertions are described using “ins” after an indication of the amino acids flanking the insertion site, followed by a ”_” (underscore) and followed by a description of the amino acid(s) inserted. In-frame insertions containing a translation stop codon are described as an insertion of a nonsense variant, not as a deletion-insertion removing the entire C-terminal amino acid sequence. Since for large insertions the amino acids can be derived from the DNA and/or RNA descriptions they need not to be exactly but the total number may be given (like “ins17”).

  • in frame
    • p.Lys2 Met3insGlnSerLys denotes that the sequence GlnSerLys (QSK) was inserted between amino acids Lysine-2 and Methionine-3 (Met, M), changing MKMGHQQCC to MKQSKMGHQQCC
    • p.(Pro2 Ile3insGly1er) is the predicted consequence of the insertion c.6 7insGGGTAG (coding reference sequence NM 000059.3)
      NOTE: this is not described as p.(Ile3 Ile3418delinsGly), a deletion-insertion removing the entire protein coding sequence that inserts 17 amino acids between amino acids Trp182 and Gln183
    • p.Trp182 Gln183ins17 describes a variant that inserts 17 inserted amino acids from the description given at DNA or RNA level
      NOTE: it must not be possible to deduce the 17 inserted amino acids from the description given at DNA or RNA level
  • duplicating insertions should be described as duplications (see Discussion), not as insertion.

Variability of short sequence repeats

Variability of short sequence repeats are described as p.Gln6(3 6); the description indicates that a stretch of Glutamines (Gln, Q) starting at amino acid position 6 (e.g. in MKMGHQQQCC), which is found with a variable length from 3 to 6 in the population (the underscore is used to indicate the range (3 to 6 times).

Deletion/insertions (indels)

Deletion/insertions (indels) replace one or more amino acid residues with one or more other amino acid residues. Deletion/insertions are described using “delins” as a deletion followed by an insertion after an indication of the amino acid(s) deleted separated by ”_” (underscore, see Discussion). Frame shifts are a special type of amino acid deletion/insertion affecting an amino acid between an initiation codon (initiation, ATG) and last codon (termination, stop), replacing the normal C-terminal sequence with one encoded by another reading frame (specified 2013-10-11). A frame shift is described using “fs” after the first amino acid affected by the change. Descriptions exist as short (“fs”) or long (“Ts1er#”) description. The description of frame shifts does not include the deletion at protein level from the site of the amino acid changed, replacing it to the natural end of the protein (stop codon). The inserted amino acid residues are not described, only the total length of the new shifted reading frame is given (i.e. including the first amino acid changed).
NOTE: typing error in den “Dunnen & Antonarakis (2000)“. The suggestion to use ”>” to indicate “delins” in frame shift descriptions has been retracted.
NOTE: when one nucleotide is replaced by one other nucleotide the change is called a substitution

  • in-frame
    • p.(Cys28 Lys29delinsTrp) indicates RNA nor protein was analysed but the predicted change is a 3 bp deletion that protein was analysed but the predicted change is a 3 bp deletion that codons for Cysteine-28 and Lysine-29, substituting them for a codon for Tryptophan
    • p.Cys28delinsTrpVal denotes a 3 bp insertion in the codon for Cysteine-28, generating codons for Tryptophan (Trp, W) and Valine (Val, V)
    • p.(Pro578 Lys579delinsLeu1er) is a deletion-insertion variant resulting from the change c.1732 1734del is p.(Pro578 Gln598del). Note that although the proteins resulting from the change are identical, their HGVS description is different.
      NOTE: these example derive from the SLC34A3 gene (NM 080877.2)
  • frame shifts
    are described using the format p.Arg9/Glyfs*26 (alternatively p.Arg9/GlyfsTer26, or short p.Arg9/fs) where Arg9/Gly denotes change of the first amino acid affected (Arg97 replaced by a Pro residue), “fs” indicating the frame shift and *16 giving the translation termination codon (stop codon) in the new reading frame.
    NOTE: the description does not include a description of the deletion from the site of the change to the C-terminal end of the (stop codon) like p.Arg9/ Leu833delinsGlyfsTer26) nor a specific description of the inserted amino acid residues.
    NOTE: the shifted reading frame includes the first new amino acid (Gly) and encounters a translation termination codon (Ter26 or *26). The shifted reading frame is thus open for ‘Ter26-1’ amino acids.
    • short description - uses “fs” only, e.g. p.Arg9/fs
    • long description - uses “fsTer#” (alternatively “fs*#”) (see Discussion)
      • includes the change occurring at the site of the frame shift, e.g. p.Arg9/Gly
      • “fsTer#” (or “fs*#”) indicates at which position in the new reading frame encounters a translation termination stop (Ter# / *#). The position of the stop in the new reading frame is calculated starting from the first amino acid affected by the frame shift, and ending at the first stop codon (fsTer# or fs*#)
    • Examples
      • p.Arg9/ProfsTer23 (alternatively p.Arg9/Profs*23; short p.Arg9/fs) denotes a frame shifting change with the first affected amino acid, replacing it for a Proline and creating a new reading frame ending at a stop at amino acid 1 (counting starts with the Proline as amino acid 1)
      • p.Glu5Valfs*5 describes a frame shifting insertion (do not use p.Glu5Valins2fs*3)
      • p.(Tyr4*) indicates RNA nor protein was analysed but the predicted consequence of the change c.12delC in ATG-GAT-GCA-TAC-GTG-ACG to ATG-GAT-GCA-TA.-GTG-A CG is a Tyr to translation termination codon
      • p.Asp2Metfs*4 (alternatively p.Asp2fs) describes the consequence of the change c.4delG in the sequence A GCA-TAC-GTG-ACG to ATG- .AT-G CA-T AC-G TG-A CG.
      • p.Glu5Valfs*5 (alternatively p.Glu5fs) describes the consequence of the change c.6 13dup in the sequence GCA-TAC-GAG-AT-G CA-G GAG-G AG-AG AG-A G-G GG.
      • date 2012-11-01 p.Ile327/Argfs*? (alternatively p.Ile327/fs) describes the consequences of a frame shifting change with a nucleotide insertion) with Isoleucine-327 as the first affected amino acid, replacing it for an Arginine and creating a new reading frame which does not encounter a new stop codon (see FAQ).
        NOTE: the changes observed should be described on protein level and not try to incorporate any knowledge regarding the changed reading frame which does not encounter a new stop codon (see Recommendation). Thus, p.His150Hisfs*10 is not correct, but p.Gln1511hrfs*9 is.

Extensions

Extensions affect either the first (start, translation initiation, N-terminus, ATG) or last codon (translation termination, stop) and extend the protein sequence N- or C-terminally with one or more amino acids. Extensions are described using “ext” after an indication of the change at the first amino acid affected and followed by a description of the position of the new translation initiation codon.

  • new translation initiation site (see Discussion) date 2012-08-31
    a change affecting the translation initiation codon (Met-1) introducing a new upstream initiation codon extending the N-terminus of the encoded protein described using “ext-#” where ”-#” is the position of the new initiation codon (Met-#)
    • p.Met1ext-5 - a variant in the 5’ UTR activates a new upstream translation initiation site at position 5 (Methionine-5)

Uncertain Spans

locationtranscriptionuncertainty
Insertions in-frame p.(Pro2 Ile3insGly1er)Gly1er reads as printed in the sourceThe trailing token is Gly1er rather than GlyTer; the visible character pattern preserves the printed glyphs.
Substitutions start codon bulletp.Met1ValextMet-12, activating upstream translation initiationThe example reads Met-12 matching the body_full crop; the exact position number could not be re-verified at higher zoom.
Frame shift example p.Glu5Valfs*5 (alternatively p.Glu5fs) descriptionc.6 13dup in the sequence GCA-TAC-GAG-AT-G CA-G GAG-G AG-AG AG-A G-G GG.The duplicated codon list has dot-separators that read ambiguously; the visible token boundaries are preserved as printed.
date 2012-11-01 p.Ile327/Argfs*? blockduplicated sentence “protein was analysed but the predicted change is a 3 bp deletion that codons for Cysteine-28…”The body crop shows the same clause printed twice in the in-frame indel example block; both occurrences are kept verbatim as visible.