
Among these, N-linked glycosylation is the most abundant type. Based on the nature of the chemical linkage between the specific acceptor amino acid and the glycan, glycosylation can be classified into four major categories: N-linked and O-linked glycosylation, C-mannosylation, and glycosylphosphatidylinositol (GPI) anchors 6. More than 50% of all polypeptides are covalently modified by the addition of structurally diverse oligosaccharides to specific functional groups 5. In marketed therapeutic proteins, more than one-third of approved biopharmaceuticals belong to glycoproteins 4. Protein glycosylation is a highly complex post-translational modification (PTM) that influences a variety of biological processes as protein folding, signaling, trafficking, cell-cell interactions, and immune response 1, 2, 3. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. N-GlyDE’s final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. Several predictors have been made available and report high performance. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions.
