Type: Package
Title: Yeast-Proteome Secondary-Structure Calculator
Description: An extension for 'NetSurfP-2.0' (Klausen et al. (2019) <doi:10.1002/prot.25674>) which is specifically designed to analyze the results of bottom-up-proteomics that is primarily analyzed with 'MaxQuant' (Cox, J., Mann, M. (2008) <doi:10.1038/nbt.1511>). This tool is designed to process a large number of yeast peptides that produced as a results of whole yeast cell-proteome digestion and provide a coherent picture of secondary structure of proteins.
Version: 1.1.0
License: GPL (≥ 3)
Encoding: UTF-8
StagedInstall: yes
RoxygenNote: 7.1.2
Suggests: rmarkdown, testthat (≥ 3.0.0)
Imports: spelling, dplyr, readxl, stringr, eulerr, Peptides, utils, svDialogs, tcltk
Depends: R (≥ 2.10)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2021-11-27 21:48:29 UTC; Shashank
Author: Sajad Tasharofi [aut, cph], Shashank Kumbhare [aut, cre, cph], Bent Petersen [aut], Morteza Khaledi [aut], Amir Shahmoradi [aut]
Maintainer: Shashank Kumbhare <shashank.kumbhare@mavs.uta.edu>
Repository: CRAN
Date/Publication: 2021-11-27 22:10:03 UTC

ypssc: Yeast-Proteome Secondary-Structure Calculator

Description

An extension for 'NetSurfP-2.0' (Klausen et al. (2019) <doi:10.1002/prot.25674>) which is specifically designed to analyze the results of bottom-up-proteomics that is primarily analyzed with 'MaxQuant' (Cox, J., Mann, M. (2008) <doi:10.1038/nbt.1511>). This tool is designed to process a large number of yeast peptides that produced as a results of whole yeast cell-proteome digestion and provide a coherent picture of secondary structure of proteins.

Details

What is ypssc?

ypssc is an extension for NetSurfP-2.0 which is specifically designed to analyze the results of bottom-up proteomics that is primarily analyzed with MaxQuant. We call this tool Yeast Proteome Secondary Structure Calculator (ypssc).


Functionalities in ypssc:

  1. findSecondary

  2. findAlpha

  3. findBeta

  4. findChain

    (Click the above links to find out more about these functionalities and their usage.)


Note: NetSurfP

  • NetSurfP-1.0 is a prediction tool for secondary structures using neural network.

  • NetSurfP-2.0 is an extension of NetSurfP-1.0 which utilized deep neural network to predict secondary structures with the accuracy of 85%. In addition to accuracy, this tool presents reduced computational time compared to other methods.

  • NetSurfP-2.0 is designed to be user friendly and efficient in calculation time of large number of sequences. In addition to that the output of the calculation is available in many formats that would make further data analysis even easier.

  • NetSurfP-2.0 is available as a web-sever (http://www.cbs.dtu.dk/services/NetSurfP-2.0/) which can accept up to 4000 sequences at a time.


Why this package?

This tool is designed to process large number of yeast peptides that produced as a results of whole yeast cell proteome digestion and provide a coherent picture of secondary structure of proteins. NetSurfP-2.0 is not designed to do this task.


Drawbacks of NetSurfP-2.0

  • First, NetSurfP-2.0 is not designed to accept as many peptides at once, therefore the process of uploading the sequences and waiting for the calculations to be complete is extremely time consuming.

  • Second, even if all sequences uploaded successfully and the results are back, it would be almost impossible to combine the results that have been produced for each individual peptide (hundreds of thousands of spread sheets) to get a coherent picture of the secondary structure of the proteins.


Advantages of ypssc

  • ypssc, on one hand benefits forms the accuracy of NetSurfP-2.0 to calculate secondary structure and on the other hand address the issue of analyzing so many peptides with NetSurfP-2.0 by eliminating the need for direct analysis of the peptides from bottom-up proteomics.

  • Instead of direct analysis of peptides by NetSurfP-2.0 which raises the problem of combining the results of peptides to proteins, the whole yeast proteome has been analyzed once by NetSurfP-2.0 and kept as Secondary Structure Database for Yeast Proteome (SSDYP). Then the peptides form the experiment are matched and compared to this database to extract secondary structure of the peptides.


Methodology

The SSDYP contains structural information for all amino acids of whole yeast proteome (Over 3000,000 amino acids) which contains over 6700 proteins. For a hypothetical protein, the SSDYP contains the ID of the protein, amino acids with numbers and structural information for each amino acid. Focusing on the hypothetical protein, in the real sample, there are many peptides identified from the hypothetical protein. ypssc first finds all the peptides that belongs to the hypothetical protein and arrange them based on the numbers of the amino acids; then it removes the parts of the protein that have been identified more than once in multiple peptides and collapses the population of identified peptides in the sample into one sequence that represents the coverage of the hypothetical protein. The result would show that which part of the protein is identified, and which part is missing. Then, ypssc matches the the sequence that identified in the sample with SSDYP to find the structural information about amino acids.


Author(s)

Maintainer: Shashank Kumbhare shashank.kumbhare@mavs.uta.edu [copyright holder]

Authors:

See Also

findSecondary, findAlpha, findBeta, findChain


Alpha Helix Calculator

Description

Form bottom-up proteomics data of proteins (peptides), this function determines the sections of proteins (in percentage) with alpha-helix, structure.

Usage

findAlpha(pathFileInput = NULL, pathDirOutput = NULL, ...)

Arguments

pathFileInput

Path of the input csv file generated from MaxQuant.

MaxQuant is a quantitative proteomics software designed to analyze large mass-spectrometric data. The input of MaxQuant is a raw file (.raw) from high-resolution mass spectrometers. After analysis of the raw file in MaxQuant, the program generates a folder named “combined”.

In this folder there is another folder named “txt” which contains many files with text format (.txt). One of the files called “peptides” which is the input of the ypssc to calculate secondary structures. ypssc has been designed such a way that can analyzed and extract information regarding the sample regardless of the name that user chosen for the sample.

pathDirOutput

Path of the directory to which the output files will be generated.

...

(for developer use only)

Value

The output of the program is a csv file (.csv) that contains 5 columns, and the number of rows depends on the number of proteins in the sample.

First column contains the ID of the identified alpha-helix proteins in the sample, second column contains the number of identified amino acids from the corresponding protein, third column contains number of identified amino acids with alpha-helix structure, fourth column contains the number of amino acids that the protein originally has in the SSDYP, and fifth column contains the number of amino acids with alpha-helix structure that the protein originally has in the SSDYP.

These columns should provide all information that the user needs to know about the protein and its structural information as well as structural information about the parts of the protein that has been identified in the sample.

In addition, it also generates 4 more '.csv' files.

  1. The no. of proteins found in the sample.

  2. The no. of peptides found in the sample.

  3. The no. of amino acids for each protein in database.

  4. It is the input file from MaxQuant that's been cleaned up for the sole purpose of calculating secondary structures.

See Also

findSecondary, findBeta, findChain

Examples

## Not run: 
findAlpha( pathFileInput = "some/path/to/inputFile.csv",
           pathDirOutput = "some/path/to/outputDir/" )

findAlpha()

## End(Not run)

Beta Sheet Calculator

Description

Form bottom-up proteomics data of proteins (peptides), this function determines the sections of proteins (in percentage) with beta-sheet, structure.

Usage

findBeta(pathFileInput = NULL, pathDirOutput = NULL, ...)

Arguments

pathFileInput

Path of the input csv file generated from MaxQuant.

MaxQuant is a quantitative proteomics software designed to analyze large mass-spectrometric data. The input of MaxQuant is a raw file (.raw) from high-resolution mass spectrometers. After analysis of the raw file in MaxQuant, the program generates a folder named “combined”.

In this folder there is another folder named “txt” which contains many files with text format (.txt). One of the files called “peptides” which is the input of the ypssc to calculate secondary structures. ypssc has been designed such a way that can analyzed and extract information regarding the sample regardless of the name that user chosen for the sample.

pathDirOutput

Path of the directory to which the output files will be generated.

...

(for developer use only)

Value

The output of the program is a csv file (.csv) that contains 5 columns, and the number of rows depends on the number of proteins in the sample.

First column contains the ID of the identified alpha-helix proteins in the sample, second column contains the number of identified amino acids from the corresponding protein, third column contains number of identified amino acids with secondary structure, fourth column contains the number of amino acids that the protein originally has in the SSDYP, and fifth column contains the number of amino acids with beta-sheet that the protein originally has in the SSDYP.

These columns should provide all information that the user needs to know about the protein and its structural information as well as structural information about the parts of the protein that has been identified in the sample.

In addition, it also generates 4 more '.csv' files.

  1. The no. of proteins found in the sample.

  2. The no. of peptides found in the sample.

  3. The no. of amino acids for each protein in database.

  4. It is the input file from MaxQuant that's been cleaned up for the sole purpose of calculating secondary structures.

See Also

findSecondary, findAlpha, findChain

Examples

## Not run: 
findBeta( pathFileInput = "some/path/to/inputFile.csv",
          pathDirOutput = "some/path/to/outputDir/" )

findBeta()

## End(Not run)

Chain Calculator

Description

Form bottom-up proteomics data of proteins (peptides), this function determines the sections of proteins (in percentage) with primary, structure.

Usage

findChain(pathFileInput = NULL, pathDirOutput = NULL, ...)

Arguments

pathFileInput

Path of the input csv file generated from MaxQuant.

MaxQuant is a quantitative proteomics software designed to analyze large mass-spectrometric data. The input of MaxQuant is a raw file (.raw) from high-resolution mass spectrometers. After analysis of the raw file in MaxQuant, the program generates a folder named “combined”.

In this folder there is another folder named “txt” which contains many files with text format (.txt). One of the files called “peptides” which is the input of the ypssc to calculate secondary structures. ypssc has been designed such a way that can analyzed and extract information regarding the sample regardless of the name that user chosen for the sample.

pathDirOutput

Path of the directory to which the output files will be generated.

...

(for developer use only)

Value

The output of the program is a csv file (.csv) that contains 5 columns, and the number of rows depends on the number of proteins in the sample.

First column contains the ID of the identified alpha-helix proteins in the sample, second column contains the number of identified amino acids from the corresponding protein, third column contains number of identified amino acids with secondary structure, fourth column contains the number of amino acids that the protein originally has in the SSDYP, and fifth column contains the number of amino acids in chain structure that the protein originally has in the SSDYP.

These columns should provide all information that the user needs to know about the protein and its structural information as well as structural information about the parts of the protein that has been identified in the sample.

In addition, it also generates 4 more '.csv' files.

  1. The no. of proteins found in the sample.

  2. The no. of peptides found in the sample.

  3. The no. of amino acids for each protein in database.

  4. It is the input file from MaxQuant that's been cleaned up for the sole purpose of calculating secondary structures.

See Also

findSecondary, findAlpha, findBeta

Examples

## Not run: 
findChain( pathFileInput = "some/path/to/inputFile.csv",
           pathDirOutput = "some/path/to/outputDir/" )

findChain()

## End(Not run)

Secondary Structure Calculator

Description

Form bottom-up proteomics data of proteins (peptides), this function determines the sections of proteins (in percentage) with secondary structure like alpha-helix, beta sheet; also determines the parts that has primary structure.

Usage

findSecondary(pathFileInput = NULL, pathDirOutput = NULL, ...)

Arguments

pathFileInput

Path of the input csv file generated from MaxQuant.

MaxQuant is a quantitative proteomics software designed to analyze large mass-spectrometric data. The input of MaxQuant is a raw file (.raw) from high-resolution mass spectrometers. After analysis of the raw file in MaxQuant, the program generates a folder named “combined”.

In this folder there is another folder named “txt” which contains many files with text format (.txt). One of the files called “peptides” which is the input of the ypssc to calculate secondary structures. ypssc has been designed such a way that can analyzed and extract information regarding the sample regardless of the name that user chosen for the sample.

pathDirOutput

Path of the directory to which the output files will be generated.

...

(for developer use only)

Value

The output of the program is a csv file (.csv) that contains 5 columns, and the number of rows depends on the number of proteins in the sample.

First column contains the ID of the identified alpha-helix proteins in the sample, second column contains the number of identified amino acids from the corresponding protein, third column contains number of identified amino acids with secondary structure, fourth column contains the number of amino acids that the protein originally has in the SSDYP, and fifth column contains the number of amino acids with secondary structure that the protein originally has in the SSDYP.

These columns should provide all information that the user needs to know about the protein and its structural information as well as structural information about the parts of the protein that has been identified in the sample.

In addition, it also generates 4 more '.csv' files.

  1. The no. of proteins found in the sample.

  2. The no. of peptides found in the sample.

  3. The no. of amino acids for each protein in database.

  4. It is the input file from MaxQuant that's been cleaned up for the sole purpose of calculating secondary structures.

See Also

findAlpha, findBeta, findChain

Examples

## Not run: 
findSecondary( pathFileInput = "some/path/to/inputFile.csv",
               pathDirOutput = "some/path/to/outputDir/" )

findSecondary()

## End(Not run)