Specify the proteins to query.
This field is mandatory. The formats accepted are[*]_:
Provide a list of database accessions, one per line.
GUI: See examples E1 and E5. Example E3 also uses this input format, but then goes on to focus the peptide search.
Provide FASTA content with protein accessions and sequences.
GUI: See example E2.
Sometimes we want to evaluate specific peptides without being concerned about other peptides in the protein. In this case a slash (/) can be used to denote the splits in the protein. Small (l=3) peptides should be included to correctly identify the flanking sequences for each peptide.
GUI: See example E4.
Protein accessions and sequences can also be provided in tabular format, e.g. CSV or TSV.
This is a JSON format internal to the software. You won't generally type it manually, but might see it if you export and recover a previous model.
[*] | This list gives the default values. GUI: The exact protein formats supported depend on the providers enabled in the advanced section below.
|
Specify the peptides to use. Peptides in the following formats are generally accepted [*].
Peptide list:
(PEPTIDE 1) (PEPTIDE 2) ...FASTA-like peptide list per protein:
>(PROTEIN 1) (PEPTIDE 1) (PEPTIDE 2) ...Network edge-list:
(PROTEIN 1):(PEPTIDE 1) (PROTEIN 1):(PEPTIDE 2) ...
Alacat-UID (obtained from a previous model) are also accepted.
If you leave this field blank the system will suggest the peptides for you.
[*] | This list gives the default values. GUI: The exact protein formats supported depend on the providers enabled in the advanced section below.
|
Specify the qbrick sets (qblocks) to use. Qbricks are generally provided as an Alacat-UID (obtained from a previous model). The exact formats supported depend on the handlers selected. If you leave this field blank the system will suggest the qblocks for you.
Specify the lists of qconcat (qmenus) to favour. Qmenus are generally provided as an Alacat-UID (obtained from a previous model). The exact formats supported depend on the handlers selected. If you leave this field blank the system will suggest the qmenus for you.
Model title, for your reference only. Can be left blank.
GUI: If the model already exists on the system, this field will be ignored and the existing title will be used.
Minimum number of qbricks per protein.
Specify the digestion you are using. This will almost always be set to TRYPSIN.
Description of options:
Digestions that the pipeline is capable of simulating.
Warning
Most of the digestion methods are included to support OpenMs's digester, which is a fast regular expression based digester. However, digestion and analysis are two different stages of the pipeline.
While we are easily able to simulate non-tryptic digests, the ability of many tools and databases to actually provide insight into such peptides is questionable. Proceed with caution when using these.
Regular expressions and descriptions are provided from OpenMs.
Preprogrammed: Preprogrammed. Indicates a digestion method other than those listed in this enumeration. Typically this means the user has specified the peptides verbatim, or provided a custom
Handler
capable of simulating a unique digestion. Only for advanced API use, not useful from GUI.Inherit model: Inherit model. Inherits the digestion of the parent model. This is only valid for
Handler
s and does not make sense to use on the model itself. Only for advanced API use, not useful from GUI.Custom digestion, with known flanks: Custom digestion, with known flanks. Indicates the input protein sequences are the peptides.
- Separate multiple peptides assigned to a single protein by a slash ('/') in the sequence.
- Peptides between 0 and 3 amino acids will not be included but should be included to specify the flanks.
- You do not need to include all peptides for the protein(s) - only those you wish to be analysed.
Important
Flanking sequences must be specified! If you do not know the flanks, use the "Custom digestion, with unknown flanks" option instead.
Example: Given peptides A, B and C, where A and B are adjacent and C is at the end of the protein, the sequence may read:
xxx/AAAAA/BBBBB/yyy/zzz/CCCCCWhere x, y and z are the flanks for A's N-terminus, B's C-terminus and C's N-terminus respectivly. Note that all flanking sequences could (redundandly) be specified if this is easier.
Custom digestion, with unknown flanks: Custom digestion, with unknown flanks.
This is identical to "Custom digestion, with known flanks", above, but indicates that the peptide linkers are unknown.
Dummy "AAA" linkers will be assumed.
Example: Given peptides A, B and C, where their positions and flanking sequences are unknown, the sequence may read:
AAAAA/BBBBB/CCCCCTrypChymo: TrypChymo CUT: TrypChymo cuts after F, Y, W, L(or J), K or R if not followed by P. RGX: '(?<=[FYWLJKRX])(?!P)'
Asp-N_ambic: Asp-N_ambic CUT: Asp-N Ammonium bicarbonate cleaves before D(or B) or E(or Z). RGX: '(?=[DBEZX])'
Lys-C/P: Lys-C/P CUT: Lys-C/P cuts after K. RGX: '(?<=[KX])'
V8_DE: V8-DE CUT: V8-DE cuts after D(or B) or E(or Z) if not followed by P. RGX: '(?<=[DBEZX])(?!P)'
V8_E: V8-E CUT: V8-E cuts after E(or Z) if not followed by P. RGX: '(?<=[EZX])(?!P)'
Formic acid: Formic acid CUT: Formic_acid cuts after D(or B) and next residue is D (or B). RGX: '((?<=[DBX]))|((?=[DBX]))'
Chymotrypsin/P: Chymotrypsin/P CUT: Chymotrypsin cleaves following F, Y, W or L(or J) residue. RGX: '(?<=[FYWLJX])'
Lys-C: Lys-C AKA: lys_c CUT: Lys-C cuts after K if not followed by P. RGX: '(?<=[KX])(?!P)'
PepsinA: PepsinA CUT: PepsinA cuts after F or L(or J). RGX: '(?<=[FLJX])'
Trypsin/P: Trypsin/P CUT: Trypsin/P cuts after K or R. RGX: '(?<=[KRX])'
Arg-C: Arg-C AKA: arg_c; Clostripain; argc CUT: Arg-C cleaves following R residue unless the next residue is P. RGX: '(?<=[RX])(?!P)'
Trypsin: Trypsin CUT: Trypsin cleaves following a K or R residue unless the next residue is P. RGX: '(?<=[KRX])(?!P)'
Chymotrypsin: Chymotrypsin CUT: Chymotrypsin cleaves following F, Y, W or L(or J) residue unless the next residue is P. RGX: '(?<=[FYWLJX])(?!P)'
Asp-N: Asp-N AKA: asp_n CUT: Asp-N cleaves before D(or B). RGX: '(?=[DBX])'
CNBr: CNBr CUT: CNBr cleaves following M. RGX: '(?<=[MX])'
Arg-C/P: Arg-C/P CUT: Arg-C/P cleaves after R residues. RGX: '(?<=[RX])'``
Asp-N/B: Asp-N/B CUT: Asp-N/B cleaves before D(while B is ignored). RGX: '(?=[DX])'
Lys-N: Lys-N AKA: lys_n CUT: Lys-N cuts before K. RGX: '(?=[KX])'
leukocyte elastase: leukocyte elastase CUT: leukocyte elastase cuts after A or L or I(or J) or V if not followed by P. RGX: '(?<=[ALIJVX])(?!P)'
cyanogen-bromide: cyanogen-bromide CUT: cyanogen-bromide cuts after M. RGX: '(?<=[MX])'
iodosobenzoate: iodosobenzoate CUT: ? RGX: '(?<=W)'
staphylococcal protease/D: staphylococcal protease/D AKA: staphylococcal protease/D; Glu-C/D CUT: staphylococcal protease/D cuts after E(or Z). RGX: '(?<=[EZX])'
PepsinA + P: PepsinA + P CUT: PepsinA + P cuts after F or L(or J) unless followed by P. RGX: '(?<=[FLJX])(?!P)'
proline endopeptidase: proline endopeptidase CUT: proline endopeptidase cuts after HP, KP or RP if not followed by P. RGX: '(?<=[HKRX][PX])(?!P)'
Clostripain/P: Clostripain/P CUT: Clostripain/P cuts after R. RGX: '(?<=[RX])'
elastase-trypsin-chymotrypsin: elastase-trypsin-chymotrypsin CUT: elastase-trypsin-chymotrypsin cuts after A,L,I(or J),V,K,R,W,F,Y unless followed by P. RGX: '(?<=[ALIVKRWFYX])(?!P)'
Alpha-lytic protease: Alpha-lytic protease CUT: Alpha-lytic protease (aLP) cuts after T, A, S, or V. RGX: '(?<=[TASVX])'
2-iodobenzoate: 2-iodobenzoate CUT: 2-iodobenzoate cuts after W. RGX: '(?<=[WX])'
proline-endopeptidase/HKR: proline-endopeptidase/HKR CUT: proline-endopeptidase/HKR cuts after P. RGX: '(?<=[PX])'
glutamyl endopeptidase: glutamyl endopeptidase AKA: Glu-C; glu_c; staphylococcal protease CUT: glutamyl endopeptidase cuts after D(or B) or E(or Z). RGX: '(?<=[DBEZX])'
Glu-C+P: Glu-C+P AKA: staphylococcal protease+P; Glu-C+P CUT: Glu-C+P cuts after D(or B) or E(or Z) unless followed by P. RGX: '(?<=[DBEZX])(?!P)'
Specify one or more organisms to run your query against. You should specify organisms by their NCBI taxonomy ID, or their scientific name. If you leave this field blank the field will be completed automatically based on the protein sequences.
GUI: Use a comma to delimit multiple organisms.
In mandated mode all of the peptides you specify will be used. This may result in multiple qbricks per protein. In non-mandated mode if you specify more peptides than necessary those that the system considers least quantotypic will be dropped. If you specify fewer peptides than in a qbrick then, regardless of this selection, the system will always supplement your selection with those from the pool of remaining peptides that it considers most quantotypic.
The same logic is applied to qblock and qmenu selection.
SQL backing mode. Turning backing on means you might not get the latest version of the scores, but operation will be considerably faster. Old scores can be purged from the database manually using their date column, see the alacat.utilities.sql_backing
module for details.
Description of options:
No database: No database. The database is not used. Read-only: Read-only. When set, scores are restored from the database. Use this to use, but not change, the database. Write-only: Write-only. New scores are stored in the database. Use this to update scores with new ones. Read-write: Read-write. Use this to retrieve scores from the database where possible, and to remember newly acquired scores in the database.