Comet releases 2020.01
Documentation for parameters for release 2020.01 can be found here. Download release here.
release 2020.01 rev. 4 (2020.01.4), release date 2021/05/11
- Bug fix: for Percolator .pin output, correctly calculate the dM (mass difference) for all lower hit entries. The top hit’s dM value was being reported for all lower hits for each spectrum query. Issue and fix reported by D. Goldfarb.
- Bug fix: address potential ThreadPool sleep issue. Issue identified and fixed by D. Shteynberg.
release 2020.01 rev. 3 (2020.01.3), release date 2021/03/18
- Revert use of maps (ala HiXcorr) to spectral arrays for spectral preprocessing. The use of maps had a measurable performance hit under Windows that was especially noticeable for real-time search applications.
- For each “variable_mod0X” parameter, extend the 4th field to allow specifying a minimum (in addition to the maximum) number of modified residues in each peptide. Feature request by J. Mohr.
- In the text output, the xcorr rank column (2nd column) will now correctly handle ties in its rank reporting.
- Extend the maximum length of reported protein accession string from 99 to 511 characters. Feature request by L. Liu.
- Real-time search: Comet was inadvertently looping over all PSMs for a spectrum query to retrieve matched protein names. This was unnecessary and caused a performance hit; protein names are now retrieved only for the top scoring peptide(s).
- Bug fix: added a check to see if a directory path was specified as the sequence database. Under linux, if a directory were specified as the database, Comet would simply sit there and not report any error. Thanks to L. Mendoza for reporting this issue.
- Bug fix: report proper flanking residues for a peptide when it is identified in multiple proteins; use the flanking residues from the first protein in the database. Similarly, flanking residues for start methionine clipped sequences were addressed as those were not being handled correctly either.
- Bug fix: for real-time search using Comet’s internal decoy peptides, storing and reporting of duplicate peptides, e.g. a peptide that is present in both target and decoy forms, was not being handled correctly; this has been addressed.
- Bug fix: in the pep.xml output, the “index” attribute of the “spectrum_query” element was not being populated correctly. With each spectrum batch, the index value was being reset to 1. This has been corrected.
- Known bug: per this thread, the mass difference (dM) column of the Percolator .pin output is always reporting the mass difference of the top hit for all lower ranked hits of a spectrum query.
release 2020.01 rev. 2 (2020.01.2), release date 2021/01/05
- Bug fix: Fixed issue where spectra were not being searched. This was due to the poor attempt at a fix in release 2020.01.1 for spectra with all zero intensity peaks.
release 2020.01 rev. 1 (2020.01.1), release date 2020/12/17
- For TIMS-TOF mzML files, changed the scan number reporting to be the scan “index” value plus “1”.
- Bug fix: Fixed issue where spectra that have all peak intensities of zero would cause the program to crash. Issue reported by D. Shteynberg.
- Bug fix: Fixed issue where mzML scans without a precursor charge were not being searched. This issue was limited to mzML and not a problem with mzXML files. Issue reported by D. Shteynberg.
release 2020.01 rev. 0 (2020.01.0), release date 2020/11/09
- Known bug: scans where all peak intensities are zero will cause the program to crash.
- Known bug: scans in mzML files without a precursor charge will not be searched. This issue appears limited to mzML files themselves and is not a problem with mzXML inputs.
- Implemented mzIdentML output via the parameter entry “output_mzidentmlfile”. The mzIdentML format does not fully support the reporting of Comet results so this is considered preliminary support. Issues with the mzIdentML format for Comet: (a) It appears as if the mzIdentML format expects decoy entries to exist in the input FASTA file; at least that seems to be the expectation in the documentation that I’ve read. Comet’s mzIdentML output will report decoy protein references even though they do not exist in the underlying FASTA file. The reported decoy protein, like in the other output formats, is generated by appending the decoy prefix to the protein accession that the decoy peptide was generated from. I have no idea if the mzIdentML format allows for this or not. (b) Within the “FragmentTolerance” element, the requirement of fragment “search tolerance plus value” (MS:1001412) and “search tolerance minus value” (MS:1001413) make no sense in the context of spectral correlation matching used in Comet, SEQUEST and other tools that perform the cross-correlation score. A fragment bin value and bin offset are needed to encapsulate the corresponding fragment settings in Comet. Currently 1/2 of the “fragment_bin_offset” value is reported for the search tolerance plus/minus values but this is a sad hack that should not be required.
- The preliminary score has been modified in a few ways. First, I extended the number of preliminary score ion “bins” considered from 200 to 1000. This allows matching many more fragment ions in the preliminary score algorithm to get a more representative matched ion count reported in the output files. Before this change, the matched fragment numbers are likely under-reported for many spectral matches of longer peptides. I have also gotten rid of the peak picking, smoothing, and “stair-stepping” that used to be applied to the preliminary score spectrum. All of these things contribute to a change in the calculated preliminary scores. This score is still reported because some post-processing tools make use of it but if it were up to me, I would get rid of it entirely.
- Added the parameter entry “use_Z1_ions” to consider “Z• + 1” ions (typically for ETD/ECD searches). Feature requested by A. Grimaud.
- Add support for Comet’s internal decoy peptides in the indexed database search (intended for real-time search application). As decoy peptides don’t need to be explicitly present in the indexed database, this reduces the index database size and indexing time by nearly half compared to indexing a FASTA file composed of target+decoy sequences.
- Add support for reporting multiple matched proteins in an indexed database search. Previously only one protein name was returned for each peptide identification. Now up to 20 matched/duplicate proteins are stored and reported.
- Corrected/changed residue ‘O’ from Ornithine to Pyrrolysine. This entails retiring the “add_O_ornithine” parameter entry and adding “add_O_pyrrolysine”. Thanks to P. Charles for reporting the correction.
- Added support for changing the text file output file extension from its default “txt” extension to a custom file extension via a new parameter entry “text_file_extension”. Text file outputs are generated when the “output_txtfile” parameter is set to “1”. This custom/hidden parameter will need to be manually added to your params file for use (i.e. it is not present in the example params file available for download nor is it written in the comet.params.new file generated by the command “comet -p”). Feature requested by PatternLab for Proteomics.
- Added a parameter entry “explicit_deltacn” which controls how the deltaCn output score is calculated. By default, Comet will very crudely analyze sequence similarity when calculating the deltaCn score. This results in the deltaCn being calculated between the top hit and the first dissimilar peptide in order to avoid very small deltaCn values when the top N peptides are all the same (such as different modified forms of the same peptide). When “explicit_deltacn” is set to “1”, the sequence similarity analysis is not used and the deltaCn is calculated as the difference between the top scoring peptide and the second best scoring peptide. This custom parameter will need to be manually added to your params file for use. Feature requested by PatternLab for Proteomics.
- Added “sp_rank” and “retention_time_sec” columns to the text output. Feature requested by PatternLab for Proteomics.
- Added preliminary support for searching TIMS-TOF mzML files. MSToolkit updates now allow TIMS-TOF mzML files to be searched. Previous parsing of the mzML file would return no spectra to search. I should note that I’m not sure what the returned scan numbers represent. Also, a function to return the file’s last scan number was not updated for these TIMS-TOF files; this causes Comet’s search progress percentage reporting to return nonsensical values. Thanks to D. Shteynberg and M. Hoopmann for the MSToolkit mods to support this.
- Extend the “activation_method”
- Removed reporting of “deltacnstar” in the pepXML output. It appears that the score has been reported as “0.0” for every result and I’ve never understood what it represented so I’m taking this opportunity to get rid of it now. NOTE: PeptideProphet apparently expects to parse this score so stick to Comet 2019.01.5 if you need PeptideProphet compatibility.
- Changed the “No_enzyme” text to “Cut_everywhere” in the enzyme definition of the exported comet.params file. Hopefully this clarifies the purpose of that enzyme definition of cleaving everywhere (as opposed to no digestion which “No_enzyme” could convey). The enzyme text strings in the params file have no function in Comet and can be named anything.
- In the spectral processing, large spectral arrays were replaced by peak vectors as in the HiXcorr implementation of Comet. This makes the spectral processing a bit more efficient, especially for small fragment_bin_tol settings.
- Bug fix: When a protein N-terminal static modification is specified, it was not being applied in the preliminary score routine for N-terminal peptides resulting from a clipped methionine. Thanks to Thermo’s BioPharma Finder group for reporting the bug.
- Bug fix: In the text file output, the position of static C-terminal modifications in the “modifications” column was always incorrectly reported as “1” in the encoded modfication string; this has been corrected. Thanks to Thermo’s BioPharma Finder group for reporting the bug.
- Bug fix: When running multiple searches via interfacing with CometWrapper.dll under Windows, the output file of subsequent searches was not updating. Thus searches were simply overwriting the first output file. This has been addressed with a simple variable initialization. Thanks to Thermo’s BioPharma Finder group for reporting the bug.
- Bug fix: Corrected a bug where every matched/duplicate protein was not always reported due to a rounding/precision issue with storing of the xcorr. This was noticed for an edge case (bad spectrum, small database) that should not affect users in practice.
- I’ve sadly had to push off Comet-PTM integration to the next release again.