https://github.com/jedick/canprot
Raw File
Tip revision: f238d0f107d9a21c15e73e3f8a81062211424168 authored by jedick on 14 June 2017, 19:52:51 UTC
Sumbit version 0.1.0 to CRAN
Tip revision: f238d0f
colorectal.Rmd
---
title: "Colorectal Cancer"
output: html_vignette
vignette: >
  %\VignetteIndexEntry{Data: Colorectal Cancer}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
bibliography: data_sources.bib
csl: peerj.csl
---

This [<span style="color:red;text-decoration:underline">vignette</span>](http://chnosz.net/canprot/doc/index.html) from the R package [canprot](http://github.com/jedick/canprot) reproduces calculations of compositional oxidation state and hydration state and thermodynamic potential that are described in two papers published in _PeerJ_ ([Dick, 2016](https://doi.org/10.7717/peerj.2238) and [Dick, 2017](https://doi.org/10.7717/peerj.3421)).

The sections below give the [Abbreviations](#abbreviations) used here,
a [Summary Table](#summary-table) comparing the chemical compositions of groups of human proteins that are relatively down- and up-expressed (`n1` and `n2`, respectively) in colorectal cancer (CRC),
the [Data Sources](#data-sources) for protein expression in CRC,
a plot of the [Mean Differences](#mean-differences) of average oxidation state of carbon (`ZC`) and water demand per residue (`nH2O`),
[Potential Diagrams](#potential-diagrams) showing the weighted rank difference of chemical affinity between down- and up-expressed groups,
and [References](#references).

## Abbreviations
T (tumor), N (normal), C (carcinoma or adenocarcinoma), A (adenoma), CM (conditioned media), AD (adenomatous colon polyps), CIS (carcinoma in situ), ICC (invasive colonic carcinoma), NC (non-neoplastic colonic mucosa).

## Summary Table

```{r options, echo=FALSE}
options(width = 90)
```

```{r canprot, message=FALSE}
library(canprot)
data(canprot)
```

Identify the datasets for protein expression.

```{r datasets}
datasets <- pdat_CRC()
```

Get the amino acid compositions of the up- and down-expressed proteins (`get_pdat`) and make comparisons of indicators of oxidation and hydration state (`ZC_nH2O`).

```{r comptab, message=FALSE}
comptab <- lapply_canprot(datasets, function(dataset) {
  pdat <- get_pdat(dataset)
  ZC_nH2O(pdat, plot.it=FALSE)
})
```

Generate the HTML table with `xsummary`, which adds bold and underline formatting to the output of `xtable`.
The columns show the difference in means (`DM`), common language effect size (`ES`), and _p_-value (`p`) for comparisons between groups of the average oxidation state of carbon (`ZC`) and water demand per residue (`nH2O`).

```{r xsummary, results="asis"}
library(xtable)
xsummary(comptab)
```

## Data Sources

<b>a</b>. @WTK+08 used 2-nitrobenzenesulfenyl labeling and MS/MS analysis to identify 128 proteins with differential expression in paired CRC and normal tissue specimens from 12 patients.
The list of proteins used in this study was generated by combining the lists of up- and down-regulated proteins from Table 1 and Supplementary Data 1 of @WTK+08 with the Swiss-Prot and UniProt accession numbers from their Supplementary Data 2.
<b>b</b>. <b>c</b>. <b>d</b>. @AKP+10 used nano-LC-MS/MS to characterize proteins from the nuclear matrix fraction in samples from 2 patients each with adenoma (ADE), chromosomal instability CRC (CIN+) and microsatellite instability CRC (MIN+).
Cluster analysis was used to classify proteins with differential expression between ADE and CIN+, MIN+, or in both subtypes of carcinoma (CRC).
Here, gene names from Supplementary Tables 5--7 of @AKP+10 were converted to UniProt IDs using the UniProt mapping tool.
<b>e</b>. @JKMF10 compiled a list of candidate serum biomarkers from a meta-analysis of the literature.
In the meta-analysis, 99 up- or down-expressed proteins were identified in at least 2 studies.
The list of UniProt IDs used in this study was taken from Table 4 of @JKMF10.
<b>f</b>. <b>g</b>. @XZC+10 used a gel-enhanced LC-MS method to analyze proteins in pooled tissue samples from 13 stage I and 24 stage II CRC patients and pooled normal colonic tissues from the same patients.
Here, IPI accession numbers from Supplemental Table 4 of @XZC+10 were converted to UniProt IDs using the DAVID conversion tool.
<b>h</b>. @ZYS+10 used acetylation stable isotope labeling and LTQ-FT MS to analyze proteins in pooled microdissected epithelial samples of tumor and normal mucosa from 20 patients, finding 67 and 70 proteins with increased or decreased expression (ratios ≥2 or ≤0.5).
Here, IPI accession numbers from Supplemental Table 4 of @ZYS+10 were converted to UniProt IDs using the DAVID conversion tool.
<b>i</b>. <b>j</b>. <b>k</b>. <b>l</b>. <b>m</b>. @BPV+11 analyzed microdissected cancer and normal tissues from 28 patients (4 adenoma samples and 24 CRC samples at different stages) using iTRAQ labeling and MALDI-TOF/TOF MS to identify 555 proteins with differential expression between adenoma and stage I, II, III, IV CRC.
Here, gene names from supplemental Table 9 of @BPV+11 were converted to UniProt IDs using the UniProt mapping tool.
<b>n</b>. @JCF+11 analyzed paired samples from 16 patients using iTRAQ-MS to identify 118 proteins with >1.3-fold differential expression between CRC tumors and adjacent normal mucosa.
The protein list used in this study was taken from Supplementary Table 2 of @JCF+11.
<b>o</b>. <b>p</b>. <b>q</b>. @MRK+11 used iTRAQ labeling with LC-MS/MS to identify a total of 1061 proteins with differential expression (fold change ≥1.5 and false discovery rate ≤0.01) between pooled samples of 4 normal colon (NC), 12 tubular or tubulo-villous adenoma (AD) and 5 adenocarcinoma (AC) tissues.
The list of proteins used in this study was taken from from Table S8 of @MRK+11.
<b>r</b>. @KKL+12 used difference in-gel electrophoresis (DIGE) and cleavable isotope-coded affinity tag (cICAT) labeling followed by mass spectrometry to identify 175 proteins with more than 2-fold abundance ratios between microdissected and pooled tumor tissues from stage-IV CRC patients with good outcomes (survived more than five years; 3 patients) and poor outcomes (died within 25 months; 3 patients).
The protein list used in this study was made by filtering the cICAT data from Supplementary Table 5 of @KKL+12 with an abundance ratio cutoff of >2 or <0.5, giving 147 proteins. IPI accession numbers were converted to UniProt IDs using the DAVID conversion tool.
<b>s</b>. @KYK+12 used mTRAQ and cICAT analysis of pooled microsatellite stable (MSS-type) CRC tissues and pooled matched normal tissues from 3 patients to identify 1009 and 478 proteins in cancer tissue with increased or decreased expression by higher than 2-fold, respectively.
Here, the list of proteins from Supplementary Table 4 of @KYK+12 was filtered to include proteins with expression ratio >2 or <0.5 in both mTRAQ and cICAT analyses, leaving 175 up-expressed and 248 down-expressed proteins in CRC.
Gene names were converted to UniProt IDs using the UniProt mapping tool.
<b>t</b>. @WOD+12 used LC-MS/MS to analyze proteins in microdissected samples of formalin-fixed paraffin-embedded (FFPE) tissue from 8 patients; at _P_ < 0.01, 762 proteins had differential expression between normal mucosa and primary tumors.
The list of proteins used in this study was taken from Supplementary Table 4 of @WOD+12.
<b>u</b>. @YLZ+12 analyzed the conditioned media of paired stage I or IIA CRC and normal tissues from 9 patients using lectin affinity capture for glycoprotein (secreted protein) enrichment by nano LC-MS/MS to identify 68 up-regulated and 55 down-regulated differentially expressed proteins.
IPI accession numbers listed in Supplementary Table 2 of @YLZ+12 were converted to UniProt IDs using the DAVID conversion tool.
<b>v</b>. @MCZ+13 used laser capture microdissection (LCM) to separate stromal cells from 8 colon adenocarcinoma and 8 non-neoplastic tissue samples, which were pooled and analyzed by iTRAQ to identify 70 differentially expressed proteins.
Here, gi numbers listed in Table 1 of @MCZ+13 were converted to UniProt IDs using the UniProt mapping tool; FASTA sequences of 31 proteins not found in UniProt were downloaded from NCBI and amino acid compositions were added to [human_extra.csv](../html/human.html).
<b>w</b>. @KWA+14 used differential biochemical extraction to isolate the chromatin-binding fraction in frozen samples of colon adenomas (3 patients) and carcinomas (5 patients), and LC-MS/MS was used for protein identification and label-free quantification.
The results were combined with a database search to generate a list of 106 proteins with nuclear annotations and at least a three-fold expression difference.
Here, gene names from Table 2 of @KWA+14 were converted to UniProt IDs.
<b>x</b>. @UNS+14 analyzed 30 samples of colorectal adenomas and paired normal mucosa using iTRAQ labeling, OFFGEL electrophoresis and LC-MS/MS.
111 proteins with expression fold changes (log~2~) at least +/- 0.5 and statistical significance threshold _q_ < 0.02 that were also quantified in cell-line experiments were classified as “epithelial cell signature proteins”.
UniProt IDs were taken from Table III of @UNS+14.
<b>y</b>. @WKP+14 analyzed the secretome of paired CRC and normal tissue from 4 patients, adopting a five-fold enrichment cutoff for identification of candidate biomarkers.
Here, the list of proteins from Supplementary Table 1 of @WKP+14 was filtered to include those with at least five-fold greater or lower abundance in CRC samples and _p_ < 0.05.
Two proteins listed as “Unmapped by Ingenuity” were removed, and gene names were converted to UniProt IDs using the UniProt mapping tool.
<b>z</b>. @STK+15 analyzed the membrane-enriched proteome from tumor and adjacent normal tissues from 8 patients using label-free nano-LC-MS/MS to identify 184 proteins with a fold change > 1.5 and _p_-value < 0.05.
Here, protein identifiers from Supporting Table 2 of @STK+15 were used to find the corresponding UniProt IDs.
<b>A</b>. <b>B</b>. <b>C</b>. @WDO+15 analyzed 8 matched formalin-fixed and paraffin-embedded (FFPE) samples of normal tissue (N) and adenocarcinoma (C) and 16 nonmatched adenoma samples (A) using LC-MS to identify 2300 (N/A), 1780 (A/C) and 2161 (N/C) up- or down-regulated proteins at _p_ < 0.05.
The list of proteins used in this study includes only those marked as having a significant change in SI Table 3 of @WDO+15.
<b>D</b>. <b>E</b>. <b>F</b>. @LPL+16 used iTRAQ and 2D LC-MS/MS to analyze pooled samples of stroma purified by laser capture microdissection (LCM) from 5 cases of non-neoplastic colonic mucosa (NC), 8 of adenomatous colon polyps (AD), 5 of colon carcinoma in situ (CIS) and 9 of invasive colonic carcinoma (ICC).
A total of 222 differentially expressed proteins between NC and other stages were identified.
Here, gene symbols from Supplementary Table S3 of @LPL+16 were converted to UniProt IDs using the UniProt mapping tool.
<b>G</b>. Data were extracted from SI Table S3 of @LXM+16, including proteins with _p_-value <0.05.
<b>H</b>. <b>I</b>. <b>J</b>. @PHL+16 used iTRAQ 2D LC-MS/MS to analyze pooled samples from 5 cases of normal colonic mucosa (NC), 8 of adenoma (AD), 5 of carcinoma in situ (CIS) and 9 of invasive colorectal cancer (ICC).
A total of 326 proteins with differential expression between two successive stages (and, for CIS and ICC, also differentially expressed with respect to NC) were detected.
The list of proteins used in this study was generated by converting the gene names in Supplementary Table 4 of @PHL+16 to UniProt IDs using the UniProt mapping tool.

## Mean Differences

Using data from the table above, this plot shows that the groups of proteins that are relatively up-expressed in colorectal cancer or more advanced cancer stages predominantly have higher `ZC` and/or `nH2O`.
The datasets comparing adenoma to normal proteomes are highlighted in red.

```{r diffplot, fig.width=6, fig.height=6, fig.align="center"}
col <- rep("black", length(datasets))
# highlight adenoma / normal datasets
col[grepl("=AD", datasets)] <- "red"
diffplot(comptab, col=col)
```

## Potential Diagrams

[These plots](../html/rankplot.html) show the [weighted rank-sum differences](../html/rankdiff.html) of [chemical affinities of formation](../../CHNOSZ/html/affinity.html) from [basis species](../../CHNOSZ/html/basis.html) of proteins in different groups.
A higher ranking of relatively down- or up-expressed proteins in colorectal cancer is represented by blue or red color, respectively.
The last plot shows effective [values of Eh](../html/Ehplot.html) (redox potential) as a function of the same variables (oxygen fugacity and water activity).

Identify the datasets for protein expression.

```{r datasets2}
datasets <- c("JKMF10", "UNS+14", "WKP+14", "MRK+11_AD.NC", "MRK+11_AC.AD", "MRK+11_AC.NC", "JCF+11", "KWA+14")
```

Set up the diagram and make the plots.

```{r rankplot, fig.width=10, fig.height=10, message=FALSE, results="hide", out.width=685}
par(mfrow=c(3, 3))
par(xaxs="i", yaxs="i", las=1, mar=c(4, 4, 2, 2), mgp=c(2.6, 1, 0), cex=1)
for(i in seq_along(datasets)) {
  pdat <- get_pdat(datasets[i], basis="QEC4")
  rankplot(pdat, res=50)
  CHNOSZ::label.figure(LETTERS[i], paren=FALSE, font=2, yfrac=0.94)
}
Ehplot()
CHNOSZ::label.figure("I", paren=FALSE, font=2, yfrac=0.94)
```

This is similar to Figure 6 of [Dick (2016)](http://doi.org/10.7717/peerj.2238).
To reduce the time needed to create the vignette, the plots here are made at lower resolution than those in the paper.

## References
back to top