The breast cancer proteome


Breast cancer is the most common invasive cancer form in women worldwide and the leading cause of cancer-related mortality in women. The global age-adjusted incidence rate for breast cancer is 124 per 100,000 women per year. Male breast cancer is exceedingly rare and accounts for around 1% of cases. Although the rate of breast cancer diagnosis increased during the 1990's, it has decreased since the year 2000 and the overall breast cancer death rate has dropped steadily in the western world. The majority of breast cancers develop sporadically, but for 5-10% of patients there is a hereditary component. The most well known genes associated with increased breast cancer risk are BRCA1 and BRCA2. Women with abnormal BRCA1 or BRCA2 experience up to a 60% risk to develop breast cancer by the age of 90. Other risk factors include early menarche and late menopause. Pregnancy has been reported to decrease risk, probably due to the changes in breast tissue.

Breast cancer forms in tissues of the breast, usually the ducts (tubes that carry milk to the nipple) and lobules (glands where the milk is produced). Based on the presumed site of origin and morphology, breast cancer is broadly classified as ductal or lobular cancers.

Here, we explore the breast cancer proteome using TCGA transcriptomics data and antibody based protein data. 578 genes are suggested as prognostic based on transcriptomics data from 1075 patients; 210 genes associated with unfavourable prognosis and 368 genes associated with favourable prognosis.

TCGA data analysis


In this metadata study we used data from TCGA where transcriptomics data was available from 1075 patients in total. The dataset included 1063 females and 12 males. Most of the patients (923 patients) were still alive at the time of data collection. The stage distribution was stage i) 180 patients, stage ii) 609 patients, stage iii) 243 patients, stage iv) 20 patients, and 11 patients with missing stage information.

Unfavourable prognostic genes in breast cancer


For unfavourable genes, higher relative expression levels at diagnosis gives significantly lower overall survival for the patients. There are 210 genes associated with unfavourable prognosis in breast cancer. In Table 1, the top 20 most significant genes related to unfavourable prognosis are listed.

CBX3 is a gene associated with unfavourable prognosis in breast cancer. The best separation is achieved by an expression cutoff at 59.2 fpkm which divides the patients into two groups with 78% 5-year survival for patients with high expression versus 84% for patients with low expression, p-value: 7.92e-4. Protein DNA-binding Chromobox protein homolog 3 is encoded by the CBX3 gene and is a component of the heterochromatin, thus important in transcriptional silencing. Immunohistochemical staining using an antibody targeting CBX3 (HPA004902) shows differential expression pattern in breast cancer samples.

p<0.001
CBX3 - survival analysis
CBX3 - high expression
CBX3 - low expression

Table 1. The 20 genes with highest significance associated with unfavourable prognosis in breast cancer.

Gene Description Predicted localization mRNA (cancer) p-value
LRP11 LDL receptor related protein 11 Membrane,Secreted 18.6 2.31e-8
PGK1 phosphoglycerate kinase 1 Intracellular 86.5 7.05e-8
PCMT1 protein-L-isoaspartate (D-aspartate) O-methyltransferase Intracellular,Membrane 30.9 4.02e-7
FAM173B family with sequence similarity 173 member B Intracellular,Membrane 6.2 1.32e-6
MAL2 mal, T-cell differentiation protein 2 (gene/pseudogene) Membrane 98.0 4.57e-6
Show more

Favourable prognostic genes in breast cancer


For favourable genes, higher relative expression levels at diagnosis gives significantly higher overall survival for the patients. There are 368 genes associated with favourable prognosis in breast cancer. In Table 2, the top 20 most significant genes related to favourable prognosis are listed.

MVP is a gene associated with favourable prognosis in breast cancer. The best separation is achieved by an expression cutoff at 26.9 fpkm which divides the patients into two groups with 87% 5-year survival for patients with high expression versus 75% for patients with low expression, p-value: 4.51e-4. The Major Vault Protein is encoded by MVP and is an important component of large ribonucleoprotein particles found in eukaryotic cells. The MVP protein may play a role in several cellular functions such as signaling pathway regulation of the JAK/STAT, PI3K/AKT and MAP kinase pathways. MVP also seems to be implicated in multi-drug resistance and previous reports link the expression of MVP to prognosis in several cancer types. Immunohistochemical staining using an antibody targeting MVP (HPA064740) shows differential expression pattern in breast cancer samples.

p<0.001
MVP - survival analysis
MVP - high expression
MVP - low expression

Table 2. The 20 genes with highest significance associated with favourable prognosis in breast cancer.

Gene Description Predicted localization mRNA (cancer) p-value
ZNF385B zinc finger protein 385B Intracellular 2.0 4.86e-8
TFPI2 tissue factor pathway inhibitor 2 Intracellular,Secreted 7.8 8.33e-7
IL27RA interleukin 27 receptor subunit alpha Membrane 8.3 1.74e-6
JCHAIN joining chain of multimeric IgA and IgM Secreted 115.8 1.81e-6
ARID5A AT-rich interaction domain 5A Intracellular 11.8 2.05e-6
Show more

The breast cancer transcriptome


The transcriptome analysis shows that 72% (n=14027) of all human genes (n=19479) are expressed in breast cancer. All genes were classified according to the breast cancer-specific expression into one of five different categories, based on the ratio between mRNA levels in breast cancer compared to the mRNA levels in the other 16 analyzed cancer tissues. 154 genes show some level of elevated expression in breast cancer compared to other cancers (Figure 1). The elevated category is further subdivided into three categories as shown in Table 3.

Figure 1. The distribution of all genes across the five categories based on transcript abundance in breast cancer as well as in all other cancer tissues.

Table 3. Number of genes in the subdivided categories of elevated expression in breast cancer.

Category Number of genes Description
Tissue enriched 39 At least five-fold higher mRNA levels in a particular cancer as compared to all other cancers
Group enriched 71 At least five-fold higher mRNA levels in a group of 2-7 cancers
Tissue enhanced 44 At least five-fold higher mRNA levels in a particular cancer as compared to average levels in all cancers
Total 154 Total number of elevated genes in breast cancer

Additional information


Staging of breast cancer is based on the presence of local and/or distant spread. Localized, disease (Stage I) comprises approximately 60% of cases, while in about 5% the cancer has spread to distant organs such as liver and bone (Stage IV). Approximately 35% are Stage II or III, indicating tumor spread to regional lymph nodes.

All breast cancers may be differentiated histologically into three grades utilizing the Nottingham Grading System (NGS), also termed the Elston-Ellis grading system, by evaluating three tumor parameters. Parameters evaluated in this system are (i) extent of tubular differentiation, (ii) nuclear pleomorphism and (iii) mitotic activity assessed by counting mitotic figures in ten high power fields. Each parameter is given a score of 1 to 3 and the score of all three components are added together to a final score e.g. 1+1+1=3. The lowest final scores of 3, 4 and 5 represent well-differentiated tumors (Grade I) associated with better survival. The highest possible score is 9 (3+3+3=9) reflecting a poorly differentiated (Grade III) tumor associated with poor overall survival. In a large proportion of breast cancer, precursor lesions such as intraductal carcinoma are present adjacent to the invasive component of the tumor. Such regions of non-invasive cancer are denoted as cancer in situ and are important to recognize in the diagnostic procedure.

Immunohistochemistry is used routinely on all breast cancers to gain important information about the prognosis as well as for predicting response to specific anticancer therapies. The most commonly used antibodies include antibodies detecting the estrogen a receptor (ER, ESR1), progesterone receptor (PR, PGR), HER-2 (ERBB2) and the proliferation marker Ki-67 (MKI67). The tumor stage and grade, as well as results from immunohistochemistry, are used to personalize treatment options.

Relevant links and publications


Uhlen M et al, 2017. A pathology atlas of the human cancer transcriptome. Science.
PubMed: 28818916 DOI: 10.1126/science.aan2507

Cancer Genome Atlas Research Network et al, 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet.
PubMed: 24071849 DOI: 10.1038/ng.2764

UhlĂ©n M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Tao Z et al, 2015. Breast Cancer: Epidemiology and Etiology. Cell Biochem Biophys.
PubMed: 25543329 DOI: 10.1007/s12013-014-0459-6

Key TJ et al, 2001. Epidemiology of breast cancer. Lancet Oncol.
PubMed: 11902563 DOI: 10.1016/S1470-2045(00)00254-0

Histology dictionary - Breast cancer