The breast-specific proteome

The human breast consists mainly of skin, adipose and glandular tissue and the main purpose of the breast is to produce milk for infants. The transcriptome analysis shows that 70% (n=13789) of all human proteins (n=19613) are expressed in the breast and 130 of these genes show an elevated expression in breast compared to other tissue types. A Gene ontology analysis of genes with elevated expression in the breast reveals that the corresponding proteins are secreted.

  • 23 breast enriched genes
  • Most elevated genes encode proteins involved in secretion and structural integrity of epithelial cells.
  • 130 genes defined as elevated in the breast

Figure 1. The distribution of all genes across the five categories based on transcript abundance in breast as well as in all other tissues.

130 genes show some level of elevated expression in the breast compared to other tissues. The three categories of genes with elevated expression in breast compared to other organs are shown in Table 1. The function and cellular localization of known genes with tissue enriched expression in breast (n=23), are well in-line with the function of the breast. In Table 2, the 12 genes with the highest level of expression among 23 enriched genes are defined.

Table 1. Number of genes in the subdivided categories of elevated expression in breast.

Category Number of genes Description
Tissue enriched 23 At least five-fold higher mRNA levels in a particular tissue as compared to all other tissues
Group enriched 41 At least five-fold higher mRNA levels in a group of 2-7 tissues
Tissue enhanced 66 At least five-fold higher mRNA levels in a particular tissue as compared to average levels in all tissues
Total 130 Total number of elevated genes in breast

Table 2. The 12 genes with the highest level of enriched expression in breast. "Predicted localization" shows the classification of each gene into three main classes: Secreted, Membrane, and Intracellular, where the latter consists of genes without any predicted membrane and secreted features. "mRNA (tissue)" shows the transcript level as TPM values, TS-score (Tissue Specificity score) corresponds to the score calculated as the fold change to the second highest tissue.

Gene Description Predicted localization mRNA (tissue) TS-score
LALBA lactalbumin alpha Secreted 2476.8 566
CSN2 casein beta Secreted 347.6 348
SULT1C3 sulfotransferase family 1C member 3 Intracellular 450.3 142
DCD dermcidin Secreted 336.4 96
CSN1S1 casein alpha s1 Intracellular,Secreted 1600.9 85
ACSM1 acyl-CoA synthetase medium-chain family member 1 Intracellular,Membrane 298.7 49
BTN1A1 butyrophilin subfamily 1 member A1 Membrane 36.8 49
MUCL1 mucin like 1 Intracellular,Secreted 3798.1 43
UGT2B11 UDP glucuronosyltransferase family 2 member B11 Membrane 320.4 17
SLCO1B7 solute carrier organic anion transporter family member 1B7 (putative) Membrane 43.2 17
ANKRD30A ankyrin repeat domain 30A Intracellular 49.8 15
CSN3 casein kappa Secreted 428.0 13

Some of the proteins predicted to be membrane-spanning are intracellular, e.g. in the Golgi or mitochondrial membranes, and some of the proteins predicted to be secreted can potentially be retained in a compartment belonging to the secretory pathway, such as the ER, or remain attached to the outer surface of the cell membrane by a GPI anchor.

The breast transcriptome

An analysis of the expression levels of each gene made it possible to calculate the relative mRNA pool for each of the categories. The analysis shows that 86% of the mRNA molecules derived from breast correspond to housekeeping genes and only 5% of the mRNA pool corresponds to genes categorized to be either breast enriched, group enriched or, breast enhanced. Thus, most of the transcriptional activity in the breast relates to proteins with presumed housekeeping functions as they are found in all tissues and cells analyzed.

Gene Ontology-based analysis of all the 130 genes elevated in breast indicates a clear overrepresentation of proteins associated with secretory functions highly related to lactation. Several other elevated genes were associated with structural integrity in epithelial cells of the breast glands.

Protein expression of genes elevated in breast

In-depth analysis of the elevated genes in breast using antibody-based protein profiling allowed us to visualize the expression patterns of the corresponding proteins. Several proteins were expressed in mammary glands of lactating breast.

LALBA, CSN1S1, CSN3 and CSN2 are examples of secreted proteins detected in milk, which are highly expressed in mammary glands. These proteins are specifically expressed during pregnancy due to hormonal changes, which induce morphological alterations in inactive mammary glands in order to activate them and thus provide milk for infants.

Genes shared between the breast and other tissues

There are 41 group enriched genes expressed in the breast. Group enriched genes are defined as genes showing a 5-fold higher average level of mRNA expression in a group of 2-7 tissues, including breast, compared to all other tissues.

In order to illustrate the relation of breast to other tissue types, a network plot was generated, displaying the number of genes shared between different tissue types.

Figure 2. An interactive network plot of the breast enriched and group enriched genes connected to their respective enriched tissues (grey circles). Red nodes represent the number of breast enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up to 3 tissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.

A Gene Ontology based analysis of the group enriched genes shows enrichment for genes in epithelia. One example of a protein expressed in breast, skin and esophagus is Keratin 15 (KRT15). Keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells.

KRT15 - breast
KRT15 - skin
KRT15 - esophagus

Hormone receptors in breast

The function of the breast is dependent on steroid hormones like progesterone, prolactin, placental lactogen and estrogen, and their respective receptors are present in the mammary epithelium. Estrogen receptor 1, ESR1, is a receptor present in female genitalia which will link estrogen hormones and transmit the signal to alter the activity of genes in the nucleus. ESR1 shows distinct nuclear positivity in breast, fallopian tube, cervix and endometrium.

ESR1 - breast
ESR1 - cervix
ESR1 - fallopian tube

ESR1 - endometrium

Breast function

The breast function can be summarized with a simple and short description; to provide nutritious milk for newborn infants. Milk secretion, or lactation, is a characteristic unique to all mammalians but the composition of milk content and how long time the lactation extends discriminates between species. Lactation in mammalians is an established evolutionary concept since it is an easy way to nourish the offspring.

The mammary gland is a complex structure that include a layer of secretory epithelial cells that secrete milk into the ducts and cavities by different forms of mechanisms, including exocytosis of secretory vesicles and budding-off of milk fat globules. The milk producing acinar cells are covered in a surrounding layer of myoepithelial cells that eject milk through contraction.

Breast histology

The main components of the human breasts consists of the skin, subcutaneous adipose tissue and glandular tissue, but the breasts are at the beginning a very basic construct; a nipple connected to a simple duct system. At puberty the breasts undergo a transformation under the influence of hormones that lead to an increase in adipose tissue and complex branching of the previous basic ductal system. Below the nipple, the collecting ducts dilate to form the lactiferous sinuses. The breast is divided into 15-25 lobes, each based on a branching duct system that leads from the collecting ducts to the terminal duct-lobular units. The terminal duct-lobular units are the functional sites of milk production. Each collecting duct drains a lobe made up of 20-40 lobules. In addition to glandular cells, the lobe is composed chiefly of adipose tissue and fibrous stroma - referred to as the inter- and perilobular connective tissue.

In the nipple, the stratified squamous epithelium from the surface extends into the collecting ducts for a variable short distance. There is then an abrupt change into the glandular epithelium that is present throughout the duct and lobular system. The glandular epithelium is composed of two distinct types of cells, the secretory or luminal cells and the myoepithelial cells. In the collecting ducts, the lining cells are usually columnar whereas in the acini they are usually cuboidal. Two types of luminal secretory cells have been identified. Basal cells, which have relatively clear cytoplasm and an oval nucleus lacking a visible nucleolus, and the superficial luminal cells with darker, basophilic cytoplasm. The myoepithelial cells usually form a discontinuous layer between the luminal secretory cells and the basement membrane. The myoepithelial cells appear small, flattened and with dark nuclei.

The histology of human breast including detailed images and information about the different cell types can be viewed in the Protein Atlas Histology Dictionary.


Here, the protein-coding genes expressed in the breast are described and characterized, together with examples of immunohistochemically stained tissue sections that visualize protein expression patterns of proteins that correspond to genes with elevated expression in the breast.

Transcript profiling and RNA-data analyses based on normal human tissues have been described previously (Fagerberg et al., 2013). Analyses of mRNA expression including over 99% of all human protein-coding genes was performed using deep RNA sequencing of 172 individual samples corresponding to 37 different human normal tissue types. RNA sequencing results of 4 fresh frozen tissues representing normal breast was compared to 168 other tissue samples corresponding to 36 tissue types, in order to determine genes with elevated expression in breast. A tissue-specific score, defined as the ratio between mRNA levels in breast compared to the mRNA levels in all other tissues, was used to divide the genes into different categories of expression. These categories include: genes with elevated expression in breast, genes expressed in all tissues, genes with a mixed expression pattern, genes not expressed in breast, and genes not expressed in any tissue. Genes with elevated expression in breast were further sub-categorized as i) genes with enriched expression in breast, ii) genes with group enriched expression including breast and iii) genes with enhanced expression in breast.

Human tissue samples used for protein and mRNA expression analyses were collected and handled in accordance with Swedish laws and regulation and obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All human tissue samples used in the present study were anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board.

Relevant links and publications

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Yu NY et al, 2015. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res.
PubMed: 26117540 DOI: 10.1093/nar/gkv608

Fagerberg L et al, 2014. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics.
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600

Histology dictionary - the breast