Immunocytochemistry/IF - cells
The Human Protein Atlas displays high resolution, multicolor images of proteins labeled with immunofluorescence at the single cell level. This provides spatial information on protein expression patterns to define the subcellular localization to cellular organelles and structures.
Originally three cell lines, U-2 OS, A-431 and U-251 MG, originating from different human tissues were chosen to be included in the immunofluorescent analysis. The cell line panel has been expanding and is now including additional cell lines to enhance the probability for a large number of expressed proteins. The cell lines were selected from different lineages, e.g. tumor cell lines from mesenchymal, epithelial and glial tumors as well as cells immortalized by introduction of telomerase. The selection was furthermore based on morphological characteristics, widespread use and multitude of publications using these cell lines. Information regarding sex and age of the donor, cellular origin and source is listed here. Based on mRNA expression data, two suitable cell lines from the cell line panel are selected for the immunofluorescent analysis of each protein. In order to localize the whole human proteome on a subcellular level in one specific cell line a third cell line, U-2 OS, is always chosen.
In addition to the human cell lines, the mouse cell line NIH 3T3 is stained. This is only done for the antibodies corresponding to genes where the mouse and human genes are orthologous.
In order to facilitate the annotation of the subcellular localization of the protein targeted by the HPA antibody, the cells are also stained with reference markers. The following probes/organelles are used as references; (i) DAPI for the nucleus, (ii) anti-tubulin antibody as internal control and marker of microtubules, and (iii) anti-calreticulin or anti-KDEL for the endoplasmic reticulum (ER).
The resulting confocal images are single slice images representing one optical section of the cells. The microscope settings are optimized for each sample. The different organelle probes are displayed as different channels in the multicolor images; the HPA antibody staining is shown in green, nuclear stain in blue, microtubules in red and ER in yellow.
Annotation
In order to provide an interpretation of the staining patterns, all images of cell lines stained by indirect immunofluorescence are manually annotated. For each cell line and antibody, the intensity, subcellular location and single-cell variations (SCV) of the staining are described. The staining intensity is classified as negative, weak, moderate or strong based on the laser power and detector gain settings used for image acquisition in combination with the visual appearance of the image. The table below lists the subcellular locations used for annotation, links to the cell structure dictionary entry and corresponding GO terms. SCVs within an immunofluorescence image are annotated as intensity (variation in their expression level), or as spatial (variation in the spatial distribution).
Knowledge-based annotation
The knowledge-based annotation aims to provide an interpretation of the detected subcellular location of a protein. In the first step, stainings in different cell lines with the same antibody are reviewed and the results are compared with available protein/gene characterization data for subcellular location. In the second step, all antibodies targeting the same protein are taken in consideration for final annotation of the subcellular localization. Each location gets separately one of the four reliability scores (Enhanced, Supported, Approved, and Uncertain), which results together with additional factors (e.g. correlation of the signal strength to RNA-seq data, similarity between sibling antibodies) in an overall gene reliability score.
Reliability score
A reliability score is set manually for all genes and indicates the level of reliability of the analyzed protein expression pattern based on available protein/RNA/gene characterization data from both HPA and the UniProtKB/Swiss-Prot database.
The overall score encompass several factors: reproducibility of the antibody staining in different cell lines (and if signal strength correlates with RNA expression levels); assays for enhanced antibody validation by using antibodies binding to different epitopes on the same target protein (independent antibodies), by knockdown/knockout of the target protein (genetic methods) and by matching of the signal with a GFP-tagged protein (recombinant expression); experimental evidence for location described in literature. Considerations are also made on whether the antibody performs in non IF-related methods like Western blot or immunohistochemistry.
The final score leads to the assignment into one of the following four classes:
- Enhanced - One or more antibodies are enhanced validated and there is no contradicting data, for example literature describes experimental evidence for a different location.
- Supported - There is no enhanced validation of the used antibody, but the annotated localization is reported in literature.
- Approved - If the localization of the protein has not been previously described and was detected by only one antibody without additional antibody validation.
- Uncertain - If the antibody-staining pattern contradicts experimental data or no expression is detected on the RNA level.
Immunohistochemistry - tissues
The Human Protein Atlas contains images of histological sections from normal and cancer tissues obtained by immunohistochemistry. Antibodies are labeled with DAB (3,3'-diaminobenzidine) and the resulting brown staining indicates where an antibody has bound to its corresponding antigen. The section is furthermore counterstained with hematoxylin to enable visualization of microscopical features.
Tissue microarrays are used to show antibody staining in samples from 144 individuals corresponding to 44 different normal tissue types, and samples from 216 cancer patients corresponding to 20 different types of cancer (movie about tissue microarray production and immunohistochemical staining). Each sample is represented by 1 mm tissue cores, resulting in a total number of 576 images for each antibody. Normal tissues are represented by samples from three individuals each, one core per individual, except for endometrium, skin, soft tissue and stomach, which are represented by samples from six individuals each and parathyroid gland, which is represented by one sample. Protein expression is annotated in 76 different normal cell types present in these tissue samples. For cancer tissues, two cores are sampled from each individual and protein expression is annotated in tumor cells. A small fraction of the 576 images are missing for most antibodies due to technical issues. Specimens containing normal and cancer tissue have been collected and sampled from anonymized paraffin embedded material of surgical specimens, in accordance with approval from the local ethics committee.
For selected proteins extended tissue profiling is performed in addition to standard tissue microarrays. Examined tissues include mouse brain, human lactating breast, eye, thymus and extended samples of adrenal gland, skin and brain.
Since specimens are derived from surgical material, normal is here defined as non-neoplastic and morphologically normal. It is not always possible to obtain fully normal tissues and thus several of the tissues denoted as normal will include alterations due to inflammation, degeneration and tissue remodeling. In rare tissues, hyperplasia or benign proliferations are included as exceptions. It should also be noted that within normal morphology there may exist interindividual differences and variations due to primary diseases, age, sex etc. Such differences may also affect protein expression and thereby immunohistochemical staining patterns.
Samples from cancer are also derived from surgical material. Due to subgroups and heterogeneity of tumors within each cancer type, included cases represent a typical mix of specimens from surgical pathology. The inclusion of tumors is based on availability and representativity, however, an effort has been made to include high and low grade malignancies where such is applicable. In certain tumor groups, subtypes have been included, e.g. breast cancer includes both ductal and lobular cancer, lung cancer includes both squamous cell carcinoma and adenocarcinoma and liver cancer includes both hepatocellular and cholangiocellular carcinoma etc. Tumor heterogeneity and interindividual differences may be reflected in diverse expression of proteins resulting in variable immunohistochemical staining patterns.
Annotation
In order to provide an overview of protein expression patterns, all images of tissues stained by immunohistochemistry are manually annotated by a specialist followed by verification by a second specialist. Annotation of each different normal and cancer tissue is performed using fixed guidelines for classification of immunohistochemical results. Each tissue is examined for representability, and subsequently immunoreactivity in the different cell types present in normal or cancer tissues was annotated. Basic annotation parameters include an evaluation of i) staining intensity (negative, weak, moderate or strong), ii) fraction of stained cells (rare, <25%, 25-75% or >75%) and iii) subcellular localization (nuclear and/or cytoplasmic/membranous). The manual annotation also provides two summarizing texts describing the staining pattern for each antibody in normal tissues and in cancer tissues.
The terminology and ontology used is compliant with standards used in pathology and medical science. SNOMED classification is used for assignment of topography and morphology. SNOMED classification also underlies the given original diagnosis from which normal as well as cancer samples were collected.
A histological dictionary used in the annotation is available as a PDF-document, containing images stained by immunohistochemistry using antibodies included in the Human Protein Atlas. The dictionary displays subtypes of cells distinguishable from each other and also shows specific expression patterns in different intracellular structures. Annotation dictionary: screen usage (15 MB), printing (95 MB).
Knowledge-based annotation
Knowledge-based annotation aims to create a comprehensive overview of protein expression patterns in normal human tissues. This is achieved by stringent evaluation of immunohistochemical staining pattern, RNA-seq data from internal and external sources and available protein/gene characterization data, with special emphasis on RNA-seq. Annotated protein expression profiles are performed using single antibodies as well as independent antibodies (two or more independent antibodies directed against different, non-overlapping epitopes on the same protein). For independent antibodies, the immunohistochemical data from all the different antibodies are taken into consideration. The immunohistochemical staining pattern in normal tissues is subjectively annotated according to strict guidelines. It is based on the experienced evaluation of positive immunohistochemical signals in the 76 normal cell types analyzed. The review also takes suboptimal experimental procedures and interindividual variations into consideration.
The final annotated protein expression is considered a best estimate and as such reflects the most probable histological distribution and relative expression level for each protein. To enable a protein expression profile, one or several of the following additional data sources is necessary; i) an independent antibody targeting another epitope of the same protein ii) RNA-seq data, and iii) available protein/gene characterization data. The result of the knowledge-based annotation is considered inconclusive when the information available at the time of analysis is evaluated as not sufficient for verification of the staining pattern and an estimation of the expected protein expression.
The knowledge-based protein expression profiles are performed using fixed guidelines on evaluation and presentation of the resulting expression profiles. Standardized explanatory sentences are used when necessary to provide additional information required for full understanding of the expression profile. A reliability score, set as Enhanced, Supported, Approved, or Uncertain is set for each annotated protein expression profile based on evaluation of all available data.
Reliability score
A reliability score is manually set for all genes and indicates the level of reliability of the analyzed protein expression pattern based on available RNA-seq data, protein/gene characterization data and immunohistochemical data from one or several antibodies with non-overlapping epitopes. The reliability score is based on the 44 normal tissues analyzed, and is displayed on both Tissue Atlas and Pathology Atlas.
The reliability score is divided into Enhanced, Supported, Approved, or Uncertain.If there is available data from more than one antibody, the staining patterns of all antibodies are taken in consideration during evaluation of reliability score.
- Enhanced - One or several antibodies with non-overlapping epitopes targeting the same gene have obtained enhanced validation based on orthogonal or independent antibody validation method.
- Supported - Consistency with RNA-seq and/or protein/gene characterization data, in combination with similar staining pattern if independent antibodies are available.
- Approved - Consistency with RNA-seq data in combination with inconsistency with, or lack of, protein/gene characterization data. Alternatively, consistency with protein/gene characterization data in combination with inconsistency with RNA-seq data. If independent antibodies are available, the staining pattern is partly similar or dissimilar.
- Uncertain - Inconsistency with, or lack of, RNA-seq and/or protein/gene characterization data, in combination with dissimilar staining pattern if independent antibodies are available.
Immunohistochemistry/IF - mouse brain
As a complement to the immunohistochemically stained tissues, the protein atlas also includes the mouse brain atlas as a sub compartment of the normal tissue atlas. In which comprehensive profiles are available in mouse brain. A selected set of targets have been analyzed by using the antibodies in serial sections of mouse brain which covers 129 areas and subfields of the brain, several of these regions difficult to cover in the human brain.
In addition pituitary, retina and trigeminal ganglions are included in recent and future image series but not annotated yet.
The tissue microarray method used within the human protein atlas enabled the global mapping of proteins in the human body, including the brain. Currently, the human tissue atlas covers four areas of the human brain: cerebral cortex, hippocampus, caudate and cerebellum. Due to the heterogeneous structure of the brain, with many nuclei and cell-types organized in complex networks, it is difficult to achieve a comprehensive overview in a 1 mm tissue sample. Analysis of more human brain samples, including smaller brain nuclei, is thus desirable in order to generate a more detailed map of protein distribution in the brain. Therefore, we here complemented the human brain atlas effort with a more comprehensive analysis of the mouse brain. A series of mouse brain sections is explored for protein expression and distribution in a large number of brain regions.
Antibodies are selected against protein involved in normal brain physiology, brain development and neuropathological processes. A limit of 60% homology (human vs mouse) is used as cut off when comparing the PrEST sequence for the antibody targets.
Selected antibodies are applied to test-sections containing brain regions or cell types with known expression based on in situ
hybridization (Allen Brain Atlas) and single cell RNAseq data
(Linnarsson Lab and Barres Lab).
Staining patterns are evaluated based on consistency between staining patterns of multiple antibodies against the same target and match to transcriptomics data.
Antibody immunoreactivity is visualized using tyramid signal amplification shown in green. A nuclear reference staining (DAPI) is visualized in blue. The immunofluorescence protocol is standardized through antibody concentration and incubation time are variable depending on protein abundance and antibody affinity determined during the test staining. The complete mouse brain profile is represented by serial coronal sections of adult mouse brain, 16 µm thick. Stained slides are then scanned and digitalized before further processing.
Table 1. Brain regions. Abbreviations are based on The Mouse Brain in Stereotaxic Coordinates, Third Edition: The coronal plates and diagrams (ISBN: 9780123742445)
Region |
Abbreviation |
Allen Brain Atlas |
forebrain |
olfactory bulb |
anterior olfactory nucleus |
aon |
AON |
forebrain |
olfactory bulb |
granule cell layer |
gro |
MOBgr |
forebrain |
olfactory bulb |
internal plexiform layer |
ipl |
MOBipl |
forebrain |
olfactory bulb |
mitral cell layer |
mi |
MOBmi |
forebrain |
olfactory bulb |
glomerular layer |
gl |
MOBgl |
forebrain |
olfactory bulb |
rostral migratory stream |
rms |
SEZ |
forebrain |
olfactory bulb |
external plexiform layer |
epl |
MOBopl |
forebrain |
olfactory bulb |
external plexiform layer of the accessory OB |
epla |
|
forebrain |
olfactory bulb |
granule cell layer of the accessory OB |
gra |
AOBgr |
forebrain |
olfactory bulb |
glomerular layer of the accessory OB |
gla |
AOBgl |
forebrain |
basal forebrain |
dorsal tenia tecta |
dtt |
TTd |
forebrain |
basal forebrain |
caudate putamen |
cpu |
CP |
forebrain |
basal forebrain |
accumbens nucleus, core |
acbc |
ACB |
forebrain |
basal forebrain |
accumbens nucleus, shell |
acbsh |
ACB |
forebrain |
basal forebrain |
island of Calleja |
icj |
isl |
forebrain |
basal forebrain |
ventral pallidum |
vp |
PALv |
forebrain |
basal forebrain |
medial septum |
ms |
MS |
forebrain |
basal forebrain |
nucleus of the vertical limb of the diagonal band |
vdb |
NDB |
forebrain |
basal forebrain |
lateral septum |
ls |
LS |
forebrain |
basal forebrain |
nucleus of the horizontal limb of the diagonal band |
hdb |
NDB |
forebrain |
basal forebrain |
globus pallidus |
gp |
PALd |
forebrain |
cerebral cortex |
frontal association cortex |
fra |
FRP |
forebrain |
cerebral cortex |
motor cortex |
m |
MO |
forebrain |
cerebral cortex |
cingulate cortex |
cg |
ACA |
forebrain |
cerebral cortex |
piriform cortex, L1 |
pirl1 |
PIR1 |
forebrain |
cerebral cortex |
piriform cortex, L2 |
pirl2 |
PIR2 |
forebrain |
cerebral cortex |
piriform cortex, L3 |
pirl3 |
PIR3 |
forebrain |
cerebral cortex |
insular cortex |
i |
AI |
forebrain |
cerebral cortex |
somatosensory cortex |
s |
SS |
forebrain |
cerebral cortex |
retrosplenial granular cortex |
rsg |
RSP |
forebrain |
cerebral cortex |
parietal association cortex |
p |
PTLp |
forebrain |
cerebral cortex |
entorhinal cortex |
ent |
ENT |
forebrain |
cerebral cortex |
visual cortex |
v |
VIS |
forebrain |
hippocampus |
polymorph layer of the dentate gyrus |
podg |
DG-po |
forebrain |
hippocampus |
molecular layer of the dentate gyrus |
modg |
DG-mo |
forebrain |
hippocampus |
granular dentate gyrus |
grdg |
DG-sg |
forebrain |
hippocampus |
CA1 - oriens layer |
ca1or |
CA1so |
forebrain |
hippocampus |
CA1 - pyramidal layer |
ca1py |
CA1sp |
forebrain |
hippocampus |
CA1 - radiatum layer |
ca1ra |
CA1sr |
forebrain |
hippocampus |
CA2 - oriens layer |
ca2or |
CA2so |
forebrain |
hippocampus |
CA2 - pyramidal layer |
ca2py |
CA2sp |
forebrain |
hippocampus |
CA2 - radiatum layer |
ca2ra |
CA2sr |
forebrain |
hippocampus |
CA3 - oriens layer |
ca3or |
CA3so |
forebrain |
hippocampus |
CA3 - pyramidal layer |
ca3py |
CA3sp |
forebrain |
hippocampus |
CA3 - radiatum layer |
ca3ra |
CA3sr |
forebrain |
hippocampus |
stratum lucidum |
slu |
CA3slu |
forebrain |
hippocampus |
lacunosum moleculare |
lmol |
CA1slm |
forebrain |
hippocampus |
subiculum |
sub |
SUB |
forebrain |
circumventricular organs |
subfornical organ |
sfo |
SFO |
forebrain |
amygdala |
nucleus of the lateral olfactory tract |
lot |
NLOT |
forebrain |
amygdala |
basal medial amygdaloid nucleus |
bma |
BMA |
forebrain |
amygdala |
basal lateral amygdaloid nucleus |
bla |
BLA |
forebrain |
amygdala |
cortical amygdala |
aco |
COA |
forebrain |
amygdala |
central amygdala |
ce |
CEA |
forebrain |
amygdala |
medial amygdaloid nucleus |
mea |
MEA |
interbrain |
hypothalamus |
dorsal tuberomammillary nucleus |
dtm |
TMd |
interbrain |
hypothalamus |
mammillary nucleus |
mn |
MBO |
interbrain |
hypothalamus |
periventricular hypothalamic nucleus |
pe |
PVi |
interbrain |
hypothalamus |
supraoptic nucleus |
so |
SO |
interbrain |
hypothalamus |
tuberal nucleus |
tu |
TU |
interbrain |
hypothalamus |
ventral tuberomammillary nucleus |
vtm |
TMv |
interbrain |
hypothalamus |
lateral preoptic area |
lpo |
LPO |
interbrain |
hypothalamus |
medial preoptic area |
mpo |
MEPO |
interbrain |
hypothalamus |
suprachiasmatic nucleus |
sch |
SCH |
interbrain |
hypothalamus |
paraventricular hypothalamic nucleus |
pa |
PVH |
interbrain |
hypothalamus |
anterior hypothalamic area, central |
ahc |
AHN |
interbrain |
hypothalamus |
ventral medial hypothalamic nucleus |
vmh |
VMH |
interbrain |
hypothalamus |
arcuate nucleus |
arc |
ARH |
interbrain |
hypothalamus |
peduncular part of lateral hypothalmus |
plh |
PH |
interbrain |
hypothalamus |
dorsal medial hypothalamic nucleus |
dm |
DMH |
interbrain |
circumventricular organs |
subcommissural organ |
sco |
|
interbrain |
circumventricular organs |
median eminence |
me |
ME |
interbrain |
thalamus |
medial geniculate nucleus |
mg |
MG |
interbrain |
thalamus |
parafascicular thalamic nucleus |
pf |
PF |
interbrain |
thalamus |
pregeniculate nucleus |
pg |
GENd |
interbrain |
thalamus |
stria terminalis |
st |
st |
interbrain |
thalamus |
zona incerta |
zi |
ZI |
interbrain |
thalamus |
anterodorsal thalamic nucleus |
ad |
AD |
interbrain |
thalamus |
reticular thalamic nucleus |
rt |
RT |
interbrain |
thalamus |
vental anterior thalamic nucleus |
va |
VAL |
interbrain |
thalamus |
medial habenular nucleus |
mhb |
MH |
interbrain |
thalamus |
laterodorsal thalamic area |
ld |
LD |
interbrain |
thalamus |
paraventricular thalamic nucleus |
pv |
PVT |
interbrain |
thalamus |
central medial thalamic area |
cm |
CM |
interbrain |
thalamus |
ventral lateral thalamic area |
vl |
VP |
interbrain |
thalamus |
ventral medial thalamic area |
vm |
VM |
interbrain |
thalamus |
lateral habenulal nucleus |
lhb |
LH |
interbrain |
thalamus |
ventral posterior thalamus |
vpt |
VP |
interbrain |
thalamus |
anterior pretactal nucleus |
apt |
PRT |
interbrain |
thalamus |
retromammillary nucleus |
rm |
SUM |
midbrain |
midbrain motor |
substantia nigra, reticular |
snr |
SNr |
midbrain |
midbrain motor |
periaquaductal grey |
pag |
PAG |
midbrain |
midbrain motor |
interpeduncular nucleus |
ip |
IPN |
midbrain |
midbrain motor |
mesencephalic retic form |
mrt |
MRN |
midbrain |
midbrain motor |
red nucleus |
r |
RN |
midbrain |
midbrain motor |
oculomotor nucleus |
3n |
III |
midbrain |
midbrain motor |
mesencephalic trigeminal nucleus |
me5 |
MEV |
midbrain |
midbrain motor |
ventral tegmental area |
vta |
VTA |
midbrain |
midbrain behavioral |
substantia nigra, compact |
snc |
SNc |
midbrain |
midbrain behavioral |
dorsal raphe nucleus |
dr |
DR |
midbrain |
midbrain behavioral |
median raphe nucleus |
mnr |
CLI |
midbrain |
midbrain sensory |
superior colliculi |
sc |
|
midbrain |
midbrain sensory |
external cortical inferior colliculli |
ecic |
ICe |
hindbrain |
cerebellum |
moleuclar layer of the cerebellum |
cemol |
CBXmo |
hindbrain |
cerebellum |
Purkinje layer of the cerebellum |
cepur |
CBXpu |
hindbrain |
cerebellum |
granular layer of the cerebellum |
cegr |
CBXgr |
hindbrain |
circumventricular organs |
medulla |
ap |
AP |
hindbrain |
pons |
koelliker-fuse nucleus |
kf |
KF |
hindbrain |
pons |
motor tregiminal nucleus |
5n |
V |
hindbrain |
pons |
parabrachial nucleus |
pbp |
PB |
hindbrain |
pons |
principle sensory trigeminal nucleus |
pr5 |
PSV |
hindbrain |
pons |
locus coeruleus |
lc |
LC |
hindbrain |
pons |
pontine nucleus |
pn |
PG |
hindbrain |
pons |
vestibular nucleus |
ve |
VNC |
hindbrain |
pons |
pontine reticular nucleus, oral |
pno |
PRNr |
hindbrain |
pons |
lateral lemniscus |
ll |
NLL |
hindbrain |
pons |
superior paraolivary nucleus |
spo |
POR |
hindbrain |
medulla |
nucleus of the solitary tract |
sol |
NTS |
hindbrain |
medulla |
raphe magnus nucleus |
rmg |
RM |
hindbrain |
medulla |
cochlear nucleus |
cn |
CN |
hindbrain |
medulla |
lateral paragigantocellular nucleus |
lpg |
PGRNl |
hindbrain |
medulla |
raphe pallidus nucleus |
rpa |
RPA |
hindbrain |
medulla |
facial nucleus |
7n |
VII |
hindbrain |
medulla |
hypoglossal nucleus |
12n |
XII |
hindbrain |
medulla |
ambiguus nucleus |
amb |
AMB |
hindbrain |
medulla |
external cuneate nucleus |
ecu |
CU |
hindbrain |
medulla |
inferior olivary nucleus |
io |
IO |
hindbrain |
medulla |
raphe obscures nucleus |
rob |
RO |
hindbrain |
medulla |
dorsal motor nucleus of vagus |
10n |
DMX |
Annotation
The digitalized images are processed (axel-adjusted and tissue edges defined) and regions of interest (ROIs) are then marked according to the table above. Theses ROIs are then used for image analysis and the relative fluorescence intensity is listed for each region. The relative fluorescence is defined intensity of the annotated region relative to the intensity of the region with highest intensity.
The overview and preserved orientation in the mouse brain has enabled us to annotate additional cell classes (ependymal), glial subpopulations (microglia, oligodendrocytes, and astrocytes), and additional brain specific subcellular locations (axon, dendrite, synapse, and glia endfeet) for each investigated protein.
All images of immunofluorescence stained sections were manually annotated by specially educated personnel followed by review and verification by a second qualified member of the staff. The cellular and subcellular location of the immunoreactivity is defined and a summarizing text is provided describing the general staining pattern.
Specificity is validated by comparing the data with in situ hybridization data (Allen brain atlas) and/or available literature; support from other data leads to a supportive reliability score, while more unknown targets are viewed as uncertain and awaits further validation.
Reliability score
A reliability score is set for all genes and indicates the level of reliability of the analyzed protein expression pattern based on available protein/RNA/gene characterization data.
The reliability score of the antibodies in mouse brain atlas is scored as Supported or Uncertain depending on support from in situ hybridization data (Allen brain atlas) and/or previous published data, UniProtKB/Swiss-Prot database.
Protein array
All purified antibodies are analyzed on antigen microarrays. The specificity profile for each
antibody is determined based on the interaction with 384 different antigens including its
own target. The antigens present on the arrays are consecutively exchanged in order to
correspond to the next set of 384 purified antibodies. Each microarray is divided into 21
replicated subarrays, enabling the analysis of 21 antibodies simultaneously. The antibodies
are detected through a fluorescently labeled secondary antibody and a dual color system is
used in order to verify the presence of the spotted proteins. A specificity profile plot is
generated for each antibody, where the signal from the binding to its own antigen is
compared to the eventual off target interactions to all the other antigens. The vast majority
(86%) of antibodies are given a pass and the remaining are failed either due to low signal or
low specificity.
Western blot
Western blot analysis of antibody specificity has been done using a routine sample setup composed of IgG/HSA-depleted human plasma and protein lysates from a limited number of human tissues and cell lines. Antibodies with an uncertain routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control. Antibody binding was visualized by chemiluminescence detection in a CCD-camera system using a peroxidase (HRP) labeled secondary antibody.
Antibodies included in the Human Protein Atlas have been analyzed without further efforts to optimize the procedure and therefore it cannot be excluded that certain observed binding properties are due to technical rather than biological reasons and that further optimization could result in a different outcome.
HPA RNA-seq data
In total, 64 cell lines and 37 tissues have been analyzed by RNA-seq to estimate the transcript abundance of each protein-coding gene.
For cell lines, early-split samples were used as duplicates and total RNA was extracted using the RNeasy mini kit. Information regarding cellular origin and source of each cell line is listed here.
For normal tissue, specimens were collected with consent from patients and all samples were anonymized in accordance with approval from the local ethics committee (ref #2011/473) and Swedish rules and legislation. All tissues were collected from the Uppsala Biobank and RNA samples were extracted from frozen tissue sections.
For a total number of 131 cell line samples and 172 tissue samples, mRNA sequencing was performed on Illumina HiSeq2000 and 2500 machines (Illumina, San Diego, CA, USA) using the standard Illumina RNA-seq protocol with a read length of 2x100 bases.
Transcript abundance estimation was performed using Kallisto v0.42.4.
For each gene, we report the abundance in 'Transcript Per Million' (TPM) as the sum of the TPM values of all its protein-coding transcripts.
For each cell line and tissue type, the average TPM value
for replicate samples were used as abundance score.
The threshold level to detect presence of a transcript
for a particular gene was set to ≥ 1 TPM.
The RNA-seq data was used to classify all genes according to their tissue specific or cell line specific expression into one of six different categories, defined based on the total set of all TPM values in 37 tissues or 64 cell lines:
- Tissue/Cell line enriched (expression in one tissue at least five-fold higher than all other tissues/cell lines)
- Group enriched (five-fold higher average TPM in a group of two to seven tissues/cell lines compared to all other tissues/cell lines)
- Tissue/Cell line enhanced (five-fold higher average TPM in one or more tissues/cell lines compared to the mean TPM of all tissues/cell lines)
- Expressed in all (≥ 1 TPM in all tissues/cell lines)
- Not detected (< 1 TPM in all tissues/cell lines)
- Mixed (detected in at least one tissue/cell line and in none of the above categories)
An additional category "elevated", containing all genes in the first three categories (tissue/cell line enriched, group enriched and tissue/cell line enhanced), has been used for some parts of the analysis. TS/CS-score (Tissue Specificity/Cell Specificity score) is calculated for “elevated” tissues/cell lines. TS/CS-score is calculated as the fold change from the tissue/cell line with highest RNA to the tissue/cell line with second highest RNA.
GTEx RNA-seq data
The Genotype-Tissue Expression (GTEx) project collects and analyzes multiple human post mortem tissues.
RNA-seq data from 31 of their tissues having a corresponding tissue in Human Protein Atlas have been included to allow for
comparisons between the Human Protein Atlas data and GTEx data.
The GTEx RNA-seq data has been mapped using the ensembl gene id available from GTEx, and the RPKMs (number Reads Per Kilobase gene model and Million mapped reads) for
each gene were subsequently used to categorize the genes using the same classification as described above but using 0.5 RPKM as threshold for detection.
Tissue |
GTEx tissue |
Number of samples |
Adipose tissue |
Adipose - Subcutaneous |
350 |
|
Adipose - Visceral (Omentum) |
227 |
Adrenal gland |
Adrenal Gland |
145 |
Breast |
Breast - Mammary Tissue |
214 |
Caudate |
Brain - Caudate (basal ganglia) |
117 |
Cerebellum |
Brain - Cerebellar Hemisphere |
105 |
|
Brain - Cerebellum |
125 |
Cerebral cortex |
Brain - Cortex |
114 |
|
Brain - Frontal Cortex (BA9) |
108 |
Cervix, uterine |
Cervix - Ectocervix |
6 |
|
Cervix - Endocervix |
5 |
Colon |
Colon - Sigmoid |
149 |
|
Colon - Transverse |
196 |
Endometrium |
Uterus - Endometrium |
14 |
Esophagus |
Esophagus - Mucosa |
286 |
Fallopian tube |
Fallopian Tube |
6 |
Heart muscle |
Heart - Atrial Appendage |
194 |
|
Heart - Left Ventricle |
218 |
Hippocampus |
Brain - Hippocampus |
94 |
Hypothalamus |
Brain - Hypothalamus |
96 |
Kidney |
Kidney - Cortex |
32 |
Liver |
Liver |
119 |
Lung |
Lung |
320 |
Ovary |
Ovary |
97 |
Pancreas |
Pancreas |
171 |
Pituitary gland |
Pituitary |
103 |
Prostate |
Prostate |
106 |
Salivary gland |
Minor Salivary Gland |
57 |
Skeletal muscle |
Muscle - Skeletal |
430 |
Skin |
Skin - Not Sun Exposed (Suprapubic) |
250 |
|
Skin - Sun Exposed (Lower leg) |
357 |
Small intestine |
Small Intestine - Terminal Ileum |
88 |
Spleen |
Spleen |
104 |
Stomach |
Stomach |
193 |
Testis |
Testis |
172 |
Thyroid gland |
Thyroid |
323 |
Urinary bladder |
Bladder |
11 |
Vagina |
Vagina |
96 |
FANTOM5 CAGE data
The Functional Annotation of Mammalian Genomes 5 (FANTOM5)
project provides comprehensive expression profiles and functional annotation of mammalian cell-type specific transcriptomes
using Cap Analysis of Gene Expression (CAGE)
(Takahashi et al 2012), which is based on a
series of full-length cDNA technologies developed in RIKEN. CAGE data for 36 of their
tissues was obtained from the
FANTOM5 repository and
mapped to ENSEMBL. The normalized Tags Per Million for each gene were calculated and subsequently used to categorize the
genes using the same classification as described above and using Tags Per Million ≥ 1 as threshold for detection to allow for comparisons with the Human Protein Atlas data.
Tissue |
FANTOM5 tissue |
Sample description |
FANTOM5 sample id |
Adipose tissue |
Adipose tissue |
65,65,76 years, mixed |
FF:10010-101C1 |
Appendix |
Appendix |
29 years, male |
FF:10189-103D9 |
Brain |
Brain |
77,79,81 years, mixed |
FF:10012-101C3 |
Breast |
Breast |
77 years, female |
FF:10080-102A8 |
Caudate |
Caudate nucleus |
76 years, female |
FF:10164-103B2 |
Cerebellum |
Cerebellum |
22-68 years, mixed |
FF:10083-102B2 |
|
Cerebellum |
76 years, female |
FF:10166-103B4 |
Cervix, uterine |
Cervix |
40,46,57,65 years, female |
FF:10013-101C4 |
Colon |
Colon |
62,83,84 years, mixed |
FF:10014-101C5 |
Endometrium |
Uterus |
23-63 years, female |
FF:10100-102D1 |
Epididymis |
Epididymis |
24 years, male |
FF:10197-103E8 |
Esophagus |
Esophagus |
68,74,75 years, mixed |
FF:10015-101C6 |
Gallbladder |
Gall bladder |
57 years, male |
FF:10198-103E9 |
Heart muscle |
Heart |
70,73,74 years, mixed |
FF:10016-101C7 |
|
Left ventricle |
73 years, female |
FF:10078-102A6 |
Hippocampus |
Hippocampus |
76 years, female |
FF:10153-102I9 |
|
Hippocampus |
60 years, female |
FF:10169-103B7 |
Kidney |
Kidney |
60,62,63 years, female |
FF:10017-101C8 |
Liver |
Liver |
64,69,70 years, mixed |
FF:10018-101C9 |
Lung |
Lung |
46,65,94 years, mixed |
FF:10019-101D1 |
|
Lung - right lower lobe |
29 years, male |
FF:10075-102A3 |
Lymph node |
Lymph node |
30 years, male |
FF:10077-102A5 |
Ovary |
Ovary |
47,75,84 years, female |
FF:10020-101D2 |
Pancreas |
Pancreas |
52 years, male |
FF:10049-101G4 |
Pituitary gland |
Pituitary gland |
76 years, female |
FF:10162-103A9 |
Placenta |
Placenta |
female |
FF:10021-101D3 |
Prostate |
Prostate |
73,79,93 years, male |
FF:10022-101D4 |
Retina |
Retina |
24-65 years, mixed |
FF:10030-101E3 |
Salivary gland |
Salivary gland |
16-60 years, mixed |
FF:10093-102C3 |
Seminal vesicle |
Seminal vesicle |
24 years, male |
FF:10201-103F3 |
Skeletal muscle |
Skeletal muscle |
55,79,79 years, mixed |
FF:10023-101D5 |
|
Skeletal muscle - soleus muscle |
male |
FF:10282-104F3 |
Small intestine |
Small intestine |
15,40,85 years, mixed |
FF:10024-101D6 |
Smooth muscle |
Smooth muscle |
20-68 years, male |
FF:10048-101G3 |
Spleen |
Spleen |
39,50,70 years, male |
FF:10025-101D7 |
Testis |
Testis |
34,53,86 years, male |
FF:10026-101D8 |
|
Testis |
14-64 years, male |
FF:10096-102C6 |
Thymus |
Thymus |
0.5,0.5,0.83 years old infant years, male |
FF:10027-101D9 |
Thyroid gland |
Thyroid |
67,68,78 years, mixed |
FF:10028-101E1 |
Tonsil |
Tonsil |
22-61 years, mixed |
FF:10047-101G2 |
Urinary bladder |
Bladder |
55,58,79 years, mixed |
FF:10011-101C2 |
Vagina |
Vagina |
68 years, female |
FF:10204-103F6 |
TCGA RNA-seq data
The Cancer Genome Atlas (TCGA) project of Genomic Data Commons (GDC) collects and analyzes multiple human cancer samples. RNA-seq data from 17 cancer types representing 21 cancer subtypes with a corresponding major cancer type in the Human Pathology Atlas were included to allow for comparisons between the protein staining data from the Human Protein Atlas and RNA-seq from TCGA data.
The TCGA RNA-seq data was mapped using the Ensembl gene id available from TCGA, and the FPKMs (number Fragments Per Kilobase of exon per Million reads) for each gene were subsequently used for quantification of expression with a detection threshold of 1 FPKM. Genes were categorized using the same classification as described above.
HPA cancer type |
TCGA cancer |
No. of samples in TCGA |
Breast cancer |
Breast Invasive Carcinoma (BRCA) |
1075 |
Cervical cancer |
Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) |
291 |
Colorectal cancer |
Colon Adenocarcinoma (COAD) |
438 |
|
Rectum Adenocarcinoma (READ) |
159 |
Endometrial cancer |
Uterine Corpus Endometrial Carcinoma (UCEC) |
541 |
Glioma |
Glioblastoma Multiforme (GBM) |
153 |
Head and neck cancer |
Head and Neck Squamous Cell Carcinoma (HNSC) |
499 |
Liver cancer |
Liver Hepatocellular Carcinoma (LIHC) |
365 |
Lung cancer |
Lung Adenocarcinoma (LUAD) |
500 |
|
Lung Squamous Cell Carcinoma (LUSC) |
494 |
Melanoma |
Skin Cuteneous Melanoma (SKCM) |
102 |
Ovarian cancer |
Ovary Serous Cystadenocarcinoma (OV) |
373 |
Pancreatic cancer |
Pancreatic Adenocarcinoma (PAAD) |
176 |
Prostate cancer |
Prostate Adenocarcinoma (PRAD) |
494 |
Renal cancer |
Kidney Chromophobe (KICH) |
64 |
|
Kidney Renal Clear Cell Carcinoma (KIRC) |
528 |
|
Kidney Renal Papillary Cell Carcinoma (KIRP) |
285 |
Stomach cancer |
Stomach Adenocarcinoma (STAD) |
354 |
Testis cancer |
Testicular Germ Cell Tumor (TGCT) |
134 |
Thyroid cancer |
Thyroid Carcinoma (THCA) |
501 |
Urothelial cancer |
Bladder Urothelial Carcinoma (BLCA) |
406 |
Survival
Based on the FPKM value of each gene, patients were classified into two expression groups and the correlation between expression level and patient survival was examined. Genes with a median expression less than FPKM 1 were excluded. The prognosis of each group of patients was examined by Kaplan-Meier survival estimators, and the survival outcomes of the two groups were compared by log-rank tests. Both median and maximally separated Kaplan-Meier plots are presented in the Human Protein Atlas, and genes with log rank P values less than 0.001 in maximally separated Kaplan-Meier analysis were defined as prognostic genes. If the group of patients with high expression of a selected prognostic gene has a higher observed event than expected event, it is an unfavourable prognostic gene; otherwise, it is a favourable prognostic gene.
Evidence
Protein evidence is calculated for each gene based on three different sources: UniProt protein existence (UniProt evidence); a Human Protein Atlas antibody- or RNA based score (HPA evidence); and evidence based on two proteogenomics studies (MS evidence). In addition, for each gene, a protein evidence summary score is based on the maximum level of evidence in all three independent evidence scores (Evidence summary).
All scores are classified into the following categories:
- Evidence at protein level
- Evidence at transcript level
- No evidence
- Not available
UniProt evidence is based on UniProt
protein existence data, which uses five types of evidence for the existence of a protein. All genes in the
classes "Experimental evidence at protein level" or "Experimental evidence at transcript level" are classified
into the first two evidence categories, whereas genes from the "Inferred from homology", "Predicted", or
"Uncertain" classes are classified as "No evidence". Genes where the gene identifier could not be mapped to
UniProt from Ensembl version 88.38
are classified as "Not available".
The HPA evidence is calculated based on the manual curation of Western blot, tissue profiling and subcellular
location as well as transcript profiling using RNA-seq. All genes with Data reliability "Supported" in one or
both of the two methods
immunohistochemistry and
immunofluorescence, or standard validation "Supported" for the
Western blot application
(assays using over-expression lysates not included)
are classified as "Evidence at protein level". For the remaining genes, all genes detected at TPM > 1 in at least one of the tissues or cell
lines used in the RNA-seq analysis are classified as "Evidence at transcript level".
The remaining genes are classified as "No evidence".
MS evidence is based on two proteogenomics studies Kim et al 2014 and Ezkurdia et al 2014. Each gene detected by at least one of the MS-based studies is classified as "Evidence at protein level" and all remaining genes as "Not available".
|