Recent integration and visualization of single cell level big data in the Human Protein Atlas
Recent integration and visualization of single cell level big data in the Human Protein AtlasA review of some of the latest developments within the HPA was recently published in the journal Protein Science, summarizing new data and features of the website, as well as describing the recent advances in the ongoing move into using multiple large datasets and computational analytical tools to decipher the human proteome at the single cell level. In the last decade, the Human Protein Atlas portal has grown into one of the world's most visited databases. What started as an antibody-based exploration of the 20,000 human proteins, has expanded through the integration of several transcriptomics datasets and other types of data, to give rise to a multidimensional spatial map of the human proteome at the RNA and protein level. The detailed mapping covers both health and disease at both tissue, single cell and subcellular resolution, that has recently been organized into various sections, each representing a unique perspective of the proteins. During the last couple of years an effort has been made to add single cell level resolution data to HPA to be able to investigate and reveal expression variation between cells. A major single cell level resource was added in the form of the Single Cell Type section where single cell RNA-seq data from a large number of tissues was imported and re-analyzed to generate mRNA levels for a majority of the cell types in the human body. The Tissue Cell Type section was another single cell level addition, based on deconvolution of bulk RNA-seq data, which generates cell type specificity predictions for each gene based on expression correlation with cell type-specific reference genes. In addition, the spatial protein information gathered from antibody-stained tissue sections has been re-evaluated with increased depth to include expression levels for cell types and cellular structures that were previously not annotated. The substantial expansion of the database has involved the development of advanced methods for analyzing and visualizing large complex datasets, as well as the creation of easily interpreted graphics to help make the information accessible to the visitors. The transcriptomics data has been dimensionally reduced to generate UMAP cluster plots that facilitated the identification of single cells of the same cell type and groups of genes with similar expression patterns across the body. The new section-layout of the website with additions such as user-friendly graphics and cluster representations of data will hopefully simplify further research into the human proteome at new levels of resolution. |