dataset.data_prep.RETINA

class dataset.data_prep.RETINA(file_dir='/home/longlab/Data/Thesis/Data/retina.loom')[source]

Bases: Dataset

Loads RETINA data set.

A class with necessary pre-processing steps for the RETINA dataset. It seperates the raw count data, the cluster numbers, and batches.

The data provided by Lopez et al. is used which can be downloaded from: https://github.com/YosefLab/scVI-data/raw/master/retina.loom The data is in loompy format (http://linnarssonlab.org/loompy/index.html)

The original dataset is available under the GEO accession number GSE81904. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81904

The webpage consists of the raw matrix, BAM files, Rdata files, and R code. To perform the pre-processing steps and analyses of the original paper, one can use the following github repository:

https://github.com/broadinstitute/BipolarCell2016

Parameters

file_dir: str: The directory of the .loom file which should be provided by the user.

Attributes

file_dirstr: The directory of the HDF5 file.
n_genesint: Total number of genes in the data set.
n_cellsint: Total number of cells.
batchIDndarray: One dimensional array of the Batch Ids of cells.
clusterndarray: One dimensional array of the assigned cluster numbers by the authors of the original paper.

Notes

Since the data is not very large (approximately 2GB in memory), the class will load the whole data into memory.

Examples

>>> import data_prep
>>> retina = data_prep.RETINA()
>>> dl = DataLoader(retina, batch_size= batch_size, shuffle=True)

Methods