dataset.data_prep.RETINA

class dataset.data_prep.RETINA(file_dir='/home/longlab/Data/Thesis/Data/retina.loom')[source]

Bases: Dataset

Loads RETINA data set.

A class with necessary pre-processing steps for the RETINA dataset. It seperates the raw count data, the cluster numbers, and batches.

The data provided by Lopez et al. is used which can be downloaded from: https://github.com/YosefLab/scVI-data/raw/master/retina.loom The data is in loompy format (http://linnarssonlab.org/loompy/index.html)

The original dataset is available under the GEO accession number GSE81904. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81904

The webpage consists of the raw matrix, BAM files, Rdata files, and R code. To perform the pre-processing steps and analyses of the original paper, one can use the following github repository:

Parameters

file_dir: str

The directory of the .loom file which should be provided by the user.

Attributes

file_dirstr

The directory of the HDF5 file.

n_genesint

Total number of genes in the data set.

n_cellsint

Total number of cells.

batchIDndarray

One dimensional array of the Batch Ids of cells.

clusterndarray

One dimensional array of the assigned cluster numbers by the authors of the original paper.

Notes

Since the data is not very large (approximately 2GB in memory), the class will load the whole data into memory.

Examples

>>> import data_prep
>>> retina = data_prep.RETINA()
>>> dl = DataLoader(retina, batch_size= batch_size, shuffle=True)

Methods