dataset.data_prep.RETINA
- class dataset.data_prep.RETINA(file_dir='/home/longlab/Data/Thesis/Data/retina.loom')[source]
Bases:
Dataset
Loads RETINA data set.
A class with necessary pre-processing steps for the RETINA dataset. It seperates the raw count data, the cluster numbers, and batches.
The data provided by Lopez et al. is used which can be downloaded from: https://github.com/YosefLab/scVI-data/raw/master/retina.loom The data is in loompy format (http://linnarssonlab.org/loompy/index.html)
The original dataset is available under the GEO accession number GSE81904. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81904
The webpage consists of the raw matrix, BAM files, Rdata files, and R code. To perform the pre-processing steps and analyses of the original paper, one can use the following github repository:
Parameters
- file_dir: str
The directory of the .loom file which should be provided by the user.
Attributes
- file_dirstr
The directory of the HDF5 file.
- n_genesint
Total number of genes in the data set.
- n_cellsint
Total number of cells.
- batchIDndarray
One dimensional array of the Batch Ids of cells.
- clusterndarray
One dimensional array of the assigned cluster numbers by the authors of the original paper.
Notes
Since the data is not very large (approximately 2GB in memory), the class will load the whole data into memory.
Examples
>>> import data_prep >>> retina = data_prep.RETINA() >>> dl = DataLoader(retina, batch_size= batch_size, shuffle=True)
Methods