dataset.data_prep.CORTEX

class dataset.data_prep.CORTEX(file_dir='/home/longlab/Data/Thesis/Data/expression_mRNA_17-Aug-2014.txt', n_genes=558)[source]

Bases: Dataset

Loads CORTEX dataset.

A class with necessary pre-processing steps for the gold standard Zeisel data set which contains 3005 mouse cortex cells and gold-standard labels for seven distinct cell types. Each cell type corresponds to a cluster to recover.

The pre-processing steps are:

exctracting the labels of the cell types from the data

choosing the genes that are transcribed in more than 25 cells

3. Selecting the 558 genes with the highest Variance in the remaining genes from the previous step 4. Performing random permutation of the genes

Parameters

file_dirstr: The path to the .csv file.
n_genesint: Number of the high variable genes that should be selected.
n_cells:: Total number of cells.
data: torch Tensor: The data.
labels: torch Tensor: The labels.

Examples

>>> import data_prep
>>> cortex = data_prep.CORTEX()
>>> dl = DataLoader(cortex, batch_size= 128, shuffle=True)

Methods