dataset.data_prep.CORTEX

class dataset.data_prep.CORTEX(file_dir='/home/longlab/Data/Thesis/Data/expression_mRNA_17-Aug-2014.txt', n_genes=558)[source]

Bases: Dataset

Loads CORTEX dataset.

A class with necessary pre-processing steps for the gold standard Zeisel data set which contains 3005 mouse cortex cells and gold-standard labels for seven distinct cell types. Each cell type corresponds to a cluster to recover.

The pre-processing steps are:

  1. exctracting the labels of the cell types from the data

  2. choosing the genes that are transcribed in more than 25 cells

3. Selecting the 558 genes with the highest Variance in the remaining genes from the previous step 4. Performing random permutation of the genes

Parameters

file_dirstr

The path to the .csv file.

n_genesint

Number of the high variable genes that should be selected.

n_cells:

Total number of cells.

data: torch Tensor

The data.

labels: torch Tensor

The labels.

Examples

>>> import data_prep
>>> cortex = data_prep.CORTEX()
>>> dl = DataLoader(cortex, batch_size= 128, shuffle=True)

Methods