utils.helper.entropy_batch_mixing

utils.helper.entropy_batch_mixing(latent_space, batches, K=50, n_jobs=8, n=100, n_iter=50)[source]

Adopted from:

1) Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018 Jun;36(5):421-427. doi: 10.1038/nbt.4091.

2) Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018 Dec;15(12):1053-1058. doi: 10.1038/s41592-018-0229-2.

This function will choose n cells from batches, finds K nearest neighbors of each randomly chosen cell, and calculates the average regional entropy of all n cells.

The procedure is repeated for n_iter iterations. Finally, the average of the iterations is returned as the final batch mixing score.

Parameters

latent_spacenumpy ndarray

The latent space matrix.

batchesa numpy array or a list

The batch number of each sample in the latent space matrix.

Kint

Number of nearest neighbors.

n_jobsint

Number of jobs. Please visit scikit-learn documentation for more info.

nint

Number of cells to be chosen randomly.

n_iterint

Number of iterations to randomly choosing n cells.

Returns

scorefloat <= 1

The batch mixing score; the higher, the better.