utils.helper.corrupting

utils.helper.corrupting(data, p=0.1, method='Uniform', percentage=0.1)[source]

Adopted from the “Deep Generative modeling for transcriptomics data”.

This function will corrupt (adding noise or dropouts) the datasets for imputation benchmarking.

Two different approaches for data corruption:: 1. Uniform zero introduction: Randomly selected a percentage of the nonzero entries and multiplied the entry n with a Ber(0.9) random variable. 2. Binomial data corruption: Randomly selected a percentage of the matrix and replaced an entry n with a Bin(n, 0.2) random variable.

Parameters

datanumpy ndarray: The data.
pfloat >= 0 and <=1: The probability of success in Bernoulli or Binomial distribution.
method: str: Specifies the method of data corruption, one of the two options: “Uniform” and “Binomial”
percentage: float >0 and <1.: The percentage of non-zero elements to be selected for corruption.

Returns

data_cnumpy ndarray: The corrupted data.
x, y, indint: The indices of where corruption is applied.