utils.helper.corrupting

utils.helper.corrupting(data, p=0.1, method='Uniform', percentage=0.1)[source]

Adopted from the “Deep Generative modeling for transcriptomics data”.

This function will corrupt (adding noise or dropouts) the datasets for imputation benchmarking.

Two different approaches for data corruption:

1. Uniform zero introduction: Randomly selected a percentage of the nonzero entries and multiplied the entry n with a Ber(0.9) random variable. 2. Binomial data corruption: Randomly selected a percentage of the matrix and replaced an entry n with a Bin(n, 0.2) random variable.

Parameters

datanumpy ndarray

The data.

pfloat >= 0 and <=1

The probability of success in Bernoulli or Binomial distribution.

method: str

Specifies the method of data corruption, one of the two options: “Uniform” and “Binomial”

percentage: float >0 and <1.

The percentage of non-zero elements to be selected for corruption.

Returns

data_cnumpy ndarray

The corrupted data.

x, y, indint

The indices of where corruption is applied.