utils.helper.corrupting
- utils.helper.corrupting(data, p=0.1, method='Uniform', percentage=0.1)[source]
Adopted from the “Deep Generative modeling for transcriptomics data”.
This function will corrupt (adding noise or dropouts) the datasets for imputation benchmarking.
- Two different approaches for data corruption:
1. Uniform zero introduction: Randomly selected a percentage of the nonzero entries and multiplied the entry n with a Ber(0.9) random variable. 2. Binomial data corruption: Randomly selected a percentage of the matrix and replaced an entry n with a Bin(n, 0.2) random variable.
Parameters
- datanumpy ndarray
The data.
- pfloat >= 0 and <=1
The probability of success in Bernoulli or Binomial distribution.
- method: str
Specifies the method of data corruption, one of the two options: “Uniform” and “Binomial”
- percentage: float >0 and <1.
The percentage of non-zero elements to be selected for corruption.
Returns
- data_cnumpy ndarray
The corrupted data.
- x, y, indint
The indices of where corruption is applied.