`R/calculate_diff_abundance.R`

`calculate_diff_abundance.Rd`

Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.

calculate_diff_abundance( data, sample, condition, grouping, intensity_log2, missingness = missingness, comparison = comparison, mean = NULL, sd = NULL, n_samples = NULL, ref_condition = "all", filter_NA_missingness = TRUE, method = c("moderated_t-test", "t-test", "t-test_mean_sd", "proDA"), p_adj_method = "BH", retain_columns = NULL )

data | a data frame containing at least the input variables that are required for the
selected method. Ideally the output of |
---|---|

sample | a character column in the |

condition | a character or numeric column in the |

grouping | a character column in the |

intensity_log2 | a numeric column in the |

missingness | a character column in the |

comparison | a character column in the |

mean | a numeric column in the |

sd | a numeric column in the |

n_samples | a numeric column in the |

ref_condition | optional, character value providing the condition that is used as a
reference for differential abundance calculation. Only required for |

filter_NA_missingness | a logical value, default is |

method | a character value, specifies the method used for statistical hypothesis testing.
Methods include Welch test ( |

p_adj_method | a character value, specifies the p-value correction method. Possible
methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default
method is |

retain_columns | a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns |

A data frame that contains differential abundances (`diff`

), p-values (`pval`

)
and adjusted p-values (`adj_pval`

) for each protein, peptide or precursor (depending on
the `grouping`

variable) and the associated treatment/reference pair. Depending on the
method the data frame contains additional columns:

"t-test": The

`std_error`

column contains the standard error of the differential abundances.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair."t-test_mean_sd": Columns labeled as control refer to the second condition of the comparison pairs. Treated refers to the first condition.

`mean_control`

and`mean_treated`

columns contain the means for the reference and treatment condition, respectively.`sd_control`

and`sd_treated`

columns contain the standard deviations for the reference and treatment condition, respectively.`n_control`

and`n_treated`

columns contain the numbers of samples for the reference and treatment condition, respectively. The`std_error`

column contains the standard error of the differential abundances.`t_statistic`

contains the t_statistic for the t-test."moderated_t-test":

`CI_2.5`

and`CI_97.5`

contain the 2.5% and 97.5% confidence interval borders for differential abundances.`avg_abundance`

contains average abundances for treatment/reference pairs (mean of the two group means).`t_statistic`

contains the t_statistic for the t-test.`B`

The B-statistic is the log-odds that the protein, peptide or precursor (depending on`grouping`

) has a differential abundance between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e, about four and a half to one. The probability that there is a differential abundance is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this group is differentially abundant. A B-statistic of zero corresponds to a 50-50 chance that the group is differentially abundant.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair."proDA": The

`std_error`

column contains the standard error of the differential abundances.`avg_abundance`

contains average abundances for treatment/reference pairs (mean of the two group means).`t_statistic`

contains the t_statistic for the t-test.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair.

set.seed(123) # Makes example reproducible # Create synthetic data data <- create_synthetic_data( n_proteins = 10, frac_change = 0.5, n_replicates = 4, n_conditions = 2, method = "effect_random", additional_metadata = FALSE ) # Assign missingness information data_missing <- assign_missingness( data, sample = sample, condition = condition, grouping = peptide, intensity = peptide_intensity_missing, ref_condition = "all", retain_columns = c(protein, change_peptide) )#>#> #> #># Calculate differential abundances # Using "moderated_t-test" and "proDA" improves # true positive recovery progressively diff <- calculate_diff_abundance( data = data_missing, sample = sample, condition = condition, grouping = peptide, intensity_log2 = peptide_intensity_missing, missingness = missingness, comparison = comparison, method = "t-test", retain_columns = c(protein, change_peptide) )#>#>#>#>#> # A tibble: 10 × 10 #> protein change_peptide comparison peptide missingness pval std_error #> <chr> <lgl> <chr> <chr> <chr> <dbl> <dbl> #> 1 protein_5 TRUE condition_1_… peptide… complete 9.38e-9 0.0557 #> 2 protein_3 TRUE condition_1_… peptide… complete 7.01e-7 0.0919 #> 3 protein_1 TRUE condition_1_… peptide… complete 6.01e-6 0.0670 #> 4 protein_1 FALSE condition_1_… peptide… MAR 5.12e-2 0.00809 #> 5 protein_4 TRUE condition_1_… peptide… MAR 6.66e-2 0.308 #> 6 protein_2 FALSE condition_1_… peptide… complete 7.77e-2 0.275 #> 7 protein_9 FALSE condition_1_… peptide… MAR 1.89e-1 0.486 #> 8 protein_3 FALSE condition_1_… peptide… complete 2.23e-1 0.0752 #> 9 protein_1 FALSE condition_1_… peptide… MAR 2.23e-1 0.0466 #> 10 protein_3 FALSE condition_1_… peptide… complete 2.71e-1 0.0843 #> # … with 3 more variables: diff <dbl>, adj_pval <dbl>, n_obs <int>