Title: | Compositional Data Analysis for Microbiome Studies |
---|---|
Description: | Functions for microbiome data analysis that take into account its compositional nature. Performs variable selection through penalized regression for both, cross-sectional and longitudinal studies, and for binary and continuous outcomes. |
Authors: | Malu Calle [aut] |
Maintainer: | Toni Susin <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.4 |
Built: | 2025-02-13 04:13:30 UTC |
Source: | https://github.com/cran/coda4microbiome |
Microbial signatures in survival studies The algorithm performs variable selection through an elastic-net penalized Cox regression conveniently adapted to CoDA. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
coda_coxnet( x, time, status, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
coda_coxnet( x, time, status, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
time |
time to event or follow up time for right censored data. Must be a numericvector. |
status |
event occurrence. Vector (type: numeric or logical) specifying 0, or FALSE, for no event occurrence, and 1, or TRUE, for event occurrence. |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
showPlots |
if TRUE, shows the plots (default = TRUE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
list with "taxa.num","taxa.name","log-contrast coefficients","risk.score","apparent Cindex","mean cv-Cindex","sd cv-Cindex","risk score plot","signature plot".
M. Calle, M. Pujolassos, T. Susin
data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event set.seed(12345) coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event set.seed(12345) coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
Microbial signatures in cross-sectional studies. The algorithm performs variable selection through penalized regression on the set of all pairwise log-ratios. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
coda_glmnet( x, y, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
coda_glmnet( x, y, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
showPlots |
if TRUE, shows the plots (default = TRUE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
if y is binary: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot" if not:list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent Rsq","mean cv-MSE","sd cv-MSE","predictions plot","signature plot"
M. Calle - T. Susin
data(Crohn, package = "coda4microbiome") set.seed(123) coda_glmnet(x_Crohn[,(1:10)],y_Crohn,showPlots= FALSE)
data(Crohn, package = "coda4microbiome") set.seed(123) coda_glmnet(x_Crohn[,(1:10)],y_Crohn,showPlots= FALSE)
Microbial signatures in longitudinal studies. Identification of a set of microbial taxa whose joint dynamics is associated with the phenotype of interest. The algorithm performs variable selection through penalized regression over the summary of the log-ratio trajectories (AUC). The result is expressed as the (weighted) balance between two groups of taxa.
coda_glmnet_longitudinal( x, y, x_time, subject_id, ini_time, end_time, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
coda_glmnet_longitudinal( x, y, x_time, subject_id, ini_time, end_time, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0 )
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
nvar |
number of variables to use in the glmnet.fit function (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds (default = 10) |
showPlots |
if TRUE, shows the plots (default = FALSE) |
coef_threshold |
coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) |
in case of binary outcome: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot","trajectories plot"
M. Calle - T. Susin
data(ecam_filtered, package = "coda4microbiome") # load the data ecam_results<-coda_glmnet_longitudinal (x=x_ecam[,(1:4)],y= metadata$diet, x_time= metadata$day_of_life, subject_id = metadata$studyid, ini_time=0, end_time=60,lambda="lambda.min",nfolds=4, showPlots=FALSE) ecam_results$taxa.num
data(ecam_filtered, package = "coda4microbiome") # load the data ecam_results<-coda_glmnet_longitudinal (x=x_ecam[,(1:4)],y= metadata$diet, x_time= metadata$day_of_life, subject_id = metadata$studyid, ini_time=0, end_time=60,lambda="lambda.min",nfolds=4, showPlots=FALSE) ecam_results$taxa.num
Performs a permutational test for the coda_glmnet_longitudinal() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet_longitudinal() on different rearrangements of the response variable.
coda_glmnet_longitudinal_null( x, y, x_time, subject_id, ini_time, end_time, niter = 100, covar = NULL, alpha = 0.9, lambda = "lambda.1se", nfolds = 10, sig = 0.05 )
coda_glmnet_longitudinal_null( x, y, x_time, subject_id, ini_time, end_time, niter = 100, covar = NULL, alpha = 0.9, lambda = "lambda.1se", nfolds = 10, sig = 0.05 )
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
niter |
number of sample iterations |
covar |
data frame with covariates (default = NULL) |
alpha |
elastic net parameter (default = 0.9) |
lambda |
penalization parameter (default = "lambda.1se") |
nfolds |
number of folds |
sig |
significance value (default = 0.05) |
list with "accuracy" and "confidence interval"
M. Calle - T. Susin
set.seed(123) # to reproduce the results data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet) ini_time = 0 end_time = 90 coda_glmnet_longitudinal_null (x,y, x_time, subject_id, ini_time, end_time, lambda="lambda.min",nfolds=4, niter=3)
set.seed(123) # to reproduce the results data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet) ini_time = 0 end_time = 90 coda_glmnet_longitudinal_null (x,y, x_time, subject_id, ini_time, end_time, lambda="lambda.min",nfolds=4, niter=3)
internal function
coda_glmnet_longitudinal0( x, lrX, idlrX, nameslrX, y, x_time, subject_id, ini_time, end_time, covar = NULL, ktop = NULL, lambda = "lambda.1se", alpha = 0.9, nfolds = 10 )
coda_glmnet_longitudinal0( x, lrX, idlrX, nameslrX, y, x_time, subject_id, ini_time, end_time, covar = NULL, ktop = NULL, lambda = "lambda.1se", alpha = 0.9, nfolds = 10 )
x |
abundance matrix or data frame in long format (several rows per individual) |
lrX |
log-ratio matrix |
idlrX |
indices table in the log-ratio matrix |
nameslrX |
colnames of the log-ratio matrix |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
covar |
data frame with covariates (default = NULL) |
ktop |
given number of selected taxa or compute the best number in case it is NULL (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
alpha |
elastic net parameter (default = 0.9) |
nfolds |
number of folds |
.
M. Calle - T. Susin
Performs a permutational test for the coda_glmnet() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet() on different rearrangements of the response variable.
coda_glmnet_null( x, y, niter = 100, covar = NULL, lambda = "lambda.1se", alpha = 0.9, sig = 0.05 )
coda_glmnet_null( x, y, niter = 100, covar = NULL, lambda = "lambda.1se", alpha = 0.9, sig = 0.05 )
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
niter |
number of iterations (default = 100) |
covar |
data frame with covariates (default = NULL) |
lambda |
penalization parameter (default = "lambda.1se") |
alpha |
elastic net parameter (default = 0.9) |
sig |
significance level for the confidence interval (default = 0.05) |
a list with "accuracy" and "confidence interval"
M. Calle - T. Susin
data(Crohn, package = "coda4microbiome") coda_glmnet_null(x=x_Crohn[,(1:10)], y=y_Crohn, niter=2,covar=NULL,lambda="lambda.1se", alpha=0.9,sig=0.05)
data(Crohn, package = "coda4microbiome") coda_glmnet_null(x=x_Crohn[,(1:10)], y=y_Crohn, niter=2,covar=NULL,lambda="lambda.1se", alpha=0.9,sig=0.05)
Internal function for the permutational test
coda_glmnet0( x, lrX, idlrX, nameslrX, y, covar = NULL, lambda = "lambda.1se", alpha = 0.9 )
coda_glmnet0( x, lrX, idlrX, nameslrX, y, covar = NULL, lambda = "lambda.1se", alpha = 0.9 )
x |
. |
lrX |
. |
idlrX |
. |
nameslrX |
. |
y |
. |
covar |
. |
lambda |
. |
alpha |
. |
.
M. Calle - T. Susin
This package provides a set of functions to explore and study microbiome data within the CoDA framework, with a special focus on identification of microbial signatures (variable selection).
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
The dataset contains two objects:
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
a factor
, indicating if the sample corresponds to a case (CD) or a control (no).
doi:10.1016/j.chom.2014.02.005
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
The dataset contains three objects:
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
vector containing the taxonomy of the 37 taxa
matrix with information on the individuals at the observation time
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
The dataset contains three objects:
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
a numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
a numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
doi:10.1016/j.chom.2014.02.005
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
The dataset contains three objects:
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
a numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
a numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
doi:10.1016/j.chom.2014.02.005
Explores the association of each log-ratio with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
explore_logratios( x, y, decreasing = TRUE, measure = "AUC", covar = NULL, shownames = FALSE, maxrow = 15, maxcol = 15, showtitle = TRUE, mar = c(0, 0, 1, 0) )
explore_logratios( x, y, decreasing = TRUE, measure = "AUC", covar = NULL, shownames = FALSE, maxrow = 15, maxcol = 15, showtitle = TRUE, mar = c(0, 0, 1, 0) )
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
decreasing |
order of importance (default = TRUE) |
measure |
association measures "AUC", "Pearson","Spearman", "glm" (default = "AUC") |
covar |
data frame with covariates (default = NULL) |
shownames |
logical, if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) |
maxrow |
maximum number of rows to display in the plot (default = 15) |
maxcol |
maximum number of columns to display in the plot (default = 15) |
showtitle |
logical, if TRUE, shows the title of the plot (default = TRUE) |
mar |
mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) |
list with "max log-ratio","names max log-ratio", "order of importance", "name of most important variables", "association log-ratio with y" and "top log-ratios plot"
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") explore_logratios(x_HIV,y_HIV)
data(HIV, package = "coda4microbiome") explore_logratios(x_HIV,y_HIV)
Explores the association of summary (integral) of each log-ratio trajectory with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
explore_lr_longitudinal( x, y, x_time, subject_id, ini_time, end_time, showPlots = FALSE, decreasing = TRUE, covar = NULL, shownames = FALSE, maxrow = 15, maxcol = 15, showtitle = TRUE, mar = c(0, 0, 1, 0) )
explore_lr_longitudinal( x, y, x_time, subject_id, ini_time, end_time, showPlots = FALSE, decreasing = TRUE, covar = NULL, shownames = FALSE, maxrow = 15, maxcol = 15, showtitle = TRUE, mar = c(0, 0, 1, 0) )
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
showPlots |
if TRUE, shows the plot (default = FALSE) |
decreasing |
order of importance (default = TRUE) |
covar |
data frame with covariates (default = NULL) |
shownames |
if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) |
maxrow |
maximum number of rows to display in the plot (default = 15) |
maxcol |
maximum number of columns to display in the plot (default = 15) |
showtitle |
logical, if TRUE, shows the title of the plot (default = TRUE) |
mar |
mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) |
list with "max log-ratio","names max log-ratio","order of importance","name of most important variables","association log-ratio with y","top log-ratios plot"
M. Calle - T. Susin
set.seed(123) # to reproduce the results data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet) ini_time = 0 end_time = 90 ecam_logratios<-explore_lr_longitudinal(x,y,x_time,subject_id,ini_time,end_time)
set.seed(123) # to reproduce the results data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id y= metadata$diet # diet ("bd"= breast diet, "fd"=formula diet) ini_time = 0 end_time = 90 ecam_logratios<-explore_lr_longitudinal(x,y,x_time,subject_id,ini_time,end_time)
Provides the proportion of zeros for a pair of variables (taxa) in table x and the proportion of samples with zero in both variables. A bar plot with this information is also provided. Results can be stratified by a categorical variable.
explore_zeros(x, id1, id2, strata = NULL)
explore_zeros(x, id1, id2, strata = NULL)
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
id1 |
column number in x for the first taxa |
id2 |
column number in x for the second taxa |
strata |
stratification variable (default = NULL) |
a list with the frequency table and the associated bar plot
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") explore_zeros(x_HIV,5,6) explore_zeros(x_HIV,5,6, strata=y_HIV)
data(HIV, package = "coda4microbiome") explore_zeros(x_HIV,5,6) explore_zeros(x_HIV,5,6, strata=y_HIV)
Filters those individuals and taxa with enough longitudinal information
filter_longitudinal( x, taxanames = NULL, x_time, subject_id, metadata, ini_time = min(x_time), end_time = max(x_time), percent_indv = 0.7, min_obs = 3 )
filter_longitudinal( x, taxanames = NULL, x_time, subject_id, metadata, ini_time = min(x_time), end_time = max(x_time), percent_indv = 0.7, min_obs = 3 )
x |
abundance matrix or data frame in long format (several rows per individual) |
taxanames |
names of different taxa |
x_time |
observation times |
subject_id |
subject id |
metadata |
matrix sample data |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
percent_indv |
percentage of individuals with more than min_obs observations |
min_obs |
required minimum number of observations per individual |
list with filtered abundance table, taxanames and metadata
M. Calle - T. Susin
data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id ini_time = 0 end_time = 360 data_filtered<-filter_longitudinal(x,taxanames,x_time, subject_id, metadata, ini_time, end_time, min_obs=4)
data(ecam_filtered, package = "coda4microbiome") # load the data x=x_ecam # microbiome abundance x_time = metadata$day_of_life # observation times subject_id = metadata$studyid # subject id ini_time = 0 end_time = 360 data_filtered<-filter_longitudinal(x,taxanames,x_time, subject_id, metadata, ini_time, end_time, min_obs=4)
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
The dataset contains three objects:
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
a factor, specifying if the individual is HIV positive or (Pos
) or negative (Neg
).
a factor, indicating sexual preferences: MSM
(Men who have Sex with Men) or not (nonMSM
).
doi:10.1016/j.ebiom.2016.01.032
Simple imputation: When the abundance table contains zeros, a positive value is added to all the values in the table. It adds 1 when the minimum of table is larger than 1 (i.e. tables of counts) or it adds half of the minimum value of the table, otherwise.
impute_zeros(x)
impute_zeros(x)
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
x abundance matrix or data frame with zeros substituted by imputed positive values
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") x<-impute_zeros(x_HIV)
data(HIV, package = "coda4microbiome") x<-impute_zeros(x_HIV)
Integral of the curve trajectory of subject_id in the interval a,b
integralFun(x, y, id, a, b)
integralFun(x, y, id, a, b)
x |
abundance matrix or data frame in long format (several rows per individual) |
y |
outcome (binary); data type: numeric, character or factor vector |
id |
subjects-ids |
a |
interval initial time |
b |
interval final time |
matrix with integrals for each individual (rows) and each taxa (columns)
M. Calle - T. Susin
Computes a large matrix with all the log-ratios between pairs of taxa (columns) in the abundance table
logratios_matrix(x)
logratios_matrix(x)
x |
abundance matrix or data frame (rows are samples, columns are variables (taxa)) |
list with matrix of log-ratios, matrix indicating the pairs of variables involved in each log-ratio, and a matrix indicating the names of the variables involved in each log-ratio.
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") lrHIV<-logratios_matrix(x_HIV[,(1:4)]) # matrix of log-ratios (first 6 rows and 6 columns): lrHIV[[1]][1:6,1:6] # variables involved in each log-ratio head(lrHIV[[2]]) # names of the variables involved in each log-ratio head(lrHIV[[3]])
data(HIV, package = "coda4microbiome") lrHIV<-logratios_matrix(x_HIV[,(1:4)]) # matrix of log-ratios (first 6 rows and 6 columns): lrHIV[[1]][1:6,1:6] # variables involved in each log-ratio head(lrHIV[[2]]) # names of the variables involved in each log-ratio head(lrHIV[[3]])
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
The dataset contains three objects:
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
vector containing the taxonomy of the 37 taxa
matrix with information on the individuals at the observation time
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
The dataset contains three objects:
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
a factor, specifying if the individual is HIV positive or (Pos
) or negative (Neg
).
a factor, indicating sexual preferences: MSM
(Men who have Sex with Men) or not (nonMSM
).
doi:10.1016/j.ebiom.2016.01.032
Plot of the predictions of a fitted model: Multiple box-plot and density plots for binary outcomes and Regression plot for continuous outcome
plot_prediction(prediction, y, strata = NULL, showPlots = TRUE)
plot_prediction(prediction, y, strata = NULL, showPlots = TRUE)
prediction |
the fitted values of predictions for the model |
y |
outcome (binary or continuous); data type: numeric, character or factor vector |
strata |
stratification variable (default = NULL) |
showPlots |
if TRUE, shows the plots (default = TRUE) |
prediction plot
M. Calle - T. Susin
# prediction plot for the log-ratio between columns 3 and 32 on HIV status data(HIV, package = "coda4microbiome") x<-impute_zeros(x_HIV) lr<-log(x[,3])-log(x[,32]) plot_prediction(lr, y_HIV)
# prediction plot for the log-ratio between columns 3 and 32 on HIV status data(HIV, package = "coda4microbiome") x<-impute_zeros(x_HIV) lr<-log(x[,3])-log(x[,32]) plot_prediction(lr, y_HIV)
Plots samples ordered by microbial risk score values along time to event.
plot_riskscore(risk.score, x, time, status, showPlots = TRUE)
plot_riskscore(risk.score, x, time, status, showPlots = TRUE)
risk.score |
microbial risk score values resulting from the coda_coxnet model |
x |
original survival data |
time |
time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data. |
status |
event occurrence. Vector (numeric or logical) specifying 0 (or FALSE) for no event occurrence, and 1 (or TRUE) for event occurrence. |
showPlots |
(default: TRUE) |
returns an object of class HeatmapList.
M. Calle, M. Pujolassos, T. Susin
set.seed(12345) data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event crohn_cox <- coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0) plot_riskscore(risk.score = crohn_cox$risk.score, x = x, time = Event_time, status = Event, showPlots = TRUE) #-------------------------------------------------------------------------------
set.seed(12345) data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event crohn_cox <- coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0) plot_riskscore(risk.score = crohn_cox$risk.score, x = x, time = Event_time, status = Event, showPlots = TRUE) #-------------------------------------------------------------------------------
Graphical representation of the variables selected and their coefficients
plot_signature(vars, coeff, showPlots = TRUE, varnames = NULL)
plot_signature(vars, coeff, showPlots = TRUE, varnames = NULL)
vars |
variables selected |
coeff |
associated coefficients |
showPlots |
if TRUE, shows the plots (default = TRUE) |
varnames |
if TRUE, shows the names of the variables |
bar plot
M. Calle - T. Susin
plot_signature(c(2,10, 3, 15, 4), c(0.8, -0.1, 0.2, -0.6, -0.3))
plot_signature(c(2,10, 3, 15, 4), c(0.8, -0.1, 0.2, -0.6, -0.3))
Graphical representation of the signature trajectories
plot_signature_curves( varNum, coeff, x, y, x_time, subject_id, ini_time, end_time, color = c("chocolate1", "slateblue2"), showLabel = TRUE, location = "bottomright", inset = c(0.01, 0.02), cex = 0.8, y.intersp = 0.7, main_title = NULL )
plot_signature_curves( varNum, coeff, x, y, x_time, subject_id, ini_time, end_time, color = c("chocolate1", "slateblue2"), showLabel = TRUE, location = "bottomright", inset = c(0.01, 0.02), cex = 0.8, y.intersp = 0.7, main_title = NULL )
varNum |
column number of the variables (taxa) |
coeff |
coefficients (coefficients must sum-up zero) |
x |
microbiome abundance matrix in long format |
y |
binary outcome; data type: numeric, character or factor vector |
x_time |
observation times |
subject_id |
subject id |
ini_time |
initial time to be analyzed |
end_time |
end time to be analyzed |
color |
color graphical parameter |
showLabel |
graphical parameter (see help(label)) |
location |
graphical parameter (see help(label)) |
inset |
graphical parameter (see help(label)) |
cex |
graphical parameter (see help(label)) |
y.intersp |
graphical parameter (see help(label)) |
main_title |
title plot |
trajectories plot
M. Calle - T. Susin
x=matrix(c(2, 3, 4, 1, 2, 5, 10, 20, 15, 30, 40, 12), ncol=2) x_time = c(0,10,20,1,15, 25) subject_id = c(1,1,1,2,2,2) y=c(0,0,0,1,1,1) plot_signature_curves(varNum=c(1,2), coeff=c(1,-1), x, y,x_time, subject_id, ini_time=0, end_time=25)
x=matrix(c(2, 3, 4, 1, 2, 5, 10, 20, 15, 30, 40, 12), ncol=2) x_time = c(0,10,20,1,15, 25) subject_id = c(1,1,1,2,2,2) y=c(0,0,0,1,1,1) plot_signature_curves(varNum=c(1,2), coeff=c(1,-1), x, y,x_time, subject_id, ini_time=0, end_time=25)
Plots survival curves stratifying samples based on their signature value. Signature value for stratification can be set by the user.
plot_survcurves(risk.score, time, status, strata.quantile = 0.5)
plot_survcurves(risk.score, time, status, strata.quantile = 0.5)
risk.score |
microbial risk score values resulting from the coda_coxnet model |
time |
time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data (what about interval data?). |
status |
event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence. |
strata.quantile |
cut-off quantile (values 0, 1) for sample stratification based on signature value. Default is set to 0.5 quantile of the signature. |
return an object of class ggsurvplot which is list containing the following: plot: the survival plot (ggplot object). table: the number of subjects at risk table per time (ggplot object). data.survplot: data used to plot the survival curves (data.frame). data.survtable: data used to plot the tables under the main survival curves (data.frame).
M. Calle, M. Pujolassos, T. Susin
set.seed(12345) data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event crohn_cox <- coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0) plot_survcurves(risk.score = crohn_cox$risk.score, time, status, strata.quantile = 0.5) #-------------------------------------------------------------------------------
set.seed(12345) data(data_survival, package = "coda4microbiome") time <- Event_time status <- Event crohn_cox <- coda_coxnet(x = x, time = Event_time, status = Event, covar = NULL, lambda = "lambda.1se", nvar = NULL, alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0) plot_survcurves(risk.score = crohn_cox$risk.score, time, status, strata.quantile = 0.5) #-------------------------------------------------------------------------------
Internal plot function
plotMedianCurve(iNum, iDen, X, Y, x_time, subject_id, ini_time, end_time)
plotMedianCurve(iNum, iDen, X, Y, x_time, subject_id, ini_time, end_time)
iNum |
. |
iDen |
. |
X |
. |
Y |
. |
x_time |
. |
subject_id |
. |
ini_time |
. |
end_time |
. |
.
M. Calle - T. Susin
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
The dataset contains two objects:
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
a numeric
variable with the value of the inflammation parameter sCD14 for each sample
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
Shannon information
shannon(x)
shannon(x)
x |
abundance composition (vector) |
shannon information
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") shannon(x_HIV[1,])
data(HIV, package = "coda4microbiome") shannon(x_HIV[1,])
Shannon effective number of variables in a composition
shannon_effnum(x)
shannon_effnum(x)
x |
abundance composition (vector) |
shannon information
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") shannon_effnum(x_HIV[1,])
data(HIV, package = "coda4microbiome") shannon_effnum(x_HIV[1,])
Shannon similarity between two compositions
shannon_sim(x, y)
shannon_sim(x, y)
x |
abundance composition (vector) |
y |
abundance composition (vector) |
shannon similarity (value between 0 and 1)
M. Calle - T. Susin
data(HIV, package = "coda4microbiome") shannon_sim(x_HIV[1,],x_HIV[2,])
data(HIV, package = "coda4microbiome") shannon_sim(x_HIV[1,],x_HIV[2,])
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
The dataset contains three objects:
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
vector containing the taxonomy of the 37 taxa
matrix with information on the individuals at the observation time
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
The dataset contains three objects:
microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns)
a numeric
, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
a numeric
, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
doi:10.1016/j.chom.2014.02.005
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
The dataset contains two objects:
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
a factor
, indicating if the sample corresponds to a case (CD) or a control (no).
doi:10.1016/j.chom.2014.02.005
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
The dataset contains three objects:
microbiome abundance matrix in long format (873 rows) and 37 genera (columns)
vector containing the taxonomy of the 37 taxa
matrix with information on the individuals at the observation time
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
The dataset contains three objects:
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
a factor, specifying if the individual is HIV positive or (Pos
) or negative (Neg
).
a factor, indicating sexual preferences: MSM
(Men who have Sex with Men) or not (nonMSM
).
doi:10.1016/j.ebiom.2016.01.032
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
The dataset contains two objects:
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
a numeric
variable with the value of the inflammation parameter sCD14 for each sample
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
The dataset contains two objects:
microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns)
a factor
, indicating if the sample corresponds to a case (CD) or a control (no).
doi:10.1016/j.chom.2014.02.005
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
The dataset contains three objects:
microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns)
a factor, specifying if the individual is HIV positive or (Pos
) or negative (Neg
).
a factor, indicating sexual preferences: MSM
(Men who have Sex with Men) or not (nonMSM
).
doi:10.1016/j.ebiom.2016.01.032
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
The dataset contains two objects:
microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns)
a numeric
variable with the value of the inflammation parameter sCD14 for each sample
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)