Title: | Simultaneous Selection by Trait and WAASB Index |
---|---|
Description: | This tool proposes a new ranking algorithm that utilizes a "Y*WAASB" biplot generated by the 'metan'. The aim of the current package is to effectively distinguish the top-ranked genotypes in MET (Multi-Environmental Trials). For a detailed explanation of the process of obtaining "WAASB", "WAASBY" indices, and a "Y*WAASB" biplot, refer to the manual included in this package as well as the study by Olivoto & Lúcio (2020) <doi:10.1111/2041-210X.13384>. In this context, "WAASB" refers to the "Weighted Average of Absolute Scores" provided by Olivoto et al. (2019) <doi:10.2134/agronj2019.03.0220>, which quantifies the stability of genotypes across different environments using linear mixed-effect models. To run the package, you need to extract the "WAASB" and "WAASBY" coefficients using the 'metan' and apply them. This tool utilizes PCA (Principal Component Analysis) and differentiates the entries which may be genotypes, hybrids, varieties, etc using "WAASB", "WAASBY", and a combination of the specified trait and WAASB index. |
Authors: | Ali Arminian [aut, cre, cph] |
Maintainer: | Ali Arminian <[email protected]> |
License: | GPL-3 |
Version: | 0.2 |
Built: | 2024-11-18 03:55:14 UTC |
Source: | https://github.com/cran/rYWAASB |
rYWAASB performs a new ranking algorithm based on a "Y*WAASB" biplot generated by the 'metan' package which is used in MET(Multi-Environmental Trials). This package effectively distinguishes the top-ranked genotypes based on a given trait (e.g. grain yield or any other trait in agricultural experiments) and the "WAASB" index in the Genotype-by-environment interaction effect studies. Note: Fortunately, this package can impute missing observations and computes them, eliminating any concerns about their presence in the data set. A complete guide may be found at: https://github.com/abeyran/rYWAASB/issues
Ali Arminian [email protected]
Useful links:
bar_plot1()
creates a bar plot for the new index (rYWAASB
for individuals) for simultaneous selection of genotypes by
trait and WAASB index using ggplot2.
bar_plot1(datap)
bar_plot1(datap)
datap |
The data set |
Returns an object of class gg, ggmatrix
.
Ali Arminian [email protected]
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
data(maize) bar_plot1(maize)
data(maize) bar_plot1(maize)
bar_plot2()
creates the 2nd barplot of the ranks of genotypes
using the graphics
package.
bar_plot2(datap, verbose = FALSE)
bar_plot2(datap, verbose = FALSE)
datap |
The data set |
verbose |
If |
Returns an object of class gg, graphics
Ali Arminian [email protected]
data(maize) bar_plot2(maize, verbose = FALSE)
data(maize) bar_plot2(maize, verbose = FALSE)
metan
package.Dataset2: a tibble containing ENV, GEN, REP factors and
GY(grain yield) and HM agronomic traits from the metan
package.
data(data_ge)
data(data_ge)
A data.frame
with 420 rows in 5 columns.
ENV
a character vector
GEN
a character vector
REP
a character vector
GY
a numeric vector
HM
a numeric vector
Olivoto, T., & Lúcio, A.D.C.2020. metan: An R package for multi‐environment trial analysis. Methods in Ecology and Evolution , 11(6), 783-789.
library(rYWAASB) data(data_ge)
library(rYWAASB) data(data_ge)
WAASB
and WAASBY
indexes.Dataset1: a tibble containing GEN, Trait, WAASB
and WAASBY
indexes.
data(maize)
data(maize)
A data.frame
with 20 observations (genotypes)
within rows and columns including the trait (named as Y
),
WAASB
and WAASBY
indexes values.
GEN
a character vector saved as factor
Y
a numeric vector
WAASB
a numeric vector
WAASBY
a numeric vector
The input format of table of data(NA free), here maize data, should be as follows:
GEN | Y | WAASB | WAASBY |
Dracma | 262.22 | 0.81 | 81.6 |
DKC6630 | 284.04 | 2.20 | 88.5 |
NS770 | 243.48 | 0.33 | 71.4 |
... |
library(rYWAASB) data(maize) ranki(maize) bar_plot1(maize) bar_plot2(maize) PCA_biplot(maize)
library(rYWAASB) data(maize) ranki(maize) bar_plot1(maize) bar_plot2(maize) PCA_biplot(maize)
nbclust()
Reads and prepares the data, and
determine the optimum number of clusters using Average
Silhouette Method by factoextra
package.
The average silhouette approach assesses the quality of
clustering by evaluating how well each object fits within
its cluster. A high average silhouette width signifies
effective clustering. This method calculates the average
silhouette for different values of k, and the optimal
number of clusters (k) is the one that maximizes the
average silhouette across a range of potential k values.
nbclust(datap, verbose = FALSE)
nbclust(datap, verbose = FALSE)
datap |
The data set |
verbose |
If |
The silhouette
coefficient (SC) refers to a criterion
to decide number of clusters.It is defined as follows.
Though there are numerous methods determining number
of clusters such as the gap statistic
etc.
.
In other words, for each observation \(i\), the
silhouette width
\(s(i)\) is defined as follows:
Put \(a(i)\) = average dissimilarity between i and all
other points of the cluster to which i belongs (if i is the only
observation in its cluster, \(s(i):= 0\) without further
calculations). For all other clusters C, put \(d(i, C)\)
= average dissimilarity of i to all observations of C.
The smallest of these \(d(i, C)\) is
\(b(i)=min\it(C) d(i, C)\), and can be seen as the
dissimilarity between i and its “neighbor” cluster, i.e.,
the nearest one to which it does not belong. Finally,
\[s(i)=:=\frac{b(i)-a(i)}{max(a(i), b(i))}\]
Note: The clustering methods can be: "average", "centroid",
"complete", "mcquitty", "median", "single", "ward.D", "ward.D2"
and, Distance methods can be as: "binary", "canberra", "euclidean",
"manhattan", "minkowski", "maximum", "pearson", "spearman", "kendall"
which may be used in shipunov
or factoextra
packages.
In this package we just applied average=UPGMA
and ward
algorithms.
Returns a data frame
Ali Arminian [email protected]
Lletı, R., Ortiz, M.C., Sarabia, L.A., Sánchez, M.S. 2004. Selecting variables for k-means cluster analysis by using a genetic algorithm that optimizes the silhouettes, Analytica Chimica Acta, 515(1): 87-100.
Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65.
https://uc-r.github.io/
library(factoextra) data(maize) maize <- as.data.frame(maize) row.names(maize) <- maize[, 1] maize[, 1] = NULL GEN <- row.names(maize) maize <- scale(maize) nbclust(maize, verbose = FALSE) # Performing bootstrap or jackknife clustering # by shipunov package. The examples should be run in the # console manually due to problems occurs in the ORPHANED # package `shipunov`. # # 1- Bootstrap clustering: # data.jb <- Jclust(maize, # method.d = "euclidean", # method.c = "average", n.cl = 2, # bootstrap = TRUE) # # plot.Jclust(data.jb, top=TRUE, lab.pos=1, # lab.offset=1, lab.col=2, lab.font=2) # Fence(data.jb$hclust, GEN) # # data.jb <- Jclust(maize, # method.d = "euclidean", # method.c = "ward.D", n.cl = 2, # bootstrap = TRUE) # # plot.Jclust(data.jb, top=TRUE, lab.pos=1, # lab.offset=1, lab.col=2, lab.font=2) # Fence(data.jb$hclust, GEN) # # if(verbose = TRUE): # cat("\nnumber of iterations:\n", data.jb$iter, "\n") # # for "bootstrap": # data.jb$mat <- as.matrix((data.jb$mat)) # data.jb$mat # cat("\nmatrix of results:\n", data.jb$mat, "\n") # cat("clustering info, by eucledean distance measure:\n") # print(data.jb$hclust) # cat("groups:\n", data.jb$gr, "\n") # cat("\nsupport values:\n", data.jb$supp, "\n") # cat("\nnumber of clusters used:\n", data.jb$n.cl, "\n") # 2- Jackknife clustering: # data.jb <- Bclust(maize, # method.d = "euclidean", method.c = "average", # bootstrap = FALSE) # plot(data.jb) # # data.jb <- Bclust(maize, # method.d = "euclidean", method.c = "ward.D", # bootstrap = FALSE) # plot(data.jb) # # if(verbose = TRUE): # For"jackknife": # cat("Consensus:\n", data.jb$consensus, "\n") # cat("Vlaues:\n", data.jb$values, "\n")
library(factoextra) data(maize) maize <- as.data.frame(maize) row.names(maize) <- maize[, 1] maize[, 1] = NULL GEN <- row.names(maize) maize <- scale(maize) nbclust(maize, verbose = FALSE) # Performing bootstrap or jackknife clustering # by shipunov package. The examples should be run in the # console manually due to problems occurs in the ORPHANED # package `shipunov`. # # 1- Bootstrap clustering: # data.jb <- Jclust(maize, # method.d = "euclidean", # method.c = "average", n.cl = 2, # bootstrap = TRUE) # # plot.Jclust(data.jb, top=TRUE, lab.pos=1, # lab.offset=1, lab.col=2, lab.font=2) # Fence(data.jb$hclust, GEN) # # data.jb <- Jclust(maize, # method.d = "euclidean", # method.c = "ward.D", n.cl = 2, # bootstrap = TRUE) # # plot.Jclust(data.jb, top=TRUE, lab.pos=1, # lab.offset=1, lab.col=2, lab.font=2) # Fence(data.jb$hclust, GEN) # # if(verbose = TRUE): # cat("\nnumber of iterations:\n", data.jb$iter, "\n") # # for "bootstrap": # data.jb$mat <- as.matrix((data.jb$mat)) # data.jb$mat # cat("\nmatrix of results:\n", data.jb$mat, "\n") # cat("clustering info, by eucledean distance measure:\n") # print(data.jb$hclust) # cat("groups:\n", data.jb$gr, "\n") # cat("\nsupport values:\n", data.jb$supp, "\n") # cat("\nnumber of clusters used:\n", data.jb$n.cl, "\n") # 2- Jackknife clustering: # data.jb <- Bclust(maize, # method.d = "euclidean", method.c = "average", # bootstrap = FALSE) # plot(data.jb) # # data.jb <- Bclust(maize, # method.d = "euclidean", method.c = "ward.D", # bootstrap = FALSE) # plot(data.jb) # # if(verbose = TRUE): # For"jackknife": # cat("Consensus:\n", data.jb$consensus, "\n") # cat("Vlaues:\n", data.jb$values, "\n")
PCA_biplot()
creates the PCA (Principal Component
Analysis) biplot with loadings for the new index rYWAASB
for simultaneous selection of genotypes by trait and WAASB index.
It shows rYWAASB
, rWAASB
and rWAASBY
indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
PCA_biplot(datap)
PCA_biplot(datap)
datap |
The data set |
PCA is a machine learning method and dimension
reduction technique.
It is utilized to simplify large data sets by extracting
a smaller set that preserves significant patterns and
trends(1).
According to Johnson and Wichern (2007), a PCA explains
the var-covar structure of a set of variables
\(X_1, X_2, ..., X_p\) with a less linear
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.
Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2).
Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector \(\mathbf{X'} = {X_1, X_2,...,X_p }\) have the covariance matrix \(\sum\) with eigenvalues \( \lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0\).
Regarding the linear combinations: \[Y_1 = a'_1X = a_{11}X_1 + a_{12}X_2 + ... + a_{1P}X_p \] \[Y_2 = a'_2X = a_{21}X_1 + a_{22}X_2 + ... + a_{2p}X_p\] \[...\] \[Y_p = a'_pX = a_{p1}X_1 + a_{p2}X_2 + ... + a_{pp}X_p\]
where \(Var(Y_i) = \mathbf{a'_i\sum{a_i}}\) , i = 1, 2, ..., p \(Cov(Y_i, Y_k) = \mathbf{a'_i\sum{a_k}}\) , i, k = 1, 2, ..., p
The principal components refer to the uncorrelated linear combinations \(Y_1, Y_2, ..., Y_p\) which aim to have the largest possible variances.
For the random vector \(\mathbf{X'}=\left [ X_1, X_2, ..., X_p \right ]\), if \(\mathbf{\sum}\) be the associated covariance matrix, then \(\mathbf{\sum}\) have the eigenvalue-eigenvector pairs \((\lambda_1, e_1), (\lambda_2, e_2), ..., (\lambda_p, e_p)\), and as said \(\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0\).
Then the \(\it{i}\)th principal component is as follows: \[Y_i = \mathbf{e'_iX} = e_{i1}X_1 + e_{i2}X_2 + ... + e_{ip}X_p, i = 1, 2, ..., p\], where \(Var(Y_i) =\mathbf(e'_i\sum{e_i}) = \lambda_i, i = 1, 2, ..., p\) \(Cov(Y_i, Y_k) = \mathbf{e'_i\sum e_i = 0, i \not\equiv k}\), and: \(\sigma_{11} + \sigma_{22} + ... + \sigma_{pp} = \sum_{i=1}^p{Var(X_i)} = \lambda_1 + \lambda_2 + ... + \lambda_p = \sum_{i=1}^p{Var(Y_i)}\).
Interestingly, Total population variance = \(\sigma_{11} + \sigma_{22} + ... + \sigma_{pp} = \lambda_1 + \lambda_2 + ... + \lambda_{p}\).
Another issues that are significant in PCA analysis are:
The proportion of total variance due to (explained by) the \(\mathit{k}\)th principal component: \[\frac{\lambda_k}{(\lambda_1 + \lambda_2 + ... + \lambda_p)}, k=1, 2, ..., p\]
The correlation coefficients between the components \(Y_i\) and the variables \(X_k\) is as follows: \(\rho_{Y_i, X_k} = \frac{e_{ik}\sqrt{\lambda_i}}{\sqrt{\sigma_{kk}}}\), i,k = 1, 2, ..., p
Please note that PCA can be performed on Covariance
or
correlation matrices
.
And before PCA the data should be centered, generally.
Returns a a list of dataframes
Ali Arminian [email protected]
(2) https://pca4ds.github.io/biplot-and-pca.html.
(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.
data(maize) PCA_biplot(maize)
data(maize) PCA_biplot(maize)
ranki()
function ranks the genotypes (or entries) based on
a new index utilizing the given trait and "WAASB" index to
simultaneous select the top-ranked ones. This can be compared
with WAASBY index of Olivoto (2019). We suggest users handle
the missing data in inputs before considering analyses,
due rank codes dose not implement a widespread algorithm
to do this task.
WAASB(Weighted Average of Absolute Scores), Computes
the Weighted Average of Absolute Scores (Olivoto et al.,
2019) for quantifying the stability of g genotypes
conducted in e environments using linear mixed-effect models.
ranki(datap)
ranki(datap)
datap |
The data set |
According to Olivoto et al. (2019a), WAASB(The weighted average of absolute scores) is computed considering all Interaction Principal Component Axis (IPCA) from the Singular Value Decomposition (SVD) of the matrix of genotype-environment interaction (GEI) effects generated by a linear mixed-effect model, as follows:
\[ WAASB_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k\]where \(WAASB_i\) is the weighted average of absolute scores of the ith genotype; \(IPCA_{ik}\) is the score of the ith genotype in the kth Interaction Principal Component Axis (IPCA); and \(EP_k\) is the explained variance of the kth IPCA for k = 1,2,..,p, considering \(p=min(g-1; e-1)\).
Further, \(WAASBY_i\) is a superiority or simultaneous selection index allowing weighting between mean performance and stability \[ WAASBY_i=\frac{\left({rY}_i\times\theta_Y\right)+ \left({rW}_i\times\theta_s\right)}{\theta_Y+\theta_s} \] , where \(WAASBY_i\) is the superiority index for genotype \(\it{i}\) that weights between mean performance and stability; \(\theta_Y\) and \(\theta_s\) are the weights for mean performance and stability, respectively; \({rY_i}\) and \({rW}_i\) are the rescaled values for mean performance \(\bar{Y_i}\) and stability \(W_i\), respectively of the genotype i. For the details of calculations, rescalling and mathematics notations see (Olivoto et al., 2019).
Finally, \(rYWAASB_i\) index is the sum of the ranks of the trait (\(rY_i\)) and WAASB index (\(rWAASB_i\)) for each individual:
\[ rYWAASB_i = {rY_i} + {rWAASB_i}\].
The input format of table of data(NA free), here maize data, should be as follows:
GEN | Y | WAASB | WAASBY |
Dracma | 262.22 | 0.81 | 81.6 |
DKC6630 | 284.04 | 2.20 | 88.5 |
NS770 | 243.48 | 0.33 | 71.4 |
... |
Returns a data frame showing numerical rankings
Ali Arminian [email protected]
Olivoto, T., Lúcio, A., DC, da Silva, J.A.G., Sari, B.G. and Diel, M. 2019. Mean performance and stability in multi-environment trials II: Selection based on multiple traits. Agronomy Journal, 111(6):2961-2969.
Olivoto, T., & Lúcio, A.D.C.2020. metan: An R package for multi‐environment trial analysis. Methods in Ecology and Evolution, 11(6), 783-789.
Kang, M.S. 1988. “A Rank-Sum Method for Selecting High-Yielding, Stable Corn Genotypes.” Cereal Research Communications 16: 113–15.
data(maize) ranki(maize)
data(maize) ranki(maize)