Simultaneous Selection by Trait and WAASB Index

1-Introduction

I dedicate this package to my colleague, Professor Dr. Manjit S. Kang, a great biostatistician and quantitative geneticist, who passed away recently and has devoted his life to PLANT BREEDING science and has provided me with invaluable supports!.

Stability analysis is crucial in plant breeding to select superior genotypes that perform consistently across different environments. Several models and methods have been developed for this purpose, including Additive Main Effect and Multiplicative Interaction (AMMI), Weighted Average of Absolute Scores (WAASB), and Genotype plus Genotype-Environment (GGE) interactions in multi-environmental trials (MET) (Mishra et al. 2024; Sakata 2021; Pour-Aboughadareh et al. 2022; Danakumara et al. 2023). These analyses help breeders identify genotypes with stable performance and adaptability under varying conditions, leading to optimal yield stability and successful crop production. Utilizing these stability analysis tools, breeders can navigate the complexities of genotype-environment interactions and select genotypes that consistently excel across different locations and seasons, ensuring the development of resilient and high-performing plant varieties (SWARUP and SINGH 2014).

The WAASB and WAASBY objects

By combining the strengths of AMMI for assessing stability and BLUP for prediction accuracy, breeders can effectively select genotypes that consistently perform well across different environmental conditions. This is crucial for developing sustainable agricultural systems and improving food security. To estimate the stability index of genotypes in multi-environment trials (METs), the WAASB index (weighted average of the absolute values obtained from the singular value decomposition of the BLUP matrix for the genotype by environment interaction effects, generated by a linear mixed-effects model) is calculated using the formula provided by (Olivoto et al. 2019). They suggest that the function *waasb()* computes the Weighted Average of the Absolute Scores considering all possible IPCA from the Singular Value Decomposition of the BLUPs for genotype-vs-environment interaction effects obtained by an Linear Mixed-effect Model (Olivoto et al. 2019) , as follows:

$$ WAASB_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k $$

where WAASBi is the weighted average of absolute scores of the ith genotype; IPCAik is the scores of the ith genotype in the kth IPCA; and EPk is the explained variance of the kth PCA for k = 1, 2, .., p, p = min(g − 1; e − 1).

Further, WAABSY is a superiority or simultaneous selection index allowing weighting between mean performance and stability (Olivoto et al. 2019):

$$ WAASBY_i=\frac{\left({rY}_i\times\theta_Y\right)+\left({rW}_i\times\theta_s\right)}{\theta_Y+\theta_s} $$ where WAASBYi is the superiority index for genotype }_i and {rW}_i are the rescaled values for mean performance $\bar{Y_i}$ and stability Wi, respectively of the genotype i. For the details of calculations, rescalling and mathematics notations see (Olivoto et al. 2019).

In this package, my new index (rYWAASB) gives results which can be compared with WAASBY index provided by Olivoto et al. (Olivoto et al. 2019).

For working with rYWAASB package, firstly, if the metan or rYWAASB packages are not already installed, they should be installed on the system. The analysis requires the following packages to be installed:

if(!require('rYWAASB')){
    install.packages('rYWAASB') # call the package
}
#> Loading required package: rYWAASB
library('rYWAASB')

The analyses requires the following R packages:

## For graphical displays
library(metan)
library(ggplot2)
library(graphics)
library(factoextra)
library(FactoMineR)

The codes provided below form the metan package, allow you to access the WAASB index values, rankings, and other information for genotypes (or entries) in general.

waasb_model <- 
  waasb(data_ge,
        env = ENV,
        gen = GEN,
        rep = REP,
        resp = everything(),
        random = "gen", #Default
        verbose = TRUE) #Default
#> Evaluating trait GY |======================                      | 50% 00:00:01 Evaluating trait HM |============================================| 100% 00:00:02 
#> Method: REML/BLUP
#> Random effects: GEN, GEN:ENV
#> Fixed effects: ENV, REP(ENV)
#> Denominador DF: Satterthwaite's method
#> ---------------------------------------------------------------------------
#> P-values for Likelihood Ratio Test of the analyzed traits
#> ---------------------------------------------------------------------------
#>     model       GY       HM
#>  COMPLETE       NA       NA
#>       GEN 1.11e-05 5.07e-03
#>   GEN:ENV 2.15e-11 2.27e-15
#> ---------------------------------------------------------------------------
#> All variables with significant (p < 0.05) genotype-vs-environment interaction

data <- waasb_model$GY$model
print(data)
#> # A tibble: 24 × 22
#>    type  Code      Y     PC1     PC2     PC3     PC4     PC5     PC6     PC7
#>    <chr> <chr> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 GEN   G1     2.60  0.246  -0.0482 -0.0314 -0.0513 -0.146  -0.419   0.117 
#>  2 GEN   G10    2.47 -0.819  -0.421  -0.128  -0.266  -0.116   0.0575  0.0480
#>  3 GEN   G2     2.74  0.119   0.154  -0.585   0.376  -0.0810  0.202   0.0975
#>  4 GEN   G3     2.96  0.0437 -0.0793  0.187   0.138  -0.175   0.155  -0.370 
#>  5 GEN   G4     2.64 -0.243   0.393  -0.0723  0.113  -0.113  -0.279  -0.126 
#>  6 GEN   G5     2.54 -0.256   0.206   0.193   0.145   0.365   0.0546  0.187 
#>  7 GEN   G6     2.53 -0.0753  0.195   0.445   0.192   0.0685  0.0353  0.0180
#>  8 GEN   G7     2.74  0.247   0.458  -0.166  -0.565   0.172   0.103  -0.0930
#>  9 GEN   G8     3.00  0.404  -0.166   0.259  -0.136  -0.310   0.175   0.222 
#> 10 GEN   G9     2.51  0.334  -0.692  -0.102   0.0537  0.335  -0.0845 -0.0991
#> # ℹ 14 more rows
#> # ℹ 12 more variables: PC8 <dbl>, PC9 <dbl>, WAASB <dbl>, PctResp <dbl>,
#> #   PctWAASB <dbl>, wRes <dbl>, wWAASB <dbl>, OrResp <dbl>, OrWAASB <dbl>,
#> #   OrPC1 <dbl>, WAASBY <dbl>, OrWAASBY <dbl>

The output generated by the waasb() function is very similar to that generated by the waas() function. The main difference is that the singular value decomposition is based on the BLUP for GEI effects matrix. For more information, refer to (Olivoto et al. 2019).

Several indexes have been developed to identify genotypes that exhibit both high performance and stability in plant breeding programs. One example is the kangranksum index, which was developed by Kang (Kang 1988). This index combines yield and stability ranks based on the Shukla stability index. Olivoto (Olivoto et al. 2019), on the other hand, created the WAASBY index by assigning weights to yield and stability. Our rYWAASB index can be compared to these earlier indexes, as it follows a similar computational approach. However, our index (rYWAASB) is a powerful tool, because it incorporates a trait and WAASB rankings in the process.

2- How to apply the package

First, let’s examine the Y*WAASB biplot generated by the metan package. This will allow us to compare the results of the rYWAASB package with the Y*WAASB biplot. In the Y*WAASB or GY*WAASB biplot proposed by (Olivoto et al. 2019) (Fig. 1), the quadrants illustrate the stability and trait patterns (specifically, the grain yield of oat genotypes in the data_ge dataset) as follows:

  • Quadrant I: This quadrant consists of unstable genotypes or environments that have high discrimination ability but productivity below the grand mean.
  • Quadrant II: This quadrant includes unstable genotypes with productivity above the grand mean.
  • Quadrant III: This quadrant comprises low productive but stable genotypes.
  • Quadrant IV: This quadrant contains productive and broadly adapted genotypes.
plot_scores(waasb_model, type = 3)

Fig. 1: Y*WAASB biplot of metan package built-in oat data

In this section, we will utilize the built-in data maize to generate ranking scores for different genotypes, along with their corresponding plots. For further details, please refer to the ?maize documentation. It is also possible to use other datasets as long as they contain the following columns: genotype, trait, and WAASB index for genotypes. To understand how the HTML tables were created, please refer to the Rendering engine section.

data(maize)
head(maize)
#> # A tibble: 6 × 4
#>   GEN           Y WAASB WAASBY
#>   <chr>     <dbl> <dbl>  <dbl>
#> 1 Dracma     262. 0.811   81.6
#> 2 DKC6630    284. 2.20    88.5
#> 3 NS770      243. 0.327   71.4
#> 4 89 MAY 70  259. 1.96    72.2
#> 5 BOLSON     253. 1.15    72.9
#> 6 Sy Hydro   244. 0.974   67.6

Applying the rYWAASB package. Ranking table:

We recommend that users address (handle/overcome/substitute) any missing data in their inputs before proceeding with analyses. This is because the rank codes do not incorporate a comprehensive algorithm to handle this task.

data(maize)
ranki(maize)
#>          GEN  Y=Trait     WAASB WAASBY rY rWAASB rWAASBY rY+rWAASB rYWAASB
#> 1     Dracma 262.2230 0.8107018   81.6  5      3     2.0         8     1.0
#> 2    DKC6630 284.0391 2.2006718   88.5  1      8     1.0         9     2.0
#> 3      NS770 243.4864 0.3272558   71.4 10      1     7.0        11     3.0
#> 4  89 MAY 70 258.8993 1.9638360   72.2  6      7     6.0        13     5.0
#> 5     BOLSON 252.7882 1.1512639   72.9  8      5     5.0        13     5.0
#> 6   Sy Hydro 243.7789 0.9741668   67.6  9      4     8.0        13     5.0
#> 7     KSC704 234.5755 0.6564340   63.1 13      2    11.0        15     8.0
#> 8     NS6010 277.7849 3.1780198   78.0  2     13     3.0        15     8.0
#> 9   Sy Inove 276.2174 3.0619396   77.6  3     12     4.0        15     8.0
#> 10     ZP606 255.3309 2.6157585   65.7  7     10     9.0        17    10.0
#> 11     ZP600 265.8830 4.2201795   63.2  4     17    10.0        21    11.0
#> 12     BK 74 217.7637 1.2910417   47.3 16      6    13.5        22    12.0
#> 13 Mv Massil 228.5190 2.2487080   49.0 14      9    12.0        23    13.0
#> 14  Barekat2 239.1610 3.7488510   47.3 11     15    13.5        26    14.0
#> 15    KSC705 218.7118 3.2878162   35.7 15     14    16.0        29    15.0
#> 16   DKC6589 234.7383 4.4766285   39.7 12     18    15.0        30    16.5
#> 17  Sy Miami 212.3417 2.7648339   34.4 19     11    17.0        30    16.5
#> 18     Gazda 214.8292 3.8871065   29.3 18     16    18.0        34    18.0
#> 19   DKC7211 215.7097 4.7286087   24.7 17     19    19.0        36    19.0
#> 20   DKC6101 191.8700 6.0110405    0.0 20     20    20.0        40    20.0

In the table above, the genotype with the lowest rank (Dracma) is considered the best due to its high grain yield and low WAASB score.

Drawing the first plot bar_plot1:

data(maize)
bar_plot1(maize)

Fig.2. The first barplot of the rYWASSB package

Drawing the second plot bar_plot2:

data(maize)
bar_plot2(maize, verbose=TRUE)
#> Plot the bar chart with text features:

#>        [,1]
#>  [1,]  1.25
#>  [2,]  3.00
#>  [3,]  4.75
#>  [4,]  6.50
#>  [5,]  8.25
#>  [6,] 10.00
#>  [7,] 11.75
#>  [8,] 13.50
#>  [9,] 15.25
#> [10,] 17.00
#> [11,] 18.75
#> [12,] 20.50
#> [13,] 22.25
#> [14,] 24.00
#> [15,] 25.75
#> [16,] 27.50
#> [17,] 29.25
#> [18,] 31.00
#> [19,] 32.75
#> [20,] 34.50

Fig.3. The second barplot of the rYWASSB package

Performing PCA and textual and graphical representations (scree plot and PCA-dependent biplots (3 plots))

data(maize)
PCA_biplot(maize)
#> $`Coordinates of variables making a scatter plot`
#>              Dim.1      Dim.2       Dim.3        Dim.4        Dim.5
#> Y=Trait -0.8795116  0.4699999  0.06670390  0.023109283 -0.023999674
#> WAASBY  -0.9873211  0.0945990  0.12448256  0.009671875  0.025662431
#> rWAASB   0.7315848  0.6774116 -0.07122906  0.025521488  0.013121693
#> rWAASBY  0.9779744 -0.1480762  0.10777970  0.100077607 -0.002739954
#> rYWAASB  0.9742173  0.1601292  0.13167024 -0.088964078 -0.002762174
#> 
#> $`Squaring the variable coordinates: var.cos2 = var.coord^2`
#>             Dim.1      Dim.2       Dim.3        Dim.4        Dim.5
#> Y=Trait 0.7735406 0.22089995 0.004449410 5.340389e-04 5.759843e-04
#> WAASBY  0.9748030 0.00894897 0.015495908 9.354516e-05 6.585604e-04
#> rWAASB  0.5352164 0.45888653 0.005073579 6.513464e-04 1.721788e-04
#> rWAASBY 0.9564339 0.02192657 0.011616464 1.001553e-02 7.507345e-06
#> rYWAASB 0.9490993 0.02564137 0.017337053 7.914607e-03 7.629606e-06
#> 
#> $`The % contributions of the variables to the PCs`
#>            Dim.1     Dim.2     Dim.3      Dim.4      Dim.5
#> Y=Trait 18.46559 30.001213  8.243859  2.7801402 40.5092017
#> WAASBY  23.27002  1.215392 28.710794  0.4869844 46.3168062
#> rWAASB  12.77643 62.323023  9.400319  3.3908280 12.1094044
#> rWAASBY 22.83153  2.977926 21.522965 52.1395883  0.5279945
#> rYWAASB 22.65644  3.482446 32.122063 41.2024591  0.5365932
#> 
#> $`eigenvalues data`
#>        eigenvalue variance.percent cumulative.variance.percent
#> Dim.1 4.189093275      83.78186550                    83.78187
#> Dim.2 0.736303386      14.72606772                    98.50793
#> Dim.3 0.053972413       1.07944826                    99.58738
#> Dim.4 0.019209065       0.38418130                    99.97156
#> Dim.5 0.001421861       0.02843721                   100.00000

Figs.4-6. The PCA biplot with loadings for compare genotypes

Designing the cluster analysis on data and define the optimum number of clusters uning The Average Silhouette Method. The details of this method is given in the description and details parts of the function h_clust(). For running the cluster analysis by menas of shipunov package (courtesy of late professor Alexy Shipunov) act as:

 data(maize)
 maize <- as.data.frame(maize)
 row.names(maize) <- maize[, 1]
 maize[, 1] = NULL
 GEN <- row.names(maize)
 maize <- scale(maize)
 nbclust(maize, verbose = FALSE)
#> [1] 2
#> attr(,"class")
#> [1] "data frame"

 # Perform bootstrap or jackknife clustering by shipunov package. 
 # The examples should be run in the console manually due to 
 # problems occurs in the ORPHANED package "shipunov".
 #
 # 1- Bootstrap clustering:
 # data.jb <- Jclust(maize,
 #  method.d = "euclidean",
 #   method.c = "average", n.cl = 2,
 #    bootstrap = TRUE)
 #
 # plot.Jclust(data.jb, top=TRUE, lab.pos=1,
 #   lab.offset=1, lab.col=2, lab.font=2)
 # Fence(data.jb$hclust, GEN)
 #
 # data.jb <- Jclust(maize,
 #  method.d = "euclidean",
 #   method.c = "ward.D", n.cl = 2,
 #    bootstrap = TRUE)
 #
 # plot.Jclust(data.jb, top=TRUE, lab.pos=1,
 #  lab.offset=1, lab.col=2, lab.font=2)
 # Fence(data.jb$hclust, GEN)
 #
 #
 # if(verbose = TRUE):
 # cat("\nnumber of iterations:\n", data.jb$iter, "\n")
 #
 # For "bootstrap":
 # data.jb$mat <- as.matrix((data.jb$mat))
 # cat("\nmatrix of results:\n", data.jb$mat, "\n")
 # cat("clustering info, by eucledean distance measure:\n")
 # print(data.jb$hclust)
 # cat("groups:\n", data.jb$gr, "\n")
 # cat("\nsupport values:\n", data.jb$supp, "\n")
 # cat("\nnumber of clusters used:\n", data.jb$n.cl, "\n")
 #
 # 2- Jackknife clustering:
 # data.jb <- Bclust(maize,
 #   method.d = "euclidean", method.c = "average",
 #    bootstrap = FALSE)
 # plot(data.jb)
 #
 # data.jb <- Bclust(maize,
 #   method.d = "euclidean", method.c = "ward.D",
 #    bootstrap = FALSE)
 # plot(data.jb)
 #
 # if(verbose = TRUE):
 # For"jackknife":
 # cat("Consensus:\n", data.jb$consensus, "\n")
 # cat("Vlaues:\n", data.jb$values, "\n")

Figures 7-10. Cluster analysis by 1000 bootstrap iterations and determine optimum N clusters using The Average Silhouette algorithm.

References

Danakumara, Thippeswamy, Tapan Kumar, Neeraj Kumar, Basavanagouda Siddanagouda Patil, Chellapilla Bharadwaj, Umashankar Patel, Nilesh Joshi, et al. 2023. “A Multi-Model Based Stability Analysis Employing Multi-Environmental Trials (METs) Data for Discerning Heat Tolerance in Chickpea (Cicer Arietinum l.) Landraces.” Plants 12 (21): 3691.
Kang, M. S. 1988. “A Rank-Sum Method for Selecting High-Yielding, Stable Corn Genotypes.” Cereal Research Communications 16: 113–15.
Mishra, Smaranika, AVV Koundinya, TS Aghora, et al. 2024. “Stability Analysis to Identify Improved Lines of Cluster Bean (Cyamopsis Tetragonoloba l. Taub.).” Plant Genetic Resources 22 (3): 173–80.
Olivoto, Tiago, Alessandro DC Lúcio, José AG da Silva, Bruno G Sari, and Maria I Diel. 2019. “Mean Performance and Stability in Multi-Environment Trials II: Selection Based on Multiple Traits.” Agronomy Journal 111 (6): 2961–69.
Pour-Aboughadareh, Alireza, Marouf Khalili, Peter Poczai, and Tiago Olivoto. 2022. “Stability Indices to Deciphering the Genotype-by-Environment Interaction (GEI) Effect: An Applicable Review for Use in Plant Breeding Programs.” Plants 11 (3): 414.
Sakata, Wakuma Merga. 2021. “An Overview of Genotype x Environment Interaction and Yield Stability Analysis in Applied Plant Breeding: Great Emphasis Given to Coffee (Coffea Arabica l.).” International Journal of Agricultural Research, Innovation and Technology (IJARIT) 11 (2): 117–23.
SWARUP, INDU, and JAGDISH SINGH. 2014. “Stability Analysis in Safflower (Carthamus Tinctorious l.).” THE INDIAN SOCIETY OF OILSEEDS RESEARCH, 29.
  • true