Title: | Path Coefficient Analysis |
---|---|
Description: | Facilitates the performance of several analyses, including simple and sequential path coefficient analysis, correlation estimate, drawing correlogram, Heatmap, and path diagram. When working with raw data, that includes one or more dependent variables along with one or more independent variables are available, the path coefficient analysis can be conducted. It allows for testing direct effects, which can be a vital indicator in path coefficient analysis. The process of preparing the dataset rule is explained in detail in the vignette file "Path.Analysis_manual.Rmd". You can find this in the folders labelled "data" and "~/inst/extdata". Also see: 1)the 'lavaan', 2)a sample of sequential path analysis in 'metan' suggested by Olivoto and Lúcio (2020) <doi:10.1111/2041-210X.13384>, 3)the simple 'PATHSAS' macro written in 'SAS' by Cramer et al. (1999) <doi:10.1093/jhered/90.1.260>, and 4)the semPlot() function of 'OpenMx' as initial tools for conducting path coefficient analyses and SEM (Structural Equation Modeling). To gain a comprehensive understanding of path coefficient analysis, both in theory and practice, see a 'Minitab' macro developed by Arminian, A. in the paper by Arminian et al. (2008) <doi:10.1080/15427520802043182>. |
Authors: | Ali Arminian [aut, cre, cph] |
Maintainer: | Ali Arminian <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-10-26 05:13:57 UTC |
Source: | https://github.com/cran/Path.Analysis |
corr_plot()
draws a correlogram for data
cor_plot(datap)
cor_plot(datap)
datap |
The data set |
Returns an object of class gg, ggmatrix
.
Ali Arminian [email protected]
Olivoto, T, and A Dal’Col Lúcio. 2020. “Metan: An r Package for Multi‐environment Trial Analysis.” Methods in Ecology and Evolution, 11(6): 783–89. https://doi.org/10.1111/2041-210 X.13384.
correlogram
, diagram
, and lavaan
package for drawing path diagrams.
data(dtsimp) cor_plot(dtsimp)
data(dtsimp) cor_plot(dtsimp)
corr()
estimates Pearson correlation coefficients
among parametric numerical characteristics as follows:
The Pearson correlation coefficient:
\[
r_{x,y} = \frac{n\sum{xy}-(\sum{x})(\sum{y})}
{\sqrt{(n\sum{x^2}-(\sum{x})^2)(n\sum{y^2}-(\sum{y})^2)}}\]
or: \[ r_{x,y} =\frac{\Sigma(x-\bar{x})(y-\bar{y})} {\sqrt{\Sigma{(x-\bar{x})^2\Sigma(y-\bar{y}})^2}} \]
where \(r_{x,y}\) is the correlation coefficient
between \(x\) and \(y\) variables.
corr(datap, verbose = FALSE)
corr(datap, verbose = FALSE)
datap |
The data set |
verbose |
If |
The corr()
function estimates correlation coefficients
and their significance in the form of a table of one or
more independent (exogenous) variables on a dependent
(endogenous) variable along with testing the significance.
Returns a list of two objects:
the data frame of Pearson's correlation coefficients
the data frame of significance of correlation coefficients (r):
p
p-value for testing the r
lowCI
lower confidence interval of r
uppCI
upper confidence interval of r
Ali Arminian [email protected]
correlation
data(dtsimp) corr(dtsimp, verbose = FALSE) data(dtraw) corr(dtraw[, -1], verbose = FALSE)
data(dtsimp) corr(dtsimp, verbose = FALSE) data(dtraw) corr(dtraw[, -1], verbose = FALSE)
Prepares data for analyses
dataprep(datap)
dataprep(datap)
datap |
dataset |
Returns a data frame
desc()
estimates the descriptive statistics such as
Min
(Minimum), 1st Qu.
(quartile), Median
, Mean
(average), 3rd Qu.
(3rd quartile), Max
(maximum), var
(variance), std.dev
(standard deviation), coef.var
(CV or coefficient of variation) of the data set.
desc(datap, resp)
desc(datap, resp)
datap |
The data set |
resp |
an integer value indicating the column
in |
The desc()
function estimates the descriptive statistics,
in tables for one or more independent (exogenous) variables on
a dependent(endogenous) variable. It acts only on numerical
variables.
For example for the variable x
:
1st. quartile:
\[Q_1 = (n + 1) x 1/4\]
2nd. quartile or Median:
\[md = (n + 1) x 2/4\]
3rd Qu.:
\[Q_3 = (n + 1) x 3/4\]
Arithmetic mean:
\[\bar{x}=\frac{1}{n} \sum_{i=i}^{n} x_{i}\]
Range:
\[R_x = \max(x) - \min(x)\]
Variance
:
\[\sigma_{x}^2 = \frac{\sum_{i=1}^n(x_i-\bar{x})^2}{n}
\]
Standard deviation:
\[sd_x = \sqrt{\frac{\sum_{i} (x_{i} - \mu)^2}{n}}\]
SEM or SE.mean
, the standard error of the mean is
calculated simply by taking the standard deviation and
dividing it by the square root of the sample size:
\[SEM_x = \frac{sd(x)}{\sqrt{n}}\]
coef.var or coefficient of variation:
\[CV = \frac{sd(x)}{\bar{x} }\times 100\]
Returns a list of 3 objects:
Descriptive statistics1 of input data
Descriptive statistics2 of input data
A table of correlation coefficients
Ali Arminian [email protected]
Bhattacharyya GK and Johnson RA 1997. Statistical Concepts and Methods, John Wiley and Sons, New York.
Draper N and Smith H 1981. Applied Regression Analysis, John Wiley & Sons, New York.
Neter, J, Whitmore, GA, Wasserman, W 1992. Applied Statistics. Allyn & Bacon, Incorporated, ISBN 10: 0205134785 / ISBN 13: 9780205134786.
Snedecor, G.W., Cochran, W.G. 1980. Statistical Methods. Iowa State University Press.
correlation
, multiple linear regression
,
data(dtsimp) desc(dtsimp, 1) data(dtraw) desc(dtraw[, -1], 1) data(heart) desc(heart, 2)
data(dtsimp) desc(dtsimp, 1) data(dtraw) desc(dtraw[, -1], 1) data(heart) desc(heart, 2)
Dataset 2: a number of 9 traits measured on 35 Camelina DH lines.
data(dtraw)
data(dtraw)
A data.frame
with 35 observations of 9 variables.
DH lines
a character vector
y
a numeric vector
X1
a numeric vector
X2
a numeric vector
X3
a numeric vector
X4
a numeric vector
X5
a numeric vector
X6
a numeric vector
X7
a numeric vector
X8
a numeric vector
library(Path.Analysis) data(dtraw)
library(Path.Analysis) data(dtraw)
Dataset 3: a number of 9 traits measured on 35 Camelina DH lines.
data(dtraw2)
data(dtraw2)
A data.frame
with 35 observations of 9 variables.
DH lines
a character vector considered as rownames
y
a numeric vector
X1
a numeric vector
X2
a numeric vector
X3
a numeric vector
X4
a numeric vector
X5
a numeric vector
X6
a numeric vector
X7
a numeric vector
X8
a numeric vector
library(Path.Analysis) data(dtraw2)
library(Path.Analysis) data(dtraw2)
Dataset 4: a dataframe consisting of 7 variables measured on 8 observations.
data(dtseq)
data(dtseq)
A data.frame
with 8 observations of 7 variables.
Genotypes
a character vector
YLD
a numeric vector
DFT
a numeric vector
FS
a numeric vector
FV
a numeric vector
FW
a numeric vector
DFL
a numeric vector
FLP
a numeric vector
library(Path.Analysis) data(dtseq)
library(Path.Analysis) data(dtseq)
Dataset5
data(dtseqr)
data(dtseqr)
A data.frame
with 24 observations of 7 variables.
Genotypes
a character vector
Rep
a numeric vector
YLD
a numeric vector
DFT
a numeric vector
FS
a numeric vector
FV
a numeric vector
FW
a numeric vector
DFL
a numeric vector
FLP
a numeric vector
library(Path.Analysis) data(dtseqr)
library(Path.Analysis) data(dtseqr)
Dataset 1: a dependent (y) and 3 independent(x1 to x3) variables.
data(dtsimp)
data(dtsimp)
A data.frame
with 105 observations of 4 variables.
y
a numeric vector
x1
a numeric vector
x2
a numeric vector
x3
a numeric vector
library(Path.Analysis) data(dtsimp)
library(Path.Analysis) data(dtsimp)
A mixed variable dataset containing 14 variables of 297 patients for their heart disease diagnosis.
data(heart)
data(heart)
A data.frame
including 297 rows and 14 variables:
Age in years (numerical).
Sex: 1 = male, 0 = female (logical).
heart.disease
a numeric vector as dependent.
biking
a numeric vector as the first independent.
smoking
a numeric vector as the 2nd independent.
The data set is belong to machine learning repository of UCI. The original data set includes 303 patients with 6 NA's. After removing missing values, it reduced into 297 patients.
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Lichman, M. (2013). UCI machine learning repository.
library(Path.Analysis) data(heart)
library(Path.Analysis) data(heart)
Heatmap
chartheat_map()
draws a double-clustered heatmap
for path coefficients analysis. Please be cautious
that this function acts only on numeric
variables/columns (see example on dtraw2
data set).
Users for drawing other types of heatmaps may
use heatmap.3
, ComplexHeatmap
and pheatmap
R packages.
Where an example is given in the vignette manual of this
package (Path.Analysis_manual.Rmd
)
heat_map(datap)
heat_map(datap)
datap |
The data set |
Returns an object of class heatmap.2
.
Ali Arminian [email protected]
lavaan
and diagram
packages for drawing path diagrams.
data(dtraw2) dtraw2 <- scale(as.data.frame(dtraw2)) heat_map(dtraw2)
data(dtraw2) dtraw2 <- scale(as.data.frame(dtraw2)) heat_map(dtraw2)
matdiag()
extracts the direct effect and indirect effects
matrices of data in path analysis along with the significance
of direct effects where direct effects are shown as a
vector (columnar matrix of 1*n dimensions and indirect
effects are off-diagonal effects. Later, draws a diagram
for path coefficient analysis based on the DiagrammeR
package.
matdiag(datap, resp, verbose = FALSE)
matdiag(datap, resp, verbose = FALSE)
datap |
The data set |
resp |
The response variable |
verbose |
If |
The matdiag
function estimates the direct and indirect effects in path
coefficient analysis as tables along with drawing the diagram of path analysis.
This is apparently the only program testing the significance of direct effects
in a path analysis. Note: all variables must be numeric for matrix calculations
and the next plotting.
In a path model, path coefficients or direct effects (Pi's) indicate the direct effects of a variable on another, and are standardized partial regression coefficients (in Wright's terminology) due they are estimated from correlations or from the transformed (standardized) data as:
The path equations are as follows:
One dependent variable: \[P_1 + P_2r_{12} + P_3r_{13} + ... + P_nr_{1n} = rY_1\] \[P_1r_{21} + P_2 + P_3r_{23} + ... + P_nr_{2n} = rY_2\] \[...\] \[P_1rn_1 + P_2r_{n2} + P_3r_{n3} + ... + P_n = rY_n\]
Extension to more dependent variables:
Path.Analysis
is capable of performing this straightforward
function through detailed explanations. The linear regression
model with a single response in its form is as follows (Johnson
and Wichern (2007):
\(Y = \beta_0 + \beta_1Z_1 + ... + \beta_rZ_r + \epsilon\)
where the multivariate multiple linear regression model is as follows: \[Y_1 = \beta_0 + \beta_1Z_{11} + \beta_2Z{12} + ... + \beta_rZ_{1r} + \epsilon_1\] \[Y_2 = \beta_0 + \beta_1Z_{21} + \beta_2Z{22} + ... + \beta_rZ_{2r} + \epsilon_2\] \[...\] \[Y_n = \beta_0 + \beta_1Z_{n1} + \beta_2Z{n2} + ... + \beta_rZ_{nr} + \epsilon_n\]
As stated by Bondari (1990), for two dependent variables \(Y_1\) and \(Y_2\): \[ Y_1 = p_1X_1 + p_2X_2 + p_3X_3 + ... + p_nX_n \] \[ Y_2 = p'_1X_1 + p'_2X_2 + p'_3X_3 + ... + p'_nX_n \] \[ ... \]
where: \[ r_{Y_1Y_2} = p_1p'_1 + p_2p'_2 + p_3p'_3 + ... + p_np'_n + \sigma_{i=j}p_ip'_1r_{ij} = \sigma_{i,j}p_ip'_ir_{ij} \]
Returns a list with three objects
a data frame of direct effects
a matrix of direct and indirect effects
a constant of residuals
Ali Arminian [email protected]
Arminian, A, MS Kang, M Kozak, S Houshmand, and P Mathews. 2008. “MULTPATH: A Comprehensive Minitab Program for Computing Path Coefficients and Multiple Regression for Multivariate Analyses.” Journal of Crop Improvement, 22(1): 82–120.
Bondari, K. 1990. "PATH ANALYSIS IN AGRICULTURAL RESEARCH," Conference on Applied Statistics in Agriculture. https://do i.org/10.4148/2475-7772.1439
Cramer, C.S, TC Wehner, and SB Donaghy. 1999. “PATHSAS: A SAS Computer Program for Path Coefficient Analysis of Quantitative Data.” Journal of Heredity, 90(1): 260–62. https://doi.org/10 .1093/jhered/90.1.260.
Johnson, R.A., Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Prentice Hall, USA.
Li, C.C. 1975. Path Analysis: A Primer. Boxwood Pr. 346 p.
Olivoto, T, and A Dal’Col Lúcio. 2020. “Metan: An r Package for Multi‐environment Trial Analysis.” Methods in Ecology and Evolution, 11(6): 783–89. https://doi.org/10.1111/2041-210 X.13384.
Wolfle, LM. 2003. “The Introduction of Path Analysis to the Social Sciences, and Some Emergent Themes: An Annotated Bibliography.” Structural Equation Modeling, 10(1): 1–34.
Wright, S. 1923. “The Theory of Path Coefficients a Reply to Niles’s Criticism.” Genetics, 8(3): 239.
———. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics, 5(3): 161–215.
———. 1960. “Path Coefficients and Path Regressions: Alternative or Complementary Concepts?” Biometrics, 16(2): 189–202.
correlation
, multiple linear regression
,
and matrix notations in mathematics.
lavaan
and diagrammeR
packages for
drawing path diagrams
data(dtsimp) matdiag(dtsimp, 1, verbose = FALSE) data(dtraw) matdiag(dtraw[, -1], 1, verbose = FALSE) data(heart) matdiag(heart, 2, verbose = FALSE)
data(dtsimp) matdiag(dtsimp, 1, verbose = FALSE) data(dtraw) matdiag(dtraw[, -1], 1, verbose = FALSE) data(heart) matdiag(heart, 2, verbose = FALSE)
network.plot()
draws the network plot of path coefficients analysis
network.plot(datap)
network.plot(datap)
datap |
The data set |
The network.plot()
draws a correlogram and a heatmap
for data, if requested by user
Returns an object of class network_plot
.
Ali Arminian [email protected]
Kuhn et al. 2022. corrr package. doi: <10.32614/CRAN.package.corrr> https://github.com/tidymodels/corrr
correlogram
, diagram
, and lavaan
packages
for drawing path diagrams.
data(dtraw2) network.plot(dtraw2)
data(dtraw2) network.plot(dtraw2)
Path.Analysis does descriptive statistics on dataset and importantly graphical representation of data such as drawing heatmaps, correlogram and path diagram.
Ali Arminian [email protected]
Useful links:
Report bugs at https://github.com/abeyran/Path.Analysis/issues
reg()
performs a multiple linear regression analysis with extracting the attributed parameters
reg(datap, resp, verbose = FALSE)
reg(datap, resp, verbose = FALSE)
datap |
The data set |
resp |
an integer value indicating the column in |
verbose |
If |
The reg
function fits a multiple linear regression analysis
of one or more independent (exogenous) variables on a dependent(endogenous)
variable in a linear pattern along with testing the significance of
parameters. It is important that according to the type of data may produce some warning errors e.g., for dtsimp as:
Warning message: In summary.lm(mlreg): essentially perfect fit: summary may be unreliable.
This case is due to the intrinsic characteristics of data
An object of class list
Ali Arminian [email protected]
multiple linear regression
data(dtsimp) reg(dtsimp, 1, verbose = FALSE) data(heart) reg(heart, 1, verbose = FALSE)
data(dtsimp) reg(dtsimp, 1, verbose = FALSE) data(heart) reg(heart, 1, verbose = FALSE)