Package 'Path.Analysis'

Title: Path Coefficient Analysis
Description: Facilitates the performance of several analyses, including simple and sequential path coefficient analysis, correlation estimate, drawing correlogram, Heatmap, and path diagram. When working with raw data, that includes one or more dependent variables along with one or more independent variables are available, the path coefficient analysis can be conducted. It allows for testing direct effects, which can be a vital indicator in path coefficient analysis. The process of preparing the dataset rule is explained in detail in the vignette file "Path.Analysis_manual.Rmd". You can find this in the folders labelled "data" and "~/inst/extdata". Also see: 1)the 'lavaan', 2)a sample of sequential path analysis in 'metan' suggested by Olivoto and Lúcio (2020) <doi:10.1111/2041-210X.13384>, 3)the simple 'PATHSAS' macro written in 'SAS' by Cramer et al. (1999) <doi:10.1093/jhered/90.1.260>, and 4)the semPlot() function of 'OpenMx' as initial tools for conducting path coefficient analyses and SEM (Structural Equation Modeling). To gain a comprehensive understanding of path coefficient analysis, both in theory and practice, see a 'Minitab' macro developed by Arminian, A. in the paper by Arminian et al. (2008) <doi:10.1080/15427520802043182>.
Authors: Ali Arminian [aut, cre, cph]
Maintainer: Ali Arminian <[email protected]>
License: GPL-3
Version: 0.1
Built: 2024-10-26 05:13:57 UTC
Source: https://github.com/cran/Path.Analysis

Help Index


Drawing the correlogram

Description

[Stable]

  • corr_plot() draws a correlogram for data

Usage

cor_plot(datap)

Arguments

datap

The data set

Value

Returns an object of class ⁠gg, ggmatrix⁠.

Author(s)

Ali Arminian [email protected]

References

Olivoto, T, and A Dal’Col Lúcio. 2020. “Metan: An r Package for Multi‐environment Trial Analysis.” Methods in Ecology and Evolution, 11(6): 783–89. https://doi.org/10.1111/2041-210 X.13384.

See Also

correlogram, diagram, and lavaan package for drawing path diagrams.

Examples

data(dtsimp)
cor_plot(dtsimp)

Correlation Analysis

Description

[Stable]

  • corr() estimates Pearson correlation coefficients among parametric numerical characteristics as follows:

  • ⁠The Pearson correlation coefficient:⁠ \[ r_{x,y} = \frac{n\sum{xy}-(\sum{x})(\sum{y})} {\sqrt{(n\sum{x^2}-(\sum{x})^2)(n\sum{y^2}-(\sum{y})^2)}}\]

or: \[ r_{x,y} =\frac{\Sigma(x-\bar{x})(y-\bar{y})} {\sqrt{\Sigma{(x-\bar{x})^2\Sigma(y-\bar{y}})^2}} \]

where \(r_{x,y}\) is the ⁠correlation coefficient⁠ between \(x\) and \(y\) variables.

Usage

corr(datap, verbose = FALSE)

Arguments

datap

The data set

verbose

If verbose = TRUE then some results are printed in the console.

Details

The corr() function estimates correlation coefficients and their significance in the form of a table of one or more independent (exogenous) variables on a dependent (endogenous) variable along with testing the significance.

Value

Returns a list of two objects:

Correlations

the data frame of Pearson's correlation coefficients

P_values

the data frame of significance of correlation coefficients (r):

  • p p-value for testing the r

  • lowCI lower confidence interval of r

  • uppCI upper confidence interval of r

Author(s)

Ali Arminian [email protected]

See Also

correlation

Examples

data(dtsimp)
corr(dtsimp, verbose = FALSE)


data(dtraw)
corr(dtraw[, -1], verbose = FALSE)

Data preparation

Description

[Experimental]

Prepares data for analyses

Usage

dataprep(datap)

Arguments

datap

dataset

Value

Returns a data frame


Descriptive statistics

Description

[Experimental]

  • desc() estimates the descriptive statistics such as Min(Minimum), ⁠1st Qu.⁠(quartile), Median, Mean (average), ⁠3rd Qu.⁠(3rd quartile), Max(maximum), var (variance), std.dev(standard deviation), coef.var (CV or coefficient of variation) of the data set.

Usage

desc(datap, resp)

Arguments

datap

The data set

resp

an integer value indicating the column in datap that corresponds to the response variable.

Details

The desc() function estimates the descriptive statistics, in tables for one or more independent (exogenous) variables on a dependent(endogenous) variable. It acts only on numerical variables. For example for the variable x:

  • ⁠1st. quartile:⁠ \[Q_1 = (n + 1) x 1/4\]

  • ⁠2nd. quartile or Median:⁠ \[md = (n + 1) x 2/4\]

  • ⁠3rd Qu.:⁠ \[Q_3 = (n + 1) x 3/4\]

  • ⁠Arithmetic mean:⁠ \[\bar{x}=\frac{1}{n} \sum_{i=i}^{n} x_{i}\]

  • ⁠Range:⁠ \[R_x = \max(x) - \min(x)\]

  • Variance: \[\sigma_{x}^2 = \frac{\sum_{i=1}^n(x_i-\bar{x})^2}{n} \]

  • ⁠Standard deviation:⁠ \[sd_x = \sqrt{\frac{\sum_{i} (x_{i} - \mu)^2}{n}}\]

  • ⁠SEM or SE.mean⁠, the standard error of the mean is calculated simply by taking the standard deviation and dividing it by the square root of the sample size: \[SEM_x = \frac{sd(x)}{\sqrt{n}}\]

  • ⁠coef.var or coefficient of variation:⁠ \[CV = \frac{sd(x)}{\bar{x} }\times 100\]

Value

Returns a list of 3 objects:

desc1

Descriptive statistics1 of input data

desc2

Descriptive statistics2 of input data

corcf

A table of correlation coefficients

Author(s)

Ali Arminian [email protected]

References

Bhattacharyya GK and Johnson RA 1997. Statistical Concepts and Methods, John Wiley and Sons, New York.

Draper N and Smith H 1981. Applied Regression Analysis, John Wiley & Sons, New York.

Neter, J, Whitmore, GA, Wasserman, W 1992. Applied Statistics. Allyn & Bacon, Incorporated, ISBN 10: 0205134785 / ISBN 13: 9780205134786.

Snedecor, G.W., Cochran, W.G. 1980. Statistical Methods. Iowa State University Press.

See Also

correlation, multiple linear regression,

Examples

data(dtsimp)
desc(dtsimp, 1)


data(dtraw)
desc(dtraw[, -1], 1)

data(heart)
desc(heart, 2)

Dataset 2: a number of 9 traits measured on 35 Camelina DH lines.

Description

Dataset 2: a number of 9 traits measured on 35 Camelina DH lines.

Usage

data(dtraw)

Format

A data.frame with 35 observations of 9 variables.

⁠DH lines⁠

a character vector

y

a numeric vector

X1

a numeric vector

X2

a numeric vector

X3

a numeric vector

X4

a numeric vector

X5

a numeric vector

X6

a numeric vector

X7

a numeric vector

X8

a numeric vector

Examples

library(Path.Analysis)
data(dtraw)

Dataset 3: a number of 9 traits measured on 35 Camelina DH lines.

Description

Dataset 3: a number of 9 traits measured on 35 Camelina DH lines.

Usage

data(dtraw2)

Format

A data.frame with 35 observations of 9 variables.

⁠DH lines⁠

a character vector considered as rownames

y

a numeric vector

X1

a numeric vector

X2

a numeric vector

X3

a numeric vector

X4

a numeric vector

X5

a numeric vector

X6

a numeric vector

X7

a numeric vector

X8

a numeric vector

Examples

library(Path.Analysis)
data(dtraw2)

Dataset 4: a dataframe consisting of 7 variables measured on 8 observations.

Description

Dataset 4: a dataframe consisting of 7 variables measured on 8 observations.

Usage

data(dtseq)

Format

A data.frame with 8 observations of 7 variables.

Genotypes

a character vector

YLD

a numeric vector

DFT

a numeric vector

FS

a numeric vector

FV

a numeric vector

FW

a numeric vector

DFL

a numeric vector

FLP

a numeric vector

Examples

library(Path.Analysis)
data(dtseq)

Dataset5

Description

Dataset5

Usage

data(dtseqr)

Format

A data.frame with 24 observations of 7 variables.

Genotypes

a character vector

Rep

a numeric vector

YLD

a numeric vector

DFT

a numeric vector

FS

a numeric vector

FV

a numeric vector

FW

a numeric vector

DFL

a numeric vector

FLP

a numeric vector

Examples

library(Path.Analysis)
data(dtseqr)

Dataset 1: a dependent (y) and 3 independent(x1 to x3) variables.

Description

Dataset 1: a dependent (y) and 3 independent(x1 to x3) variables.

Usage

data(dtsimp)

Format

A data.frame with 105 observations of 4 variables.

y

a numeric vector

x1

a numeric vector

x2

a numeric vector

x3

a numeric vector

Examples

library(Path.Analysis)
data(dtsimp)

Dataset 6: Heart Disease data set

Description

A mixed variable dataset containing 14 variables of 297 patients for their heart disease diagnosis.

Usage

data(heart)

Format

A data.frame including 297 rows and 14 variables:

age

Age in years (numerical).

sex

Sex: 1 = male, 0 = female (logical).

heart.disease

a numeric vector as dependent.

biking

a numeric vector as the first independent.

smoking

a numeric vector as the 2nd independent.

Source

The data set is belong to machine learning repository of UCI. The original data set includes 303 patients with 6 NA's. After removing missing values, it reduced into 297 patients.

https://archive.ics.uci.edu/ml/datasets/Heart+Disease

References

Lichman, M. (2013). UCI machine learning repository.

Examples

library(Path.Analysis)
data(heart)

Creating the Heatmap chart

Description

[Stable]

  • heat_map() draws a double-clustered heatmap for path coefficients analysis. Please be cautious that this function acts only on numeric variables/columns (see example on dtraw2 data set). Users for drawing other types of heatmaps may use heatmap.3, ComplexHeatmap and pheatmap R packages. Where an example is given in the vignette manual of this package (Path.Analysis_manual.Rmd)

Usage

heat_map(datap)

Arguments

datap

The data set

Value

Returns an object of class heatmap.2.

Author(s)

Ali Arminian [email protected]

See Also

lavaan and diagram packages for drawing path diagrams.

Examples

data(dtraw2)
dtraw2 <- scale(as.data.frame(dtraw2))
heat_map(dtraw2)

Direct and Indirect Effects Matrices and Diagram

Description

[Stable]

  • matdiag() extracts the direct effect and indirect effects matrices of data in path analysis along with the significance of direct effects where direct effects are shown as a vector (columnar matrix of 1*n dimensions and indirect effects are off-diagonal effects. Later, draws a diagram for path coefficient analysis based on the DiagrammeR package.

Usage

matdiag(datap, resp, verbose = FALSE)

Arguments

datap

The data set

resp

The response variable

verbose

If verbose = TRUE then some results are printed

Details

The matdiag function estimates the direct and indirect effects in path coefficient analysis as tables along with drawing the diagram of path analysis. This is apparently the only program testing the significance of direct effects in a path analysis. Note: all variables must be numeric for matrix calculations and the next plotting.

  • In a path model, path coefficients or direct effects (Pi's) indicate the direct effects of a variable on another, and are standardized partial regression coefficients (in Wright's terminology) due they are estimated from correlations or from the transformed (standardized) data as:

\[P_i = \beta_i\frac{\sigma_{X_i}}{\sigma_Y} \]
  • The path equations are as follows:

  • One dependent variable: \[P_1 + P_2r_{12} + P_3r_{13} + ... + P_nr_{1n} = rY_1\] \[P_1r_{21} + P_2 + P_3r_{23} + ... + P_nr_{2n} = rY_2\] \[...\] \[P_1rn_1 + P_2r_{n2} + P_3r_{n3} + ... + P_n = rY_n\]

  • Extension to more dependent variables: Path.Analysis is capable of performing this straightforward function through detailed explanations. The linear regression model with a single response in its form is as follows (Johnson and Wichern (2007): \(Y = \beta_0 + \beta_1Z_1 + ... + \beta_rZ_r + \epsilon\)

    where the multivariate multiple linear regression model is as follows: \[Y_1 = \beta_0 + \beta_1Z_{11} + \beta_2Z{12} + ... + \beta_rZ_{1r} + \epsilon_1\] \[Y_2 = \beta_0 + \beta_1Z_{21} + \beta_2Z{22} + ... + \beta_rZ_{2r} + \epsilon_2\] \[...\] \[Y_n = \beta_0 + \beta_1Z_{n1} + \beta_2Z{n2} + ... + \beta_rZ_{nr} + \epsilon_n\]

    As stated by Bondari (1990), for two dependent variables \(Y_1\) and \(Y_2\): \[ Y_1 = p_1X_1 + p_2X_2 + p_3X_3 + ... + p_nX_n \] \[ Y_2 = p'_1X_1 + p'_2X_2 + p'_3X_3 + ... + p'_nX_n \] \[ ... \]

where: \[ r_{Y_1Y_2} = p_1p'_1 + p_2p'_2 + p_3p'_3 + ... + p_np'_n + \sigma_{i=j}p_ip'_1r_{ij} = \sigma_{i,j}p_ip'_ir_{ij} \]

Value

Returns a list with three objects

direff

a data frame of direct effects

matall

a matrix of direct and indirect effects

Residual

a constant of residuals

Author(s)

Ali Arminian [email protected]

References

Arminian, A, MS Kang, M Kozak, S Houshmand, and P Mathews. 2008. “MULTPATH: A Comprehensive Minitab Program for Computing Path Coefficients and Multiple Regression for Multivariate Analyses.” Journal of Crop Improvement, 22(1): 82–120.

Bondari, K. 1990. "PATH ANALYSIS IN AGRICULTURAL RESEARCH," Conference on Applied Statistics in Agriculture. https://do i.org/10.4148/2475-7772.1439

Cramer, C.S, TC Wehner, and SB Donaghy. 1999. “PATHSAS: A SAS Computer Program for Path Coefficient Analysis of Quantitative Data.” Journal of Heredity, 90(1): 260–62. https://doi.org/10 .1093/jhered/90.1.260.

Johnson, R.A., Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Prentice Hall, USA.

Li, C.C. 1975. Path Analysis: A Primer. Boxwood Pr. 346 p.

Olivoto, T, and A Dal’Col Lúcio. 2020. “Metan: An r Package for Multi‐environment Trial Analysis.” Methods in Ecology and Evolution, 11(6): 783–89. https://doi.org/10.1111/2041-210 X.13384.

Wolfle, LM. 2003. “The Introduction of Path Analysis to the Social Sciences, and Some Emergent Themes: An Annotated Bibliography.” Structural Equation Modeling, 10(1): 1–34.

Wright, S. 1923. “The Theory of Path Coefficients a Reply to Niles’s Criticism.” Genetics, 8(3): 239.

———. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics, 5(3): 161–215.

———. 1960. “Path Coefficients and Path Regressions: Alternative or Complementary Concepts?” Biometrics, 16(2): 189–202.

See Also

correlation, ⁠multiple linear regression⁠, and matrix notations in mathematics.

lavaan and diagrammeR packages for drawing path diagrams

Examples

data(dtsimp)
matdiag(dtsimp, 1, verbose = FALSE)

data(dtraw)
matdiag(dtraw[, -1], 1, verbose = FALSE)

data(heart)
matdiag(heart, 2, verbose = FALSE)

Network plot

Description

[Stable]

  • network.plot() draws the network plot of path coefficients analysis

Usage

network.plot(datap)

Arguments

datap

The data set

Details

The network.plot() draws a correlogram and a heatmap for data, if requested by user

Value

Returns an object of class network_plot.

Author(s)

Ali Arminian [email protected]

References

Kuhn et al. 2022. corrr package. doi: <10.32614/CRAN.package.corrr> https://github.com/tidymodels/corrr

See Also

correlogram, diagram, and lavaan packages for drawing path diagrams.

Examples

data(dtraw2)
network.plot(dtraw2)

Path Coefficient Analysis

Description

Path.Analysis does descriptive statistics on dataset and importantly graphical representation of data such as drawing heatmaps, correlogram and path diagram.

Author(s)

Ali Arminian [email protected]

See Also

Useful links:


Multiple Linear Regression

Description

[Experimental]

  • reg() performs a multiple linear regression analysis with extracting the attributed parameters

Usage

reg(datap, resp, verbose = FALSE)

Arguments

datap

The data set

resp

an integer value indicating the column in datap that

verbose

If verbose = TRUE then some results are printed in the console. corresponds to the response variable.

Details

The reg function fits a multiple linear regression analysis of one or more independent (exogenous) variables on a dependent(endogenous) variable in a linear pattern along with testing the significance of parameters. It is important that according to the type of data may produce some warning errors e.g., for dtsimp as: Warning message: In summary.lm(mlreg): essentially perfect fit: summary may be unreliable. This case is due to the intrinsic characteristics of data

Value

An object of class list

Author(s)

Ali Arminian [email protected]

See Also

⁠multiple linear regression⁠

Examples

data(dtsimp)
reg(dtsimp, 1, verbose = FALSE)


data(heart)
reg(heart, 1, verbose = FALSE)