Lecture notes.
Many examples were in SAS. Since currently SAS is unavailable in the institute network it may not be straight forward for non-SAS users. However with the SAS source codes provded in this notes the burden should be relieved quite a lot. We also give some details about MLn, Mx and SAGE.
In the analysis of the the neuroticism dataset, we used a few packages: SAS in examples 5.1, 5.2, 5.3, 5.9 5.16, 5.18, SPSS in example 5.10, SPSS and MLn in example 5.11, Mx in examples 5.20 and 5.21.
Important !!!
We do not intend to distribute
many files. The exceptions are the original data files which are
too long to include or too difficult to make, and some Mx programs
that are very similar to the codes within this notes. In order
to replicate the examples you could take advantage of the copy/cut
and paste utility under MS Windows and create your own
programs. For example if you want to run the programs for SPSS
you could first block the code from web browser (Netscape or Internet
Explorer), then select copy from its Edit menu, and finally
switch and paste to your SPSS syntax window. From there
you block the code and click the execute button to get the result.
SPSS can also paste your selections from its own menu. By doing
this you would be able to check your choices with ours.
The book chapter makes extensive
use of Dr
Alison MacDonald's data on personality dimension neuroticism.
Originally the dataset contains the following variables.
From these, new variables t1, t2 and group were created for the subsequent analyses.
Table 5.1 Cross-tabulation of neuroticism for MZ and DZ twin pairs.
MZ
------------------------------------------------------------------
T2
----------------------------------------------------------
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 ALL
------------------------------------------------------------------
T1
3
1 1
2
4
1 3 4 2
10
5
1 1 12 4
18
6
1 2 6 11 6
26
7
5 5 8 7 6
31
8
1 1 1 4 8 6 3
24
9
1 2 1 5 12 9 8 7
45
10
5 3 2 3 8 5 5 4
35
11
1 1 4 3 3 9 8 3
5
37
12
2 3 3 4 4 6 8 9
4 3
46
13
1 2 4 5 4 7
7 6 6 6 3
51
14
1 1 3 2 3 1
4 4 9 4 13 5
50
15
2 1 2 1 4 4 1 2
5 7 7 3
39
16
1 1 2 1 2 4
4 5 6 4 4 1
35
17
1 1 2 3 2 1
4 6 6 9 1
36
18
1 1 1 2
1 2 2 5 1 4
1 21
19
1
1 1 2 1 1 1
2 3 1 14
20
1
1
21
1
1
ALL 1 11
24 41 49 55 47 47 52 35 34 28 35 22 18 11 7 4
1 522
------------------------------------------------------------------
DZ
------------------------------------------------------------
T2
----------------------------------------------------
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 19 ALL
------------------------------------------------------------
T1
4
1 1
2
5
1 3 1
5
6
1 2 3
6
7
4 1 4 6 1
16
8
3 3 1 6 4 4
21
9
3 4 2 6 2 5 3
25
10
1 3 4 2 1 2
2 3
18
11
2 1 2 4 4 4 4 2
23
12
2 2 1 2 2 4
2 4 1 1
21
13
2 1 2 1 1 3
1 3 7 1
22
14
1 2 2 5 2 1
2 4 2 1 1
23
15
1 2 1 1 2 4 5 2
4 2
24
16
1 3 2 1 1
4 1 2 1 4 3
2 25
17
1 2 1 1 2 1
1 2 2 3
1 1 18
18
1 1 3 1
3 1 2 1
1 14
19
1 1 1 1
2 1 7
20
1
1 2
ALL 7 21
23 24 36 25 23 23 22 15 18 7 11 8 3 5
1 272
------------------------------------------------------------
The above table was created by the following SAS program.
libname x '';
data subset;
set x.alison;
where zyg2=2
or zyg2=4;
if zyg2=2 then
do; t1=max(n_1,n_2); t2=min(n_1,n_2); end;
if zyg2=4 then
do; t1=max(n_1,n_2); t2=min(n_1,n_2); end;
t1=max(n_1,n_2);
t2=min(n_1,n_2);
td=t1-t2;tm=mean(t1,t2);
group=zyg2/2;
run;
options nocenter ls=120
missing=' ';
title2 'MZ';
proc tabulate data=subset
fc=' ----------' noseps f=2. missing;
class t1 t2;
where zyg2=2;
table t1 all,
t2 all/rts=5;
keylabel n='
';
run;
title2 'DZ';
proc tabulate data=subset
fc=' ----------' noseps f=2. missing;
class t1 t2;
where zyg2=4;
table t1 all,
t2 all/rts=5;
keylabel n='
';
run;
Example 5.1 One-way ANOVA of neuroticism.
MZ sample
Dependent Variable:
N
Source
DF Sum of Squares Mean Square
F Value Pr > F
Model
521 13445.23
25.81 2.72
.00
Error
522 4953.00
9.49
Corrected Total 1043
18398.23
R-Square
C.V. Root MSE
N Mean
.73
30.04 3.08
10.25
Source
DF Anova SS
Mean Square F Value Pr > F
PAIRID
521 13445.23
25.81 2.72
.00
DZ sample
Dependent Variable:
N
Source
DF Sum of Squares Mean Square
F Value Pr > F
Model
271 5970.45
22.03 1.48
.00
Error
272 4053.50
14.90
Corrected Total
543 10023.95
R-Square
C.V. Root MSE
N Mean
.60
37.86 3.86
10.20
Source
DF Anova SS
Mean Square F Value Pr > F
PAIRID
271 5970.45
22.03 1.48
.00
The F-test for a genetic
contribution.
F=MSW(DZ)/MSW(MZ):=1.5700737619
p=.000006513
The corresponding SAS program is as follows.
data anova;
set x.alison
(keep=pairid n_1 n_2 zyg2);
where zyg2=2
or zyg2=4;
keep pairid
n nc a c d z twin group wgt;
group=zyg2/2;wgt=.5;
n=n_1;nc=n_2;c=nc;
if group=1 then do; a=nc; d=nc; z=1; end;
else do;a=nc/2; d=nc/4; z=0;end;
twin=1;output;
n=n_2;nc=n_1;c=nc;
if group=1 then do; a=nc; d=nc; z=1; end;
else do;a=nc/2; d=nc/4; z=0;end;
twin=2;output;
run;
title2' test for MZ
correlation';
proc anova;
where group=1;
class pairid;
model n=pairid;
run;
title2' test for DZ
correlation';
proc anova;
where group=2;
class pairid;
model n=pairid;
run;
data _null_;
file print;
put' Test of
genetic contribution';
f=14.90/9.49;p=1-probf(f,272,522);
put 'F=MSW(DZ)/MSW(MZ):='
f 'p=' p 10.9;
run;
This part shows testing for heteroscedasticity, intraclass correlation, the SAS codes are self-explanatory.
Example 5.2 The F-test for homoscedasticity.
F=MST(DZ)/MST(MZ):=1.0465195042 p=.268726137
The Z-tests for twin pair
correlations are as follows.
MZ z=0.0468542566 p=.142892555
DZ z=0.0038100184 p=.475086731
Example 5.3 The intraclass correlations for MZ and DZ twins.
MZ r=(MSB-MSW)/(MSB+MSW)=0.4623229462
DZ r=(MSB-MSW)/(MSB+MSW)=0.1930679664
The Z-test for equal intraclass
correlations is as follows.
z =4.0625533933 p =.000024269
As a measure of the proportion
of phenotypic variance accounted for by genetic factors.
Holzinger's H =
0.33368
Example 5.9 The MZ:DZ ratio in intraclass correlations.
Ratio of intraclass
correlations=2.3946123986
Va=0.3099489195 Vd=0.1523740267
Ve=0.5376770538
Narrow heritability
(S.E.)=0.3099489195 0.2360174398
Broad heritability
(S.E.)=0.4623229462 0.0344135556
The SAS program is as follows. It is obvious that you could use QBASIC or a calculator to perform most of the computations.
data _null_;
file print;
put' Test
of homoscedasticity';
f=(10023.95/543)/(18398.23/1043);p=1-probf(f,543,1043);
put 'F=MST(DZ)/MST(MZ):='
f 'p=' p 10.9;
put 'Test
of twin pair correlation';
r=0.04682;n=522;
z=0.5*log((1+r)/(1-r));p=1-probnorm(z/sqrt(1/(n-3)));
put 'MZ z='
z 'p=' p 10.9;
r=0.00381;n=272;
z=0.5*log((1+r)/(1-r));p=1-probnorm(z/sqrt(1/(n-3)));
put 'DZ z='
z 'p=' p 10.9;
put 'Intraclass
correlation for MZ';
rm=(25.81-9.49)/(25.81+9.49);
zm=0.5*log((1+rm)/(1-rm));
put 'MZ r=(MSB-MSW)/(MSB+MSW)='
rm;
put 'Intraclass
correlation for DZ';
rd=(22.03-14.90)/(22.03+14.90);
zd=0.5*log((1+rd)/(1-rd));
put 'DZ r=(MSB-MSW)/(MSB+MSW)='
rd;
put 'Test for
equal intraclass correlations';
z=(zm-zd)/(sqrt(1/(522-2)+1/(272-2)));
p=1-probnorm(z);
put 'z =' z
'p =' p 10.9;
h=(rm-rd)/(1-rd);
put 'Holzinger''s
H =' h 10.5;
rr=rm/rd;
put 'Ratio of
intraclass correlations=' rr;
va=4*rd-rm;vd=2*rm-4*rd;ve=1-va-vd;
hn=va; hb=va+vd;
put 'Components
of genetic variances:';
put 'Va=' va
'Vd=' vd 'Ve=' ve;
var_rm=(1-rm**2)**2/522;
var_rd=(1-rd**2)**2/272;
var_hn=16*var_rd+var_rm;se_hn=sqrt(var_hn);
var_hb=var_rm;
se_hb=sqrt(var_hb);
put 'Narrow
heritability (S.E.)=' hn se_hn;
put 'Broad heritability
(S.E.)=' hb se_hb;
run;
Example 5.10 The linear regression approach for estimating heritability.
The dataset has already been created in example 5.1. Now denote y the mean centered trait value of cotwin we can perform a linear regresssion of y on A2 and D2 for MZ and DZ twins together using SPSS.
* * * * MULTIPLE REGRESSION THROUGH THE ORIGIN * * * *
Listwise Deletion of Missing Data
Equation Number 1 Dependent Variable.. Y
Block Number 1. Method: Enter A2 D2 Z
Variable(s) Entered
on Step Number
1..
Z
2..
A2
3..
D2
Multiple R
.38838
R Square
.15084
Adjusted R Square
.14762
Standard Error
3.90599
Analysis of Variance
DF Sum of Squares
Mean Square
Regression
3 2143.63633
714.54544
Residual
791 12068.07627
15.25673
F = 46.83476 Signif F = .0000
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
A2
.303566 .224411 .260339
1.353 .1765
D2
.158013 .235236 .129277
.672 .5020
Z
.013347 .170963 .002558
.078 .9378
* * * * MULTIPLE REGRESSION THROUGH THE ORIGIN * * * *
Equation Number 1 Dependent Variable.. Y
Block Number
2. Method: Backward Criterion
POUT .1000
A2
D2 Z
Variable(s) Removed
on Step Number
4..
Z
Multiple R
.38837
R Square
.15083
Adjusted R Square
.14868
Standard Error
3.90354
Analysis of Variance
DF Sum of Squares
Mean Square
Regression
2 2143.54334
1071.77167
Residual
792 12068.16926
15.23759
F = 70.33736 Signif F = .0000
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
A2
.303547 .224270 .260323
1.353 .1763
D2
.158051 .235088 .129308
.672 .5016
Equation Number 1 Dependent Variable.. Y
Variable(s) Removed
on Step Number
5..
D2
Multiple R
.38774
R Square
.15034
Adjusted R Square
.14927
Standard Error
3.90219
Analysis of Variance
DF Sum of Squares
Mean Square
Regression
1 2136.65606
2136.65606
Residual
793 12075.05654
15.22706
F = 140.31970 Signif F = .0000
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
A2 .452124 .038168 .387743 11.846 .0000
The SPSS program should be similar to the following.
* Program for Page
216.
GET
FILE='C:\iop\lecture\anova.sav'.
EXECUTE .
WEIGHT BY wgt.
REGRESSION
/MISSING LISTWISE
/STATISTICS
COEFF OUTS R ANOVA
/CRITERIA=PIN(.05)
POUT(.10)
/ORIGIN
/DEPENDENT y
/METHOD=ENTER
a2 d2 z /METHOD=BACKWARD a2 d2 z.
Example 5.11 ADE and AE models using SPSS and MLn.
Linear regression under selection with weight=(108+66)/(131+75)=0.845 using SPSS
* * * * MULTIPLE REGRESSION THROUGH THE ORIGIN * * * *
Listwise Deletion of Missing Data
Equation Number 1 Dependent Variable.. Y
Block Number 1. Method: Enter A2 D2
Variable(s) Entered
on Step Number
1..
D2
2..
A2
Multiple R
.54802
R Square
.30033
Adjusted R Square
.29220
Standard Error
4.15175
Analysis of Variance
DF Sum of Squares
Mean Square
Regression
2 1273.13406
636.56703
Residual
172 2965.97524
17.23703
F = 36.93021 Signif F = .0000
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
A2
.373490 .304826 .450112
1.225 .2222
D2
.086413 .320181 .099147
.270 .7876
* * * * MULTIPLE REGRESSION THROUGH THE ORIGIN * * * *
Equation Number 1 Dependent Variable.. Y
Block Number
2. Method: Backward Criterion
POUT .1000
A2
D2
Variable(s) Removed
on Step Number
3..
D2
Multiple R
.54775
R Square
.30003
Adjusted R Square
.29599
Standard Error
4.14061
Analysis of Variance
DF Sum of Squares
Mean Square
Regression
1 1271.87852
1271.87852
Residual
173 2967.23078
17.14469
F = 74.18500 Signif F = .0000
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
A2
.454511 .052770 .547754
8.613 .0000
The SPSS program should look like the following.
* Program for page
218.
GET
FILE='C:\iop\lecture\anova.sav'.
EXECUTE .
* PROCESS IF is not
recognized now.
SELECT IF (nc>15).
COMPUTE wgt_new=0.845.
WEIGHT BY wgt_new.
REGRESSION
/MISSING LISTWISE
/STATISTICS
COEFF OUTS R ANOVA
/CRITERIA=PIN(.05)
POUT(.10)
/ORIGIN
/DEPENDENT y
/METHOD=ENTER
a2 d2 /METHOD=BACKWARD a2 d2.
MLn
MLN - Software for N-level analysis. Tue Feb 17 16:59:22 1998
name c1 'ID' c2
'Y' c3 'wgt' c4 'A2' c5 'D2' c6 'C2' c7 'CONS' c8 'CONSD'
resp 'Y'
note weight==.5
iden 1 'id' 2 'cons'
expl 'cons' 'consd'
'a2' 'd2'
fpar 'cons' 'consd'
setv 1 'cons' 'consd'
sett
EXPLanatory variables
in CONS
CONSD A2
D2
FPARameters
A2 D2
RESPonse variable in
Y
FSDErrors : uncorrected
RSDErrors : uncorrected
MAXIterations
20 TOLErance 2
METHod is IGLS BATCh is OFF
IDENtifying codes :
1-ID, 2-CONS
LEVEL 1 RPM
CONS CONSD
CONS
1
CONSD
1 1
star
rand
fixe
like
next
Convergence achieved
rand
LEV. PARAMETER
(NCONV) ESTIMATE S. ERROR(U)
PREV. ESTIM CORR.
-------------------------------------------------------------------------------
1
CONS /CONS ( 1)
14.11 1.743
14.11
1
CONSD /CONS ( 1)
4.028 2.008
4.028
1
CONSD /CONSD ( 2)
0
0
0
fixe
PARAMETER
ESTIMATE S. ERROR(U) PREV.
ESTIMATE
A2
0.3722 0.3144
0.3722
D2
0.08768 0.3247
0.08768
like
-2*log(lh) is
1163.69
fpar 'd2'
sett
EXPLanatory variables
in CONS
CONSD A2
D2
FPARameters
A2
RESPonse variable in
Y
FSDErrors : uncorrected
RSDErrors : uncorrected
MAXIterations
20 TOLErance 2
METHod is IGLS BATCh is OFF
IDENtifying codes :
1-ID, 2-CONS
LEVEL 1 RPM
CONS CONSD
CONS
1
CONSD
1 1
star
rand
fixe
like
next
Convergence achieved
rand
LEV. PARAMETER
(NCONV) ESTIMATE S. ERROR(U)
PREV. ESTIM CORR.
-------------------------------------------------------------------------------
1
CONS /CONS ( 1)
14.11 1.743
14.11
1
CONSD /CONS ( 1)
4.038
2.01 4.037
1
CONSD /CONSD ( 2)
0
0
0
fixe
PARAMETER
ESTIMATE S. ERROR(U) PREV.
ESTIMATE
A2
0.4562 0.04503
0.4562
like
-2*log(lh) is
1163.76
clrv 1 'consd'
tole 4
maxi 100
sett
EXPLanatory variables
in CONS
CONSD A2
D2
FPARameters
A2
RESPonse variable in
Y
FSDErrors : uncorrected
RSDErrors : uncorrected
MAXIterations 100
TOLErance 4 METHod
is IGLS BATCh is OFF
IDENtifying codes :
1-ID, 2-CONS
LEVEL 1 RPM
CONS
CONS
1
star
rand
fixe
like
next
Convergence achieved
rand
LEV. PARAMETER
(NCONV) ESTIMATE S. ERROR(U)
PREV. ESTIM CORR.
-------------------------------------------------------------------------------
1
CONS /CONS ( 1)
17.05
1.68 17.05
fixe
PARAMETER
ESTIMATE S. ERROR(U) PREV.
ESTIMATE
A2
0.4544 0.04836
0.4544
like
-2*log(lh) is
1168.81
stop
The MLn script is as follows. Suppose our MLn command file is anova.ml1, start MLn from network (by typing N:\MLN) and then obey anova.ml1, then you should be able to get the above output. Note the location of data and logo files need be changed.
dinp c1-c8
u:\lecture\genetics\lecture\anova.1
logo u:\lecture\genetics\anova.out
name c1 'ID' c2 'Y'
c3 'wgt' c4 'A2' c5 'D2' c6 'C2' c7 'CONS' c8 'CONSD'
resp 'Y'
note weight==.5
iden 1 'id' 2 'cons'
expl 'cons' 'consd'
'a2' 'd2'
fpar 'cons' 'consd'
setv 1 'cons' 'consd'
sett
star
rand
fixe
like
next
rand
fixe
like
fpar 'd2'
sett
star
rand
fixe
like
next
rand
fixe
like
clrv 1 'consd'
tole 4
maxi 100
sett
star
rand
fixe
like
next
rand
fixe
like
stop
Example 5.16 Falconer's approximation for MZ and DZ tetrachoric correlations.
MZ A=1.6424560179
RM=0.4644076033
DZ A=1.5928158852 RD=0.2371256092
Example 5.18 The tetrachoric correlations and VA, VD estimates.
RM=0.4883032777
SEM=0.1044075369
RD=0.2483505375
SED=0.3640990882
VA=0.4799054804
SVA=0.7575462488
VC=0.0083977973 SVC=0.7356449672
The thresholds of the liability to have a high neuroticism score (>15) are given by example 5.12, the SAS program for examples 5.16 and 5.18 is as follows.
data _null_;
n=108;
t=1.145;
a=pdf('normal',t,0,1);
a=a/(1-probnorm(t));
qm=46/131;
rm=(t-probit(1-qm))/a;
sem=(1/(a**2*rm))*sqrt((1-qm)/(qm*n));
put a= rm= ;
d=probit(1-qm);
rm=(t-d*sqrt(1-(t**2-d**2)*(1-t/a)))/(a+d**2*(a-t));
put rm= sem=;
n=66;
t=1.084;
a=pdf('normal',t,0,1);
a=a/(1-probnorm(t));
qd=18/75;
rd=(t-probit(1-qd))/a;
sed=(1/(a**2*rd))*sqrt((1-qd)/(qd*n));
put a= rd= ;
d=probit(1-qd);
rd=(t-d*sqrt(1-(t**2-d**2)*(1-t/a)))/(a+d**2*(a-t));
put rd= sed=;
va=2*(rm-rd);
sva=sqrt(4*(sed**2+sem**2));
vc=2*rd-rm;
svc=sqrt(4*sed**2+sem**2);
put va= sva=;
put vc= svc=;
run;
Example 5.20 Structural equation models for continuous twin data using Mx.
Mx could simply be started with the following ways:
mx
mx
inputfile outputfile
mx
<myfile.mx >myfile.mxo &
Under the first format, the program prompts: Please enter the
input filename (Ctrl-c to quit).
The second format is to provide
input and output filenames as command line parameters. Usually
inputfile and outputfile have extension .mx and .mxo, respetively.
The third format start Mx in the background under Unix environment.
Under MS Windows, Mx could be setup as short cut; after execution
you could open and execute an existing command file.
The script have three things: keywords, parameters and numbers. Keywords must start on a new line. Parameters pass information , e.g. filenames or numbers to commands. Anything after ! is comment. An minimum specifications will include the following commands.
Take the script of the second example from Mx, the observed covariance matrix (COV) was obtained from 150 observations about three variables,. It tries to establish the factor model COV=AA'+D, where A and D are the factor loading matrix and unique factor matrix, respectively. Each of them has three unknown parameters.
!
! Anything after a
! is a comment
!
! Mx script to fit
a simple model
! One factor in A plus
specifics in D
! Use #define to simplify
setting matrix dimensions
!
#define nvar 3
#define nfac 1
Simple MX example
file
Ngroups=1
Data NObservations=150
NInput_vars=nvar
CMatrix
1.2
.8 1.3
.4 .3 1.4
Begin Matrices;
A Full nvar
nfac Free
D Diag nvar
nvar Free
End Matrices;
Start .5 All
! All free parameters to start at .5
Bound 0 100 D
1 1 to D nvar nvar ! Keep specific variances positive
Covariance A*A' + D ;
Option RS
! to get residuals
End
The output from Mx is as follows.
MATRIX A
This is a FULL
matrix of order 3 by 1
1
1
1.0328
2
0.7746
3
0.3873
MATRIX D
This is a DIAGONAL
matrix of order 3 by 3
1 2
3
1
0.1333
2
0.0000 0.7000
3
0.0000 0.0000
1.2500
OBSERVED COVARIANCE
MATRIX
1 2
3
1
1.2000
2
0.8000 1.3000
3
0.4000 0.3000
1.4000
EXPECTED COVARIANCE
MATRIX
1 2
3
1
1.2000
2
0.8000 1.3000
3
0.4000 0.3000
1.4000
RESIDUAL MATRIX
1
2
3
1
-1.7704E-07
2
-2.0424E-07 -1.4347E-07
3
-1.2676E-07 3.5314E-08 -5.2318E-08
Function value
of this group: 6.7271E-12
Where the fit
function is Maximum Likelihood
Your model has
6 estimated parameters and 6 Observed
statistics
Chi-squared fit
of model >>>>>>>
0.000
Degrees of freedom
>>>>>>>>>>>>>
0
Probability incalculable
Akaike's Information
Criterion > 0.000
RMSEA >>>>>>>>>>>>>>>>>>>>>>>>>>
0.000
It is a saturated model so the Chi-square fit is zero and none degree of freedom left. Now turn to our neuroticism dataset, we can use MxWin to generate the path diagram (alison.mxd) such as figure 5.5.
The script for ADE model is as follows.
!A D E model fitted
to Alison's Data
G1: model parameters
Data Calc NGroups=4
Begin Matrices;
X Lower 1 1 Free
! genetic structure
Y Lower 1 1 Fixed
! common environmental structure
Z Lower 1 1 Free
! specific environmental structure
W Lower 1 1 Free
! dominance structure
Begin Algebra;
A= X*X' ;
C= Y*Y' ;
E= Z*Z' ;
D= W*W' ;
End Algebra;
End
G2: Monozygotic
twin pairs
Data NInput-vars=2
NObservations=522
LAbels N_T1 N_T2
CMatrix
17.64
8.1421 17.64
Matrices= Group 1
COvariances A+C+D+E
| A+C+D _
A+C+D |
A+C+D+E /
Options RSidual SE
CI=95
END
G3: Dizygotic twin
pairs
Data NInput_vars=2
NObservations=272
LAbels N_T1 N_T2
CMATRIX
18.4599
3.5303 18.4599
MAtrices= GROUP 1
H Full 1 1
Q Full 1 1
Covariances A+C+D+E
| H@A+C+Q@D _
H@A+C+Q@D | A+C+D+E
/
MAtrix H .5
MAtrix Q .25
STart .6 ALL
Options Multiple RSidual
SE CI=95
End
G4: beta-test
DAta calc
MAtrices= Group 1
compute (A|C|D|E) @
(A+C+D+E)~/
Options RS ND=3 SE
CI=95
Options multiple
End
Mx scripts can be split into many groups and allows communication between them, it is thus more flexible than program such as LISREL. The program gives the following estimates.
EXPECTED MATRIX of this
CALCULATION group
1 2
3 4
1
0.279 0.000 0.188
0.533
Example 5.21 Structural equation model analysis assuming complete ascertainment.
!A D E model fitted
to Alison's Data with ascertainment
G1: model parameters
Data Calc NGroups=9
Begin Matrices;
X Lower 1 1 Free
! genetic structure
Y Lower 1 1 Fixed
! common environmental structure
Z Lower 1 1 Free
! specific environmental structure
W Lower 1 1 Free
! dominance structure
M Full 1 2 Fixed
T Full 2 1 Fixed
V Full 2 1 Free
End Matrices
Matrix T 15.5 15.5
Matrix M 10.23 10.23
! Specify M
! 0 0
Begin Algebra;
A= X*X' ;
C= Y*Y' ;
E= Z*Z' ;
D= W*W' ;
V= T-M' ;
End Algebra;
End
G2: Monozygotic
twin pairs
Data NInput-vars=2
NObservations=131
Raw_data file=mz.dat
Labels N_t1 N_t2
Matrices= Group 1
Mean M /
Covariances A+C+D+E
| A+C+D _
A+C+D | A+C+D+E /
Options RSidual
End
G3: dummay group
Data Ninput=2
CTable 2 2
0 0
0 0
! It's full of zeros
so it contributes zero to the function
Matrices = Group 1
Thresholds V /
Covariances A+C+D+E
| A+C+D _
A+C+D | A+C+D+E /
Option RSiduals
End
G4: Calculate ascertainment
correction
Data Ninput=0
Matrices
I Iden 1 1
J Izero 1 2
P Full 2 2 = %P3
T Full 1 1
K Unit 2 1
Compute T*\ln(I-J*P*K)
/
Matrix T 262
! twice the sample size of group 2
Options User-defined
RSiduals Multiple
End
G5: Dizygotic twin
pairs
Data NInput_vars=2
NObservations=75
Raw_data file=dz.dat
Labels N_t1 N_t2
Matrices= Group 1
H Full 1 1
Q Full 1 1
Mean M /
Covariances A+C+D+E
| H@A+C+Q@D _
H@A+C+Q@D | A+C+D+E /
Matrix H .5
Matrix Q .25
Start 3.5 All
Options Multiple RSidual
End
G6: dummy group
Data Ninput=2
CTable 2 2
0 0
0 0
! It's full of zeros
so it contributes zero to the function
Matrices = Group 1
H Full 1 1
Q Full 1 1
Thresholds V /
Covariances A+C+D+E
| H@A+C+Q@D _
H@A+C+Q@D | A+C+D+E /
Matrix H .5
Matrix Q .25
Option RSiduals
End
G7: Calculate ascertainment
correction
Data Ninput=0
Matrices
I Iden 1 1
J Izero 1 2
P Full 2 2 = %P6
T Full 1 1
K Unit 2 1
Compute T*\ln(I-J*P*K)
/
Matrix T 150
! twice the sample size of group 5
Options User-defined
RSiduals Multiple
End
G8: standardization
data calc
matrices= Group 1
compute (A|C|D|E) @
(A+C+D+E)~/
options rs nd=3
options multiple
end
G9: Constraint group
Data Constraint NInput=1
matrices= Group 1
U Full 1 1
Begin Algebra;
S= (X|Y|W|Z);
End Algebra;
Matrix U 18.0
Constraint U - S*S'/
end
The program gives the following estimates for (A, D, E).
EXPECTED MATRIX
of this CALCULATION group
1 2
3 4
1
0.295 0.000 0.166
0.539
The programs allowing for adjustment
of ascertainment are given for ADE,
ACE, AE, DE, CE, and E models, respectively.
Models with means and variances
as free parameters are also given for ADE,
ACE, AE, DE, CE and E only.
Chapter 5 does not provide an example for this. Bonney's regressive models and Haseman-Elston sib pair analysis have been sucessfully implemented in the package SAGE (Statistical Analysis for Genetic Epidemiology). It is composed of series of programs and famous for its capability to include covariates in the analysis. Here we introduce programs FSP, REGC and SIBPAL. FSP is a program for family structure processing. REGC is a program for segregation analysis of continuous trait using regressive models. SIBPAL is a program for Haseman-Elston sib pair analysis.
FSP builds a data structure
for each pedigree, checks pedigree data for correctness of coding,
prepares data for use in SIBPAL and REGC. As of SAGE 2.2, the
typical input and output files for FSP, REGC and SIBPAL are listed
as follows.
Here we set up examples from sage. You can examine these sample files, they are in strict Fortran format. Now you can type name of the program (eg fsp) at DOS prompt, you can issue exit command to return to Windows. We will not go into more details of these programs. Since Fortran refuses to open an existing file for output, you need delete the previous results in order to run the program again. We have a very naive DOS program clean.bat to do this.
Any suggestions and problems
brought to our attention is appreciated.
Pak
Chung Sham and Jing
Hua Zhao 16/2/1998