CAD/MI

The summary statistics is from http://www.cardiogramplusc4d.org/, notably * CARDIoGRAM GWA meta-analysis

Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011 43: 333-338

Nikpey M, Goel A, Won H, Hall LM, Willenborg C, Kanoni S, Saleheen S, et al. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat Genet 2015 47:1121-1130

Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 2017 Jul 17 49(9): 1385-1391. doi: 10.1038/ng.3913

For the CARDIoGRAMplusC4D 1000 Genomes-based GWAS, the setup is

wget http://www.cardiogramplusc4d.org/media/cardiogramplusc4d-consortium/data-downloads/cad.additive.Oct2015.pub.zip
unzip cad.additive.Oct2015.pub.zip

giving `cad.add.160614.website.txt.

MR

We examine MMP-12 and CAD as in the Nature paper, the input data is generated with MMP12.sh.

THe analogous coding for TwoSampleMR as with MendelianRandomization in software-notes is contained in MMP12.R.

We could also follow Generalised Summary-data-based Mendelian Randomisation (GSMR).

# 16-7-2019 JHZ

awk '
{
   if (NR==1) print "SNP", "A1", "A2", "freq", "b", "se", "p", "N";
   else {
     CHR=$2
     POS=$3
     a1=$4
     a2=$5
     if (a1>a2) snpid="chr" CHR ":" POS "_" a2 "_" a1;
     else snpid="chr" CHR ":" POS "_" a1 "_" a2
     $1=snpid
     print snpid, a1, a2, $6, $9, $10, $11, 185000
   }
}' CAD/cad.add.160614.website.txt > gsmr_outcome.txt

# awk 'NR==1' CAD/cad.add.160614.website.txt | sed 's|\t|\n|g' | awk '{print "# " NR, $1}'
# 1 markername
# 2 chr
# 3 bp_hg19
# 4 effect_allele
# 5 noneffect_allele
# 6 effect_allele_freq
# 7 median_info
# 8 model
# 9 beta
# 10 se_dgc
# 11 p_dgc
# 12 het_pvalue
# 13 n_studies

xport INF=/rds/project/jmmh2/rds-jmmh2-projects/olink_proteomics/scallop/INF
echo $INF/INTERVAL/INTERVAL > gsmr_ref_data
echo IL.6 $INF/work/IL.6.ma > gsmr_exposure
echo CAD gsmr_outcome.txt > gsmr_outcome

gcta-1.9 --mbfile gsmr_ref_data --gsmr-file gsmr_exposure gsmr_outcome --gsmr-direction 0 --effect-plot --out gsmr_result

R --no-save -q <<END
  source("http://cnsgenomics.com/software/gcta/static/gsmr_plot.r")
  gsmr_data <- read_gsmr_data("gsmr_result.eff_plot.gz")
  gsmr_summary(gsmr_data)
  plot_gsmr_effect(gsmr_data, "IL.6", "cad", colors()[75])
END

where we use IL.6 as exposure data; the effect-size plots are generated.

Note that gcta-1.9 indcates GCTA 1.9.x is necessary since the --mfile option is not recognised by GCTA 1.26.0.

Pathway analysis

MAGMA is illustrated here,

The GWAS summary data can either be formatted with R,

R --no-save <<END
d <- read.delim("`cad.add.160614.website.txt",as.is=TRUE)
db <- "CAD"
write.table(d[c("SNP","CHR","BP")],file=paste0(db,".snploc"),quote=FALSE,row.name=FALSE,col.names=FALSE,sep="\t")
write.table(d[c("SNP","P","NOBS")],file=paste0(db,".pval"),quote=FALSE,row.name=FALSE,sep="\t")
END

or more efficiently with bash before pathway analysis

awk -vOFS="\t" '{print $2,$1,$4}' g1000_eur.bim > g1000_eur.snploc
awk -vOFS="\t" '{if(NR==1) print "SNP", "P", "NOBS"; else print $1,$11,1000}' `cad.add.160614.website.txt > CAD.pval

# Annotation
magma --annotate window=50,50 --snp-loc g1000_eur.snploc --gene-loc NCBI37.3.gene.loc --out CAD
# Gene analysis - SNP p-values
magma --bfile g1000_eur --pval CAD.pval ncol=NOBS --gene-annot CAD.genes.annot --out CAD
# Pathway analysis
# http://software.broadinstitute.org/gsea/downloads.jsp
magma --gene-results CAD.genes.raw --set-annot msigdb.v6.2.entrez.gmt self-contained --model fwer --out CAD