1 Introduction

This package was initiated to integrate some C/Fortran/SAS programs I have written or used over the years. As such, it would rather be a long-term project, but an immediate benefit is something complementary to other packages currently available from CRAN, e.g. genetics, hwde, etc. I hope eventually this will be part of a bigger effort to fulfil most of the requirements foreseen by many1, within the portable environment of R for data management, analysis, graphics and object-oriented programming. My view has been outlined more formally2,3 in relation to other package systems and also on package kinship4,5. I feel the enormous advantage by shifting to R and would like to my work with others as it is available, which I will not claim this work as exclusively done by myself, but would like to invite others to join me and enlarge the collections and improve them.

With recent work on genomewide association studies (GWASs) especially protein GWASs, I have added many functions such as METAL_forestplot which handles data from software METAL6 and qtlFinder which extracts sentinels from GWAS summary statistics in a way that is very appealing to us compared to some other established software. Meanwhile, the size of the package surpasses the limit as imposed by CRAN, thus the old good feature of S as with R that value both code and data alike has to suffer slightly in that gap.datasets and gap.examples are spun off as two separate packages; they do deserve a glimpse however for some general ideas. A separate initiative has been made in the pQTLtools package.

Notable recent technical updates include:

  1. documentaion from R code via roxygen2 and devtools to allow for easier generation of the .Rd files and NAMESPACE.

  2. vignettes with noweb/Sweave as well as rmarkdown and bookdown that allows for numbered sectioning, multiple figures generated from a code chunk, and Citation Style Language (CSL) (https://citationstyles.org/). Rdpack is now employed for BibTeX citations in the .Rd files.

  3. an experimental shiny App (runshinygap()).

These experiences can be useful for others in their package development. I found it useful to use specific functions without loading the whole package, i.e., library(gap), e.g.,invnormal <- gap::invnormal, log10p <- gap::log10p.

2 Implementation

The currently available functions are shown below,

Name Description
ANALYSIS  
AE3 AE model using nuclear family trios
bt Bradley-Terry model for contingency table
ccsize Power and sample size for case-cohort design
cs Credibel set
fbsize Sample size for family-based linkage and association design
gc.em Gene counting for haplotype analysis
gcontrol genomic control
gcontrol2 genomic control based on p values
gcp Permutation tests using GENECOUNTING
gc.lambda Estionmation of the genomic control inflation statistic (lambda)
genecounting Gene counting for haplotype analysis
gif Kinship coefficient and genetic index of familiality
MCMCgrm Mixed modeling with genetic relationship matrices
hap Haplotype reconstruction
hap.em Gene counting for haplotype analysis
hap.score Score statistics for association of traits with haplotypes
htr Haplotype trend regression
hwe Hardy-Weinberg equilibrium test for a multiallelic marker
hwe.cc A likelihood ratio test of population Hardy-Weinberg equilibrium
hwe.hardy Hardy-Weinberg equilibrium test using MCMC
invnormal Inverse normal transformation
kin.morgan kinship matrix for simple pedigree
LD22 LD statistics for two diallelic markers
LDkl LD statistics for two multiallelic markers
lambda1000 A standardized estimate of the genomic inflation scaling to
  a study of 1,000 cases and 1,000 controls
log10p log10(p) for a standard normal deviate
log10pvalue log10(p) for a P value including its scientific format
logp log(p) for a normal deviate
masize Sample size calculation for mediation analysis
mia multiple imputation analysis for hap
mr Mendelian randomization analysis
mtdt Transmission/disequilibrium test of a multiallelic marker
mtdt2 Transmission/disequilibrium test of a multiallelic marker
  by Bradley-Terry model
mvmeta Multivariate meta-analysis based on generalized least squares
pbsize Power for population-based association design
pbsize2 Power for case-control association design
pfc Probability of familial clustering of disease
pfc.sim Probability of familial clustering of disease
pgc Preparing weight for GENECOUNTING
print.hap.score Print a hap.score object
s2k Statistics for 2 by K table
sentinels Sentinel identification from GWAS summary statistics
tscc Power calculation for two-stage case-control design
   
GRAPHICS  
asplot Regional association plot
ESplot Effect-size plot
circos.cnvplot circos plot of CNVs
circos.cis.vs.trans.plot circos plot of cis/trans classification
circos.mhtplot circos Manhattan plot with gene annotation
circos.mhtplot2 Another circos Manhattan plot
cnvplot genomewide plot of CNVs
labelManhattan Annotate Manhattan or Miami Plot
METAL_forestplot forest plot as R/meta’s forest for METAL outputs
makeRLEplot make relative log expression plot
mhtplot Manhattan plot
mhtplot2 Manhattan plot with annotations
mhtplot.trunc truncated Manhattan plot
miamiplot Miami plot
miamiplot2 Miami plot
mr_forestplot Mendelian Randomization forest plot
pedtodot Converting pedigree(s) to dot file(s)
pedtodot_verbatim Pedigree-drawing with graphviz
plot.hap.score Plot haplotype frequencies versus haplotype score statistics
qqfun Quantile-comparison plots
qqunif Q-Q plot for uniformly distributed random variable
qtl2dplot 2D QTL plot
qtl2dplotly 2D QTL plotly
qtl3dplotly 3D QTL plotly
   
UTILITIES  
SNP Functions for single nucleotide polymorphisms (SNPs)
BFDP Bayesian false-discovery probability
FPRP False-positive report probability
ab Test/Power calculation for mediating effect
b2r Obtain correlation coefficients and their variance-covariances
chow.test Chow’s test for heterogeneity in two regressions
chr_pos_a1_a2 Form SNPID from chromosome, posistion and alleles
cis.vs.trans.classification a cis/trans classifier
ci2ms Effect size and standard error from confidence interval
comp.score score statistics for testing genetic linkage of quantitative trait
GRM functions ReadGRM, ReadGRMBin, ReadGRMPLINK,
  ReadGRMPCA, WriteGRM,
  WriteGRMBin, WriteGRMSAS
  handle genomic relationship matrix involving other software
get_b_se Get b and se from AF, n, and z
get_pve_se Get pve and its standard error from n, z
get_sdy Get sd(y) from AF, n, b, se
h2G A utility function for heritability
h2GE A utility function for heritability involving gene-environment interaction
h2l A utility function for converting observed heritability to its counterpart
  under liability threshold model
h2_mzdz Heritability estimation according to twin correlations
klem Haplotype frequency estimation based on a genotype table
  of two multiallelic markers
makeped A function to prepare pedigrees in post-MAKEPED format
metap Meta-analysis of p values
metareg Fixed and random effects model for meta-analysis
muvar Means and variances under 1- and 2- locus (diallelic) QTL model
pvalue P value for a normal deviate
qtlClassifier A QTL cis/trans classifier
qtlFinder Distance-based signal identification
read.ms.output A utility function to read ms output
revStrand Allele on the reverse strand
runshinygap Start shinygap
snptest_sample A utility to generate SNPTEST sample file
whscore Whittemore-Halpern scores for allele-sharing
weighted.median Weighted median with interpolation

After installation, you will be able to obtain the list by typing library(help=gap) in alphabetical order, or ?gap::gap ordered by category, or view it within a web browser via help.start(). A full list of functions is provided in the Appendix. This file can be viewed with command vignette("gap", package="gap"). You can cut and paste examples at end of each function’s documentation.

Both genecounting and hap are able to handle SNPs and multiallelic markers, with the former be flexible enough to include features such as X-linked data and the later being able to handle large number of SNPs. But they are unable to recode allele labels automatically, so functions gc.em and hap.em are in haplo.em format and used by a modified function hap.score in association testing.

It is notable that multilocus data are handled differently from that in hwde and elegant definitions of basic genetic data can be found in the genetics package. Incidentally, I found my C mixed-radixed sorting routine7 is much faster than R’s internal function.

With exceptions such as function pfc which is very computer-intensive, most functions in the package can easily be adapted for analysis of large datasets involving either SNPs or multiallelic markers. Some are utility functions, e.g. muvar and whscore, which will be part of the other analysis routines in the future.

The benefit with R compared to standalone programs is that for users, all functions have unified format. For developers, it is able to incorporate their C/C++ programs more easily and avoid repetitive work such as preparing own routines for matrix algebra and linear models. Further advantage can be taken from packages in Bioconductor, which are designed and written to deal with large number of genes.

3 Independent programs

To facilitate comparisons and individual preferences, the source codes for EHPLUS8, 2LD, GENECOUNTING, HAP9, now hosted at GitHub, have enjoyed great popularity ahead of GWASs therefore are likely to be more familiar than their R counterparts in gap but you need to follow their instructions to compile for a particular computer system.

I have kept the original pedtodot.sh by David Duffy which enables contrast with pedtodot_verbatim() and pedtodot() reported as application notes. I have also included ms code10 to couple with read.ms.output.

A final note is concerned about twinan90, which is now dropped from the package function list due to difficulty to keep up with the requirements by the R environment nevertheless you will still be able to compile and use otherwise from gap.examples.

4 Demos

This has been a template for adding self-contained examples:

library(gap)
demo(gap)

See examples of haplotype analysis there.

5 Pedigrees and kinship

5.1 Pedigree drawing

I have included the original file for the R News as well as put examples in separate vignettes. They can be accessed via vignette("rnews",package="gap.examples") and vignette("pedtodot", package="gap.examples"), respectively.

We also demonstrate through pedigree 10081 example5 with pedtodot_verbatim.

library(gap)
#> Loading required package: gap.datasets
#> gap version 1.7
knitr::kable(p3,caption="An example pedigree")
Table 5.1: An example pedigree
pid id fid mid sex aff GABRB1 D4S1645
10081 1 2 3 2 2 7/7 7/10
10081 2 0 0 1 1 -/- -/-
10081 3 0 0 2 2 7/9 3/10
10081 4 2 3 2 2 7/9 3/7
10081 5 2 3 2 1 7/7 7/10
10081 6 2 3 1 1 7/7 7/10
10081 7 2 3 2 1 7/7 7/10
10081 8 0 0 1 1 -/- -/-
10081 9 8 4 1 1 7/9 3/10
10081 10 0 0 2 1 -/- -/-
10081 11 2 10 2 1 7/7 7/7
10081 12 2 10 2 2 6/7 7/7
10081 13 0 0 1 1 -/- -/-
10081 14 13 11 1 1 7/8 7/8
10081 15 0 0 1 1 -/- -/-
10081 16 15 12 2 1 6/6 7/7
library(DOT)
# one can see the diagram in RStudio
pedtodot_verbatim(p3,run=TRUE,toDOT=TRUE,return="verbatim")
#> Pedigree  10081 
#> running DOT::dot on 10081
library(DiagrammeR)
pedtodot_verbatim(p3)
#> Pedigree  10081
grViz(readr::read_file("10081.dot"))

Figure 5.1: An example pedigree

5.2 Kinship calculation

Next, I will provide an example for kinship calculation using kin.morgan. It is recommended that individuals in a pedigree are ordered so that parents always precede their children. In this regard, package pedigree can be used, and package kinship2 can be used to produce pedigree diagram as with kinship matrix.

The pedigree diagram is as follows,

# pedigree diagram
data(lukas, package="gap.datasets")
library(kinship2)
#> Loading required package: Matrix
#> Loading required package: quadprog
ped <- with(lukas,pedigree(id,father,mother,sex))
plot(ped,cex=0.4)
A pedigree diagram

Figure 5.2: A pedigree diagram

We then turn to the kinship calculation.

# unordered individuals
gk1 <- kin.morgan(lukas)
write.table(gk1$kin.matrix,"gap_1.txt",quote=FALSE)

library(kinship2)
kk1 <- kinship(lukas[,1],lukas[,2],lukas[,3])
write.table(kk1,"kinship_1.txt",quote=FALSE)

d <- gk1$kin.matrix-kk1
sum(abs(d))
#> [1] 2.443634

# order individuals so that parents precede their children
library(pedigree)
op <- orderPed(lukas)
olukas <- lukas[order(op),]
gk2 <- kin.morgan(olukas)

write.table(olukas,"olukas.csv",quote=FALSE)
write.table(gk2$kin.matrix,"gap_2.txt",quote=FALSE)

kk2 <- kinship(olukas[,1],olukas[,2],olukas[,3])
write.table(kk2,"kinship_2.txt",quote=FALSE)

z <- gk2$kin.matrix-kk2
sum(abs(z))
#> [1] 0

We see that in the second case, the result agrees with kinship2.

6 Study designs

I would like to highlight fbsize, pbsize and ccsize functions used for power/sample calculations in a genome-wide asssociatoin study as reported11,12,13.

It now has an experimental work via Shiny from inst/shinygap.

6.1 Family-based design

The example is as follows,

options(width=150)
library(gap)
models <- matrix(c(
         4.0, 0.01,
         4.0, 0.10,
         4.0, 0.50, 
         4.0, 0.80,
         2.0, 0.01,
         2.0, 0.10,
         2.0, 0.50,
         2.0, 0.80,
         1.5, 0.01,    
         1.5, 0.10,
         1.5, 0.50,
         1.5, 0.80), ncol=2, byrow=TRUE)
outfile <- "fbsize.txt"
cat("gamma","p","Y","N_asp","P_A","H1","N_tdt","H2","N_asp/tdt",
    "L_o","L_s\n",file=outfile,sep="\t")
for(i in 1:12) {
    g <- models[i,1]
    p <- models[i,2]
    z <- fbsize(g,p)
    cat(z$gamma,z$p,z$y,z$n1,z$pA,z$h1,z$n2,z$h2,z$n3,
        z$lambdao,z$lambdas,file=outfile,append=TRUE,sep="\t")
    cat("\n",file=outfile,append=TRUE)
}
table1 <- read.table(outfile,header=TRUE,sep="\t")
nc <- c(4,7,9)
table1[,nc] <- ceiling(table1[,nc])
dc <- c(3,5,6,8,10,11)
table1[,dc] <- round(table1[,dc],2)
unlink(outfile)
knitr::kable(table1,caption="Power/Sample size of family-based designs")
Table 6.1: Power/Sample size of family-based designs
gamma p Y N_asp P_A H1 N_tdt H2 N_asp.tdt L_o L_s
4.0 0.01 0.52 6402 0.80 0.05 1201 0.11 257 1.08 1.09
4.0 0.10 0.60 277 0.80 0.35 165 0.54 53 1.48 1.54
4.0 0.50 0.58 446 0.80 0.50 113 0.42 67 1.36 1.39
4.0 0.80 0.53 3024 0.80 0.24 244 0.16 177 1.12 1.13
2.0 0.01 0.50 445964 0.67 0.03 6371 0.04 2155 1.01 1.01
2.0 0.10 0.52 8087 0.67 0.25 761 0.32 290 1.07 1.08
2.0 0.50 0.53 3753 0.67 0.50 373 0.47 197 1.11 1.11
2.0 0.80 0.51 17909 0.67 0.27 701 0.22 431 1.05 1.05
1.5 0.01 0.50 6944779 0.60 0.02 21138 0.03 8508 1.00 1.00
1.5 0.10 0.51 101926 0.60 0.21 2427 0.25 1030 1.02 1.02
1.5 0.50 0.51 27048 0.60 0.50 1039 0.49 530 1.04 1.04
1.5 0.80 0.51 101926 0.60 0.29 1820 0.25 1030 1.02 1.02

As for APOE4 and Alzheimer’s14

g <- 4.5
p <- 0.15
alz <- data.frame(fbsize(g,p))
knitr::kable(alz,caption="Power/Sample size of study on Alzheimer's disease")
Table 6.2: Power/Sample size of study on Alzheimer’s disease
gamma p y n1 pA h1 n2 h2 n3 lambdao lambdas
4.5 0.15 0.6256916 162.6246 0.8181818 0.4598361 108.994 0.6207625 39.97688 1.671594 1.784353

6.2 Population-based design

The example is as follows,

library(gap)
kp <- c(0.01,0.05,0.10,0.2)
models <- matrix(c(
          4.0, 0.01,
          4.0, 0.10,
          4.0, 0.50, 
          4.0, 0.80,
          2.0, 0.01,
          2.0, 0.10,
          2.0, 0.50,
          2.0, 0.80,
          1.5, 0.01,    
          1.5, 0.10,
          1.5, 0.50,
          1.5, 0.80), ncol=2, byrow=TRUE)
outfile <- "pbsize.txt"
cat("gamma","p","p1","p5","p10","p20\n",sep="\t",file=outfile)
for(i in 1:dim(models)[1])
{
   g <- models[i,1]
   p <- models[i,2]
   n <- vector()
   for(k in kp) n <- c(n,ceiling(pbsize(k,g,p)))
   cat(models[i,1:2],n,sep="\t",file=outfile,append=TRUE)
   cat("\n",file=outfile,append=TRUE)
} 
table5 <- read.table(outfile,header=TRUE,sep="\t")
knitr::kable(table5,caption="Sample size of population-based design")
Table 6.3: Sample size of population-based design
gamma p p1 p5 p10 p20
4.0 0.01 46681 8959 4244 1887
4.0 0.10 8180 1570 744 331
4.0 0.50 10891 2091 991 441
4.0 0.80 31473 6041 2862 1272
2.0 0.01 403970 77530 36725 16323
2.0 0.10 52709 10116 4792 2130
2.0 0.50 35285 6772 3208 1426
2.0 0.80 79391 15237 7218 3208
1.5 0.01 1599920 307056 145448 64644
1.5 0.10 192105 36869 17465 7762
1.5 0.50 98013 18811 8911 3961
1.5 0.80 192105 36869 17465 7762

6.3 Case-cohort design

We obtain results for ARIC and EPIC studies.

library(gap)
# ARIC study
outfile <- "aric.txt"
n <- 15792
pD <- 0.03
p1 <- 0.25
alpha <- 0.05
theta <- c(1.35,1.40,1.45)
beta <- 0.2
s_nb <- c(1463,722,468)
cat("n","pD","p1","hr","q","power","ssize\n",file=outfile,sep="\t")
for(i in 1:3)
{
  q <- s_nb[i]/n
  power <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta,TRUE)
  ssize <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta,FALSE)
  cat(n,"\t",pD,"\t",p1,"\t",theta[i],"\t",q,"\t",
      signif(power,3),"\t",ssize,"\n",
      file=outfile,append=TRUE)
}
read.table(outfile,header=TRUE,sep="\t")
#>       n   pD   p1   hr          q power ssize
#> 1 15792 0.03 0.25 1.35 0.09264184   0.8  1463
#> 2 15792 0.03 0.25 1.40 0.04571935   0.8   722
#> 3 15792 0.03 0.25 1.45 0.02963526   0.8   468
unlink(outfile)
# EPIC study
outfile <- "epic.txt"
n <- 25000
alpha <- 0.00000005
beta <- 0.2
s_pD <- c(0.3,0.2,0.1,0.05)
s_p1 <- seq(0.1,0.5,by=0.1)
s_hr <- seq(1.1,1.4,by=0.1)
cat("n","pD","p1","hr","alpha","ssize\n",file=outfile,sep="\t")
# direct calculation
for(pD in s_pD)
{
   for(p1 in s_p1)
   {
      for(hr in s_hr)
      {
         ssize <- ccsize(n,q,pD,p1,log(hr),alpha,beta,FALSE)
         if (ssize>0) cat(n,"\t",pD,"\t",p1,"\t",hr,"\t",alpha,"\t",
                          ssize,"\n",
                          file=outfile,append=TRUE)
      }
   }
}
knitr::kable(read.table(outfile,header=TRUE,sep="\t"),caption="Sample size of case-cohort designs")
Table 6.4: Sample size of case-cohort designs
n pD p1 hr alpha ssize
25000 0.3 0.1 1.3 0 14391
25000 0.3 0.1 1.4 0 5732
25000 0.3 0.2 1.2 0 21529
25000 0.3 0.2 1.3 0 5099
25000 0.3 0.2 1.4 0 2613
25000 0.3 0.3 1.2 0 11095
25000 0.3 0.3 1.3 0 3490
25000 0.3 0.3 1.4 0 1882
25000 0.3 0.4 1.2 0 8596
25000 0.3 0.4 1.3 0 2934
25000 0.3 0.4 1.4 0 1611
25000 0.3 0.5 1.2 0 7995
25000 0.3 0.5 1.3 0 2786
25000 0.3 0.5 1.4 0 1538
25000 0.2 0.1 1.4 0 9277
25000 0.2 0.2 1.3 0 7725
25000 0.2 0.2 1.4 0 3164
25000 0.2 0.3 1.3 0 4548
25000 0.2 0.3 1.4 0 2152
25000 0.2 0.4 1.2 0 20131
25000 0.2 0.4 1.3 0 3648
25000 0.2 0.4 1.4 0 1805
25000 0.2 0.5 1.2 0 17120
25000 0.2 0.5 1.3 0 3422
25000 0.2 0.5 1.4 0 1713
25000 0.1 0.2 1.4 0 8615
25000 0.1 0.3 1.4 0 3776
25000 0.1 0.4 1.3 0 13479
25000 0.1 0.4 1.4 0 2824
25000 0.1 0.5 1.3 0 10837
25000 0.1 0.5 1.4 0 2606
unlink(outfile)

7 Graphics

Some figures from the documentation may be of interest.

7.1 Genome-wide association

The following code is used to obtain a Q-Q plot via qqunif function,

library(gap)
u_obs <- runif(1000)
r <- qqunif(u_obs,pch=21,bg="blue",bty="n")
u_exp <- r$y
hits <- u_exp >= 2.30103
points(r$x[hits],u_exp[hits],pch=21,bg="green")
legend("topleft",sprintf("GC.lambda=%.4f",gc.lambda(u_obs)))
A Q-Q plot

Figure 7.1: A Q-Q plot

Based on a chicken genome scan data, the code below generates a Manhattan plot, demonstrating the use of gaps to separate chromosomes.

ord <- with(w4,order(chr,pos))
w4 <- w4[ord,]
oldpar <- par()
par(cex=0.6)
colors <- c(rep(c("blue","red"),15),"red")
mhtplot(w4,control=mht.control(colors=colors,gap=1000),pch=19,srt=0)
axis(2,cex.axis=2)
suggestiveline <- -log10(3.60036E-05)
genomewideline <- -log10(1.8E-06)
abline(h=suggestiveline, col="blue")
abline(h=genomewideline, col="green")
abline(h=0)
A genome-wide association study on chickens

Figure 7.2: A genome-wide association study on chickens

The code below obtains a Manhattan plot with gene annotation15,

data <- with(mhtdata,cbind(chr,pos,p))
glist <- c("IRS1","SPRY2","FTO","GRIK3","SNED1","HTR1A","MARCH3","WISP3",
           "PPP1R3B","RP1L1","FDFT1","SLC39A14","GFRA1","MC4R")
hdata <- subset(mhtdata,gene%in%glist)[c("chr","pos","p","gene")]
color <- rep(c("lightgray","gray"),11)
glen <- length(glist)
hcolor <- rep("red",glen)  
par(las=2, xpd=TRUE, cex.axis=1.8, cex=0.4)
ops <- mht.control(colors=color,yline=1.5,xline=3)
hops <- hmht.control(data=hdata,colors=hcolor)
mhtplot(data,ops,hops,pch=19)
axis(2,pos=2,at=1:16,cex.axis=0.5)
A Manhattan plot with gene annotation

Figure 7.3: A Manhattan plot with gene annotation

All these look familiar, so revised form of the function called mhtplot2 was created for additional features such as centering the chromosome ticks, allowing for more sophisticated coloring schemes, using prespecified fonts, etc. Please refer to the function’s documentation example.

We could also go further with a circos Manhattan plot,

circos.mhtplot(mhtdata, glist)
#> Note: 11 points are out of plotting region in sector 'chr16', track '3'.
A circos Manhattan plot

Figure 7.4: A circos Manhattan plot

and a version with y-axis,

require(gap.datasets)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
testdat <- mhtdata[c("chr","pos","p","gene","start","end")] %>%
           rename(log10p=p) %>%
           mutate(chr=paste0("chr",chr),log10p=-log10(log10p))
dat <- mutate(testdat,start=pos,end=pos) %>%
       select(chr,start,end,log10p)
labs <- subset(testdat,gene %in% glist) %>%
        group_by(gene,chr,start,end) %>%
        summarize() %>%
        mutate(cols="blue") %>%
        select(chr,start,end,gene,cols)
#> `summarise()` has grouped output by 'gene', 'chr', 'start'. You can override using the `.groups` argument.
labs[2,"cols"] <- "red"
ticks <- 0:2*5
circos.mhtplot2(dat,labs,ticks=ticks,ymax=max(ticks))
#> Note: 43 points are out of plotting region in sector 'chr16', track '5'.
Another circos Manhattan plot

Figure 7.5: Another circos Manhattan plot

As a side note, the data is used by manhattanly.

#{r mhttest, fig.cap="Manhattanly plot", fig.height=8, fig.width=8, plotly=TRUE}
library(manhattanly)
mhttest <- manhattanly(mhtdata, chr = "chr", bp = "pos",
                       snp = "rsn", annotation1 = "gene", suggestiveline = TRUE,
                       annotation2 = "rsn", p = "p")
mhttest
htmlwidgets::saveWidget(mhttest,"mhttest.html")

We now experiment with Miami plot,

mhtdata <- within(mhtdata,{pr=p})
miamiplot(mhtdata,chr="chr",bp="pos",p="p",pr="pr",snp="rsn")
Miami plots

Figure 7.6: Miami plots

# An alternative implementation
gwas <- select(mhtdata,chr,pos,p) %>%
        mutate(z=qnorm(p/2))
chrmaxpos <- miamiplot2(gwas,gwas,name1="Batch 2",name2="Batch 1",z1="z",z2="z")
#> Warning in max(c(dat1$pos[dat1$chr == i], dat2$pos[dat2$chr == i]), na.rm = TRUE): no non-missing arguments to max; returning -Inf
labelManhattan(chr=c(2,16),pos=c(226814165,52373776),name=c("AnonymousGene","FTO"),gwas,gwasZLab="z",chrmaxpos=chrmaxpos)
Miami plots

Figure 7.7: Miami plots

and a truncated Manhattan plot, noting that only data points with -log10(P)>=2 are shown,

par(oma=c(0,0,0,0), mar=c(5,6.5,1,1))
mhtplot.trunc(filter(IL.12B,log10P>=2), chr="Chromosome", bp="Position", z="Z",
   snp="MarkerName",
   suggestiveline=FALSE, genomewideline=-log10(5e-10),
   cex.mtext=1.2, cex.text=1.2,
   annotatelog10P=-log10(5e-10), annotateTop = FALSE,
   highlight=with(genes,gene),
   mtext.line=3, y.brk1=115, y.brk2=300, trunc.yaxis=TRUE, delta=0.01,
   cex.axis=1.5, cex=0.8, font=3, font.axis=1.5,
   y.ax.space=20,
   col = c("blue4", "skyblue")
)
Association of IL-12B

Figure 7.8: Association of IL-12B

The code below obtains a regional association plot with the asplot function,

asplot(CDKNlocus, CDKNmap, CDKNgenes, best.pval=5.4e-8, sf=c(3,6))
#> - CDKN2A 
#> - CDKN2B
title("CDKN2A/CDKN2B Region")
A regional association plot

Figure 7.9: A regional association plot

The function predates the currently popular locuszoom software but leaves the option open for generating such plots on the fly and locally.

Note that all these can serve as templates to customize features of your own.

7.2 Copy number variation

A plot of copy number variation (CNV) is shown here,

cnvplot(gap.datasets::cnv)
A CNV plot

Figure 7.10: A CNV plot

7.3 Effect size plot

The code below obtains an effect size plot via the ESplot function.

library(gap)
rs12075 <- data.frame(id=c("CCL2","CCL7","CCL8","CCL11","CCL13","CXCL6","Monocytes"),
                      b=c(0.1694,-0.0899,-0.0973,0.0749,0.189,0.0816,0.0338387),
                      se=c(0.0113,0.013,0.0116,0.0114,0.0114,0.0115,0.00713386))
ESplot(rs12075)
rs12075 and traits

Figure 7.11: rs12075 and traits

7.4 Forest plot

It draws many forest plots given a list of variants, e.g.,

data(OPG,package="gap.datasets")
meta::settings.meta(method.tau="DL")
METAL_forestplot(OPGtbl,OPGall,OPGrsid,width=6.75,height=5,digits.TE=2,digits.se=2,
                 col.diamond="black",col.inside="black",col.square="black")
#> Joining with `by = join_by(MarkerName)`
#> Joining with `by = join_by(MarkerName)`