R Programming for Mass Spectrometry
The supplement
Web: https://books.wiley.com/titles/9781119872351/
Data download instructions
The supplement is a zip file with .html for each chapter but it is more useful to convert to markdown, and it is handy to run from a Linux terminal than otherwise nevertheless the markdown format renders can be rendered as follows,
for f in data-analysis intro-ms wrangle-data eda spectra-analysis chrom machine-learning
do
export f=${f}
pandoc ${f}.html --lua-filter=html.lua -t markdown -o ${f}.Rmd
sed -i 's/``` {.r/```{r/' ${f}.Rmd
Rscript -e 'f=Sys.getenv("f");rmarkdown::render(paste0(f,".Rmd"),output_dir="output"))'
done
with html.lua. For instance, in order to run through the R code,
- data-analysis.Rmd requires caution over Bash code blocks Rscript hello.R and R CMD BATCH hello.R, and a not-R code block.
- intro-ms.Rmd needs c("tidyverse").
- wrangle-data.Rmd requires c("tidyverse") and "tandem_result/" created by X!Tandem (tandem.sh, input.xml, default_input.xml, taxonomy.xml) shown here following ftp://ftp.thegpm.org/projects/tandem/source/.
- eda.Rmd requires c("Spectra").
- spectra-analysis.Rmd needs c("tidyverse", "Spectra", "infer", "xml2", "mzID", "MSnbase") as with
inten_label
andpal
. - chrom.Rmd needs c("tidyverse", "baseline", "signal", "EnvStats", "MassSpecWavelet", "MSnbase", "xcms", "latex2exp", "ggpubr", "fda.usc") as with
inten_label
andpal
. - machine-learning.Rmd requires c("tidymodels", "tidyverse", "visdat", "ggfortify", "factoextra", "colino", "heatmaply", "Spectra").
Set options(lifecycle_verbosity = "quiet")
to use progress_estimated()
in wrangle-data.Rmd, but a switch has been suggested
library(progress)
n <- 100
pb <- progress_bar$new(
format = " processing [:bar] :percent eta: :eta",
total = n, clear = FALSE, width = 60
)
for (i in seq_len(n)) {
pb$tick()
Sys.sleep(0.1)
}
inten_label
and pal
are from intro-ms.Rmd and data-analysis.Rmd, respectively. Batch load of packages can be done, e.g., pkgs <- c("tidyverse", "Spectra", "infer", "mzID", "MSnbase"); lapply(pkgs,library,character.only = TRUE).
large-data/mona/ (Chapter 7)
MoNA-export-LC-MS-MS_Positive_Mode.msp
MTBLS4938 (Chapter 7)
large-data/MSV000081318/MSV000086195
We start with wget
wget -r -nH --cut-dirs=2 -R "index.html*" ftp://massive-ftp.ucsd.edu/v01/MSV000081318/
wget -r -nH --cut-dirs=1 -R "index.html*" ftp://massive-ftp.ucsd.edu/v03/MSV000086195/
Directory listing including file transfer can also be done with
ftp massive-ftp.ucsd.edu <<EOF
anonymous
ls
cd z01/MSV000086195/ccms_peak/
prompt
mget *
EOF
where anonymous
is the user name, or preferably by lftp,
lftp massive-ftp.ucsd.edu <<EOF
mirror --parallel=10 --verbose /v03/MSV000086195 ./MSV000086195
bye
EOF
# to resume
lftp -e "mirror --continue --parallel=4 /z01/MSV000086195/ccms_peak/ ccms_peak/; quit" \
ftp://massive-ftp.ucsd.edu
ScltlMsclsMAvsCntr_Batch1_BRPhsFr5_prof.mzML
in Chapters 4 & 5 is made with MSConvert (6GB!) or ThermoRawFileParser/1.4.4 (6.2GB with -p but 750MB without) as in mzML.sh following exercises in the Caprion project.
schema/ (Chapter 3):
Reference
Julian RK (2025). R Programming for Mass Spectrometry: Effective and Reproducible Data Analysis. ISBN: 978-1-119-87235-1.