Model temporal gene expression patterns
Haotian Zhuang
May 15, 2024
Source:vignettes/PreTSA_temporal.Rmd
PreTSA_temporal.Rmd
Introduction
We will demonstrate how to use PreTSA to fit the gene expression along pseudotime values and identify temporally variable genes (TVGs).
The 3K peripheral blood mononuclear cells (PBMC3K) dataset was
downloaded from the 10x
website. The processed PBMC3K dataset was downloaded using the R
package SeuratData
. TSCAN
was used to
construct a pseudotime trajectory from naive CD4 T cells to memory CD4 T
cells with the top 10 principal components (PCs) and the cell clusters
obtained from the dataset. Genes that are expressed in fewer than 5
cells within the trajectory were filtered out. We analyzed a final set
of 10,509 genes on 1,180 cells.
Load packages and datasets
library(PreTSA)
load(system.file("extdata", "pbmc3k", "pbmc3k_cleaned.Rdata", package = "PreTSA"))
str(expr)
#> Loading required package: Matrix
#> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> ..@ i : int [1:953387] 22 52 157 168 190 252 350 430 486 554 ...
#> ..@ p : int [1:1181] 0 238 508 958 1290 1691 2211 2596 2842 3257 ...
#> ..@ Dim : int [1:2] 10509 1180
#> ..@ Dimnames:List of 2
#> .. ..$ : chr [1:10509] "LINC00115" "NOC2L" "HES4" "ISG15" ...
#> .. ..$ : chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...
#> ..@ x : num [1:953387] 2.87 2.87 5.46 2.87 3.54 ...
#> ..@ factors : list()
str(pseudotime)
#> Named int [1:1180] 1 2 3 4 5 6 7 8 9 10 ...
#> - attr(*, "names")= chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...
Fit the gene expression along pseudotime values
fitRes <- temporalFit(expr = expr, pseudotime = pseudotime, knot = F)
str(fitRes)
#> num [1:10509, 1:1180] 0.00238 0.16975 0.00847 0.34266 -0.00666 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:10509] "LINC00115" "NOC2L" "HES4" "ISG15" ...
#> ..$ : chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...
The argument “knot” indicates whether to select the optimal number of knots automatically (FALSE by default). The argument “maxknotallowed” is the user-defined maximum number of knots (10 by default).
temporalFit(expr = expr, pseudotime = pseudotime, knot = T, maxknotallowed = 10)
Identify temporally variable genes (TVGs)
To account for the pseudotime uncertainty, we apply the similar
strategy used in PseudotimeDE
. The argument
“pseudotime_permute” is a list of permuted pseudotime values from
subsampled cells. Each element in the list has the same format of the
argument “pseudotime”.
pseudotime_permute <- readRDS(system.file("extdata", "pbmc3k", "pbmc3k_pseudotime_permute.rds", package = "PreTSA"))
length(pseudotime_permute)
#> [1] 100
str(pseudotime_permute[[1]])
#> Named int [1:944] 1 2 3 4 5 6 7 8 9 10 ...
#> - attr(*, "names")= chr [1:944] "AAATTCGAGGAGTG" "GAGATGCTGAATGA" "TATCTTCTAAACAG" "GCAAGACTACTGGT" ...
It returns a data frame with the p-value, FDR, test statistic and number of knots selected for each gene. Genes are ordered first by p-value (increasing) then by test statistic (decreasing).
testRes <- temporalTest(expr = expr[1:100, ], pseudotime = pseudotime, pseudotime_permute = pseudotime_permute, knot = F)
head(testRes)
#> fdr.parametric pval.parametric fdr.empirical pval.empirical fstat.ori
#> ISG15 4.939088e-19 4.939088e-21 0.07616146 0.00990099 28.32494
#> TNFRSF4 1.787031e-13 3.574063e-15 0.07616146 0.00990099 26.34783
#> ENO1 1.979526e-10 5.938578e-12 0.07616146 0.00990099 18.76528
#> RPL22 4.865585e-09 1.946234e-10 0.07616146 0.00990099 14.92563
#> TNFRSF25 2.798111e-08 1.399055e-09 0.07616146 0.00990099 14.02817
#> RER1 4.449211e-07 2.823722e-08 0.07616146 0.00990099 14.15083
To reduce the computational time, users can ignore the pseudotime uncertainty, which is the default setting.
testResFixed <- temporalTest(expr = expr, pseudotime = pseudotime, pseudotime_permute = NULL, knot = T, maxknotallowed = 10)
head(testResFixed)
#> fdr logpval pval fstat knotnum
#> S100A4 5.688841e-65 -157.1895 5.413304e-69 122.40125 0
#> KIAA0101 5.939572e-60 -144.9403 1.130378e-63 29.69045 10
#> MALAT1 4.759637e-58 -140.1511 1.358732e-61 107.66759 0
#> ANXA1 5.577562e-42 -102.8635 2.122966e-45 76.85722 0
#> B2M 5.982524e-41 -100.2677 2.846381e-44 74.78293 0
#> S100A11 2.061180e-37 -91.9406 1.176808e-40 68.18884 0
Visualize the fitted expression
As an example, IL32 is an known gene associated to T cell activation process. The fitted PreTSA curve shows an increasing trend.
library(ggplot2)
genename <- "IL32"
df.gene = data.frame(ptime = pseudotime, ori = expr[genename, ], pretsa = fitRes[genename, ])
ggplot(data = df.gene) + geom_point(aes(x = ptime, y = ori), color = "orange") +
geom_line(aes(x = ptime, y = pretsa), linewidth = 2, color = "royalblue") +
labs(x = "Pseudotime", y = "Expression", title = genename) + theme_classic() +
theme(plot.title = element_text(hjust = 0.5, face = "italic"))
Session Info
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.4.2 Matrix_1.5-1 PreTSA_1.1
#>
#> loaded via a namespace (and not attached):
#> [1] highr_0.10 pillar_1.8.1 bslib_0.4.2 compiler_4.2.1
#> [5] jquerylib_0.1.4 tools_4.2.1 fitdistrplus_1.1-8 digest_0.6.31
#> [9] tibble_3.2.1 gtable_0.3.1 jsonlite_1.8.4 evaluate_0.19
#> [13] memoise_2.0.1 lifecycle_1.0.3 lattice_0.20-45 pkgconfig_2.0.3
#> [17] rlang_1.1.0 cli_3.6.1 rstudioapi_0.14 yaml_2.3.6
#> [21] pkgdown_2.0.7 xfun_0.39 fastmap_1.1.0 withr_2.5.0
#> [25] dplyr_1.1.1 stringr_1.5.0 knitr_1.41 generics_0.1.3
#> [29] desc_1.4.2 fs_1.5.2 vctrs_0.6.1 sass_0.4.4
#> [33] systemfonts_1.0.4 tidyselect_1.2.0 rprojroot_2.0.3 grid_4.2.1
#> [37] glue_1.6.2 R6_2.5.1 textshaping_0.3.6 fansi_1.0.3
#> [41] survival_3.3-1 rmarkdown_2.19 farver_2.1.1 purrr_1.0.1
#> [45] magrittr_2.0.3 scales_1.2.1 htmltools_0.5.4 matrixStats_0.63.0
#> [49] splines_4.2.1 MASS_7.3-57 colorspace_2.0-3 labeling_0.4.2
#> [53] ragg_1.2.5 utf8_1.2.2 stringi_1.7.8 munsell_0.5.0
#> [57] cachem_1.0.6