Model temporal gene expression patterns

Introduction

We will demonstrate how to use PreTSA to fit the gene expression along pseudotime values and identify temporally variable genes (TVGs).

The 3K peripheral blood mononuclear cells (PBMC3K) dataset was downloaded from the 10x website. The processed PBMC3K dataset was downloaded using the R package SeuratData. TSCAN was used to construct a pseudotime trajectory from naive CD4 T cells to memory CD4 T cells with the top 10 principal components (PCs) and the cell clusters obtained from the dataset. Genes that are expressed in fewer than 5 cells within the trajectory were filtered out. We analyzed a final set of 10,509 genes on 1,180 cells.

Load packages and datasets

library(PreTSA)
load(system.file("extdata", "pbmc3k", "pbmc3k_cleaned.Rdata", package = "PreTSA"))
str(expr)
#> Loading required package: Matrix
#> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   ..@ i       : int [1:953387] 22 52 157 168 190 252 350 430 486 554 ...
#>   ..@ p       : int [1:1181] 0 238 508 958 1290 1691 2211 2596 2842 3257 ...
#>   ..@ Dim     : int [1:2] 10509 1180
#>   ..@ Dimnames:List of 2
#>   .. ..$ : chr [1:10509] "LINC00115" "NOC2L" "HES4" "ISG15" ...
#>   .. ..$ : chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...
#>   ..@ x       : num [1:953387] 2.87 2.87 5.46 2.87 3.54 ...
#>   ..@ factors : list()
str(pseudotime)
#>  Named int [1:1180] 1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, "names")= chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...

Fit the gene expression along pseudotime values

fitRes <- temporalFit(expr = expr, pseudotime = pseudotime, knot = F)
str(fitRes)
#>  num [1:10509, 1:1180] 0.00238 0.16975 0.00847 0.34266 -0.00666 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:10509] "LINC00115" "NOC2L" "HES4" "ISG15" ...
#>   ..$ : chr [1:1180] "CTAATAGAGCTATG" "TAGTCTTGGCTGTA" "ACTTGTACCTGTCC" "CACCCATGTTCTGT" ...

The argument “knot” indicates whether to select the optimal number of knots automatically (FALSE by default). The argument “maxknotallowed” is the user-defined maximum number of knots (10 by default).

temporalFit(expr = expr, pseudotime = pseudotime, knot = T, maxknotallowed = 10)

Identify temporally variable genes (TVGs)

To account for the pseudotime uncertainty, we apply the similar strategy used in PseudotimeDE. The argument “pseudotime_permute” is a list of permuted pseudotime values from subsampled cells. Each element in the list has the same format of the argument “pseudotime”.

pseudotime_permute <- readRDS(system.file("extdata", "pbmc3k", "pbmc3k_pseudotime_permute.rds", package = "PreTSA"))
length(pseudotime_permute)
#> [1] 100
str(pseudotime_permute[[1]])
#>  Named int [1:944] 1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, "names")= chr [1:944] "AAATTCGAGGAGTG" "GAGATGCTGAATGA" "TATCTTCTAAACAG" "GCAAGACTACTGGT" ...

It returns a data frame with the p-value, FDR, test statistic and number of knots selected for each gene. Genes are ordered first by p-value (increasing) then by test statistic (decreasing).

testRes <- temporalTest(expr = expr[1:100, ], pseudotime = pseudotime, pseudotime_permute = pseudotime_permute, knot = F)
head(testRes)
#>          fdr.parametric pval.parametric fdr.empirical pval.empirical fstat.ori
#> ISG15      4.939088e-19    4.939088e-21    0.07616146     0.00990099  28.32494
#> TNFRSF4    1.787031e-13    3.574063e-15    0.07616146     0.00990099  26.34783
#> ENO1       1.979526e-10    5.938578e-12    0.07616146     0.00990099  18.76528
#> RPL22      4.865585e-09    1.946234e-10    0.07616146     0.00990099  14.92563
#> TNFRSF25   2.798111e-08    1.399055e-09    0.07616146     0.00990099  14.02817
#> RER1       4.449211e-07    2.823722e-08    0.07616146     0.00990099  14.15083

To reduce the computational time, users can ignore the pseudotime uncertainty, which is the default setting.

testResFixed <- temporalTest(expr = expr, pseudotime = pseudotime, pseudotime_permute = NULL, knot = T, maxknotallowed = 10)
head(testResFixed)
#>                   fdr   logpval         pval     fstat knotnum
#> S100A4   5.688841e-65 -157.1895 5.413304e-69 122.40125       0
#> KIAA0101 5.939572e-60 -144.9403 1.130378e-63  29.69045      10
#> MALAT1   4.759637e-58 -140.1511 1.358732e-61 107.66759       0
#> ANXA1    5.577562e-42 -102.8635 2.122966e-45  76.85722       0
#> B2M      5.982524e-41 -100.2677 2.846381e-44  74.78293       0
#> S100A11  2.061180e-37  -91.9406 1.176808e-40  68.18884       0

Visualize the fitted expression

As an example, IL32 is an known gene associated to T cell activation process. The fitted PreTSA curve shows an increasing trend.

library(ggplot2)
genename <- "IL32"
df.gene = data.frame(ptime = pseudotime, ori = expr[genename, ], pretsa = fitRes[genename, ])

ggplot(data = df.gene) + geom_point(aes(x = ptime, y = ori), color = "orange") +
  geom_line(aes(x = ptime, y = pretsa), linewidth = 2, color = "royalblue") +
  labs(x = "Pseudotime", y = "Expression", title = genename) + theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "italic"))

Session Info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.4.2 Matrix_1.5-1  PreTSA_1.1   
#> 
#> loaded via a namespace (and not attached):
#>  [1] highr_0.10         pillar_1.8.1       bslib_0.4.2        compiler_4.2.1    
#>  [5] jquerylib_0.1.4    tools_4.2.1        fitdistrplus_1.1-8 digest_0.6.31     
#>  [9] tibble_3.2.1       gtable_0.3.1       jsonlite_1.8.4     evaluate_0.19     
#> [13] memoise_2.0.1      lifecycle_1.0.3    lattice_0.20-45    pkgconfig_2.0.3   
#> [17] rlang_1.1.0        cli_3.6.1          rstudioapi_0.14    yaml_2.3.6        
#> [21] pkgdown_2.0.7      xfun_0.39          fastmap_1.1.0      withr_2.5.0       
#> [25] dplyr_1.1.1        stringr_1.5.0      knitr_1.41         generics_0.1.3    
#> [29] desc_1.4.2         fs_1.5.2           vctrs_0.6.1        sass_0.4.4        
#> [33] systemfonts_1.0.4  tidyselect_1.2.0   rprojroot_2.0.3    grid_4.2.1        
#> [37] glue_1.6.2         R6_2.5.1           textshaping_0.3.6  fansi_1.0.3       
#> [41] survival_3.3-1     rmarkdown_2.19     farver_2.1.1       purrr_1.0.1       
#> [45] magrittr_2.0.3     scales_1.2.1       htmltools_0.5.4    matrixStats_0.63.0
#> [49] splines_4.2.1      MASS_7.3-57        colorspace_2.0-3   labeling_0.4.2    
#> [53] ragg_1.2.5         utf8_1.2.2         stringi_1.7.8      munsell_0.5.0     
#> [57] cachem_1.0.6

Haotian Zhuang

May 15, 2024

Introduction

Load packages and datasets

Fit the gene expression along pseudotime values

Identify temporally variable genes (TVGs)

Visualize the fitted expression