This script illustrates a simple differential gene expression (DGE) analysis with DESeq2:
The results are exported into a text file and visualized using heatmaps and volcano-plots.
Please note that this is only an introduction into DGE
analysis.
A detailed tutorial for DGE analysis with DESEq2
package could be found here:
# Packages should be installed only once.
# They have been pre-installed.
# Hence these lines are commented.
# CRAN packages
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("pheatmap")
# install.packages("BiocManager")
# Bioconductor packages
# BiocManager::install("tximeta")
# BiocManager::install("DESeq2")
# BiocManager::install("org.Hs.eg.db")
# BiocManager::install('EnhancedVolcano')
# Time stamp
Sys.time()
## [1] "2024-06-26 16:02:43 BST"
# Clean environment
rm(list=ls())
graphics.off()
tximeta stands for “Transcripts Import with Metadata”. This package allows a user to import Salmon counts into a data structure called Summarized Experiment (SE). Importantly, tximeta can detect the version of human genome resources (genome, transcriptome and annotation) used by Salmon during the transcripts count. Of course, tximeta can only recognize commonly used human (or mouse) genome resources from Gencode, Ensembl or Refseq. For other species or customized genome resources you may use tximport package, or even construct the Summarized Experiment object manually.
Luckily, we used a compatible Gencode version of resources (Human Gencode v45) during the Salmon count. So, we may use tximeta for data import. In addition to recognizing the transcripts names, tximeta will be able to aggregate the transcripts into genes, and even will add the genes/transcripts genomic coordinates to the produced Summarized Experiment (making a Ranged Summarized Experiment).
You may find detailed tutorials for using Summarised Experiment and tximeta here:
Note that after loading tximeta we will to check its version. This is needed because older versions of tximeta may be out of sync with recent versions of genomic resources. tximeta v.1.20.+ is already OK for the GENCODE v.45
The first time when tximeta calls for GENCODE v45, it downloads the resources from the internet and performs some necessary configuration (e.g. asks user where to store the files etc). This needs to be done only once. To save time, all required resources had already been downloaded to the VM.
# Load tximeta library
library(tximeta)
packageVersion("tximeta")
## [1] '1.20.3'
# Set working folder
# Note that when rendering RMD, setwd() may not persist beyond the current chunk:
# https://bookdown.org/yihui/rmarkdown-cookbook/working-directory.html#working-directory
#base_folder="/home/ubuntu/Documents/bulk-rnaseq/short-reads"
base_folder <- "C:\\Users\\alexe\\OneDrive - Cranfield University\\External\\ebi\\short_reads"
data_folder <- file.path(base_folder, "data", "salmon_counts")
setwd(data_folder)
# Read samples information
coldata <- read.table("samples.txt", header=T, sep = "\t")
coldata
## files names SRA Depth CCLE condition
## 1 SRR8615612/quant.sf AC1 SRR8615612 83450360 NCIH1373 Adenocarcinoma
## 2 SRR8616105/quant.sf AC2 SRR8616105 84706828 NCIH23 Adenocarcinoma
## 3 SRR8615805/quant.sf AC3 SRR8615805 109186348 NCIH358 Adenocarcinoma
## 4 SRR8616002/quant.sf SCC1 SRR8616002 79854702 DMS114 Small_Cell_Carcinoma
## 5 SRR8615428/quant.sf SCC2 SRR8615428 77000229 CORL311 Small_Cell_Carcinoma
## 6 SRR8615609/quant.sf SCC3 SRR8615609 60897689 NCIH1836 Small_Cell_Carcinoma
# Read Salmon counts to Summarized Experiment (SE) object
se <- tximeta(coldata)
# Show summary of the SE object:
# it should be RangedSummarizedExperiment
# with 252,048 transcripts and 6 samples
se
## class: RangedSummarizedExperiment
## dim: 252048 6
## metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
## assays(3): counts abundance length
## rownames(252048): ENST00000456328.2 ENST00000450305.2 ... ENST00000387460.2 ENST00000387461.2
## rowData names(3): tx_id gene_id tx_name
## colnames(6): AC1 AC2 ... SCC2 SCC3
## colData names(5): names SRA Depth CCLE condition
# Load SummarizedExperiment library:
# it provides core functions for handling SE objects
library(SummarizedExperiment)
# Information about the genome
seqinfo(se)
## Seqinfo object with 25 sequences (1 circular) from hg38 genome:
## seqnames seqlengths isCircular genome
## chr1 248956422 FALSE hg38
## chr2 242193529 FALSE hg38
## chr3 198295559 FALSE hg38
## chr4 190214555 FALSE hg38
## chr5 181538259 FALSE hg38
## ... ... ... ...
## chr21 46709983 FALSE hg38
## chr22 50818468 FALSE hg38
## chrX 156040895 FALSE hg38
## chrY 57227415 FALSE hg38
## chrM 16569 TRUE hg38
# Information about the samples
colData(se)
## DataFrame with 6 rows and 5 columns
## names SRA Depth CCLE condition
## <character> <character> <integer> <character> <character>
## AC1 AC1 SRR8615612 83450360 NCIH1373 Adenocarcinoma
## AC2 AC2 SRR8616105 84706828 NCIH23 Adenocarcinoma
## AC3 AC3 SRR8615805 109186348 NCIH358 Adenocarcinoma
## SCC1 SCC1 SRR8616002 79854702 DMS114 Small_Cell_Carcinoma
## SCC2 SCC2 SRR8615428 77000229 CORL311 Small_Cell_Carcinoma
## SCC3 SCC3 SRR8615609 60897689 NCIH1836 Small_Cell_Carcinoma
# Assays: matrices with the counts
# (and other data relevant to counts)
assayNames(se)
## [1] "counts" "abundance" "length"
# Information about transcripts:
# note that SE uses Ensembl transcript IDs (ENST...),
# also note the genomic coordinates (given in Rle/IRanges formats)
rowRanges(se)
## GRanges object with 252048 ranges and 3 metadata columns:
## seqnames ranges strand | tx_id gene_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <CharacterList> <character>
## ENST00000456328.2 chr1 11869-14409 + | 1 ENSG00000290825.1 ENST00000456328.2
## ENST00000450305.2 chr1 12010-13670 + | 2 ENSG00000223972.6 ENST00000450305.2
## ENST00000488147.2 chr1 14696-24886 - | 11638 ENSG00000227232.6 ENST00000488147.2
## ENST00000619216.1 chr1 17369-17436 - | 11639 ENSG00000278267.1 ENST00000619216.1
## ENST00000473358.1 chr1 29554-31097 + | 3 ENSG00000243485.5 ENST00000473358.1
## ... ... ... ... . ... ... ...
## ENST00000361681.2 chrM 14149-14673 - | 252928 ENSG00000198695.2 ENST00000361681.2
## ENST00000387459.1 chrM 14674-14742 - | 252929 ENSG00000210194.1 ENST00000387459.1
## ENST00000361789.2 chrM 14747-15887 + | 252920 ENSG00000198727.2 ENST00000361789.2
## ENST00000387460.2 chrM 15888-15953 + | 252921 ENSG00000210195.2 ENST00000387460.2
## ENST00000387461.2 chrM 15956-16023 - | 252930 ENSG00000210196.2 ENST00000387461.2
## -------
## seqinfo: 25 sequences (1 circular) from hg38 genome
# Summarize to gene level
# Use most abundant transcript to select the most relevant gene coordinates
gse <- summarizeToGene(se, assignRanges="abundant")
# Explore the result
# Note that GSE already uses genes IDs
# (ENSG..., 62,729 genes)
gse
## class: RangedSummarizedExperiment
## dim: 62749 6
## metadata(7): tximetaInfo quantInfo ... txdbInfo assignRanges
## assays(3): counts abundance length
## rownames(62749): ENSG00000000003.16 ENSG00000000005.6 ... ENSG00000293559.1 ENSG00000293560.1
## rowData names(5): gene_id tx_ids isoform max_prop iso_prop
## colnames(6): AC1 AC2 ... SCC2 SCC3
## colData names(5): names SRA Depth CCLE condition
rowRanges(gse)
## GRanges object with 62749 ranges and 5 metadata columns:
## seqnames ranges strand | gene_id tx_ids isoform max_prop iso_prop
## <Rle> <IRanges> <Rle> | <character> <CharacterList> <character> <numeric> <NumericList>
## ENSG00000000003.16 chrX 100627108-100636806 - | ENSG00000000003.16 ENST00000373020.9,ENST00000612152.4,ENST00000496771.5,... ENST00000373020.9 0.687237 0.6872372,0.0025714,0.2824258,...
## ENSG00000000005.6 chrX 100593624-100597531 + | ENSG00000000005.6 ENST00000373031.5,ENST00000485971.1 ENST00000485971.1 1.000000 0,1
## ENSG00000000419.14 chr20 50934870-50958532 - | ENSG00000000419.14 ENST00000466152.5,ENST00000371582.8,ENST00000371588.10,... ENST00000371588.10 0.758189 0.0161384,0.0665779,0.0422139,...
## ENSG00000000457.14 chr1 169853074-169893959 - | ENSG00000000457.14 ENST00000367771.11,ENST00000367770.5,ENST00000367772.8,... ENST00000367772.8 0.515659 0.213536,0.000000,0.515659,...
## ENSG00000000460.17 chr1 169795043-169804347 + | ENSG00000000460.17 ENST00000498289.5,ENST00000472795.5,ENST00000359326.9,... ENST00000496973.5 0.383497 0.00838816,0.01530056,0.15528023,...
## ... ... ... ... . ... ... ... ... ...
## ENSG00000293556.1 chr7 149776435-149783291 + | ENSG00000293556.1 ENST00000486824.3,ENST00000493502.3 ENST00000486824.3 0.921758 0.9217577,0.0782423
## ENSG00000293557.1 chr7 149824562-149833978 + | ENSG00000293557.1 ENST00000461331.5,ENST00000488835.5,ENST00000492965.5,... ENST00000472850.3 0.411791 0.000000,0.173459,0.342876,...
## ENSG00000293558.1 chr7 149810915-149812635 + | ENSG00000293558.1 ENST00000481772.3 ENST00000481772.3 1.000000 1
## ENSG00000293559.1 chr7 99238911-99261636 + | ENSG00000293559.1 ENST00000453194.2 ENST00000453194.2 1.000000 1
## ENSG00000293560.1 chr7 99297912-99311130 + | ENSG00000293560.1 ENST00000452010.2,ENST00000429079.1 ENST00000452010.2 1.000000 1,0
## -------
## seqinfo: 25 sequences (1 circular) from hg38 genome
# Clean-up
# (delete interim data that are not any longer needed)
rm(coldata,se,data_folder)
Bioconductor provides a large collection of Annotation Data
Packages:
- https://bioconductor.org/packages/release/data/annotation/
Many annotations for human genes are provided via
org.Hs.eg.db data package:
- https://bioconductor.org/packages/release/data/annotation/html/org.Hs.eg.db.html
Information from org.Hs.eg.db can be easily added to Summarised Experiment objects.
# Load database with human genes annotations
library(org.Hs.eg.db)
# Explore the content of org.Hs.eg.db
org.Hs.eg.db
## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: OrgDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: HUMAN_DB
## | ORGANISM: Homo sapiens
## | SPECIES: Human
## | EGSOURCEDATE: 2023-Sep11
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | CENTRALID: EG
## | TAXID: 9606
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2023-07-27
## | GOEGSOURCEDATE: 2023-Sep11
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
## | GPSOURCEURL:
## | GPSOURCEDATE: 2023-Aug20
## | ENSOURCEDATE: 2023-May10
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.UniProt.org/
## | UPSOURCEDATE: Mon Sep 18 16:12:39 2023
# Add gene SYMBOLs to our GSE object
gse <- addIds(gse, "SYMBOL", gene=TRUE)
# Explore the result
# (the SYMBOLs column is added to the right side of the table)
rowRanges(gse)
## GRanges object with 62749 ranges and 6 metadata columns:
## seqnames ranges strand | gene_id tx_ids isoform max_prop iso_prop SYMBOL
## <Rle> <IRanges> <Rle> | <character> <CharacterList> <character> <numeric> <NumericList> <character>
## ENSG00000000003.16 chrX 100627108-100636806 - | ENSG00000000003.16 ENST00000373020.9,ENST00000612152.4,ENST00000496771.5,... ENST00000373020.9 0.687237 0.6872372,0.0025714,0.2824258,... TSPAN6
## ENSG00000000005.6 chrX 100593624-100597531 + | ENSG00000000005.6 ENST00000373031.5,ENST00000485971.1 ENST00000485971.1 1.000000 0,1 TNMD
## ENSG00000000419.14 chr20 50934870-50958532 - | ENSG00000000419.14 ENST00000466152.5,ENST00000371582.8,ENST00000371588.10,... ENST00000371588.10 0.758189 0.0161384,0.0665779,0.0422139,... DPM1
## ENSG00000000457.14 chr1 169853074-169893959 - | ENSG00000000457.14 ENST00000367771.11,ENST00000367770.5,ENST00000367772.8,... ENST00000367772.8 0.515659 0.213536,0.000000,0.515659,... SCYL3
## ENSG00000000460.17 chr1 169795043-169804347 + | ENSG00000000460.17 ENST00000498289.5,ENST00000472795.5,ENST00000359326.9,... ENST00000496973.5 0.383497 0.00838816,0.01530056,0.15528023,... FIRRM
## ... ... ... ... . ... ... ... ... ... ...
## ENSG00000293556.1 chr7 149776435-149783291 + | ENSG00000293556.1 ENST00000486824.3,ENST00000493502.3 ENST00000486824.3 0.921758 0.9217577,0.0782423 <NA>
## ENSG00000293557.1 chr7 149824562-149833978 + | ENSG00000293557.1 ENST00000461331.5,ENST00000488835.5,ENST00000492965.5,... ENST00000472850.3 0.411791 0.000000,0.173459,0.342876,... <NA>
## ENSG00000293558.1 chr7 149810915-149812635 + | ENSG00000293558.1 ENST00000481772.3 ENST00000481772.3 1.000000 1 <NA>
## ENSG00000293559.1 chr7 99238911-99261636 + | ENSG00000293559.1 ENST00000453194.2 ENST00000453194.2 1.000000 1 <NA>
## ENSG00000293560.1 chr7 99297912-99311130 + | ENSG00000293560.1 ENST00000452010.2,ENST00000429079.1 ENST00000452010.2 1.000000 1,0 <NA>
## -------
## seqinfo: 25 sequences (1 circular) from hg38 genome
Unfortunately, many ENSEMBL genes still don’t have corresponding SYMBOLS, as could be seen by NA-s at the end of our rowRanges.
# How many Ensembl genes have no SYMBOLs yet?
sum(is.na(rowRanges(gse)$SYMBOL))
## [1] 26279
# The total number of the genes in our GSE:
length(rowRanges(gse))
## [1] 62749
To avoid potential problems caused by the missed SYMBOLS, you may get the “gene names” directly from the GTF file provided by Gencode. Gencode uses SYMBOLs to name the gens where possible, and uses other identifiers (e.g. Ensembl IDs) if the SYMBOL is not available.
IMPORTANT: use the same version of Gencode that was used for Salmon counts!
# Library containing function for reading GFF files
library(rtracklayer)
# GenCode 45 annotations (GTF) file name
gtf_file <- file.path(base_folder, "resources","gencode_45","gencode.v45.primary_assembly.annotation.gtf.gz")
# Read GTF file to a data frame (this may take a couple of minutes ...)
gtf.df <- readGFF(gtf_file)
# Explore annotations imported from the GTF file
head(gtf.df)
## seqid source type start end score strand phase gene_id gene_type gene_name level tag transcript_id transcript_type transcript_name transcript_support_level havana_transcript exon_number exon_id hgnc_id havana_gene ont protein_id ccdsid artif_dupl
## 1 chr1 HAVANA gene 11869 14409 NA + NA ENSG00000290825.1 lncRNA DDX11L2 2 overlaps_pseudogene <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 chr1 HAVANA transcript 11869 14409 NA + NA ENSG00000290825.1 lncRNA DDX11L2 2 Ensembl_canonical ENST00000456328.2 lncRNA DDX11L2-202 1 OTTHUMT00000362751.1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 chr1 HAVANA exon 11869 12227 NA + NA ENSG00000290825.1 lncRNA DDX11L2 2 Ensembl_canonical ENST00000456328.2 lncRNA DDX11L2-202 1 OTTHUMT00000362751.1 1 ENSE00002234944.1 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 chr1 HAVANA exon 12613 12721 NA + NA ENSG00000290825.1 lncRNA DDX11L2 2 Ensembl_canonical ENST00000456328.2 lncRNA DDX11L2-202 1 OTTHUMT00000362751.1 2 ENSE00003582793.1 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 chr1 HAVANA exon 13221 14409 NA + NA ENSG00000290825.1 lncRNA DDX11L2 2 Ensembl_canonical ENST00000456328.2 lncRNA DDX11L2-202 1 OTTHUMT00000362751.1 3 ENSE00002312635.1 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 chr1 HAVANA gene 12010 13670 NA + NA ENSG00000223972.6 transcribed_unprocessed_pseudogene DDX11L1 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> HGNC:37102 OTTHUMG00000000961.2 <NA> <NA> <NA> <NA>
# Make table linking Ensembl gene ID to the Gencode "gene_name"
library(dplyr)
geneId2Name.df <- gtf.df %>%
select(gene_id, gene_name) %>%
distinct()
# Explore result
head(geneId2Name.df)
## gene_id gene_name
## 1 ENSG00000290825.1 DDX11L2
## 2 ENSG00000223972.6 DDX11L1
## 3 ENSG00000227232.6 WASH7P
## 4 ENSG00000278267.1 MIR6859-1
## 5 ENSG00000243485.5 MIR1302-2HG
## 6 ENSG00000284332.1 MIR1302-2
tail(geneId2Name.df)
## gene_id gene_name
## 63236 ENSG00000278573.1 ENSG00000278573
## 63237 ENSG00000276017.1 ENSG00000276017
## 63238 ENSG00000278817.1 ENSG00000278817
## 63239 ENSG00000277196.4 ENSG00000277196
## 63240 ENSG00000278625.1 U6
## 63241 ENSG00000277374.1 U1
# The following 3 lines of code add the Gencode "gene_names" to our GSE object
rownames(geneId2Name.df) <- geneId2Name.df$gene_id
geneId2Name.df <- geneId2Name.df[rownames(gse),]
rowRanges(gse)$gene_name <- geneId2Name.df$gene_name
# Check result
# (column with "gene name" is added to the right side of the table)
rowRanges(gse)
## GRanges object with 62749 ranges and 7 metadata columns:
## seqnames ranges strand | gene_id tx_ids isoform max_prop iso_prop SYMBOL gene_name
## <Rle> <IRanges> <Rle> | <character> <CharacterList> <character> <numeric> <NumericList> <character> <character>
## ENSG00000000003.16 chrX 100627108-100636806 - | ENSG00000000003.16 ENST00000373020.9,ENST00000612152.4,ENST00000496771.5,... ENST00000373020.9 0.687237 0.6872372,0.0025714,0.2824258,... TSPAN6 TSPAN6
## ENSG00000000005.6 chrX 100593624-100597531 + | ENSG00000000005.6 ENST00000373031.5,ENST00000485971.1 ENST00000485971.1 1.000000 0,1 TNMD TNMD
## ENSG00000000419.14 chr20 50934870-50958532 - | ENSG00000000419.14 ENST00000466152.5,ENST00000371582.8,ENST00000371588.10,... ENST00000371588.10 0.758189 0.0161384,0.0665779,0.0422139,... DPM1 DPM1
## ENSG00000000457.14 chr1 169853074-169893959 - | ENSG00000000457.14 ENST00000367771.11,ENST00000367770.5,ENST00000367772.8,... ENST00000367772.8 0.515659 0.213536,0.000000,0.515659,... SCYL3 SCYL3
## ENSG00000000460.17 chr1 169795043-169804347 + | ENSG00000000460.17 ENST00000498289.5,ENST00000472795.5,ENST00000359326.9,... ENST00000496973.5 0.383497 0.00838816,0.01530056,0.15528023,... FIRRM FIRRM
## ... ... ... ... . ... ... ... ... ... ... ...
## ENSG00000293556.1 chr7 149776435-149783291 + | ENSG00000293556.1 ENST00000486824.3,ENST00000493502.3 ENST00000486824.3 0.921758 0.9217577,0.0782423 <NA> ENSG00000293556
## ENSG00000293557.1 chr7 149824562-149833978 + | ENSG00000293557.1 ENST00000461331.5,ENST00000488835.5,ENST00000492965.5,... ENST00000472850.3 0.411791 0.000000,0.173459,0.342876,... <NA> ENSG00000293557
## ENSG00000293558.1 chr7 149810915-149812635 + | ENSG00000293558.1 ENST00000481772.3 ENST00000481772.3 1.000000 1 <NA> ENSG00000293558
## ENSG00000293559.1 chr7 99238911-99261636 + | ENSG00000293559.1 ENST00000453194.2 ENST00000453194.2 1.000000 1 <NA> ENSG00000293559
## ENSG00000293560.1 chr7 99297912-99311130 + | ENSG00000293560.1 ENST00000452010.2,ENST00000429079.1 ENST00000452010.2 1.000000 1,0 <NA> ENSG00000293560
## -------
## seqinfo: 25 sequences (1 circular) from hg38 genome
# Clean-up
rm(gtf_file, gtf.df, geneId2Name.df)
At this point we have imported Salmon counts to the Summarized Experiment object, and have added some annotations (such as SYMBOLS and “gene names”). Now we are prepared to import data into DESeq DataSet.
# Load DESeq2 library
library(DESeq2)
# Read data
dds <- DESeqDataSet(gse, design = ~ condition)
# Explore result
dds
## class: DESeqDataSet
## dim: 62749 6
## metadata(8): tximetaInfo quantInfo ... assignRanges version
## assays(3): counts abundance avgTxLength
## rownames(62749): ENSG00000000003.16 ENSG00000000005.6 ... ENSG00000293559.1 ENSG00000293560.1
## rowData names(7): gene_id tx_ids ... SYMBOL gene_name
## colnames(6): AC1 AC2 ... SCC2 SCC3
## colData names(5): names SRA Depth CCLE condition
# Clean-up
rm(gse)
Note the design element of the DESeq object: it describes what analysis we want to perform. In this example we will consider a very simple design: comparing gene expressions between the two types of lung cancer given in the condition column:
design = ~ condition
More complex designs might include covariates (e.g. sex or batch). In such cases, the covariate data should be provided in the samples table, and then included into the design formula. If the design includes covariates, then the main variable of interest (e.g. condition in our example) MUST be kept at the end of the DESeq2 formula. For instance, a design with batch-correction and adjustment for sex could look like this:
design = ~ batch + sex + condition
In other packages this order could be different.
Discussing more complex designs is beyond this introductory course.
However, information about advanced use of DESeq2 could be found in the
tutorial mentioned above:
- http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
The low-expressed genes are not informative. At the same time, they may complicate downstream tasks (e.g. make calculations slower, or inflate multiple testing correction). Manual removal of low-expressed genes is optional for DESeq2 analysis, because DESeq2 automatically takes care of them later. However, it is not optional for edgeR analysis.
Conventionally, you may exclude genes with median count per sample less than 10. Alternatively, you may explore the genes abundances and select a customized threshold.
# Calculate median counts
median_counts <- apply(counts(dds),1,median)
# Explore the median counts distribution
# (vertical line shows arbitrary threshold of count=10)
hist(log2(median_counts),
main="Median counts per gene (Log2)",
xlab="Log2(median count)")
abline(v=log2(10), col="red", lty=2)
# Select the low expressed genes
# using an arbitrary threshold of median count 10
low_expressed_genes <- median_counts < 10
sum(low_expressed_genes)
## [1] 42184
# Remove the low expressed genes
dds <- dds[!low_expressed_genes, ]
dds
## class: DESeqDataSet
## dim: 20565 6
## metadata(8): tximetaInfo quantInfo ... assignRanges version
## assays(3): counts abundance avgTxLength
## rownames(20565): ENSG00000000003.16 ENSG00000000419.14 ... ENSG00000293552.1 ENSG00000293557.1
## rowData names(7): gene_id tx_ids ... SYMBOL gene_name
## colnames(6): AC1 AC2 ... SCC2 SCC3
## colData names(5): names SRA Depth CCLE condition
# Clean-up
rm(low_expressed_genes, median_counts)
It is recommended to normalize data for exploration. Here we use Variance-Stabilizing-Transformation (VST), provided by DESeq2 (an alternative is Rlog).
Importantly, we apply VST/Rlog transformation ONLY to the exploratory data! The actual DGE analysis is performed later starting from the raw data.
The boxplots below show that there is no gross outliers in our data.
# Distributions of raw counts
boxplot(log2(assay(dds)), main="Raw Data", ylab="log2(expr)")
# VST (or Rlog): for exploring and visualizing ONLY !
vst <- varianceStabilizingTransformation(dds)
#rlg <- rlog(dds)
# Counts after VST
boxplot(log2(assay(vst)), main="VST Data", ylab="log2(expr)")
To speed-up computation, we only use 1000 most variable genes for this step. The heatmap below suggests that 1000 most variable genes could separate lung adenocarcinoma and small-cells lung cancers in our dataset.
# Select 1000 most variable genes
variable_genes <- sort(apply(assay(vst),1,var), decreasing=T)[1:1000]
selected_data <- assay(vst)[names(variable_genes),]
# Plot heatmap
heatmap(selected_data, labRow=NA)
# Clean-up
rm(variable_genes, selected_data)
plotPCA is a built-in DESeq2 function. The PCA plot confirms that lung adenocarcinomas and small-cell lung cancers have very distinct gene expression pasterns.
plotPCA(vst)
# Clean-up:
# Note that we don't need VST data any longer
# the actual DGE analysis will start with raw data
rm(vst)
After the exploration suggested that there is no gross outliers in the data, and that transcriptional profiles are different between the studied groups, we will look for the genes differentially expressed between these groups.
The core steps of DESeq analysis includes fitting the negative binomial model and dispersion adjustments are all done with a single DESeq command.
The results of calculations are saved as a DeSEq DataSet object with additional fields containing the calculated statistical metrics. For simplicity, in the code below the result is saved under the same name as the initial DeSEq DataSet object. However, you may note some new “assays” within the new object, and new columns added to the rowData table.
# Learn DESeq command parameters
#?DESeq
# Add information about differential expression to DESeq-DataSet object
dds <- DESeq(dds)
# Check result (there are 7 assays instead of 3)
dds
## class: DESeqDataSet
## dim: 20565 6
## metadata(8): tximetaInfo quantInfo ... assignRanges version
## assays(7): counts abundance ... H cooks
## rownames(20565): ENSG00000000003.16 ENSG00000000419.14 ... ENSG00000293552.1 ENSG00000293557.1
## rowData names(29): gene_id tx_ids ... deviance maxCooks
## colnames(6): AC1 AC2 ... SCC2 SCC3
## colData names(5): names SRA Depth CCLE condition
# Row data: many statistical metrics added for for each gene
colnames(rowData(dds))
## [1] "gene_id" "tx_ids" "isoform" "max_prop" "iso_prop" "SYMBOL" "gene_name" "baseMean" "baseVar" "allZero" "dispGeneEst" "dispGeneIter" "dispFit" "dispersion" "dispIter"
## [16] "dispOutlier" "dispMAP" "Intercept" "condition_Small_Cell_Carcinoma_vs_Adenocarcinoma" "SE_Intercept" "SE_condition_Small_Cell_Carcinoma_vs_Adenocarcinoma" "WaldStatistic_Intercept" "WaldStatistic_condition_Small_Cell_Carcinoma_vs_Adenocarcinoma" "WaldPvalue_Intercept" "WaldPvalue_condition_Small_Cell_Carcinoma_vs_Adenocarcinoma" "betaConv" "betaIter" "deviance" "maxCooks"
The plot below illustrates the dispersion adjustments: the black dots show observed dispersions, the red line shows the trend (average dispersion depending on the level of gene expression), and the blue dots show the adjusted dispersions used for DGE calculation.
plotDispEsts(dds, main="Dispersion estimates and adjustments")
The results of DGE analysis should be extracted from DDESeq Data Set object results() function, which calculates p-values and FDR for a given fold change.
Importantly, the defaults are to calculate FDR<0.1 for any fold change. These are very relaxed thresholds, which could be appropriate only for a very small number of samples per group (a typical case just several years ago, when the high costs of sequencing were prohibitive for larger group sizes).
# Extract results for default FDR and log-fold-change
FDR_0.1_any_FC <- results(dds)
# --------- Explore results --------- #
# Summary: numbers of detected genes and
# genes excluded excluded as outliers.
# Note that DESeq selected 13 as the threshold for excluding low expressed genes
summary(FDR_0.1_any_FC)
##
## out of 20565 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 1131, 5.5%
## LFC < 0 (down) : 1097, 5.3%
## outliers [1] : 947, 4.6%
## low counts [2] : 399, 1.9%
## (mean count < 13)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
# Head and tail of the results table
FDR_0.1_any_FC
## log2 fold change (MLE): condition Small Cell Carcinoma vs Adenocarcinoma
## Wald test p-value: condition Small Cell Carcinoma vs Adenocarcinoma
## DataFrame with 20565 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000000003.16 2490.54 -0.569151 0.740499 -0.768604 0.4421282 0.770363
## ENSG00000000419.14 6779.61 1.632398 0.835378 1.954083 0.0506914 0.265677
## ENSG00000000457.14 1145.30 -0.473832 0.346140 -1.368903 0.1710295 0.505487
## ENSG00000000460.17 1711.59 -0.313666 0.523161 -0.599559 0.5488003 0.832673
## ENSG00000000971.17 1095.35 -1.317647 2.109042 -0.624761 0.5321278 0.823030
## ... ... ... ... ... ... ...
## ENSG00000293546.1 79.8270 0.2848225 0.727473 0.391523 0.6954107 0.898247
## ENSG00000293548.1 42.8602 -0.0795315 0.525817 -0.151253 0.8797759 0.963976
## ENSG00000293550.1 402.8264 1.4794925 0.821960 1.799957 0.0718675 0.323623
## ENSG00000293552.1 101.8336 -0.8594089 0.753021 -1.141281 0.2537530 0.605825
## ENSG00000293557.1 67.2056 1.4346120 2.442988 0.587237 0.5570448 0.837506
# Explaining column names of the results table
as.data.frame(mcols(FDR_0.1_any_FC, use.names=T))
## type description
## baseMean intermediate mean of normalized counts for all samples
## log2FoldChange results log2 fold change (MLE): condition Small Cell Carcinoma vs Adenocarcinoma
## lfcSE results standard error: condition Small Cell Carcinoma vs Adenocarcinoma
## stat results Wald statistic: condition Small Cell Carcinoma vs Adenocarcinoma
## pvalue results Wald test p-value: condition Small Cell Carcinoma vs Adenocarcinoma
## padj results BH adjusted p-values
MA plot shows fold change depending on the average gene expression. The blue dots show genes,for which changes are considered significant under the selected thresholds.
plotMA(dds, main="MA plot: FDR<0.1 for any fold change")
The more conventional thresholds would be FDR<0.05 for at least 2x fold change. These, or even stricter thresholds, are usually used nowadays when it is affordable to sequence tens (or hundreds) of samples per group.
Note that the number of differentially expressed genes has dropped under the stricter thresholds.
# Calculate results for 2-fold change and FDR<0.05
# Note that in log2-scale the 2-fold change is equivalent to the lfcThreshold=1
FDR_0.05_2x_change <- results(dds, lfcThreshold=1, alpha=0.05)
# Explore results
summary(FDR_0.05_2x_change)
##
## out of 20565 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 1.00 (up) : 405, 2%
## LFC < -1.00 (down) : 379, 1.8%
## outliers [1] : 947, 4.6%
## low counts [2] : 0, 0%
## (mean count < 8)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
FDR_0.05_2x_change
## log2 fold change (MLE): condition Small Cell Carcinoma vs Adenocarcinoma
## Wald test p-value: condition Small Cell Carcinoma vs Adenocarcinoma
## DataFrame with 20565 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000000003.16 2490.54 -0.569151 0.740499 0.000000 1.000000 1
## ENSG00000000419.14 6779.61 1.632398 0.835378 0.757020 0.449038 1
## ENSG00000000457.14 1145.30 -0.473832 0.346140 0.000000 1.000000 1
## ENSG00000000460.17 1711.59 -0.313666 0.523161 0.000000 1.000000 1
## ENSG00000000971.17 1095.35 -1.317647 2.109042 -0.150612 0.880282 1
## ... ... ... ... ... ... ...
## ENSG00000293546.1 79.8270 0.2848225 0.727473 0.000000 1.000000 1
## ENSG00000293548.1 42.8602 -0.0795315 0.525817 0.000000 1.000000 1
## ENSG00000293550.1 402.8264 1.4794925 0.821960 0.583353 0.559656 1
## ENSG00000293552.1 101.8336 -0.8594089 0.753021 0.000000 1.000000 1
## ENSG00000293557.1 67.2056 1.4346120 2.442988 0.177902 0.858800 1
# Make MA-plot
plotMA(dds, lfcThreshold=1, alpha=0.05, main="MA plot: FDR<0.05 for at least 2x change")
Typical steps after selecting the desired DESeq thresholds include:
# Extract results to a data frame
result.df <- as.data.frame(FDR_0.05_2x_change)
dim(result.df)
## [1] 20565 6
head(result.df)
## baseMean log2FoldChange lfcSE stat pvalue padj
## ENSG00000000003.16 2490.543 -0.5691506 0.7404988 0.0000000 1.0000000 1
## ENSG00000000419.14 6779.605 1.6323983 0.8353780 0.7570205 0.4490376 1
## ENSG00000000457.14 1145.300 -0.4738320 0.3461398 0.0000000 1.0000000 1
## ENSG00000000460.17 1711.593 -0.3136660 0.5231614 0.0000000 1.0000000 1
## ENSG00000000971.17 1095.345 -1.3176474 2.1090420 -0.1506122 0.8802816 1
## ENSG00000001036.14 4424.283 -2.0144005 0.7382122 -1.3741314 0.1694009 1
tail(result.df)
## baseMean log2FoldChange lfcSE stat pvalue padj
## ENSG00000293541.1 75.51924 -3.2150821 1.3931679 -1.5899606 0.1118437 1
## ENSG00000293546.1 79.82702 0.2848225 0.7274733 0.0000000 1.0000000 1
## ENSG00000293548.1 42.86017 -0.0795315 0.5258167 0.0000000 1.0000000 1
## ENSG00000293550.1 402.82636 1.4794925 0.8219601 0.5833526 0.5596560 1
## ENSG00000293552.1 101.83360 -0.8594089 0.7530214 0.0000000 1.0000000 1
## ENSG00000293557.1 67.20561 1.4346120 2.4429881 0.1779018 0.8588001 1
# Get gene names from DDS object
gene_names.df <- as.data.frame(rowData(dds)[,c("gene_id","SYMBOL","gene_name")])
dim(gene_names.df)
## [1] 20565 3
head(gene_names.df)
## gene_id SYMBOL gene_name
## ENSG00000000003.16 ENSG00000000003.16 TSPAN6 TSPAN6
## ENSG00000000419.14 ENSG00000000419.14 DPM1 DPM1
## ENSG00000000457.14 ENSG00000000457.14 SCYL3 SCYL3
## ENSG00000000460.17 ENSG00000000460.17 FIRRM FIRRM
## ENSG00000000971.17 ENSG00000000971.17 CFH CFH
## ENSG00000001036.14 ENSG00000001036.14 FUCA2 FUCA2
tail(gene_names.df)
## gene_id SYMBOL gene_name
## ENSG00000293541.1 ENSG00000293541.1 <NA> ENSG00000293541
## ENSG00000293546.1 ENSG00000293546.1 <NA> ALG1L9P
## ENSG00000293548.1 ENSG00000293548.1 <NA> ENSG00000293548
## ENSG00000293550.1 ENSG00000293550.1 <NA> ENSG00000293550
## ENSG00000293552.1 ENSG00000293552.1 <NA> ENSG00000293552
## ENSG00000293557.1 ENSG00000293557.1 <NA> ENSG00000293557
# Add gene names to the data frame with results
all(rownames(gene_names.df) == rownames(result.df))
## [1] TRUE
result.df <- data.frame(gene_names.df,result.df)
dim(result.df)
## [1] 20565 9
head(result.df)
## gene_id SYMBOL gene_name baseMean log2FoldChange lfcSE stat pvalue padj
## ENSG00000000003.16 ENSG00000000003.16 TSPAN6 TSPAN6 2490.543 -0.5691506 0.7404988 0.0000000 1.0000000 1
## ENSG00000000419.14 ENSG00000000419.14 DPM1 DPM1 6779.605 1.6323983 0.8353780 0.7570205 0.4490376 1
## ENSG00000000457.14 ENSG00000000457.14 SCYL3 SCYL3 1145.300 -0.4738320 0.3461398 0.0000000 1.0000000 1
## ENSG00000000460.17 ENSG00000000460.17 FIRRM FIRRM 1711.593 -0.3136660 0.5231614 0.0000000 1.0000000 1
## ENSG00000000971.17 ENSG00000000971.17 CFH CFH 1095.345 -1.3176474 2.1090420 -0.1506122 0.8802816 1
## ENSG00000001036.14 ENSG00000001036.14 FUCA2 FUCA2 4424.283 -2.0144005 0.7382122 -1.3741314 0.1694009 1
tail(result.df)
## gene_id SYMBOL gene_name baseMean log2FoldChange lfcSE stat pvalue padj
## ENSG00000293541.1 ENSG00000293541.1 <NA> ENSG00000293541 75.51924 -3.2150821 1.3931679 -1.5899606 0.1118437 1
## ENSG00000293546.1 ENSG00000293546.1 <NA> ALG1L9P 79.82702 0.2848225 0.7274733 0.0000000 1.0000000 1
## ENSG00000293548.1 ENSG00000293548.1 <NA> ENSG00000293548 42.86017 -0.0795315 0.5258167 0.0000000 1.0000000 1
## ENSG00000293550.1 ENSG00000293550.1 <NA> ENSG00000293550 402.82636 1.4794925 0.8219601 0.5833526 0.5596560 1
## ENSG00000293552.1 ENSG00000293552.1 <NA> ENSG00000293552 101.83360 -0.8594089 0.7530214 0.0000000 1.0000000 1
## ENSG00000293557.1 ENSG00000293557.1 <NA> ENSG00000293557 67.20561 1.4346120 2.4429881 0.1779018 0.8588001 1
# Remove genes with no p-value
# (this includes the low-expressed genes)
excluded_genes <- is.na(result.df$padj)
sum(excluded_genes)
## [1] 947
result.df <- result.df[!excluded_genes,]
dim(result.df)
## [1] 19618 9
# Select columns to save and order by p-value (and fold change)
result.df <- result.df %>%
select(gene_id, gene_name, baseMean, log2FoldChange, padj) %>%
arrange(padj, desc(log2FoldChange))
dim(result.df)
## [1] 19618 5
head(result.df)
## gene_id gene_name baseMean log2FoldChange padj
## ENSG00000130294.18 ENSG00000130294.18 KIF1A 42564.292 11.143601 2.609015e-57
## ENSG00000075340.23 ENSG00000075340.23 ADD2 6060.279 9.964503 5.089675e-41
## ENSG00000105971.15 ENSG00000105971.15 CAV2 9138.902 -8.356510 1.852796e-37
## ENSG00000167644.12 ENSG00000167644.12 C19orf33 1720.753 -9.048687 4.554644e-30
## ENSG00000069188.17 ENSG00000069188.17 SDK2 1194.699 8.980931 1.987185e-29
## ENSG00000159164.10 ENSG00000159164.10 SV2A 3426.134 7.556652 4.332935e-26
tail(result.df)
## gene_id gene_name baseMean log2FoldChange padj
## ENSG00000237330.3 ENSG00000237330.3 RNF223 333.23756 -4.035052 1
## ENSG00000285294.1 ENSG00000285294.1 LINC00842 56.07101 -4.059909 1
## ENSG00000260537.2 ENSG00000260537.2 ENSG00000260537 141.57984 -4.460399 1
## ENSG00000236362.9 ENSG00000236362.9 GAGE12F 2533.10619 -4.492739 1
## ENSG00000088340.18 ENSG00000088340.18 FER1L4 386.19624 -4.982070 1
## ENSG00000291034.1 ENSG00000291034.1 ENSG00000291034 130.23387 -5.283913 1
# Save results table to a text file
output_folder <- file.path(base_folder,"results","s04_differential_gene_expression")
dir.create(output_folder)
results_file <- file.path(output_folder,"DESeq2_FDR_0.05_2x_change_genes.txt")
write.table(result.df, file=results_file, quote=FALSE, sep="\t", row.names = FALSE, col.names = TRUE)
# Clean-up
rm(excluded_genes, results_file, output_folder, gene_names.df)
We will use EnhancedVolcano package to make the volcano plot:
library(EnhancedVolcano)
#?EnhancedVolcano
EnhancedVolcano(result.df,
lab = result.df$gene_name,
x = "log2FoldChange",
y = "padj",
title="DESeq2 results",
subtitle="Enhanced Volcano Plot")
This chunk of code illustrates how to use the pheatmap function for advanced heatmap plotting.
We will use log-transformed TPM values from tximeta. Normalized abundances calculated by VSD or Rlog could also be used.
# Library for advanced heatmap plotting
library(pheatmap)
#?pheatmap
# Library to make green-red palette
library(RColorBrewer)
# Get log-transformed TPM values from tximeta
# +1 in case if some expressions are 0
log_expr_tpm.mx <- log2(assay(dds,"abundance") + 1)
# Select (arbitrary) cut-offs for top genes
log_fc_cutoff <- 10
adj_p_cutoff <- 1e-5
# Extract results for top genes
top_genes.df <- result.df %>%
filter(abs(log2FoldChange) > log_fc_cutoff, padj < adj_p_cutoff)
dim(top_genes.df)
## [1] 31 5
# Print top 10 genes
top_genes.df %>%
arrange(padj) %>%
head(n=10)
## gene_id gene_name baseMean log2FoldChange padj
## ENSG00000130294.18 ENSG00000130294.18 KIF1A 42564.2922 11.14360 2.609015e-57
## ENSG00000136155.17 ENSG00000136155.17 SCEL 1775.8663 -10.13023 2.637792e-20
## ENSG00000185666.16 ENSG00000185666.16 SYN3 1161.9712 10.83813 4.818606e-17
## ENSG00000153292.16 ENSG00000153292.16 ADGRF1 10662.7983 -11.25918 2.508326e-16
## ENSG00000286190.2 ENSG00000286190.2 ENSG00000286190 1026.7065 12.56927 1.215160e-15
## ENSG00000158825.6 ENSG00000158825.6 CDA 1308.2019 -10.21058 3.274156e-14
## ENSG00000163873.10 ENSG00000163873.10 GRIK3 1001.4176 10.96557 5.768153e-11
## ENSG00000132932.19 ENSG00000132932.19 ATP8A2 4731.8992 10.52145 5.099518e-10
## ENSG00000196132.14 ENSG00000196132.14 MYT1 3187.1319 11.05015 9.609516e-10
## ENSG00000259495.3 ENSG00000259495.3 ARNT2-DT 986.7009 10.49658 1.676677e-09
# Get log-transformed expression values for the top genes
top_genes_exprs.mx <- log_expr_tpm.mx[top_genes.df$gene_id, ]
# Change gene IDs to gene names
rownames(top_genes_exprs.mx) <- top_genes.df$gene_name
rownames(top_genes.df) <- top_genes.df$gene_name
# Make palette of 99 colours for heatmap
gbr_99 <- colorRampPalette(c("green", "black", "red"))(n = 99)
# Data frame with condition bar (used by pheatmap)
cases.df <- as.data.frame(colData(dds))
cases.df <- cases.df %>% select(condition)
# Prepare heatmap title
num_genes <- nrow(top_genes.df)
main <- paste(num_genes, "genes with log2(FC) >", log_fc_cutoff, "and adj.P <", adj_p_cutoff, "\n")
# Plot heatmap
pheatmap(
mat = top_genes_exprs.mx, # matrix with data
color=gbr_99, # user-defined palette
scale = "row",
annotation_col = cases.df,
border_color = NA, # don't draw borders in heatmap
main = main)
# Clean-up
rm(log_expr_tpm.mx, log_fc_cutoff, adj_p_cutoff, main, cases.df, gbr_99, top_genes_exprs.mx, num_genes, result.df)
Finally, DESeq2 provides plotCounts function for plotting expressions of individual genes.
gene_id <- "ENSG00000130294.18"
gene_name <- top_genes.df[top_genes.df$gene_id==gene_id,"gene_name"]
plotCounts(dds, gene=gene_id, main=gene_name)
gene_id <- "ENSG00000136155.17"
gene_name <- top_genes.df[top_genes.df$gene_id==gene_id,"gene_name"]
plotCounts(dds, gene=gene_id, main=gene_name)
rm(gene_id, gene_name, top_genes.df)
# save image
#save.image(file.path(base_folder, "scripts","s04_differential_gene_expression","DESeq2_analysis.RData"))
save.image("DESeq2_analysis.RData")
# List objects
ls()
## [1] "base_folder" "dds" "FDR_0.05_2x_change" "FDR_0.1_any_FC"
# List libraries etc
# (for reproducible coding/research)
sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 pheatmap_1.0.12 EnhancedVolcano_1.20.0 ggrepel_0.9.5 ggplot2_3.5.1 DESeq2_1.42.1 dplyr_1.1.4 rtracklayer_1.62.0 org.Hs.eg.db_3.18.0 SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0 matrixStats_1.3.0 GenomicFeatures_1.54.4 AnnotationDbi_1.64.1 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 tximeta_1.20.3
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7 biomaRt_2.58.2 rlang_1.1.1 magrittr_2.0.3 compiler_4.3.1 RSQLite_2.3.6 png_0.1-8 vctrs_0.6.5 stringr_1.5.1 ProtGenerics_1.34.0 pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1 dbplyr_2.5.0 XVector_0.42.0 labeling_0.4.3 utf8_1.2.4 Rsamtools_2.18.0 promises_1.3.0 rmarkdown_2.26 tzdb_0.4.0 purrr_1.0.2 bit_4.0.5 xfun_0.43 zlibbioc_1.48.2 cachem_1.0.8 jsonlite_1.8.8 progress_1.2.3 blob_1.2.4 highr_0.10 later_1.3.2 DelayedArray_0.28.0
## [34] BiocParallel_1.36.0 interactiveDisplayBase_1.40.0 parallel_4.3.1 prettyunits_1.2.0 R6_2.5.1 bslib_0.7.0 stringi_1.8.3 jquerylib_0.1.4 Rcpp_1.0.12 knitr_1.46 readr_2.1.5 httpuv_1.6.15 Matrix_1.6-0 tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5 yaml_2.3.8 codetools_0.2-20 curl_5.2.1 lattice_0.22-6 tibble_3.2.1 withr_3.0.0 shiny_1.8.1.1 KEGGREST_1.42.0 evaluate_0.23 BiocFileCache_2.10.2 xml2_1.3.6 Biostrings_2.70.3 pillar_1.9.0 BiocManager_1.30.22 filelock_1.0.3 generics_0.1.3 vroom_1.6.5
## [67] RCurl_1.98-1.14 BiocVersion_3.18.1 ensembldb_2.26.1 hms_1.1.3 munsell_0.5.1 scales_1.3.0 xtable_1.8-4 glue_1.7.0 lazyeval_0.2.2 tools_4.3.1 AnnotationHub_3.10.1 BiocIO_1.12.0 locfit_1.5-9.9 GenomicAlignments_1.38.2 XML_3.99-0.16.1 grid_4.3.1 colorspace_2.1-0 GenomeInfoDbData_1.2.11 restfulr_0.0.15 cli_3.6.1 rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.2.1 AnnotationFilter_1.26.0 gtable_0.3.5 sass_0.4.9 digest_0.6.33 SparseArray_1.2.4 tximport_1.30.0 farver_2.1.1 rjson_0.2.21 memoise_2.0.1 htmltools_0.5.8
## [100] lifecycle_1.0.4 httr_1.4.7 mime_0.12 bit64_4.0.5