scRank icon indicating copy to clipboard operation
scRank copied to clipboard

Request for Comparison Scripts (scRank vs. beyondcell, scDEAL) from Cell Reports Medicine Paper

Open luisherfurth opened this issue 9 months ago • 2 comments

Dear scRank Authors,

Thank you for developing scRank and sharing it with the community. I have been reading your recent paper "scRank infers drug-responsive cell types from untreated scRNA-seq data using a target-perturbed gene regulatory network" (Li et al., 2024, Cell Reports Medicine 5, 101568) and found it very insightful.

In the paper, specifically in Figure 3E and Figure S7, you present comparisons of scRank's performance against other drug response prediction tools like beyondcell and scDEAL on specific cancer cell line datasets (Melanoma, SCC47, JHU006).

To better understand how these comparisons were performed and potentially reproduce or adapt the comparison methodology for my own work, would it be possible for you to share the specific scripts or notebooks used to:

  1. Run beyondcell (including the different strategies mentioned: beyondcell, bc_SwitchBinary, bc_BCS) on the datasets?
  2. Run scDEAL (including scDEAL and scDEAL_binary) on the datasets?
  3. Process the outputs from all tools (scRank, beyondcell, scDEAL) to generate the comparison metrics and plots shown in Figure 3E and Figure S7?

Having access to these comparison scripts would be extremely helpful for ensuring reproducibility and understanding the precise parameters and workflows used for the benchmarking presented in the paper.

Thank you for considering my request!

luisherfurth avatar Apr 28 '25 12:04 luisherfurth

@Lee0498

luisherfurth avatar May 13 '25 11:05 luisherfurth

Hi, Luis

Sorry for the delayed response. For your questions:

  1. beyondcell (v2.2.0) We ran beyondcell following the tutorial’s default settings on each Seurat object (Melanoma, SCC47, JHU006). For example, on the scc47 object:
EpisenHigh = 2
EpisenLow = 1
# 1. Run beyondcell (after creating bc via the tutorial workflow)
#    see: https://github.com/cnio-bu/beyondcell/blob/master/tutorial/analysis_workflow/README.md

# 2. Extract scaled BCS and switch point
drug_ids    <- FindDrugs(bc, "Afatinib")$IDs
Afatinib_BSC <- bc@scaled[drug_ids, ] %>% colMeans()
sw_point     <- bc@scaled_scores[drug_ids]

# 3. Add BSC to Seurat metadata
scc47 <- AddMetaData(scc47,
    metadata = Afatinib_BSC,
    col.name = 'Afatinib_BSC')

# 4a. “beyondcell” (median BSC per cluster)
[email protected] %>% 
group_by(seurat_clusters) %>% 
summarise(avgBSC = median(Afatinib_BSC)) %>% 
mutate(
    rank = rank(avgBSC),
    rank = rank / dplyr::n(),
    rank = scales::rescale(rank, c(0, 1)) * 100,
  ) %>%
filter(seurat_clusters == EpisenHigh) %>%
pull(rank)

# 4b. “bc_BCS” (cell‑level percentile then median per cluster)
[email protected] %>%
arrange(Afatinib_BSC) %>%
mutate(`top rank%` = (row_number() / n()) * 100) %>%
group_by(seurat_clusters) %>%
summarise(avgRank = median(`top rank%`)) %>%
filter(seurat_clusters == EpisenHigh) %>%
pull(avgRank)

# 4c. “bc_SwitchBinary” (binarized by switch point)
[email protected] %>%
mutate(Afatinib_BSC = as.vector(Afatinib_BSC),
                   binary_sensitivity = case_when(Afatinib_BSC >= sw_point ~ 1,
                                                  Afatinib_BSC < sw_point ~ 0)) %>% 
  group_by(seurat_clusters) %>% 
summarise(mean_binary = mean(binary_sensitivity)) %>%
filter(seurat_clusters == EpisenHigh) %>%
pull(mean_binary)
  1. scDEAL
# 1. Running the models

# Train bulk model (bash code)
python bulkmodel.py \
  --drug "AFATINIB" --data_name "SCC47" \
  --printgene "F" --checkpoint "False"

# Run single-cell inference
python scmodel.py \
  --sc_data "SCC47" --drug "AFATINIB" \
  --checkpoint "False" --bottleneck 32 \
  --min_g 0 --min_c 0 --min_n_genes 0 --max_n_genes 30000
# 2. Loading predictions (r code)
# The output CSV (SCC47_AFATINIB_sens_label.csv) in scDEAL/save/adata/ 

pred <- rio::import("scDEAL/save/adata/SCC47_AFATINIB_sens_label.csv") %>%
  tibble::column_to_rownames("V1")

# 3. Add to Seurat
scc47 <- Seurat::AddMetaData(scc47,
  metadata = pred,
  col.name = c("scDEAL_score"))

# 4a. “scDEAL” median score per cluster
[email protected]  %>%
  group_by(seurat_clusters) %>% 
  summarise(avgScore = median(Afatinib_scDEAL_sens_preds)) %>%
  mutate(
    rank = rank(avgScore),
    rank = rank / dplyr::n(),
    rank = scales::rescale(rank, c(0, 1)) * 100,
  ) %>%
  filter(seurat_clusters == EpisenHigh) %>%
  pull(rank)

# 4b. “scDEAL_score” cell‑level percentile then median per cluster
[email protected] %>% 
  arrange(Afatinib_scDEAL_sens_preds) %>%
  mutate(`top rank%` = (row_number() / n()) * 100) %>% 
  group_by(seurat_clusters) %>% summarise(avgRank = median(`top rank%`)) %>%
  filter(seurat_clusters == EpisenHigh) %>%
  pull(avgRank)
  1. Process output

# Add all result into a dataframe (result) like :

#    Dataset                  Method        Drug                     MOA  TopRank% shape

ggplot() + 
  geom_boxplot(data = result, mapping = aes(x = `TopRank%`, y = Method), outliers = F)+
  geom_jitter(data=result, aes(x=`TopRank%`, y=Method, color=MOA, shape = shape), size=3.0,width = -0.01,height = 0.3) +
  scale_shape_identity() +
  theme_bw() +
  theme(
    aspect.ratio = 9/16,
    axis.text = element_text(size = 15),
    text = element_text(size = 15)
  ) +
  scale_x_continuous(breaks=c(0,0.25,0.5,0.75,1)) +
  xlim(0,1) +
  scale_y_discrete(position = "right") +
  scale_color_manual(values = c("#629934","#99CA60","#1771CD","#19227E","#6639B6","#4A1A8C","#B0D9FA","#BBC1E5","#C9BAE5","#6A4A3F","#678392","#777777")) +
  xlab('Top Rank %')

Lee0498 avatar May 14 '25 02:05 Lee0498

Hi @Lee0498 ,

Thank you so much for sending over the comparison scripts with different tools! That's incredibly helpful and much appreciated.

I have a follow-up question regarding the interpretation of the perturbation scores. You mentioned in a different issue that these scores cannot be directly interpreted between experiments, which I understand to mean they are relative within a given experiment.

My question is, are these scores totally relative, or do they hold some absolute meaning within a single experiment? For instance, if I have a dataset consisting entirely of cells that are resistant to a drug, the scRank tool will still output a ranking based on its calculated scores. In such a scenario, if I don't have access to ground truth resistance values, could I interpret the perturbation scores in a way that suggests most cells in that dataset are resistant (e.g., if all scores are very low or difference between the perturbation scores is very small, indicating a uniform response)? Or are they strictly about relative ranking, making it impossible to infer a general state of resistance/susceptibility across the entire population without an external reference?

Any clarification on this would be greatly appreciated!

Thanks again, Luis

luisherfurth avatar Jun 29 '25 13:06 luisherfurth

Hi @luisherfurth ,

Thanks for raising this important point, it’s a valuable perspective.

To clarify: When we designed the perturbation score in scRank, our primary goal was to rank cells (or cell types) to identify the most responsive populations to a given perturbation. Although we haven’t really tested how well the scores reflect the overall responsiveness of the cells, your point makes sense and fits with what we’ve observed. For example, in our cell line experiments:

  • Resistant cell lines tend to have perturbation scores around 1e-16 to 1e-18,
  • While sensitive ones often show higher scores, typically around 1e-6.

This suggests that the absolute magnitude of the score may carry signal of overall responsiveness.

However, it’s important to interpret this carefully:

  • The perturbation score fundamentally reflects a relative impact within a given network context, and its absolute value is influenced by the network structure and properties of the perturbed gene set.
  • In some datasets (e.g., CTRP), even the definition of “resistant” vs. “sensitive” is relative, often determined by metrics such as AUC.

So yes — in your scenario, scores that are all similar and low might suggest uniformly low responsiveness across the entire population, but any interpretation beyond relative ranking should be made with caution.

Lee0498 avatar Jun 30 '25 08:06 Lee0498

Hi @Lee0498,

First off, thank you for your quick and very helpful responses to my previous questions.

I've been working with scRank on the Kinker et al. dataset (specifically JHU006 and SCC47 from GSE157220), but my results are significantly different from those you've shared.

To help me understand the discrepancy, could you please clarify the following:

  1. Preprocessing: What were the specific preprocessing steps you applied to the CPM data before running it through scRank?

  2. Drug-Target Inputs: For Afatinib, Gefitinib, and Sorafenib, which gene targets did you use? I noticed the database in the repository lists up to 14 potential targets for some drugs and was unsure which ones to select.

Thanks in advance for your help!

Best, Luis

luisherfurth avatar Jul 07 '25 14:07 luisherfurth

Hi @luisherfurth,

Below is a detailed breakdown of how to reproduce our results on the Kinker et al. dataset,

  1. Preprocessing We followed a standard Seurat preprocessing pipeline with slight modifications in clustering resolution and dimension numbers to better separate EpiSen-high and EpiSen-low cells into distinct clusters:
# For JHU006
JHU006 <- NormalizeData(JHU006, normalization.method = "LogNormalize", scale.factor = 10000)
JHU006 <- FindVariableFeatures(JHU006, selection.method = "vst", nfeatures = 2000)
JHU006 <- ScaleData(JHU006)
JHU006 <- RunPCA(JHU006, features = VariableFeatures(JHU006))
JHU006 <- FindNeighbors(JHU006, dims = 1:50)
JHU006 <- FindClusters(JHU006, resolution = 0.8)
JHU006 <- RunUMAP(JHU006, dims = 1:50)

# For SCC47
scc47 <- NormalizeData(scc47, normalization.method = "LogNormalize", scale.factor = 10000)
scc47 <- FindVariableFeatures(scc47, selection.method = "vst", nfeatures = 7000)
scc47 <- ScaleData(scc47)
scc47 <- RunPCA(scc47, features = VariableFeatures(scc47))
scc47 <- FindNeighbors(scc47, dims = 1:20)
scc47 <- FindClusters(scc47, resolution = 0.7)
scc47 <- RunUMAP(scc47, dims = 1:20)
  1. Running scRank
# JHU006
JHU006_rank <- scRank::CreateScRank(JHU006, cell_type = 'seurat_clusters', species = 'human', target = 'EGFR')
JHU006_rank <- scRank::Constr_net(JHU006_rank,min_cells = 50) # Tuned due to smaller cell numbers
JHU006_rank <- scRank::rank_celltype(JHU006_rank)

# SCC47
scc47_rank <- scRank::CreateScRank(scc47, cell_type = 'seurat_clusters', species = 'human', target = 'EGFR')
scc47_rank <- scRank::Constr_net(scc47_rank, select_ratio = 0.9)  # Tuned due to smaller cell numbers
scc47_rank <- scRank::rank_celltype(scc47_rank)
  1. Drug Target Inputs We input drug targets based on mechanistic knowledge of each drug and biological context of cell lines.
  • For Afatinib and Gefitinib, we used EGFR as the target for both cell lines.
  • For Sorafenib, we used SLC7A11 for SCC47, based on ferroptosis-related sensitivity reported in EpiSen-low cells (Kinker et al.); and its canonical kinase target RAF1 for JHU006, where ferroptosis was not reported.

Lee0498 avatar Jul 14 '25 11:07 Lee0498