We read the matrix and perform normalization.
m = read.table("https://jokergoo.github.io/cola_examples/TCGA_GBM/unifiedScaled.txt",
header = TRUE, row.names = 1, check.names = FALSE)
m = as.matrix(m)
subtype = read.table("https://jokergoo.github.io/cola_examples/TCGA_GBM/TCGA_unified_CORE_ClaNC840.txt",
sep = "\t", header = TRUE, check.names = FALSE, stringsAsFactors = FALSE)
subtype = structure(unlist(subtype[1, -(1:2)]), names = colnames(subtype)[-(1:2)])
subtype_col = structure(seq_len(4), names = unique(subtype))
m = m[, names(subtype)]
m = adjust_matrix(m)
cn = colnames(m)
rn = rownames(m)
m = normalize.quantiles(m)
colnames(m) = cn
rownames(m) = rn
First we apply standard consensus partitioning analysis with “ATC” as the top-value method and “skmeans” as partitioning method.
res = consensus_partition(m, top_value_method = "ATC", partition_method = "skmeans",
cores = 4, anno = subtype, anno_col = subtype_col)
In the following plot, cola suggests 5 as the best number of subgroups, but we select 4 as the best k because it gives more stable classification.
## The best k suggested by this function might not reflect the real
## subgroups in the data (especially when you expect a large best k). It
## is recommended to directly look at the plots from
## select_partition_number() or other related plotting functions.
## [1] 5
## attr(,"optional")
## [1] 2 3 4
Figure S6.1. Select the best number of groups.
The signature heatmap with 4 subgroups.
get_signatures(res, k = 4)
Figure S6.2. Signature heatmap of CP classification with 4 subgroups.
Next we apply hierarchical consensus partitioning (HCP) on the same matrix:
rh = hierarchical_partition(m, cores = 4, anno = subtype, anno_col = subtype_col)
The subgroup hierarchy:
Figure S6.3. Subgroup hierarchy under HCP.
And the signature heatmap under HCP classification:
Figure S6.4. Signature heatmap under HCP classification.
The statistics on each node:
df = node_info(rh)
## id best_method depth best_k n_columns n_signatures p_signatures is_leaf
## 1 0 ATC:skmeans 1 4 173 9686 0.8596787077 FALSE
## 2 01 ATC:skmeans 2 3 52 4051 0.3595455756 FALSE
## 3 011 ATC:skmeans 3 2 17 55 0.0048815124 TRUE
## 4 012 ATC:skmeans 3 2 21 625 0.0554717316 FALSE
## 5 0121 not applied 4 NA 10 NA NA TRUE
## 6 0122 not applied 4 NA 11 NA NA TRUE
## 7 013 ATC:skmeans 3 2 14 8 0.0007100382 TRUE
## 8 02 ATC:skmeans 2 3 66 4781 0.4243365581 FALSE
## 9 021 ATC:skmeans 3 3 25 806 0.0715363451 FALSE
## 10 0211 not applied 4 NA 11 NA NA TRUE
## 11 0212 not applied 4 NA 9 NA NA TRUE
## 12 0213 not applied 4 NA 5 NA NA TRUE
## 13 022 ATC:skmeans 3 2 24 666 0.0591106772 FALSE
## 14 0221 ATC:skmeans 4 2 14 30 0.0026626431 TRUE
## 15 0222 not applied 4 NA 10 NA NA TRUE
## 16 023 ATC:skmeans 3 2 17 227 0.0201473329 TRUE
## 17 03 ATC:skmeans 2 4 24 1376 0.1221265643 FALSE
## 18 031 not applied 3 NA 7 NA NA TRUE
## 19 032 not applied 3 NA 6 NA NA TRUE
## 20 033 not applied 3 NA 6 NA NA TRUE
## 21 034 not applied 3 NA 5 NA NA TRUE
## 22 04 ATC:skmeans 2 2 31 1954 0.1734268217 FALSE
## 23 041 ATC:skmeans 3 2 18 257 0.0228099760 TRUE
## 24 042 ATC:skmeans 3 2 13 175 0.0155320848 TRUE
And the statistics on non-leaf nodes:
df[!df$is_leaf, ]
## id best_method depth best_k n_columns n_signatures p_signatures is_leaf
## 1 0 ATC:skmeans 1 4 173 9686 0.85967871 FALSE
## 2 01 ATC:skmeans 2 3 52 4051 0.35954558 FALSE
## 4 012 ATC:skmeans 3 2 21 625 0.05547173 FALSE
## 8 02 ATC:skmeans 2 3 66 4781 0.42433656 FALSE
## 9 021 ATC:skmeans 3 3 25 806 0.07153635 FALSE
## 13 022 ATC:skmeans 3 2 24 666 0.05911068 FALSE
## 17 03 ATC:skmeans 2 4 24 1376 0.12212656 FALSE
## 22 04 ATC:skmeans 2 2 31 1954 0.17342682 FALSE
