Internal Function to Get Mutations/CNA/Fusion By Sample ID
Source:R/genomics_by_sample.R
dot-get_data_by_sample.Rd
Internal Function to Get Mutations/CNA/Fusion By Sample ID
Usage
.get_data_by_sample(
sample_id = NULL,
study_id = NULL,
molecular_profile_id = NULL,
sample_study_pairs = NULL,
data_type = c("mutation", "cna", "fusion", "structural_variant", "segment"),
genes = NULL,
panel = NULL,
add_hugo = TRUE,
base_url = NULL
)
Arguments
- sample_id
a vector of sample IDs (character)
- study_id
A string indicating the study ID from which to pull data. If no study ID, will guess the study ID based on your URL and inform. Only 1 study ID can be passed. If mutations/cna from more than 1 study needed, see
sample_study_pairs
- molecular_profile_id
A string indicating the molecular profile ID from which to pull data. If ID supplied, will guess the molecular profile ID based on the study ID. Only 1 molecular profile ID can be passed. If mutations from more than 1 study needed, see
sample_study_pairs
- sample_study_pairs
A dataframe with columns:
sample_id
,study_id
andmolecular_profile_id
(optional). Variations in capitalization of column names are accepted. This can be used in place ofsample_id
,study_id
,molecular_profile_id
arguments above if you need to pull samples from several different studies at once. If passed this will take overwritesample_id
,study_id
,molecular_profile_id
if also passed.- data_type
specify what type of data to return. Options are
mutations
,cna
,fusion
orstructural_variant
(same asfusion
).- genes
A vector of Entrez ids or Hugo symbols. If Hugo symbols are supplied, they will be converted to entrez ids using the
get_entrez_id()
function. Ifpanel
andgenes
are both supplied, genes from both arguments will be returned. If both are NULL (default), it will return gene results for all available genomic data for that sample.- panel
One or more panel IDs to query (e.g. 'IMPACT468'). If
panel
andgenes
are both supplied, genes from both arguments will be returned. If both are NULL (default), it will return gene results for all available genomic data for that sample.- add_hugo
Logical indicating whether
HugoGeneSymbol
should be added to your resulting data frame, if not already present in raw API results. Argument isTRUE
by default. IfFALSE
, results will be returned as is (i.e. any existing Hugo Symbol columns in raw results will not be removed).- base_url
The database URL to query If
NULL
will default to URL set withset_cbioportal_db(<your_db>)
Examples
# \dontrun{
set_cbioportal_db("public")
#> ✔ You are successfully connected!
#> ✔ base_url for this R session is now set to "www.cbioportal.org/api"
.get_data_by_sample(sample_id = c("TCGA-OR-A5J2-01","TCGA-OR-A5J6-01"),
study_id = "acc_tcga", data_type = "mutation")
#> The following parameters were used in query:
#> Study ID: "acc_tcga"
#> Molecular Profile ID: "acc_tcga_mutations"
#> Genes: "All available genes"
#> # A tibble: 173 × 28
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 ZFPM1 161882 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 2 ZNF787 126208 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 3 PODXL 5420 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 4 CCDC102A 92922 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 5 TVP23C 201158 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 6 ZNF628 89887 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 7 TBP 6908 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 8 SEMA5B 54437 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 9 CELSR2 1952 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> 10 MUC5B 727897 VENHQS1PUi1BNUoyLTAxOmFjY190Y2dh VENHQS1PUi1BNUo…
#> # ℹ 163 more rows
#> # ℹ 24 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, center <chr>, mutationStatus <chr>,
#> # validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>,
#> # endPosition <int>, referenceAllele <chr>, proteinChange <chr>,
#> # mutationType <chr>, ncbiBuild <chr>, variantType <chr>, keyword <chr>, …
.get_data_by_sample(sample_id = c("DS-sig-010-P2"),
molecular_profile_id = "blca_plasmacytoid_mskcc_2016_cna", data_type = "cna")
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016"
#> Molecular Profile ID: "blca_plasmacytoid_mskcc_2016_cna"
#> Genes: "All available genes"
#> # A tibble: 2 × 9
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 YAP1 10413 RFMtc2lnLTAxMC1QMjpibGNhX3BsYXNt… RFMtc2lnLTAxMDp…
#> 2 CD79B 974 RFMtc2lnLTAxMC1QMjpibGNhX3BsYXNt… RFMtc2lnLTAxMDp…
#> # ℹ 5 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, alteration <int>
.get_data_by_sample(sample_id = c("P-0002146-T01-IM3"),
study_id = "blca_plasmacytoid_mskcc_2016", data_type = "fusion")
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016"
#> Molecular Profile ID: "blca_plasmacytoid_mskcc_2016_structural_variants"
#> Genes: "All available genes"
#> # A tibble: 1 × 44
#> uniqueSampleKey uniquePatientKey molecularProfileId sampleId patientId studyId
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 UC0wMDAyMTQ2LV… UC0wMDAyMTQ2OmJ… blca_plasmacytoid… P-00021… P-0002146 blca_p…
#> # ℹ 38 more variables: site1EntrezGeneId <int>, site1HugoSymbol <chr>,
#> # site1EnsemblTranscriptId <chr>, site1Chromosome <chr>, site1Position <int>,
#> # site1Contig <chr>, site1Region <chr>, site1RegionNumber <int>,
#> # site1Description <chr>, site2EntrezGeneId <int>, site2HugoSymbol <chr>,
#> # site2EnsemblTranscriptId <chr>, site2Chromosome <chr>, site2Position <int>,
#> # site2Contig <chr>, site2Region <chr>, site2RegionNumber <int>,
#> # site2Description <chr>, site2EffectOnFrame <chr>, ncbiBuild <chr>, …
df_pairs <- data.frame(
"sample_id" = c("s_C_36924L_P001_d",
"s_C_03LNU8_P001_d"),
"study_id" = c("prad_msk_2019"))
.get_data_by_sample(sample_study_pairs = df_pairs, data_type = "mutation")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "prad_msk_2019"
#> Molecular Profile ID: prad_msk_2019_mutations
#> Genes: "All available genes"
#> # A tibble: 1 × 28
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 TP53 7157 c19DXzAzTE5VOF9QMDAxX2Q6cHJhZF9t… cF9DXzAzTE5VODp…
#> # ℹ 24 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, center <chr>, mutationStatus <chr>,
#> # validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>,
#> # endPosition <int>, referenceAllele <chr>, proteinChange <chr>,
#> # mutationType <chr>, ncbiBuild <chr>, variantType <chr>, keyword <chr>,
#> # chr <chr>, variantAllele <chr>, refseqMrnaId <chr>, …
.get_data_by_sample(sample_study_pairs = df_pairs, genes = 7157, data_type = "mutation")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "prad_msk_2019"
#> Molecular Profile ID: prad_msk_2019_mutations
#> Genes: 7157
#> # A tibble: 1 × 28
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 TP53 7157 c19DXzAzTE5VOF9QMDAxX2Q6cHJhZF9t… cF9DXzAzTE5VODp…
#> # ℹ 24 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, center <chr>, mutationStatus <chr>,
#> # validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>,
#> # endPosition <int>, referenceAllele <chr>, proteinChange <chr>,
#> # mutationType <chr>, ncbiBuild <chr>, variantType <chr>, keyword <chr>,
#> # chr <chr>, variantAllele <chr>, refseqMrnaId <chr>, …
.get_data_by_sample(sample_study_pairs = df_pairs, data_type = "cna")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "prad_msk_2019"
#> Molecular Profile ID: prad_msk_2019_cna
#> Genes: "All available genes"
#> # A tibble: 1 × 9
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 PTEN 5728 c19DXzM2OTI0TF9QMDAxX2Q6cHJhZF9t… cF9DXzM2OTI0TDp…
#> # ℹ 5 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, alteration <int>
.get_data_by_sample(sample_study_pairs = df_pairs, data_type = "fusion")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "prad_msk_2019"
#> Molecular Profile ID: prad_msk_2019_structural_variants
#> Genes: "All available genes"
#> # A tibble: 0 × 0
df_pairs2 <- data.frame(
"sample_id" = c("P-0002146-T01-IM3", "s_C_CAUWT7_P001_d"),
"study_id" = c("blca_plasmacytoid_mskcc_2016", "prad_msk_2019"))
.get_data_by_sample(sample_study_pairs = df_pairs2, data_type = "mutation")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016" and "prad_msk_2019"
#> Molecular Profile ID: blca_plasmacytoid_mskcc_2016_mutations and
#> prad_msk_2019_mutations
#> Genes: "All available genes"
#> # A tibble: 7 × 28
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 TERT 7015 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 2 NOTCH4 4855 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 3 TP53 7157 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 4 BLM 641 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 5 TP53 7157 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 6 CDKN1A 1026 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 7 RB1 5925 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> # ℹ 24 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, center <chr>, mutationStatus <chr>,
#> # validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>,
#> # endPosition <int>, referenceAllele <chr>, proteinChange <chr>,
#> # mutationType <chr>, ncbiBuild <chr>, variantType <chr>, chr <chr>,
#> # variantAllele <chr>, refseqMrnaId <chr>, proteinPosStart <int>, …
.get_data_by_sample(sample_study_pairs = df_pairs2, genes = 7157)
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016" and "prad_msk_2019"
#> Molecular Profile ID: blca_plasmacytoid_mskcc_2016_mutations and
#> prad_msk_2019_mutations
#> Genes: 7157
#> # A tibble: 2 × 28
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey
#> <chr> <int> <chr> <chr>
#> 1 TP53 7157 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> 2 TP53 7157 UC0wMDAyMTQ2LVQwMS1JTTM6YmxjYV9w… UC0wMDAyMTQ2OmJ…
#> # ℹ 24 more variables: molecularProfileId <chr>, sampleId <chr>,
#> # patientId <chr>, studyId <chr>, center <chr>, mutationStatus <chr>,
#> # validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>,
#> # endPosition <int>, referenceAllele <chr>, proteinChange <chr>,
#> # mutationType <chr>, ncbiBuild <chr>, variantType <chr>, keyword <chr>,
#> # chr <chr>, variantAllele <chr>, refseqMrnaId <chr>, …
.get_data_by_sample(sample_study_pairs = df_pairs2, data_type = "cna")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016" and "prad_msk_2019"
#> Molecular Profile ID: blca_plasmacytoid_mskcc_2016_cna and prad_msk_2019_cna
#> Genes: "All available genes"
#> NULL
.get_data_by_sample(sample_study_pairs = df_pairs2, data_type = "fusion")
#> Joining with `by = join_by(study_id)`
#> The following parameters were used in query:
#> Study ID: "blca_plasmacytoid_mskcc_2016" and "prad_msk_2019"
#> Molecular Profile ID: blca_plasmacytoid_mskcc_2016_structural_variants and
#> prad_msk_2019_structural_variants
#> Genes: "All available genes"
#> # A tibble: 2 × 44
#> uniqueSampleKey uniquePatientKey molecularProfileId sampleId patientId studyId
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 UC0wMDAyMTQ2LV… UC0wMDAyMTQ2OmJ… blca_plasmacytoid… P-00021… P-0002146 blca_p…
#> 2 c19DX0NBVVdUN1… cF9DX0NBVVdUNzp… prad_msk_2019_str… s_C_CAU… p_C_CAUW… prad_m…
#> # ℹ 38 more variables: site1EntrezGeneId <int>, site1HugoSymbol <chr>,
#> # site1EnsemblTranscriptId <chr>, site1Chromosome <chr>, site1Position <int>,
#> # site1Contig <chr>, site1Region <chr>, site1RegionNumber <int>,
#> # site1Description <chr>, site2EntrezGeneId <int>, site2HugoSymbol <chr>,
#> # site2EnsemblTranscriptId <chr>, site2Chromosome <chr>, site2Position <int>,
#> # site2Contig <chr>, site2Region <chr>, site2RegionNumber <int>,
#> # site2Description <chr>, site2EffectOnFrame <chr>, ncbiBuild <chr>, …
# }