• Frequently Asked Questions

    What is the intention of CancerResource?

    CancerResource is a comprehensive knowledgebase for drug-target relationships related to cancer as well as for supporting information or experimental data. Drug-target relationships are determined by a manually curated text-mining of publicly available literature. A couple of resources that provide similar data with slightly different background and intention are mined a. for comparison with the CancerResource text-mining and b. for integration into this knowledgebase. Thus, CancerResource reflects the actual knowledge about this matter in an integrative way. To strenghten the literature mining, which is in its result a compilation of direct knowledge of drug-target relationships, interactions that are known in the PDB is added to that part.

    To improve the functionality of CancerResource as an oncologic web exploration tool, several kinds of experimental data are added. Indirect drug interactions on target genes that are known from whole-genome explorations by microarray technology or on cancer cell lines, characterized as cellular fingerprints, can be immediately compared with the text-mining results for direct interactions.

    To cover this attempt, CancerResource provides a set of exploration tools or access points to data. CancerResource provides information for or on lead compound design, for drug action on genes, for detecting novel target genes, for estimating or predicting activities of compounds on cancer cell lines. A big point is the detection and visualization of the targeting of multiple genes by a particular drug as well as the targeting of a single gene by multiple drugs.

    Considering all those aspects, this oncologic web exploration tool shall allow researchers to initiate workflows in several directions. As a conclusion: this site is designed for biochemists, pharmacists, and scientists in biomedicine who are intending to develop novel drugs or who want to gather a fast overview over the field of drug-target relationships.

    top of page


    What kind of information can I find in CancerResource?

    First of all, CancerResource comprises a lot of drug-target interactions and serves as a discovery tool for such information, which can be found in the central part of any detail result page, for a drug page as well as for a target gene page. Querying a drug, all targets will be displayed; querying a potential target, all targeting drugs will be displayed.
    For a group of drugs, as well as for a group of targets, a matrix of drugs and targets will be displayed, to elucidate in the same picture the multiple targeting of a single drug and the targeting on a single gene by multiple drugs.

    The CancerResource is a comprehensive database for information about cancer-related targets in conjunction with experimental data (as gene expression, drug influence, drug influence on gene expression - differential gene expression, pathway affiliation).

    So, cellular fingerprints are the computational representation of the activity of a drug on a defined set of cancer cell lines (integration of DTP / NCI data). With this tool it is possible to elucidate relationships between the structure and the activity of a compound.

    Gene expression data of 1,821 cancer cell lines, obtained from NCI60, CCLE and CoSMIC, can be explored (by KEGG pathway affiliation or user-defined expression data). Here, the user is able to detect genes that are, in single cancer cell lines, significantly differently expressed.

    Mutation data of 2,037 cancer cell lines, obtained from NCI60, CCLE and CoSMIC, are available. A total amount of 872,658 mutations for 19,834 genes is stored in CancerResource.

    top of page

    How to use CancerResource?

    The illustrated use case gives a starting point of how to use CancerResource by uploading either external mutational data or mRNA expression values:
    1. Upload external mRNA expression data on the Cell line/Expression site by using the Query database with your own data box.
    2. You can choose between using affy probe names or HGNC gene symbols with normalized and log2 transformed expression values seperated by tabs.
    3. Please be aware, that a calculation can take up to 10 minutes, depending on the number of genes that have to be analyzed.
    4. The most similar cancer cell lines are computed. Either by Pearson correlation distance or fold changes.
    5. In this example REH cell line is calculated as the most similar cancer cell line by Pearson correlation distance.
    6. By clicking on the REH cell line link the most effective drugs of the REH cancer cell line are displayed.
    7. With regard to IC50 Patupilone is determined as the most effective drug on REH cancer cell lines.
    8. By clicking on the Patupilone link, the user is directed to Patupilone's detail site displaying information about Patupilone as well as its cellular fingerprint.
    9. The mutation profile of REH shows the 10 most affected cancer relevant genes with regard to Polyphen prediction.
    10. The heatmap illustration gives an overview of the similarity of REH cancer cell line to other cell lines of the same consortia, which can be selected on the top of the page.
    11. To compare the expression levels of the most affected cancer relevant genes copy their gene symbols into the Query database by expression data box on the Cell line/Expression site.


    top of page

    How are compound-target gene interactions for CancerResource retrieved?

    Compound-target relationships were automatically detected by own literature text mining over 19 million PubMed abstracts using our vocabularies for drugs and targets.

    The drug vocabulary was generated from compounds having a cancer-related ATC-classification via SuperDrug or if the compound and its synonymous name are in the NCI compound set. The cancer relationship of a gene was determined from annotations in cancer-related KEGG pathways and the Gene Ontology, GO. Abstracts, titles and MeSH terms were converted into a text index using the LingPipe (http://alias-i.com/lingpipe/index.html) and the Lucene software packages. Both vocabularies were searched against each indexed abstract and the result was scored by an own rule-based validation algorithm. After this automatic procedure and a subsequent ranking revealing about 8,000 publications, a manual revision of the hits followed resulting in about 900 highly significant publications of direct interactions.

    To complement the own literature mining, cancer-related drug-target interactions were collected from several established data sources such as ChEMBL, CTD, PharmGKB, TTD, DrugBank and PDB.

    top of page

    How is a drug (or compound, generally) defined?

    A drug is defined as a cancer-target-related compound, as a chemical with known cancer relevant activity, or a compound that shows to have influence in cancer development. Drugs inside this database were collected from several data sources:
    • Drugs (and compounds) that revealed from the own text mining of cancer-related literature in PubMed: Abstracts were computationally processed to find experimentally verified drugs (see above).
    • Drugs associated with target genes and a cancer disease annotation from CTD, PharmGKB, and TTD.
    • Drugs associated with target genes without a particular disease annotation from DrugBank, ChEMBL and PDB to complement the set and enable a re-positioning query.
    • Compounds that have influence on cell lines DTP / NCI
    Additionally, general resources for compounds are mined to provide backbone information on drugs as described.

    top of page

    How is a target defined?

    Targets in CancerResource are genes or proteins which are involved in the appearence and development of cancer. For CancerResource, cancer associated targets are selected from different sources:
    • Textmining; Thousands of PubMed abstracts were processed computationally to find experimentally verified drugs, targets and drug-target-relations. The abstracts were filtered by cancer relevant terms (e.g. 'antineoplastic'). All results were manually curated.
    • Target genes or proteins associated with drugs from ChEMBL, CTD, PharmGKB, TTD, DrugBank and PDB.
    Additionally, general resources are mined to provide backbone information on genes or proteins as described.

    top of page

    How is KEGG used in CancerResource?

    The KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of database resources for linking genomes to life and the environment. KEGG PATHWAY provides a collection of manually drawn pathway maps which visualize molecular interaction and reaction networks.

    In CancerResource, KEGG maps for more than 50 cancer relevant (signalling) pathways are used to picture the role of targets and drugs acting on them. Targets with annotated drug-target interactions are highlighted in yellow, and information about the drugs acting on them is given on-click.

    Expression data are inserted into pathway maps as colored icon borders if the user performed a respective search before and requested the link to the pathway map. Colored maps are retrieved via Web Service.

    top of page

    What is an Over-Representation Analysis (ORA)?

    The over-representation analysis ORA is a statistical estimation for the affection of a given KEGG pathway through a (drug) treatment.

    The number of differentially expressed genes of a expression direction (both directions are validated separately) in the pathway are compared to 'all' differentially expressed genes of that direction. Additionally, the complete numbers of all genes in the pathway and the total number of genes are going in into the calculation. The ORA utilizes the hypergeometric distribution, whereas for data points i outside of the event case the hypergeometric function HF is applied. The sum of all HF(i) reveals the p-value; a p-value lower than 0.05 is a good estimation for a significant influence (of a drug) on the pathway considered.

    top of page

    What is a Cellular Fingerprint?

    A cellular fingerprint represents the growth rates of 2,037 human cancer cell lines as reactions on the treatment with a particular compound. A boolean array is generated in the following way:
    1. A bit comparison is only possible for a pre-defined vector of cell lines
    2. GI-50 values of the 2,037 given cell lines were normalized using the z-score normalization, z = ( x - μ ) / σ
    • x = the 2,037 values for one compound
    • μ = mean of x
    • σ = standard deviation of x
    3. Each single, normalized GI-50 value is transformed to a 42 bit vector
    4. In consequence, one cellular fingerprint has a length of 2520 bits

    top of page

    What is a Tanimoto Coefficient?

    A simple count of shared features (common fragment substructures) can be a measure of chemical distance when used in some similarity coefficient. Dictionaries of predefined structural fragments, such as MDL Information Systems MACCS keys, are used to identify features contained in a molecule. The structural fragments or features that are present in the given molecule are turned ON (set as 1) and the ones that are absent are kept OFF (set as 0). Thus, for each molecule one ends up having a string containing 1s and 0s (bit string). Once the molecules have been represented by such bit-strings the Tanimoto Coefficient can be used as a measure to assess similarity.
    Lets say, we are comparing two molecules A and B. If NA is number of features (ON bits) in A, NB is the number of features (ON bits) in B, and NAB is the number of features (ON bits) common in both A and B, then, the Tanimoto Coefficient is:


    • NAB = number of "1" bits that occur in both row A and in row B
    • NA = number of "1" bits in row A
    • NB = number of "1" bits in row B
    • row A contains the fingerprint of molecule A

    Two structures with a Tanimoto Coefficient greater or equal to 0.85 (which refers to a similarity of 85%) are considered as similar enough to be able to transfer biological activities of one molecule to the other and, thus, predict toxicities, pathways the molecule might participate in, and potential binding partners.

    Note that OFF bits do not determine the similarity. In other words, if some molecular features are absent in both molecules then that is not taken as an indication of similarity between the two.

    top of page

    What is a mean graph?

    A mean graph is the graphical representation of the vector of the cancer cell lines (vertically) and their normalized GI-50 value (horizontally; see also cellular fingerprint). The single cell lines are indicated in the middle of the plot whereas the GI-50 value for each cell line is given as a green bar and as difference to the mean value. The mean value is given as the Z-score of the GI50 values in units of the negative decadic logarithm (CellMiner), as the Z-score of the GI50 values in units of the decadic logarith. (CCLE) or as the Z-score of the IC50 values in units of the decadic logarithm.


    top of page

    What are IC-50 and GI-50 values?

    The half maximal inhibitory concentration (IC-50 value) is a measure of the effectiveness of a compound in inhibiting biological or biochemical function. If a significant effect is measurable, the compound in question is a drug candidate. This quantitative measure indicates how much of a particular drug or other substance (inhibitor) is needed to inhibit a given biological process by half. In other words, it is the half maximal (50%) Inhibitory Concentration (IC) of a substance (50% IC, or IC-50). It is commonly used as a measure of antagonist drug potency in pharmacological research.

    The NCI renamed the value for the concentration that causes 50% growth inhibition to emphasize the correction for the cell count at time zero. Thus, the GI-50 value is the concentration of test drug where 100 x (T - T0)/(C - T0) = 50. The optical density of the test well after a 48h period of exposure to test drug is T, the optical density at time zero is T0, and the control optical density is C. The ``50'' is called the GI50PRCNT, a T/C-like parameter that can have values from +100 to -100. The GI-50 measures the growth inhibitory power of the test agent. The TGI is the concentration of test drug where 100 x (T - T0)/(C - T0) = 0. Thus, the TGI signifies a cytostatic effect.

    top of page

    What is a “similarity search”?

    During an activity similarity search, a fingerprint of a search compound will be compared with the fingerprints of all other compounds in the database in order to find molecules similar in the reacton pattern on cell lines. For this case, a row of compounds must be pre-selected. This can be done by their chemical structure. Either, chemical structures are determined by a structure similarity search, "similarity search".

    As the result of the complete query, the structure similarity vector (all structures found with correspondence to a given structure) can be compared with the activity profile vector, i.e., a vector of cellular fingerprints for the compounds found.

    Similarity search: A structure similarity search is performed by the calculation of the Tanimoto Coefficient.

    top of page

    Which Expression Data are available in CancerResource?

    NCI-60, CCLE and CoSMIC cell line Expression Data

    Such data represent the expression of a single gene in a single cell line compared to all cell lines in case of CellMiner or to all cell lines for the same tissue type in case for CCLE and CoSMIC. The representation of the expression profile for other tissue types is enabled. This is no differential gene expression, only the relative abundance compared to other cell lines.


    User-defined Expression Data

    A user can import own experimental (microarray) data, a single chip to compare it against NCI-60, CCLE and CoSMIC cell lines. Differential expression of genes will be calculated on-line. Those results can be projected on KEGG pathways. For the latter case, an Over-Representation Analysis will be calculated to determine significance.

    top of page

    Which Mutation Data are available in CancerResource?

    NCI-60, CCLE and CoSMIC gene Mutation Data

    When searching for a selected gene, information about cell-lines is given in which the mutated gene occurs. A detailed overview of the mutation is displayed showing the amino acid change and position in the gene and a Polyphen prediction for the 5 worst effected cancer relevant genes showing whether the mutation might be damaging. Further information about the cell lines mutation profile can be found be clicking on the selected cell line.
    Finding mutations for cell lines of a specific tissue type is enabled. The overview displays the number of cell lines found for the tissue type and lists all cell lines including their mutated genes. Again a Polyphen prediction is available for the 5 worst effected cancer relevant genes showing whether the mutation might be damaging.
    Furthermore, a search for cell-lines is included that results in a list with genes which are mutated in the selected cell-line and how much mutation for a gene are known.

    User-defined mutational Data

    A user can import own mutational data to compare it against NCI-60, CCLE and CoSMIC cell lines. A list with similar cell lines will be calculated. By choosing a similar cell line information given by the three consortia will be displayed. The most effective drugs, its mutation profile including Polyphen predictions for the 10 worst effected cancer relevant genes and similar cell lines based on compound activity profile, mutated genes or expression level of genes are given.

    Heatmap representation for similar cell lines (Compounds Activity Profile):


    By clicking on a colored square the user will be directed to a comparison on both selected cell lines giving an overview of most effective drugs and their sensitivity data.
    top of page