indigopy package

Submodules

indigopy.core module

This module contains all custom functions at the core of the INDIGO algorithm.

indigopy.core.classify(scores: list, thresholds: tuple = (-0.1, 0.1), classes: tuple = ('Synergy', 'Neutral', 'Antagonism'))[source]

Converts drug interaction scores into interaction classes.

Score-to-class conversion is based on the general convention for synergy (negative) and antagonism (positive) for interaction outcomes measured based on the Loewe Additivity or Bliss Independence models.

Parameters
  • scores (list) – A list of drug interaction scores.

  • thresholds (tuple, optional) – A tuple of two floating numbers indicative of (inclusive) thresholds for synergy and antagonism, repectively.

  • classes (tuple, optional) – A tuple of three strings representative of class labels. By default, the three classes are Synergy, Neutral, and Antagonism.

Returns

A list of class labels for the given list of interaction scores.

Return type

list

Raises
  • AssertionError – Raised when the wrong number of elements is provided for thresholds or classes.

  • TypeError – Raised when a given input type is incorrect.

Examples

Usage cases of the classify function.

>>> scores = [-2, 1.5, 0.5, -0.1, 1]
>>> classify(scores)
['Synergy', 'Antagonism', 'Antagonism', 'Synergy', 'Antagonism']
>>> thresholds = (-1, 1)
>>> classify(scores, thresholds=thresholds)
['Synergy', 'Antagonism', 'Neutral', 'Neutral', 'Antagonism']
>>> classes = ('S', 'N', 'A')
>>> classify(scores, thresholds=thresholds, classes=classes)
['S', 'A', 'N', 'N', 'A']
indigopy.core.featurize(interactions: list, profiles: dict, feature_names: Optional[list] = None, key: Optional[list] = None, normalize: bool = False, norm_method: str = 'znorm', na_handle: float = 0.0, binarize: bool = True, thresholds: tuple = (-2, 2), remove_zero_rows: bool = False, entropy: bool = False, time: bool = False, time_values: Optional[list] = None, strains: Optional[list] = None, orthology_map: Optional[dict] = None, silent: bool = False)[source]

Determines ML features for a list of drug combinations.

This function determines the feature information (i.e., joint profile) for a given drug combination. The feature information is comprised of four pieces of information:

  • sigma scores: indicative of the combined drug effect

  • delta scores: indicative of drug-unique effects

  • entropy scores: indicative of the combined entropy (optional)

  • time score: indicative of time difference between treatments (optional)

Parameters
  • interactions (list) – A list of lists containing the drug names involved in a combination.

  • profiles (dict) – A dictionary of profile information for individual drug treatments.

  • feature_names (list, optional) – A list of feature names corresponding to the profile information (default is None).

  • key (list, optional) – A list of tuple pairs containing mapping information between drug names in interactions and profiles (default is None). The first element of each tuple must correspond to drug names in interactions. The second element of each tuple must exist in profiles.keys().

  • normalize (bool, optional) – Boolean flag to normalize drug profile data (default is False).

  • norm_method (str, optional) – A string specifying the normalization method to use; choose between ‘znorm’ or ‘minmax’ (default is ‘znorm’).

  • na_handle (float, optional) – A numeric value used for replacing any NaN values in profiles (default is 0.).

  • binarize (bool, optional) – Boolean flag to binarize drug profile data (default is True).

  • thresholds (tuple, optional) – A tuple of floating numbers indicative of (inclusive) thresholds for data binarization.

  • remove_zero_rows (bool, optional) – Boolean flag for remove all-zero rows from profile data (default is False).

  • entropy (bool, optional) – Boolean flag to determine entropy scores (default is False).

  • time (bool, optional) – Boolean flag to determine time score (default is False).

  • time_values (list, optional) – A list of time values to use for the time feature (default is None). Length must match length of interactions list.

  • strains (list, optional) – A list of strain names that correspond to interactions (default is None). Length must match length of interactions list.

  • orthology_map (dict, optional) – A dictionary of orthology information for each unique strain in strains (default is None). Key entries must match the unique strain names in strains.

  • silent (bool, optional) – Boolean flag to silence warnings (default is False).

Returns

A dictionary of feature information for the given list of drug combinations.

Return type

dict

Raises
  • AssertionError – Raised when argument data dimensions do not correctly correspond to one another.

  • KeyError – Raised when a drug profile is not provided; bypassed with a warning indicating missing information.

  • TypeError – Raised when a given input type is incorrect.

  • ValueError – Raised when a given input value is incorrect.

Examples

Usage cases of the featurize function.

>>> interactions = [['A', 'B'], ['A', 'C'], ['B', 'C'], ['A', 'B', 'C']]
>>> profiles = {'A': [1, 0, 1], 'B': [-2, 1.5, -0.5], 'C': [1, 2, 3]}
>>> out = featurize(interactions, profiles)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-feat1

0

0

0

0

sigma-neg-feat2

0

0

0

0

sigma-neg-feat3

0

0

0

0

sigma-pos-feat1

0

0

0

0

sigma-pos-feat2

0

0

0

0

sigma-pos-feat3

0

1

1

0.666667

delta-neg-feat1

0

0

0

0

delta-neg-feat2

0

0

0

0

delta-neg-feat3

0

0

0

0

delta-pos-feat1

0

0

0

0

delta-pos-feat2

0

0

0

0

delta-pos-feat3

0

1

1

1

>>> feature_names = ['G1', 'G2', 'G3']
>>> out = featurize(interactions, profiles, feature_names=feature_names)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-G1

0

0

0

0

sigma-neg-G2

0

0

0

0

sigma-neg-G3

0

0

0

0

sigma-pos-G1

0

0

0

0

sigma-pos-G2

0

0

0

0

sigma-pos-G3

0

1

1

0.666667

delta-neg-G1

0

0

0

0

delta-neg-G2

0

0

0

0

delta-neg-G3

0

0

0

0

delta-pos-G1

0

0

0

0

delta-pos-G2

0

0

0

0

delta-pos-G3

0

1

1

1

>>> profiles_alt = {'Drug_A': [1, 0, 1], 'Drug_B': [-2, 1.5, -0.5], 'Drug_C': [1, 2, 3]}
>>> key = [('A', 'Drug_A'), ('B', 'Drug_B'), ('C', 'Drug_C')]
>>> silent = True
>>> out = featurize(interactions, profiles_alt, key=key, silent=silent)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-feat1

0

0

0

0

sigma-neg-feat2

0

0

0

0

sigma-neg-feat3

0

0

0

0

sigma-pos-feat1

0

0

0

0

sigma-pos-feat2

0

0

0

0

sigma-pos-feat3

0

1

1

0.666667

delta-neg-feat1

0

0

0

0

delta-neg-feat2

0

0

0

0

delta-neg-feat3

0

0

0

0

delta-pos-feat1

0

0

0

0

delta-pos-feat2

0

0

0

0

delta-pos-feat3

0

1

1

1

>>> normalize, norm_method, na_handle = True, 'minmax', 0
>>> out = featurize(interactions, profiles, normalize=normalize, norm_method=norm_method)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-feat1

0

0

0

0

sigma-neg-feat2

0

0

0

0

sigma-neg-feat3

0

0

0

0

sigma-pos-feat1

0

0

0

0

sigma-pos-feat2

0

0

0

0

sigma-pos-feat3

0

0

0

0

delta-neg-feat1

0

0

0

0

delta-neg-feat2

0

0

0

0

delta-neg-feat3

0

0

0

0

delta-pos-feat1

0

0

0

0

delta-pos-feat2

0

0

0

0

delta-pos-feat3

0

0

0

0

>>> binarize, thresholds, remove_zero_rows = True, (-1, 1), True
>>> out = featurize(interactions, profiles, binarize=binarize, thresholds=thresholds, remove_zero_rows=remove_zero_rows)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-feat1

1

0

1

0.666667

sigma-pos-feat2

1

1

2

1.333333

sigma-pos-feat3

0

1

1

0.666667

delta-neg-feat1

1

0

1

1

delta-pos-feat2

1

1

0

0

delta-pos-feat3

0

1

1

1

>>> entropy, time, time_values = True, True, [[0, 0], [1, 1], [1, 2], [1, 2, 3]]
>>> out = featurize(interactions, profiles, entropy=entropy, time=time, time_values=time_values)
>>> print(out['feature_df'])

A + B

A -> C

B -> C

A -> B -> C

sigma-neg-feat1

0

0

0

0

sigma-neg-feat2

0

0

0

0

sigma-neg-feat3

0

0

0

0

sigma-pos-feat1

0

0

0

0

sigma-pos-feat2

0

0

0

0

sigma-pos-feat3

0

1

1

0.666667

delta-neg-feat1

0

0

0

0

delta-neg-feat2

0

0

0

0

delta-neg-feat3

0

0

0

0

delta-pos-feat1

0

0

0

0

delta-pos-feat2

0

0

0

0

delta-pos-feat3

0

0.5

0.666667

0.5

entropy-mean

0.0136995

-0.549306

0.563006

0.00913299

entropy-sum

0.027399

-1.09861

1.12601

0.027399

time

0

1

1

3

>>> feature_names = ['G1', 'G2', 'G3']
>>> strains = ['MG1655', 'MG1655', 'MC1400', 'IAI1']
>>> orthology_map = {'MG1655': ['G1', 'G2'], 'MC1400': ['G1', 'G3'], 'IAI1': ['G1']}
>>> out = featurize(interactions, profiles, feature_names=feature_names, strains=strains, orthology_map=orthology_map)
>>> print(out['feature_df'])

A + B

A + C

B + C

A + B + C

sigma-neg-G1

0

0

0

0

sigma-neg-G2

0

0

0

0

sigma-neg-G3

0

0

0

0

sigma-pos-G1

0

0

0

0

sigma-pos-G2

0

0

0

0

sigma-pos-G3

0

0

1

0

delta-neg-G1

0

0

0

0

delta-neg-G2

0

0

0

0

delta-neg-G3

0

0

0

0

delta-pos-G1

0

0

0

0

delta-pos-G2

0

0

0

0

delta-pos-G3

0

1

1

1

indigopy.core.load_sample(dataset: str)[source]

Loads a sample dataset.

This function loads a dictionary containing data relevant to a sample organism. Currently supports data for Escherichia coli, Mycobacterium tuberculosis, Staphylococcus aureus, and Acinetobacter baumannii.

Parameters

dataset (str) – A string specifying the organism for which to load the sample data. Choose from ‘ecoli’, ‘mtb’, ‘saureus’, or ‘abaumannii’.

Returns

A dictionary object containing data for an organism of interest. Specifically:
  • For ‘ecoli’, the dictionary contains the following keys:
    • key: a dictionary for drug name mapping

    • profiles: a dictionary of drug profile data (i.e., chemogenomic data)

    • feature_names: a list of feature (i.e., gene) names associated with drug profile data

    • train: a dictionary for the train subset of the drug interaction data

    • test: a dictionary for the test subset of the drug interaction data

  • For ‘mtb’, the dictionary contains the following keys:
    • key: a dictionary for drug name mapping

    • profiles: a dictionary of drug profile data (i.e., transcriptomic data)

    • feature_names: a list of feature (i.e., gene) names associated with drug profile data

    • train: a dictionary for the train subset of the drug interaction data

    • test: a dictionary for the test subset of the drug interaction data

    • clinical: a dictionary for the clinical subset of the drug interaction data

  • For ‘saureus’, the dictionary contains the following keys:
    • key: a dictionary for drug name mapping

    • profiles: a dictionary of drug profile data (i.e., chemogenomic data)

    • feature_names: a list of feature (i.e., gene) names associated with drug profile data

    • train: a dictionary for the train subset of the drug interaction data

    • test: a dictionary for the test subset of the drug interaction data

    • orthology: a dictionary for the orthology data between E. coli and S. aureus

  • For ‘abaumannii’, the dictionary contains the following keys:
    • key: a dictionary for drug name mapping

    • profiles: a dictionary of drug profile data (i.e., chemogenomic data)

    • feature_names: a list of feature (i.e., gene) names associated with drug profile data

    • train: a dictionary for the train subset of the drug interaction data

    • test: a dictionary for the test subset of the drug interaction data

    • orthology: a dictionary for the orthology data between E. coli and A. baumannii

Return type

dict

Raises
  • TypeError – Raised when the input type is not a string.

  • ValueError – Raised when the function argument does not match accepted values (‘ecoli’, ‘mtb’, ‘saureus’, ‘abaumannii’).

Examples

Usage cases of the load_sample function.

>>> ecoli_data = load_sample('ecoli')
>>> print(ecoli_data['train']['interactions'][0])
['AMK', 'CEF']
>>> mtb_data = load_sample('mtb')
>>> print(mtb_data['clinical']['interactions'][0])
['EMBx', 'INH']
>>> saureus_data = load_sample('saureus')
>>> print(saureus_data['orthology']['map']['S_aureus'][0:3])
['b0002', 'b0003', 'b0007']
>>> abaumannii_data = load_sample('abaumannii')
>>> print(abaumannii_data['orthology']['map']['A_baumannii'][0:3])
['b0002', 'b0006', 'b0007']

Module contents