Overview

BioTransformer Tasks

BioTransformer can perform two different tasks for any small molecule, namely the metabolism prediction (with link to the metabolism prediction help card), and the metabolite identification (with link to the metabolite identification help card). The molecule must be organic and must not be a mixture.

BioTransformer Options

The user can select between seven different options for the prediction or identification. Five options allow for a specific coverage metabolism, and include the CYP450, EC-Based, Phase II, Gut microbial, and Environmental microbial options. Two options allow for a comprehensive coverage of metabolism in the human superorganism, and include the allHuman, and superbio options.

  1. CYP450: Select this option for CYP450 metabolism prediction.
  2. EC-Based: Select this option for the prediction of promiscuous metabolism (e.g. glycerolipid metabolism).
  3. Phase II: Select this option for the prediction of major conjugative reactions, including glucuronidation, sulfation, glycine transfer, N-acetyl transfer, and glutathione transfer, among others.
  4. Gut microbial: Select this option to predict xenobiotic metabolism by gut microbial enzymes
  5. Environmental microbial: Select this option to predict or identify products of the environmental microbial degradation of small organic molecules
  6. AllHuman: Select this option to predict or identify products of small molecule metabolism in the human superorganism. This covers biotransformations occurring both in human tissues as well as well as the gut microbiota. Each step of the metabolism prediction/metabolite identification covers human as well as gut microbial transformations, when applicable.
  7. Superbio: Select this option to predict or identify products of small molecule metabolism in the human superorganism. This covers biotransformations occurring both in human tissues as well as well as the gut microbiota. In contrast to the allHuman option, BioTransformer follows a defined sequence of reaction types, starting from promiscuous metabolic reactions alone, followed by CYP450-catalyzed reactions, gut microbial degradation, and Phase II reactions.

Web Server Accessibility

The webserver can be freely accessed in two ways: Users can submit queries, and retrieve results both manually and programmatically as described on the next pages.

Fair usage policy

In order to ensure that resources are shared properly, we have limited the number of starting compounds per query to 1. Thus, the tab-separated structural input should have no more than 1 line, and the SD File (SDF) must contain no more than 1 compound. Moreover, the number of steps was limited to 1. Users who would like to submit more compounds at once or predict more multi-step metabolism are asked to used the command-line executable available here.

Metabolism Prediction

The Input

In order to predict metabolism, the BioTransformer Metabolism Prediction Tool (BMPT) accepts five parameters as input, described as follows:

  1. The task type (required): The user must select the “Metabolism Prediction” task.
  2. The BioTransformer option (required): The user can select one of the options depending of the aspect of metabolism that is of interest. The prediction could be specific (CYP450, Phase II, or EC-based), or comprehensive (Gut microbial, Environmental microbial, AllHuman, or Superbio). For a detailed explanation of the different option, refer to this link.
  3. The structure input (required): The input structures for starting compounds can be submitted in two different ways:
    1. A tab-separated text: Each line contains either a (preferably isomeric) SMILESstring, or a standard InChI of a starting compound. The structural representation (SMILES or standard InChI) can be preceded by an identifier (e.g. a name, a database ID).
    2. An SDF file: The SDF can be uploaded, and must contain the exact structure of each starting compound. For each starting compound, the optional identifier should be inserted in the header, just before the corresponding connection table. For more information, please consult the following document.
  4. The Number of steps (optional): This determines the maximal number of biotransformation steps that will be predicted. For the sake of fair usage, it was limited to 1 on the web server.
  5. The Submission title (optional): This allows the user to specify a title for the query. Please keep in mind that queries are accessible from the web server only through their query ID, as returned by BioTransformer.

The Output

As default, the query output is returned in an HTML format that is displayed on the web interface. Users have the possibility to display results in a JSON or SDF format. Moreover, they can download the results in a single JSON, SDF, or CSV file.

Metabolite Identification

The Input

To submit a query for metabolite identification, please provide the input as described below:

  1. The task type (required): The user must select the “Metabolite Identification” task.
  2. The BioTransformer option (required): The user can select one of the options depending of the aspect of metabolism that is of interest. The prediction could be specific (CYP450, Phase II, or EC-based), or comprehensive (Gut microbial, Environmental microbial, AllHuman, or Superbio). For a detailed explanation of the different option, refer to this link.
  3. The structure input (required): The input structures for starting compounds can be submitted in two different ways:
    1. A tab-separated text: Each line contains either a (preferably isomeric) SMILESstring, or a standard InChI of a starting compound. The structural representation (SMILES or standard InChI) can be preceded by an identifier (e.g. a name, a database ID). Moreover, the structural representation must be followed by:
      1. The string “MASS”, followed by a semicolon-separated list of masses of potential metabolites that are searched for, and optionally, a mass tolerance between 0 and 1. The default mass tolerance is set to 0.01 (See Fig. 1).
      2. The string “FORMULA”, followed by a semicolon -separated list of molecular formulas of potential metabolites that are searched for (See Fig. 1).
    2. An SDF file: The SDF can be uploaded, and must contain the exact structure of each starting compound. For each starting compound, the optional identifier should be inserted in the header, just before the corresponding connection table. For more information, please consult the following document. Moreover, each SDF representation must contain additional parameters as described below (See Fig.2):
      1. MASSES: the content should include a semicolon-separated list of masses of potential metabolites that are searched for
      2. MTOLERANCE: The content should include a decimal number between 0 and 1, which represent the mass tolerance that is allowed. This parameter is only needed when the identification uses the mass instead of the molecular formula
      3. MFORMULAS: the content should include a semicolon-separated list of molecular formulae of potential metabolites that are searched for.
  4. The Number of steps (optional): This determines the maximal number of biotransformation steps that will be predicted. For the sake of fair usage, it was limited to 1 on the web server.
  5. The Submission title (optional): This allows the user to specify a title for the query. Please keep in mind that queries are accessible from the web server only through their query ID, as returned by BioTransformer.

The Output

As default, the query output is returned in an HTML format that is displayed on the web interface. Users have the possibility to display results in a JSON or SDF format. Moreover, they can download the results in a single JSON, SDF, or CSV file.

Fig. 1: Example of text input for a metabolite identification query.
Fig. 1: Example of text input for a metabolite identification query.

Fig. 2: Example of SDF input for metabolite prediction
Fig. 2: Example of SDF input for metabolite prediction.

Programmatic Access

Resources

There is one end-point available to submit queries and retrieve the results from the queries performed:
biotransformer.ca/queries.json

Submit query (POST)

POST: Submit a query to the biotransformer app. In the production server, the number of post requests is limited to 2 per minute. The attributes are listed below. Only one of the parameters ‘query_input’ or ‘fstruc’ must be provided with content.

  1. query_label (optional): any string for labeling the query.
  2. task_type (required): ‘PREDICTION’, ‘IDENTIFICATION’.
  3. biotransformer_option (required): ‘CYP450’, ‘EC-BASED’, ‘PHASEII’, ‘HGUT’, ‘ENVMICRO’, ‘ALLHUMAN’, ‘SUPERBIO’.
    IDENTIFICATION TASK is restricted to biotransformer options: ‘ENVMICRO’, ‘ALLHUMAN’, ‘SUPERBIO’.
  4. query_input (optional): this option is used only if a the user wants to enter a text input, as described in the ‘Metabolism Prediction’, and the ‘Metabolite Identification’ sections. The tab spaces must be replaced by a tab (‘\t’) or space ('\s') character, and the newlines must be replaced by ‘\n’.
  5. fstruct (optional): The path to the SD File to be uploaded.
  6. number_of_steps (optional): It determines the maximal number of biotransformation steps that will be predicted. For the sake of fair usage, it is limited to 1 on the production web server (http://biotransformer.ca/).
This is the command that can be used with curl to use the REST API:

curl -i -H "Content-Type: application/json" -H "Accept: application/json" http://biotransformer.ca/queries.json -X POST -d '{"biotransformer_option":{OPTION}, "number_of_steps":1, "query_input":{QUERY_INPUT} ,"task_type":{TASK TYPE}}'

Examples of query submissions are shown below:

curl -i -H "Content-type: application/json" -H "Accept: application/json" http://biotransformer.ca/queries.json -X POST -d '{"biotransformer_option":"CYP450", "number_of_steps":1, "query_input":"acetaminophen\tCC(=O)NC1=CC=C(O)C=C1", "task_type":"PREDICTION" }'

Where the json sent is:

        
curl -i -H "Content-type: application/json" -H "Accept: application/json" http://biotransformer.ca/queries.json -X POST -d '{"biotransformer_option":"ALLHUMAN", "number_of_steps":1, "query_input":"Epicatechin\tO[[email protected]@H]1CC2=C(O)C=C(O)C=C2O[[email protected]@H]1C1=CC=C(O)C(O)=C1\tMASS\t208.0735;126.03\t0.005", "task_type":"IDENTIFICATION"}'

Where the json sent is:

        

If the number of allowed requests is reached in the production server, the user will see the next message:
Fig. 1: post requests limit reached in the production server
Fig. 1: post requests limit reached in the production server.

Retrieve results (GET)

curl -H "Accept: application/json" -X GET http://biotransformer.ca/queries/{QUERY_NUMBER}.json The query number can be shown in the response from the server when performing a POST request (see image below):
Fig. 2: example of a POST request. Query id remarked
Fig. 2: example of a POST request. Query id remarked.



The attributes of the query results for identification are:
  1. id: id of the query.
  2. label: null or string specified by input query.
  3. task: ‘PREDICTION’, ‘IDENTIFICATION’.
  4. biotransformer_option: ‘ENVMICRO’, ‘ALLHUMAN’, ‘SUPERBIO’.
  5. number_of_steps: number of compounds in the entry. Limited to 1.
  6. status: ‘failed’, ‘In progress’, ‘Done’.
  7. number_of_starting_compounds: number of compounds in the entry. Limited to 1.
  8. total_prediction_time_in_ms: number of ms used for processing the biotransformation.
  9. "invalid_compounds": array of invalid compounds introduced by the user
  10. "predictions": results of the metabolites predicted, products of the biotransformer processing.
  11. "number_of_unique_metabolites": number of unique metabolites.
  12. "prediction_errors": errors ocurred during the prediction
A json example of the query results is shown below:


        The attributes of the query results for identification are:
        
  1. id: id of the query.
  2. label: null or string specified by input query.
  3. task: ‘PREDICTION’, ‘IDENTIFICATION’.
  4. biotransformer_option: ‘ENVMICRO’, ‘ALLHUMAN’, ‘SUPERBIO’.
  5. number_of_steps: number of compounds in the entry. Limited to 1.
  6. status: ‘failed’, ‘In progress’, ‘Done’.
  7. number_of_starting_compounds: number of compounds in the entry. Limited to 1.
  8. total_prediction_time_in_ms: number of ms used for processing the biotransformation.
  9. "number_of_identified_metabolites" : number of metabolites identified.
  10. "invalid_compounds": array of invalid compounds introduced by the user
  11. "metabolites": results of the metabolites identified, product of the biotransformer processing.
  12. "number_of_unique_metabolites": number of unique metabolites.
  13. "prediction_errors": errors ocurred during the prediction
A json example of the query results is shown below: