myChEMBL webservices version 2.x

myChEMBL team, ChEMBL Group, EMBL-EBI.

Introduction

This notebook will provide some examples of using the myChEMBL webservices.

The web services have recently been updated to the 2.x version and are not backwards compatible. The main features introduced by this latest version are:

  • more resources
  • filtering
  • paging
  • ordering

You can call the web services in the following two ways:

1) Directly via URLs (see the 'Web Services' link on the myChEMBL LaunchPad for a list of the available endpoints). The advantage of using the URLs is that it is language-agnostic: although the examples below use Python, any other language with a library for executing HTTP requests would do just as well.

2) Using the API provided by the Python package chembl_webresource_client. This has the following advantages:

  • the usage is simpler
  • some extra functionality is available
  • there are performance benefits

For the reasons above, we recommend using the API where possible.

Note that the chembl_webresource_client module is aleady installed on the myChEMBL VM; if you wish to use it on other machines, it can be installed using pip.

Please note that the code below attempts to balance clarity and brevity, and is not intended to be a template for production code: error checking, for example, should be much more thorough in practice.

Configuration and setup

In [3]:
from collections import Counter
from operator import itemgetter
from lxml import etree
from chembl_webresource_client.new_client import new_client

List of available resources

It's easy to get a list of available resources by invoking:

In [4]:
available_resources = [resource for resource in dir(new_client) if not resource.startswith('_')]
print available_resources
print len(available_resources)
['activity', 'assay', 'atc_class', 'binding_site', 'biotherapeutic', 'cell_line', 'chembl_id_lookup', 'description', 'document', 'image', 'mechanism', 'molecule', 'molecule_form', 'official', 'protein_class', 'similarity', 'source', 'substructure', 'target', 'target_component']
20

Which means there are 20 different types of resources available via web services. In this notebook only the most important of these are covered.

Molecules

Molecule records may be retrieved in a number of ways, such as lookup of single molecules using various identifiers or searching for compounds via substruture or similarity.

In [5]:
# Get a molecule-handler object for API access and check the connection to the database...

molecule = new_client.molecule
molecule.set_format('json')
print "%s molecules available in myChEMBL_20" % len(molecule.all())
1463270 molecules available in myChEMBL_20

Getting a single molecule

In order to retrieve a single molecule from the web services, you need to know its unique and unambiguous identifier. In case of molecule resource this can be one of three types:

  1. ChEMBL_ID
  2. InChI Key
  3. Canonical SMILES (non-canonical SMILES will be covered later in this notebook)
In [6]:
# so this:
# 1.
m1 = molecule.get('CHEMBL25')
# 2.
m2 = molecule.get('BSYNRYMUTXBXSQ-UHFFFAOYSA-N')
#
m3 = molecule.get('CC(=O)Oc1ccccc1C(=O)O')
# will return the same data:
m1 == m2 == m3
Out[6]:
False

ChEMBL ID

All the main entities in the ChEMBL database have a ChEMBL ID. It is a stable identifier designed for straightforward lookup of data.

In [7]:
# Lapatinib, the bioactive component of the anti-cancer drug Tykerb

chembl_id = "CHEMBL554" 
In [8]:
# Get compound record using client...

record_via_client = molecule.get(chembl_id)

record_via_client
Out[8]:
{u'atc_classifications': [u'L01XE07'],
 u'availability_type': u'1',
 u'biotherapeutic': None,
 u'black_box_warning': u'1',
 u'chebi_par_id': 49603,
 u'chirality': u'2',
 u'dosed_ingredient': False,
 u'first_approval': 2007,
 u'first_in_class': u'0',
 u'helm_notation': None,
 u'indication_class': None,
 u'inorganic_flag': u'0',
 u'max_phase': 4,
 u'molecule_chembl_id': u'CHEMBL554',
 u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL554',
  u'parent_chembl_id': u'CHEMBL554'},
 u'molecule_properties': {u'acd_logd': u'6.26',
  u'acd_logp': u'6.30',
  u'acd_most_apka': None,
  u'acd_most_bpka': u'6.34',
  u'alogp': u'6.04',
  u'aromatic_rings': 5,
  u'full_molformula': u'C29H26ClFN4O4S',
  u'full_mwt': u'581.06',
  u'hba': 7,
  u'hbd': 2,
  u'heavy_atoms': 40,
  u'med_chem_friendly': u'Y',
  u'molecular_species': u'NEUTRAL',
  u'mw_freebase': u'581.06',
  u'mw_monoisotopic': u'580.1347',
  u'num_alerts': 1,
  u'num_ro5_violations': 2,
  u'psa': u'114.72',
  u'qed_weighted': u'0.18',
  u'ro3_pass': u'N',
  u'rtb': 11},
 u'molecule_structures': {u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2',
  u'standard_inchi': u'InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)',
  u'standard_inchi_key': u'BCFGMOOMADDAQU-UHFFFAOYSA-N'},
 u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'INN', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'OTHER', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'TRADE_NAME', u'synonyms': u'Tykerb'}],
 u'molecule_type': u'Small molecule',
 u'natural_product': u'0',
 u'oral': True,
 u'parenteral': False,
 u'polymer_flag': False,
 u'pref_name': u'LAPATINIB',
 u'prodrug': u'0',
 u'structure_type': u'MOL',
 u'therapeutic_flag': True,
 u'topical': False,
 u'usan_stem': None,
 u'usan_stem_definition': None,
 u'usan_substem': None,
 u'usan_year': 2003}

As noted above, a URLs may also be used to access the data, and, although the examples here use Python, any other language with a library for executing HTTP requests would do as well.

In [10]:
# Import a Python module to allow URL-based access...

import requests
from urllib import quote

# Stem of URL for local version of web services...

url_stem = "https://www.ebi.ac.uk/chembl/api/data"
In [11]:
# Note that, for historical reasons, the URL-based webservices return XML by default, so JSON
# must be requested explicity by appending '.json' to the URL.

# Get request object...
url = url_stem + "/molecule/" + chembl_id + ".json"
request = requests.get(url)

print url

# Check reqest status: should be 200 if everything went OK...
print request.status_code
https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL554.json
200
In [12]:
record_via_url = request.json()
record_via_url 
Out[12]:
{u'atc_classifications': [u'L01XE07'],
 u'availability_type': u'1',
 u'biotherapeutic': None,
 u'black_box_warning': u'1',
 u'chebi_par_id': 49603,
 u'chirality': u'2',
 u'dosed_ingredient': False,
 u'first_approval': 2007,
 u'first_in_class': u'0',
 u'helm_notation': None,
 u'indication_class': None,
 u'inorganic_flag': u'0',
 u'max_phase': 4,
 u'molecule_chembl_id': u'CHEMBL554',
 u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL554',
  u'parent_chembl_id': u'CHEMBL554'},
 u'molecule_properties': {u'acd_logd': u'6.26',
  u'acd_logp': u'6.30',
  u'acd_most_apka': None,
  u'acd_most_bpka': u'6.34',
  u'alogp': u'6.04',
  u'aromatic_rings': 5,
  u'full_molformula': u'C29H26ClFN4O4S',
  u'full_mwt': u'581.06',
  u'hba': 7,
  u'hbd': 2,
  u'heavy_atoms': 40,
  u'med_chem_friendly': u'Y',
  u'molecular_species': u'NEUTRAL',
  u'mw_freebase': u'581.06',
  u'mw_monoisotopic': u'580.1347',
  u'num_alerts': 1,
  u'num_ro5_violations': 2,
  u'psa': u'114.72',
  u'qed_weighted': u'0.18',
  u'ro3_pass': u'N',
  u'rtb': 11},
 u'molecule_structures': {u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2',
  u'standard_inchi': u'InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)',
  u'standard_inchi_key': u'BCFGMOOMADDAQU-UHFFFAOYSA-N'},
 u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'INN', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'OTHER', u'synonyms': u'Lapatinib'},
  {u'syn_type': u'TRADE_NAME', u'synonyms': u'Tykerb'}],
 u'molecule_type': u'Small molecule',
 u'natural_product': u'0',
 u'oral': True,
 u'parenteral': False,
 u'polymer_flag': False,
 u'pref_name': u'LAPATINIB',
 u'prodrug': u'0',
 u'structure_type': u'MOL',
 u'therapeutic_flag': True,
 u'topical': False,
 u'usan_stem': None,
 u'usan_stem_definition': None,
 u'usan_substem': None,
 u'usan_year': 2003}

Note that in both cases we are getting exactly the same results:

In [13]:
record_via_client == record_via_url
Out[13]:
True

When retrieved in JSON format, a record is a nested dictionary, so to get, say, a SMILES string we have to write:

In [14]:
smiles_from_json = record_via_client['molecule_structures']['canonical_smiles']

It is possible to retrieve data in XML format as well:

In [15]:
# Get compound record in XML format...

molecule.set_format('xml')
xml = molecule.get(chembl_id).encode('utf-8')
#print xml
# The XML must be parsed (e.g. using the lxml.etree module in Python) to enable extraction of the data...

root = etree.fromstring(xml).getroottree()
In [16]:
# Extract SMILES via xpath...

smiles_from_xml = root.xpath("/molecule/molecule_structures/canonical_smiles/text()")[0]

print smiles_from_xml
print smiles_from_xml == smiles_from_json
CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2
True
In [17]:
# Pretty-print XML...

print etree.tostring(root, pretty_print=True)
<molecule>
  <atc_classifications>
    <level5>L01XE07</level5>
  </atc_classifications>
  <availability_type>1</availability_type>
  <biotherapeutic/>
  <black_box_warning>1</black_box_warning>
  <chebi_par_id>49603</chebi_par_id>
  <chirality>2</chirality>
  <dosed_ingredient/>
  <first_approval>2007</first_approval>
  <first_in_class>0</first_in_class>
  <helm_notation/>
  <indication_class/>
  <inorganic_flag>0</inorganic_flag>
  <max_phase>4</max_phase>
  <molecule_chembl_id>CHEMBL554</molecule_chembl_id>
  <molecule_hierarchy>
    <molecule_chembl_id>CHEMBL554</molecule_chembl_id>
    <parent_chembl_id>CHEMBL554</parent_chembl_id>
  </molecule_hierarchy>
  <molecule_properties>
    <acd_logd>6.26</acd_logd>
    <acd_logp>6.30</acd_logp>
    <acd_most_apka/>
    <acd_most_bpka>6.34</acd_most_bpka>
    <alogp>6.04</alogp>
    <aromatic_rings>5</aromatic_rings>
    <full_molformula>C29H26ClFN4O4S</full_molformula>
    <full_mwt>581.06</full_mwt>
    <hba>7</hba>
    <hbd>2</hbd>
    <heavy_atoms>40</heavy_atoms>
    <med_chem_friendly>Y</med_chem_friendly>
    <molecular_species>NEUTRAL</molecular_species>
    <mw_freebase>581.06</mw_freebase>
    <mw_monoisotopic>580.1347</mw_monoisotopic>
    <num_alerts>1</num_alerts>
    <num_ro5_violations>2</num_ro5_violations>
    <psa>114.72</psa>
    <qed_weighted>0.18</qed_weighted>
    <ro3_pass>N</ro3_pass>
    <rtb>11</rtb>
  </molecule_properties>
  <molecule_structures>
    <canonical_smiles>CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2</canonical_smiles>
    <standard_inchi>InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)</standard_inchi>
    <standard_inchi_key>BCFGMOOMADDAQU-UHFFFAOYSA-N</standard_inchi_key>
  </molecule_structures>
  <molecule_synonyms>
    <synonym>
      <syn_type>FDA</syn_type>
      <synonyms>Lapatinib</synonyms>
    </synonym>
    <synonym>
      <syn_type>INN</syn_type>
      <synonyms>Lapatinib</synonyms>
    </synonym>
    <synonym>
      <syn_type>OTHER</syn_type>
      <synonyms>Lapatinib</synonyms>
    </synonym>
    <synonym>
      <syn_type>TRADE_NAME</syn_type>
      <synonyms>Tykerb</synonyms>
    </synonym>
  </molecule_synonyms>
  <molecule_type>Small molecule</molecule_type>
  <natural_product>0</natural_product>
  <oral>True</oral>
  <parenteral/>
  <polymer_flag/>
  <pref_name>LAPATINIB</pref_name>
  <prodrug>0</prodrug>
  <structure_type>MOL</structure_type>
  <therapeutic_flag>True</therapeutic_flag>
  <topical/>
  <usan_stem/>
  <usan_stem_definition/>
  <usan_substem/>
  <usan_year>2003</usan_year>
</molecule>

InChIKey

Compound records may also be retrieved via InChI Key lookup.

In [19]:
# InChI Key for Lapatinib
inchi_key = "BCFGMOOMADDAQU-UHFFFAOYSA-N"

# getting molecule via client
molecule.set_format('json')
record_via_client = molecule.get(inchi_key)

# getting molecule via url
url = url_stem + "/molecule/" + inchi_key + ".json"
record_via_url = requests.get(url).json()

print url

# they are the same
print record_via_url == record_via_client
https://www.ebi.ac.uk/chembl/api/data/molecule/BCFGMOOMADDAQU-UHFFFAOYSA-N.json
True

SMILES

Compound records may also be retrieved via SMILES lookup.

The purpose of the get method is to return objects identified by their unique and unambiguous properties. This is why SMILES provided as arguments to the get method need to be canonical. But you can still search for molecules, using non-canonical SMILES - this functionaly will be covered later in this notebook.

In [21]:
# Canonoical SMILES for Lapatinib
canonical_smiles = "CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2"

# getting molecule via client
molecule.set_format('json')
record_via_client = molecule.get(canonical_smiles)

# getting molecule via url
url = url_stem + "/molecule/" + quote(canonical_smiles) + ".json"
record_via_url = requests.get(url).json()

print url

# they are the same
record_via_url == record_via_client
https://www.ebi.ac.uk/chembl/api/data/molecule/CS%28%3DO%29%28%3DO%29CCNCc1oc%28cc1%29c2ccc3ncnc%28Nc4ccc%28OCc5cccc%28F%29c5%29c%28Cl%29c4%29c3c2.json
Out[21]:
True

Batch queries

Multiple records may be requested at once. The get method can accept a list of homogenous identifiers.

In [22]:
records1 = molecule.get(['CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505'])
records2 = molecule.get(['XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N'])
records3 = molecule.get(['CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2',
            'Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N',
            'CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O'])
records1 == records2 == records3
Out[22]:
True

The same can be done via urls:

In [23]:
url1 = url_stem + "/molecule/set/%s;%s;%s" % ('CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505') + ".json"
records1 = requests.get(url1).json()

url2 = url_stem + "/molecule/set/%s;%s;%s" % ('XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N') + ".json"
records2 = requests.get(url2).json()

url3 = url_stem + "/molecule/set/%s;%s;%s" % (quote('CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2'),
            quote('Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N'),
            quote('CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O')) + ".json"
records3 = requests.get(url3).json()

print url1
print url2
print url3

records1 == records2 == records3
https://www.ebi.ac.uk/chembl/api/data/molecule/set/CHEMBL6498;CHEMBL6499;CHEMBL6505.json
https://www.ebi.ac.uk/chembl/api/data/molecule/set/XSQLHVPPXBBUPP-UHFFFAOYSA-N;JXHVRXRRSSBGPY-UHFFFAOYSA-N;TUHYVXGNMOGVMR-GASGPIRDSA-N.json
https://www.ebi.ac.uk/chembl/api/data/molecule/set/CNC%28%3DO%29c1ccc%28cc1%29N%28CC%23C%29Cc2ccc3nc%28C%29nc%28O%29c3c2;Cc1cc2SC%28C%29%28C%29CC%28C%29%28C%29c2cc1%5CN%3DC%28/S%29%5CNc3ccc%28cc3%29S%28%3DO%29%28%3DO%29N;CC%28C%29C%5BC%40H%5D%28NC%28%3DO%29%5BC%40%40H%5D%28NC%28%3DO%29%5BC%40H%5D%28Cc1c%5BnH%5Dc2ccccc12%29NC%28%3DO%29%5BC%40H%5D3CCCN3C%28%3DO%29C%28CCCCN%29CCCCN%29C%28C%29%28C%29C%29C%28%3DO%29O.json
Out[23]:
True

Please note that the length of url can't be more than 4000 characters. This is why url-based approach should not be used for a very long lists of identifiers. Also molecule.get call needs to be modified slightly in that case.

In [24]:
# Generate a list of 300 ChEMBL IDs (N.B. not all will be valid)...

chembl_ids = ['CHEMBL{}'.format(x) for x in range(1, 301)]

# Get compound records, note `molecule_chembl_id` named parameter.
# Named parameters should always be used for longer lists

records = molecule.get(molecule_chembl_id=chembl_ids)

len(records)
Out[24]:
169

Note that we expect to see a number that is less than 300 (169). This is because for some identifiers in range (CHEMBL1, ..., CHEMBL300) there are no molecule mapped to them.

Filtering

All resources available through ChEMBL web services can be filtered. Some examples of filtering applied to molecules:

  1. Get all approved drugs
  2. Get all molecules in ChEMBL with no Rule-of-Five violations
  3. Get all biotherapeutic molecules
  4. Return molecules with molecular weight <= 300
  5. Return molecules with molecular weight <= 300 AND pref_name ends with -nib
In [25]:
# First, filtering using the client:

# 1. Get all approved drugs
approved_drugs = molecule.filter(max_phase=4)

# 2. Get all molecules in ChEMBL with no Rule-of-Five violations
no_violations = molecule.filter(molecule_properties__num_ro5_violations=0)

# 3. Get all biotherapeutic molecules
biotherapeutics = molecule.filter(biotherapeutic__isnull=False)

# 4. Return molecules with molecular weight <= 300
light_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300)

# 5. Return molecules with molecular weight <= 300 AND pref_name ends with nib
light_nib_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300).filter(pref_name__iendswith="nib")
In [26]:
# Secondly, fltering using url endpoint:

# 1. Get all approved drugs
url_1 = url_stem + "/molecule.json?max_phase=4"
url_approved_drugs = requests.get(url_1).json()

# 2. Get all molecules in ChEMBL with no Rule-of-Five violations
url_2 = url_stem + "/molecule.json?molecule_properties__num_ro5_violations=0"
ulr_no_violations = requests.get(url_2).json()

# 3. Get all biotherapeutic molecules
url_3 = url_stem + "/molecule.json?biotherapeutic__isnull=false"
url_biotherapeutics = requests.get(url_3).json()

# 4. Return molecules with molecular weight <= 300
url_4 = url_stem + "/molecule.json?molecule_properties__mw_freebase__lte=300"
url_light_molecules = requests.get(url_4).json()

# 5. Return molecules with molecular weight <= 300 AND pref_name ends with nib
url_5 = url_stem + "/molecule.json?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib"
url_light_nib_molecules = requests.get(url_5).json()

print url_1
print url_2
print url_3
print url_4
print url_5
https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_properties__num_ro5_violations=0
https://www.ebi.ac.uk/chembl/api/data/molecule.json?biotherapeutic__isnull=false
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_properties__mw_freebase__lte=300
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib

Deferences between filtering with client and url endpoint - paging

There are some important differences between filering results returned by the client and generated using URL endpoint. Let's have a look at them.

In [27]:
# First off, they are not the same thing:
print approved_drugs == url_approved_drugs

# Not surprisingly, url-endpoint produced JSON data, which has been paresed into python dict:
print type(url_approved_drugs)

# Whereas the client has returned an object of type `QuerySet`
print type(approved_drugs)
False
<type 'dict'>
<class 'chembl_webresource_client.query_set.QuerySet'>
In [28]:
# Let's examine what data contains the python dict:
url_approved_drugs
Out[28]:
{u'molecules': [{u'atc_classifications': [u'J05AF04'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'1',
   u'chebi_par_id': 63581,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1994,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antiviral',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL991',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL991',
    u'parent_chembl_id': u'CHEMBL991'},
   u'molecule_properties': {u'acd_logd': u'-0.65',
    u'acd_logp': u'-0.65',
    u'acd_most_apka': u'9.47',
    u'acd_most_bpka': None,
    u'alogp': u'-0.52',
    u'aromatic_rings': 0,
    u'full_molformula': u'C10H12N2O4',
    u'full_mwt': u'224.21',
    u'hba': 4,
    u'hbd': 2,
    u'heavy_atoms': 16,
    u'med_chem_friendly': u'N',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'224.21',
    u'mw_monoisotopic': u'224.0797',
    u'num_alerts': 1,
    u'num_ro5_violations': 0,
    u'psa': u'78.87',
    u'qed_weighted': u'0.62',
    u'ro3_pass': u'N',
    u'rtb': 2},
   u'molecule_structures': {u'canonical_smiles': u'CC1=CN([C@@H]2O[C@H](CO)C=C2)C(=O)NC1=O',
    u'standard_inchi': u'InChI=1S/C10H12N2O4/c1-6-4-12(10(15)11-9(6)14)8-3-2-7(5-13)16-8/h2-4,7-8,13H,5H2,1H3,(H,11,14,15)/t7-,8+/m0/s1',
    u'standard_inchi_key': u'XNKLLVCARDGLGL-JGVFFNPUSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE',
     u'synonyms': u'BMY-27857'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'D4T'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID7971664'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID855856'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90341632'},
    {u'syn_type': u'BAN', u'synonyms': u'Stavudine'},
    {u'syn_type': u'FDA', u'synonyms': u'Stavudine'},
    {u'syn_type': u'INN', u'synonyms': u'Stavudine'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Stavudine'},
    {u'syn_type': u'USAN', u'synonyms': u'Stavudine'},
    {u'syn_type': u'USP', u'synonyms': u'Stavudine'},
    {u'syn_type': u'OTHER', u'synonyms': u'Zerit'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Zerit'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Zerit Xr'},
    {u'syn_type': u'OTHER', u'synonyms': u'd4T'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'1',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'STAVUDINE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-vudine',
   u'usan_stem_definition': u'antineoplastics,antivirals (zidovudine group) (exception: edoxudine)',
   u'usan_substem': None,
   u'usan_year': 1991},
  {u'atc_classifications': [u'N04BA01'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 15765,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1970,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antiparkinsonian',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1009',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1009',
    u'parent_chembl_id': u'CHEMBL1009'},
   u'molecule_properties': {u'acd_logd': u'-3.67',
    u'acd_logp': u'-1.15',
    u'acd_most_apka': u'2.24',
    u'acd_most_bpka': u'8.85',
    u'alogp': u'-2.09',
    u'aromatic_rings': 1,
    u'full_molformula': u'C9H11NO4',
    u'full_mwt': u'197.19',
    u'hba': 5,
    u'hbd': 4,
    u'heavy_atoms': 14,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'ZWITTERION',
    u'mw_freebase': u'197.19',
    u'mw_monoisotopic': u'197.0688',
    u'num_alerts': 1,
    u'num_ro5_violations': 0,
    u'psa': u'103.78',
    u'qed_weighted': u'0.42',
    u'ro3_pass': u'N',
    u'rtb': 3},
   u'molecule_structures': {u'canonical_smiles': u'N[C@@H](Cc1ccc(O)c(O)c1)C(=O)O',
    u'standard_inchi': u'InChI=1S/C9H11NO4/c10-6(9(13)14)3-5-1-2-7(11)8(12)4-5/h1-2,4,6,11-12H,3,10H2,(H,13,14)/t6-/m0/s1',
    u'standard_inchi_key': u'WTDRDQBEARUVNC-LURJTMIESA-N'},
   u'molecule_synonyms': [{u'syn_type': u'OTHER', u'synonyms': u'Bendopa'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Bendopa'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Dopar'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Larodopa'},
    {u'syn_type': u'BAN', u'synonyms': u'Levodopa'},
    {u'syn_type': u'FDA', u'synonyms': u'Levodopa'},
    {u'syn_type': u'INN', u'synonyms': u'Levodopa'},
    {u'syn_type': u'JAN', u'synonyms': u'Levodopa'},
    {u'syn_type': u'USAN', u'synonyms': u'Levodopa'},
    {u'syn_type': u'USP', u'synonyms': u'Levodopa'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID26753566'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90341430'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'LEVODOPA',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-dopa',
   u'usan_stem_definition': u'dopamine receptor agonists',
   u'usan_substem': None,
   u'usan_year': 1969},
  {u'atc_classifications': [u'A10BB07'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 5384,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1984,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antidiabetic',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1073',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1073',
    u'parent_chembl_id': u'CHEMBL1073'},
   u'molecule_properties': {u'acd_logd': u'0.05',
    u'acd_logp': u'1.88',
    u'acd_most_apka': u'5.08',
    u'acd_most_bpka': u'1.33',
    u'alogp': u'1.90',
    u'aromatic_rings': 2,
    u'full_molformula': u'C21H27N5O4S',
    u'full_mwt': u'445.54',
    u'hba': 6,
    u'hbd': 3,
    u'heavy_atoms': 31,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'ACID',
    u'mw_freebase': u'445.54',
    u'mw_monoisotopic': u'445.1784',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'138.53',
    u'qed_weighted': u'0.60',
    u'ro3_pass': u'N',
    u'rtb': 7},
   u'molecule_structures': {u'canonical_smiles': u'Cc1cnc(cn1)C(=O)NCCc2ccc(cc2)S(=O)(=O)NC(=O)NC3CCCCC3',
    u'standard_inchi': u'InChI=1S/C21H27N5O4S/c1-15-13-24-19(14-23-15)20(27)22-12-11-16-7-9-18(10-8-16)31(29,30)26-21(28)25-17-5-3-2-4-6-17/h7-10,13-14,17H,2-6,11-12H2,1H3,(H,22,27)(H2,25,26,28)',
    u'standard_inchi_key': u'ZJJXGWJIGJFDTL-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE',
     u'synonyms': u'CP-28720'},
    {u'syn_type': u'BAN', u'synonyms': u'Glipizide'},
    {u'syn_type': u'FDA', u'synonyms': u'Glipizide'},
    {u'syn_type': u'INN', u'synonyms': u'Glipizide'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Glipizide'},
    {u'syn_type': u'USAN', u'synonyms': u'Glipizide'},
    {u'syn_type': u'USP', u'synonyms': u'Glipizide'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Glucotrol'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Glucotrol Xl'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'K-4024'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11111214'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11112714'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'GLIPIZIDE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'gli-',
   u'usan_stem_definition': u'antihyperglycemics',
   u'usan_substem': None,
   u'usan_year': 1974},
  {u'atc_classifications': [u'N06AF03'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'1',
   u'chebi_par_id': 8060,
   u'chirality': u'2',
   u'dosed_ingredient': False,
   u'first_approval': 1961,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1089',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1089',
    u'parent_chembl_id': u'CHEMBL1089'},
   u'molecule_properties': {u'acd_logd': u'-0.51',
    u'acd_logp': u'0.30',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'8.02',
    u'alogp': u'0.92',
    u'aromatic_rings': 1,
    u'full_molformula': u'C8H12N2',
    u'full_mwt': u'136.19',
    u'hba': 2,
    u'hbd': 2,
    u'heavy_atoms': 10,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'136.19',
    u'mw_monoisotopic': u'136.1000',
    u'num_alerts': 2,
    u'num_ro5_violations': 0,
    u'psa': u'38.04',
    u'qed_weighted': u'0.48',
    u'ro3_pass': u'Y',
    u'rtb': 3},
   u'molecule_structures': {u'canonical_smiles': u'NNCCc1ccccc1',
    u'standard_inchi': u'InChI=1S/C8H12N2/c9-10-7-6-8-4-2-1-3-5-8/h1-5,10H,6-7,9H2',
    u'standard_inchi_key': u'RMUCZJUITONUFY-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'OTHER', u'synonyms': u'Nardil'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nardil'},
    {u'syn_type': u'BAN', u'synonyms': u'Phenelzine'},
    {u'syn_type': u'FDA', u'synonyms': u'Phenelzine'},
    {u'syn_type': u'INN', u'synonyms': u'Phenelzine'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11111653'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11111654'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID50111200'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90341694'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'PHENELZINE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': None},
  {u'atc_classifications': [],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 8746,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1999,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1002',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1002',
    u'parent_chembl_id': u'CHEMBL1002'},
   u'molecule_properties': {u'acd_logd': u'-1.44',
    u'acd_logp': u'0.69',
    u'acd_most_apka': u'9.99',
    u'acd_most_bpka': u'9.62',
    u'alogp': u'0.94',
    u'aromatic_rings': 1,
    u'full_molformula': u'C13H21NO3',
    u'full_mwt': u'239.31',
    u'hba': 4,
    u'hbd': 4,
    u'heavy_atoms': 17,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'BASE',
    u'mw_freebase': u'239.31',
    u'mw_monoisotopic': u'239.1521',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'72.72',
    u'qed_weighted': u'0.62',
    u'ro3_pass': u'N',
    u'rtb': 5},
   u'molecule_structures': {u'canonical_smiles': u'CC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1',
    u'standard_inchi': u'InChI=1S/C13H21NO3/c1-13(2,3)14-7-12(17)9-4-5-11(16)10(6-9)8-15/h4-6,12,14-17H,7-8H2,1-3H3/t12-/m0/s1',
    u'standard_inchi_key': u'NDAUXUAQIAJITI-LBPRGKRZSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Levalbuterol'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Levolin'},
    {u'syn_type': u'INN', u'synonyms': u'Levosalbutamol'},
    {u'syn_type': u'USAN', u'synonyms': u'Levosalbutamol'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11111805'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11111806'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11112645'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11113612'},
    {u'syn_type': u'OTHER', u'synonyms': u'Xopenex'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': False,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'LEVOSALBUTAMOL',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': u'-sal-',
   u'usan_stem_definition': u'anti-inflammatory agents (salicylic acid derivatives)',
   u'usan_substem': None,
   u'usan_year': 1997},
  {u'atc_classifications': [u'S01EE01'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 6384,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1996,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antiglaucoma Agent',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1051',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1051',
    u'parent_chembl_id': u'CHEMBL1051'},
   u'molecule_properties': {u'acd_logd': u'4.28',
    u'acd_logp': u'4.28',
    u'acd_most_apka': None,
    u'acd_most_bpka': None,
    u'alogp': u'4.37',
    u'aromatic_rings': 1,
    u'full_molformula': u'C26H40O5',
    u'full_mwt': u'432.59',
    u'hba': 5,
    u'hbd': 3,
    u'heavy_atoms': 31,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'432.59',
    u'mw_monoisotopic': u'432.2876',
    u'num_alerts': 3,
    u'num_ro5_violations': 0,
    u'psa': u'86.99',
    u'qed_weighted': u'0.23',
    u'ro3_pass': u'N',
    u'rtb': 14},
   u'molecule_structures': {u'canonical_smiles': u'CC(C)OC(=O)CCC\\C=C/C[C@H]1[C@@H](O)C[C@@H](O)[C@@H]1CC[C@@H](O)CCc2ccccc2',
    u'standard_inchi': u'InChI=1S/C26H40O5/c1-19(2)31-26(30)13-9-4-3-8-12-22-23(25(29)18-24(22)28)17-16-21(27)15-14-20-10-6-5-7-11-20/h3,5-8,10-11,19,21-25,27-29H,4,9,12-18H2,1-2H3/b8-3-/t21-,22+,23+,24-,25+/m0/s1',
    u'standard_inchi_key': u'GGXICVAJURFBLW-CEYXHVGTSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'BAN', u'synonyms': u'Latanoprost'},
    {u'syn_type': u'FDA', u'synonyms': u'Latanoprost'},
    {u'syn_type': u'INN', u'synonyms': u'Latanoprost'},
    {u'syn_type': u'USAN', u'synonyms': u'Latanoprost'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'PHXA-41'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'XA-41'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Xalatan'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'1',
   u'oral': False,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'LATANOPROST',
   u'prodrug': u'1',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': u'-prost',
   u'usan_stem_definition': u'prostaglandins',
   u'usan_substem': None,
   u'usan_year': 1996},
  {u'atc_classifications': [u'M01AX01'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'1',
   u'chebi_par_id': 7443,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1991,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Anti-Inflammatory',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1070',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1070',
    u'parent_chembl_id': u'CHEMBL1070'},
   u'molecule_properties': {u'acd_logd': u'3.14',
    u'acd_logp': u'3.14',
    u'acd_most_apka': None,
    u'acd_most_bpka': None,
    u'alogp': u'2.94',
    u'aromatic_rings': 2,
    u'full_molformula': u'C15H16O2',
    u'full_mwt': u'228.29',
    u'hba': 2,
    u'hbd': 0,
    u'heavy_atoms': 17,
    u'med_chem_friendly': u'Y',
    u'molecular_species': None,
    u'mw_freebase': u'228.29',
    u'mw_monoisotopic': u'228.1150',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'26.30',
    u'qed_weighted': u'0.80',
    u'ro3_pass': u'N',
    u'rtb': 4},
   u'molecule_structures': {u'canonical_smiles': u'COc1ccc2cc(CCC(=O)C)ccc2c1',
    u'standard_inchi': u'InChI=1S/C15H16O2/c1-11(16)3-4-12-5-6-14-10-15(17-2)8-7-13(14)9-12/h5-10H,3-4H2,1-2H3',
    u'standard_inchi_key': u'BLXXJMDCKKHMKV-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE',
     u'synonyms': u'BRL-14777'},
    {u'syn_type': u'BAN', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'FDA', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'INN', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'JAN', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'USAN', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'USP', u'synonyms': u'Nabumetone'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Relafen'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11112767'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID26748824'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID8139968'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'NABUMETONE',
   u'prodrug': u'1',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1986},
  {u'atc_classifications': [u'R06AX13'],
   u'availability_type': u'2',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': None,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1993,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antihistaminic',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL998',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL998',
    u'parent_chembl_id': u'CHEMBL998'},
   u'molecule_properties': {u'acd_logd': u'3.90',
    u'acd_logp': u'3.90',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'4.27',
    u'alogp': u'5.00',
    u'aromatic_rings': 2,
    u'full_molformula': u'C22H23ClN2O2',
    u'full_mwt': u'382.88',
    u'hba': 3,
    u'hbd': 0,
    u'heavy_atoms': 27,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'382.88',
    u'mw_monoisotopic': u'382.1448',
    u'num_alerts': 0,
    u'num_ro5_violations': 1,
    u'psa': u'42.43',
    u'qed_weighted': u'0.73',
    u'ro3_pass': u'N',
    u'rtb': 2},
   u'molecule_structures': {u'canonical_smiles': u'CCOC(=O)N1CCC(=C2c3ccc(Cl)cc3CCc4cccnc24)CC1',
    u'standard_inchi': u'InChI=1S/C22H23ClN2O2/c1-2-27-22(26)25-12-9-15(10-13-25)20-19-8-7-18(23)14-17(19)6-5-16-4-3-11-24-21(16)20/h3-4,7-8,11,14H,2,5-6,9-10,12-13H2,1H3',
    u'standard_inchi_key': u'JCCNYMKQOSZNPW-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME',
     u'synonyms': u'Alavert'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u"Children's Claritin"},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Claritin'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Claritin Hives Relief'},
    {u'syn_type': u'TRADE_NAME',
     u'synonyms': u'Claritin Hives Relief Reditab'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Claritin Reditabs'},
    {u'syn_type': u'OTHER', u'synonyms': u'Claritin-D'},
    {u'syn_type': u'BAN', u'synonyms': u'Loratadine'},
    {u'syn_type': u'FDA', u'synonyms': u'Loratadine'},
    {u'syn_type': u'INN', u'synonyms': u'Loratadine'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Loratadine'},
    {u'syn_type': u'USAN', u'synonyms': u'Loratadine'},
    {u'syn_type': u'USP', u'synonyms': u'Loratadine'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Loratadine Redidose'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'SCH-29851'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID85231118'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID855843'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90340694'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'LORATADINE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-(a)tadine; -atadine',
   u'usan_stem_definition': u'tricyclic histaminic-H1 receptor antagonists, loratadine derivatives; tricyclic antiasthmatics',
   u'usan_substem': None,
   u'usan_year': 1986},
  {u'atc_classifications': [],
   u'availability_type': u'0',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 5555,
   u'chirality': u'0',
   u'dosed_ingredient': False,
   u'first_approval': 1982,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1037',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1037',
    u'parent_chembl_id': u'CHEMBL1037'},
   u'molecule_properties': {u'acd_logd': u'-1.45',
    u'acd_logp': u'0.55',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'12.29',
    u'alogp': u'0.29',
    u'aromatic_rings': 0,
    u'full_molformula': u'C10H19N3O2',
    u'full_mwt': u'213.28',
    u'hba': 3,
    u'hbd': 3,
    u'heavy_atoms': 15,
    u'med_chem_friendly': u'N',
    u'molecular_species': u'BASE',
    u'mw_freebase': u'213.28',
    u'mw_monoisotopic': u'213.1477',
    u'num_alerts': 2,
    u'num_ro5_violations': 0,
    u'psa': u'80.36',
    u'qed_weighted': u'0.47',
    u'ro3_pass': u'N',
    u'rtb': 3},
   u'molecule_structures': {u'canonical_smiles': u'NC(=N)NCC1COC2(CCCCC2)O1',
    u'standard_inchi': u'InChI=1S/C10H19N3O2/c11-9(12)13-6-8-7-14-10(15-8)4-2-1-3-5-10/h8H,1-7H2,(H4,11,12,13)',
    u'standard_inchi_key': u'HPBNRIOWIXYZFK-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Guanadrel'},
    {u'syn_type': u'INN', u'synonyms': u'Guanadrel'},
    {u'syn_type': u'OTHER', u'synonyms': u'Hylorel'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'GUANADREL',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1968},
  {u'atc_classifications': [u'N02CA04'],
   u'availability_type': u'0',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': None,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1962,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Vasoconstrictor (specific in migraine)',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1065',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1065',
    u'parent_chembl_id': u'CHEMBL1065'},
   u'molecule_properties': {u'acd_logd': u'1.67',
    u'acd_logp': u'1.81',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'7.15',
    u'alogp': u'2.27',
    u'aromatic_rings': 2,
    u'full_molformula': u'C21H27N3O2',
    u'full_mwt': u'353.46',
    u'hba': 3,
    u'hbd': 2,
    u'heavy_atoms': 26,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'353.46',
    u'mw_monoisotopic': u'353.2103',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'57.50',
    u'qed_weighted': u'0.89',
    u'ro3_pass': u'N',
    u'rtb': 4},
   u'molecule_structures': {u'canonical_smiles': u'CC[C@@H](CO)NC(=O)[C@H]1CN(C)[C@@H]2Cc3cn(C)c4cccc(C2=C1)c34',
    u'standard_inchi': u'InChI=1S/C21H27N3O2/c1-4-15(12-25)22-21(26)14-8-17-16-6-5-7-18-20(16)13(10-23(18)2)9-19(17)24(3)11-14/h5-8,10,14-15,19,25H,4,9,11-12H2,1-3H3,(H,22,26)/t14-,15+,19-/m1/s1',
    u'standard_inchi_key': u'KPJZHOPZRAFDTN-ZRGWGRIASA-N'},
   u'molecule_synonyms': [{u'syn_type': u'BAN', u'synonyms': u'Methysergide'},
    {u'syn_type': u'FDA', u'synonyms': u'Methysergide'},
    {u'syn_type': u'INN', u'synonyms': u'Methysergide'},
    {u'syn_type': u'USAN', u'synonyms': u'Methysergide'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID26751590'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90340563'},
    {u'syn_type': u'OTHER', u'synonyms': u'Sansert'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'1',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'METHYSERGIDE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-erg-',
   u'usan_stem_definition': u'ergot alkaloid derivatives',
   u'usan_substem': None,
   u'usan_year': 1970},
  {u'atc_classifications': [u'M01AE12'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'1',
   u'chebi_par_id': 7822,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1992,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Anti-Inflammatory',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1071',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1071',
    u'parent_chembl_id': u'CHEMBL1071'},
   u'molecule_properties': {u'acd_logd': u'0.09',
    u'acd_logp': u'3.15',
    u'acd_most_apka': u'4.28',
    u'acd_most_bpka': u'0.80',
    u'alogp': u'3.59',
    u'aromatic_rings': 3,
    u'full_molformula': u'C18H15NO3',
    u'full_mwt': u'293.32',
    u'hba': 3,
    u'hbd': 1,
    u'heavy_atoms': 22,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'ACID',
    u'mw_freebase': u'293.32',
    u'mw_monoisotopic': u'293.1052',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'63.33',
    u'qed_weighted': u'0.78',
    u'ro3_pass': u'N',
    u'rtb': 5},
   u'molecule_structures': {u'canonical_smiles': u'OC(=O)CCc1oc(c2ccccc2)c(n1)c3ccccc3',
    u'standard_inchi': u'InChI=1S/C18H15NO3/c20-16(21)12-11-15-19-17(13-7-3-1-4-8-13)18(22-15)14-9-5-2-6-10-14/h1-10H,11-12H2,(H,20,21)',
    u'standard_inchi_key': u'OFPXSFXSNFPTHF-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Daypro'},
    {u'syn_type': u'BAN', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'FDA', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'INN', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'JAN', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'USAN', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'USP', u'synonyms': u'Oxaprozin'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID50106814'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID85230870'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90340983'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'WY-21743'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'OXAPROZIN',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1973},
  {u'atc_classifications': [],
   u'availability_type': u'0',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': None,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1945,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antibacterial',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1243',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1243',
    u'parent_chembl_id': u'CHEMBL1243'},
   u'molecule_properties': {u'acd_logd': u'-0.11',
    u'acd_logp': u'1.74',
    u'acd_most_apka': u'4.85',
    u'acd_most_bpka': u'1.11',
    u'alogp': u'1.44',
    u'aromatic_rings': 2,
    u'full_molformula': u'C13H12N2O3S',
    u'full_mwt': u'276.31',
    u'hba': 4,
    u'hbd': 2,
    u'heavy_atoms': 19,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'ACID',
    u'mw_freebase': u'276.31',
    u'mw_monoisotopic': u'276.0569',
    u'num_alerts': 1,
    u'num_ro5_violations': 0,
    u'psa': u'97.64',
    u'qed_weighted': u'0.83',
    u'ro3_pass': u'N',
    u'rtb': 3},
   u'molecule_structures': {u'canonical_smiles': u'Nc1ccc(cc1)S(=O)(=O)NC(=O)c2ccccc2',
    u'standard_inchi': u'InChI=1S/C13H12N2O3S/c14-11-6-8-12(9-7-11)19(17,18)15-13(16)10-4-2-1-3-5-10/h1-9H,14H2,(H,15,16)',
    u'standard_inchi_key': u'PBCZLFBEBARBBI-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'OTHER_PC',
     u'synonyms': u'SID11112278'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID26746972'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID855720'},
    {u'syn_type': u'BAN', u'synonyms': u'Sulfabenzamide'},
    {u'syn_type': u'FDA', u'synonyms': u'Sulfabenzamide'},
    {u'syn_type': u'INN', u'synonyms': u'Sulfabenzamide'},
    {u'syn_type': u'USAN', u'synonyms': u'Sulfabenzamide'},
    {u'syn_type': u'USP', u'synonyms': u'Sulfabenzamide'},
    {u'syn_type': u'FDA', u'synonyms': u'Sulfabenzamide (Triple Sulfa)'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': False,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'SULFABENZAMIDE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': u'sulfa-',
   u'usan_stem_definition': u'antimicrobials (sulfonamides derivatives)',
   u'usan_substem': None,
   u'usan_year': 1971},
  {u'atc_classifications': [u'N02CC02'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 7478,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1998,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1278',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1278',
    u'parent_chembl_id': u'CHEMBL1278'},
   u'molecule_properties': {u'acd_logd': u'-0.85',
    u'acd_logp': u'1.15',
    u'acd_most_apka': u'11.52',
    u'acd_most_bpka': u'9.30',
    u'alogp': u'2.05',
    u'aromatic_rings': 2,
    u'full_molformula': u'C17H25N3O2S',
    u'full_mwt': u'335.46',
    u'hba': 3,
    u'hbd': 2,
    u'heavy_atoms': 23,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'BASE',
    u'mw_freebase': u'335.46',
    u'mw_monoisotopic': u'335.1667',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'73.58',
    u'qed_weighted': u'0.88',
    u'ro3_pass': u'N',
    u'rtb': 5},
   u'molecule_structures': {u'canonical_smiles': u'CNS(=O)(=O)CCc1ccc2[nH]cc(C3CCN(C)CC3)c2c1',
    u'standard_inchi': u'InChI=1S/C17H25N3O2S/c1-18-23(21,22)10-7-13-3-4-17-15(11-13)16(12-19-17)14-5-8-20(2)9-6-14/h3-4,11-12,14,18-19H,5-10H2,1-2H3',
    u'standard_inchi_key': u'AMKVXSZCKVJAGH-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'OTHER', u'synonyms': u'Amerge'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Amerge'},
    {u'syn_type': u'BAN', u'synonyms': u'Naratriptan'},
    {u'syn_type': u'INN', u'synonyms': u'Naratriptan'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Naratriptan'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'NARATRIPTAN',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-triptan',
   u'usan_stem_definition': u'antimigraine agents (5-HT1 receptor agonists),sumatriptan derivatives',
   u'usan_substem': None,
   u'usan_year': 1993},
  {u'atc_classifications': [u'G01AG02'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 82980,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1987,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Antifungal',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1306',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1306',
    u'parent_chembl_id': u'CHEMBL1306'},
   u'molecule_properties': {u'acd_logd': u'3.57',
    u'acd_logp': u'4.19',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'8.33',
    u'alogp': u'4.95',
    u'aromatic_rings': 3,
    u'full_molformula': u'C26H31Cl2N5O3',
    u'full_mwt': u'532.46',
    u'hba': 7,
    u'hbd': 0,
    u'heavy_atoms': 36,
    u'med_chem_friendly': u'N',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'532.46',
    u'mw_monoisotopic': u'531.1804',
    u'num_alerts': 0,
    u'num_ro5_violations': 1,
    u'psa': u'64.88',
    u'qed_weighted': u'0.41',
    u'ro3_pass': u'N',
    u'rtb': 8},
   u'molecule_structures': {u'canonical_smiles': u'CC(C)N1CCN(CC1)c2ccc(OC[C@H]3CO[C@@](Cn4cncn4)(O3)c5ccc(Cl)cc5Cl)cc2',
    u'standard_inchi': u'InChI=1S/C26H31Cl2N5O3/c1-19(2)31-9-11-32(12-10-31)21-4-6-22(7-5-21)34-14-23-15-35-26(36-23,16-33-18-29-17-30-33)24-8-3-20(27)13-25(24)28/h3-8,13,17-19,23H,9-12,14-16H2,1-2H3/t23-,26-/m0/s1',
    u'standard_inchi_key': u'BLSQLHNBWJLIBQ-OZXSUGGESA-N'},
   u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME',
     u'synonyms': u'Fungistat'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Gyno-Terazol'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Panlomyc'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'R-42470'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID144204230'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID56463234'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Terazol'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Terazol 3'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Terazol 7'},
    {u'syn_type': u'BAN', u'synonyms': u'Terconazole'},
    {u'syn_type': u'FDA', u'synonyms': u'Terconazole'},
    {u'syn_type': u'INN', u'synonyms': u'Terconazole'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Terconazole'},
    {u'syn_type': u'USAN', u'synonyms': u'Terconazole'},
    {u'syn_type': u'OTHER', u'synonyms': u'Triaconazole'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Triaconazole'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': False,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'TERCONAZOLE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': u'-conazole',
   u'usan_stem_definition': u'systemic antifungals (miconazole type)',
   u'usan_substem': None,
   u'usan_year': 1980},
  {u'atc_classifications': [u'A16AX04'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 50378,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 2002,
   u'first_in_class': u'1',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1337',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1337',
    u'parent_chembl_id': u'CHEMBL1337'},
   u'molecule_properties': {u'acd_logd': u'-1.50',
    u'acd_logp': u'0.99',
    u'acd_most_apka': u'3.14',
    u'acd_most_bpka': None,
    u'alogp': u'2.53',
    u'aromatic_rings': 1,
    u'full_molformula': u'C14H10F3NO5',
    u'full_mwt': u'329.23',
    u'hba': 5,
    u'hbd': 0,
    u'heavy_atoms': 23,
    u'med_chem_friendly': u'N',
    u'molecular_species': u'ACID',
    u'mw_freebase': u'329.23',
    u'mw_monoisotopic': u'329.0511',
    u'num_alerts': 4,
    u'num_ro5_violations': 0,
    u'psa': u'97.03',
    u'qed_weighted': u'0.37',
    u'ro3_pass': u'N',
    u'rtb': 4},
   u'molecule_structures': {u'canonical_smiles': u'[O-][N+](=O)c1cc(ccc1C(=O)C2C(=O)CCCC2=O)C(F)(F)F',
    u'standard_inchi': u'InChI=1S/C14H10F3NO5/c15-14(16,17)7-4-5-8(9(6-7)18(22)23)13(21)12-10(19)2-1-3-11(12)20/h4-6,12H,1-3H2',
    u'standard_inchi_key': u'OUBCNLGXQFSTLU-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Nitisinone'},
    {u'syn_type': u'INN', u'synonyms': u'Nitisinone'},
    {u'syn_type': u'USAN', u'synonyms': u'Nitisinone'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Orfadin'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'SC-0735'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'NITISINONE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 2002},
  {u'atc_classifications': [u'N07AX03'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': None,
   u'chirality': u'0',
   u'dosed_ingredient': False,
   u'first_approval': 2000,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL168815',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL168815',
    u'parent_chembl_id': u'CHEMBL168815'},
   u'molecule_properties': {u'acd_logd': u'0.37',
    u'acd_logp': u'2.44',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'9.51',
    u'alogp': u'1.04',
    u'aromatic_rings': 0,
    u'full_molformula': u'C10H17NOS',
    u'full_mwt': u'199.31',
    u'hba': 3,
    u'hbd': 0,
    u'heavy_atoms': 13,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'BASE',
    u'mw_freebase': u'199.31',
    u'mw_monoisotopic': u'199.1031',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'37.77',
    u'qed_weighted': u'0.59',
    u'ro3_pass': u'Y',
    u'rtb': 0},
   u'molecule_structures': {u'canonical_smiles': u'C[C@H]1OC2(CS1)CN3CCC2CC3',
    u'standard_inchi': u'InChI=1S/C10H17NOS/c1-8-12-10(7-13-8)6-11-4-2-9(10)3-5-11/h8-9H,2-7H2,1H3',
    u'standard_inchi_key': u'WUTYZMFRCNBCHQ-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'INN', u'synonyms': u'Cevimeline'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'CEVIMELINE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1997},
  {u'atc_classifications': [u'G04BD04'],
   u'availability_type': u'2',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 7856,
   u'chirality': u'0',
   u'dosed_ingredient': True,
   u'first_approval': 1975,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1231',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1231',
    u'parent_chembl_id': u'CHEMBL1231'},
   u'molecule_properties': {u'acd_logd': u'4.15',
    u'acd_logp': u'5.05',
    u'acd_most_apka': u'11.95',
    u'acd_most_bpka': u'8.24',
    u'alogp': u'4.65',
    u'aromatic_rings': 1,
    u'full_molformula': u'C22H31NO3',
    u'full_mwt': u'357.49',
    u'hba': 4,
    u'hbd': 1,
    u'heavy_atoms': 26,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'357.49',
    u'mw_monoisotopic': u'357.2304',
    u'num_alerts': 2,
    u'num_ro5_violations': 0,
    u'psa': u'49.77',
    u'qed_weighted': u'0.49',
    u'ro3_pass': u'N',
    u'rtb': 10},
   u'molecule_structures': {u'canonical_smiles': u'CCN(CC)CC#CCOC(=O)C(O)(C1CCCCC1)c2ccccc2',
    u'standard_inchi': u'InChI=1S/C22H31NO3/c1-3-23(4-2)17-11-12-18-26-21(24)22(25,19-13-7-5-8-14-19)20-15-9-6-10-16-20/h5,7-8,13-14,20,25H,3-4,6,9-10,15-18H2,1-2H3',
    u'standard_inchi_key': u'XIQVNETUBQGFHX-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME',
     u'synonyms': u'Anturol'},
    {u'syn_type': u'OTHER', u'synonyms': u'Ditropan'},
    {u'syn_type': u'BAN', u'synonyms': u'Oxybutynin'},
    {u'syn_type': u'FDA', u'synonyms': u'Oxybutynin'},
    {u'syn_type': u'INN', u'synonyms': u'Oxybutynin'},
    {u'syn_type': u'USAN', u'synonyms': u'Oxybutynin'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Oxytrol'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Oxytrol For Women'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID50105262'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID90341037'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'OXYBUTYNIN',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1962},
  {u'atc_classifications': [u'N02CC07'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': None,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 2001,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': None,
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1279',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1279',
    u'parent_chembl_id': u'CHEMBL1279'},
   u'molecule_properties': {u'acd_logd': u'-1.92',
    u'acd_logp': u'0.93',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'10.38',
    u'alogp': u'1.41',
    u'aromatic_rings': 2,
    u'full_molformula': u'C14H17N3O',
    u'full_mwt': u'243.30',
    u'hba': 2,
    u'hbd': 3,
    u'heavy_atoms': 18,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'BASE',
    u'mw_freebase': u'243.30',
    u'mw_monoisotopic': u'243.1372',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'70.91',
    u'qed_weighted': u'0.75',
    u'ro3_pass': u'N',
    u'rtb': 2},
   u'molecule_structures': {u'canonical_smiles': u'CN[C@@H]1CCc2[nH]c3ccc(cc3c2C1)C(=O)N',
    u'standard_inchi': u'InChI=1S/C14H17N3O/c1-16-9-3-5-13-11(7-9)10-6-8(14(15)18)2-4-12(10)17-13/h2,4,6,9,16-17H,3,5,7H2,1H3,(H2,15,18)/t9-/m1/s1',
    u'standard_inchi_key': u'XPSQPHWEGNHMSK-SECBINFHSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'OTHER', u'synonyms': u'Frova'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Frova'},
    {u'syn_type': u'BAN', u'synonyms': u'Frovatriptan'},
    {u'syn_type': u'FDA', u'synonyms': u'Frovatriptan'},
    {u'syn_type': u'INN', u'synonyms': u'Frovatriptan'},
    {u'syn_type': u'USAN', u'synonyms': u'Frovatriptan'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'FROVATRIPTAN',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-triptan',
   u'usan_stem_definition': u'antimigraine agents (5-HT1 receptor agonists),sumatriptan derivatives',
   u'usan_substem': None,
   u'usan_year': 1997},
  {u'atc_classifications': [],
   u'availability_type': u'0',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 8021,
   u'chirality': u'1',
   u'dosed_ingredient': True,
   u'first_approval': 1988,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Dopamine Agonist',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1275',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1275',
    u'parent_chembl_id': u'CHEMBL531'},
   u'molecule_properties': {u'acd_logd': u'2.50',
    u'acd_logp': u'3.90',
    u'acd_most_apka': None,
    u'acd_most_bpka': u'7.98',
    u'alogp': u'4.11',
    u'aromatic_rings': 2,
    u'full_molformula': u'C19H26N2S.CH4O3S',
    u'full_mwt': u'410.59',
    u'hba': 2,
    u'hbd': 1,
    u'heavy_atoms': 22,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'NEUTRAL',
    u'mw_freebase': u'314.49',
    u'mw_monoisotopic': u'314.1817',
    u'num_alerts': 0,
    u'num_ro5_violations': 0,
    u'psa': u'44.33',
    u'qed_weighted': u'0.92',
    u'ro3_pass': u'N',
    u'rtb': 4},
   u'molecule_structures': {u'canonical_smiles': u'CCCN1C[C@H](CSC)C[C@H]2[C@H]1Cc3c[nH]c4cccc2c34.CS(=O)(=O)O',
    u'standard_inchi': u'InChI=1S/C19H26N2S.CH4O3S/c1-3-7-21-11-13(12-22-2)8-16-15-5-4-6-17-19(15)14(10-20-17)9-18(16)21;1-5(2,3)4/h4-6,10,13,16,18,20H,3,7-9,11-12H2,1-2H3;1H3,(H,2,3,4)/t13-,16-,18-;/m1./s1',
    u'standard_inchi_key': u'UWCVGPLTGZWHGS-ZORIOUSZSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE',
     u'synonyms': u'LY-127809'},
    {u'syn_type': u'FDA', u'synonyms': u'Pergolide Mesylate'},
    {u'syn_type': u'OTHER', u'synonyms': u'Pergolide Mesylate'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Pergolide Mesylate'},
    {u'syn_type': u'USAN', u'synonyms': u'Pergolide Mesylate'},
    {u'syn_type': u'USP', u'synonyms': u'Pergolide Mesylate'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Permax'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID11533039'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID144204371'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'1',
   u'oral': True,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'PERGOLIDE MESYLATE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': False,
   u'usan_stem': u'-erg-',
   u'usan_stem_definition': u'ergot alkaloid derivatives',
   u'usan_substem': None,
   u'usan_year': 1979},
  {u'atc_classifications': [u'D10AD03', u'D10AD53'],
   u'availability_type': u'1',
   u'biotherapeutic': None,
   u'black_box_warning': u'0',
   u'chebi_par_id': 31174,
   u'chirality': u'2',
   u'dosed_ingredient': True,
   u'first_approval': 1996,
   u'first_in_class': u'0',
   u'helm_notation': None,
   u'indication_class': u'Anti-Acne',
   u'inorganic_flag': u'0',
   u'max_phase': 4,
   u'molecule_chembl_id': u'CHEMBL1265',
   u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1265',
    u'parent_chembl_id': u'CHEMBL1265'},
   u'molecule_properties': {u'acd_logd': u'5.35',
    u'acd_logp': u'8.21',
    u'acd_most_apka': u'4.23',
    u'acd_most_bpka': None,
    u'alogp': u'6.28',
    u'aromatic_rings': 3,
    u'full_molformula': u'C28H28O3',
    u'full_mwt': u'412.52',
    u'hba': 3,
    u'hbd': 1,
    u'heavy_atoms': 31,
    u'med_chem_friendly': u'Y',
    u'molecular_species': u'ACID',
    u'mw_freebase': u'412.52',
    u'mw_monoisotopic': u'412.2038',
    u'num_alerts': 0,
    u'num_ro5_violations': 1,
    u'psa': u'46.53',
    u'qed_weighted': u'0.55',
    u'ro3_pass': u'N',
    u'rtb': 4},
   u'molecule_structures': {u'canonical_smiles': u'COc1ccc(cc1C23CC4CC(CC(C4)C2)C3)c5ccc6cc(ccc6c5)C(=O)O',
    u'standard_inchi': u'InChI=1S/C28H28O3/c1-31-26-7-6-23(21-2-3-22-12-24(27(29)30)5-4-20(22)11-21)13-25(26)28-14-17-8-18(15-28)10-19(9-17)16-28/h2-7,11-13,17-19H,8-10,14-16H2,1H3,(H,29,30)',
    u'standard_inchi_key': u'LZCDAPDGXCYOEH-UHFFFAOYSA-N'},
   u'molecule_synonyms': [{u'syn_type': u'BAN', u'synonyms': u'Adapalene'},
    {u'syn_type': u'FDA', u'synonyms': u'Adapalene'},
    {u'syn_type': u'INN', u'synonyms': u'Adapalene'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Adapalene'},
    {u'syn_type': u'USAN', u'synonyms': u'Adapalene'},
    {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CD-271'},
    {u'syn_type': u'OTHER', u'synonyms': u'Differin'},
    {u'syn_type': u'TRADE_NAME', u'synonyms': u'Differin'},
    {u'syn_type': u'OTHER_PC', u'synonyms': u'SID26758039'}],
   u'molecule_type': u'Small molecule',
   u'natural_product': u'0',
   u'oral': False,
   u'parenteral': False,
   u'polymer_flag': False,
   u'pref_name': u'ADAPALENE',
   u'prodrug': u'0',
   u'structure_type': u'MOL',
   u'therapeutic_flag': True,
   u'topical': True,
   u'usan_stem': None,
   u'usan_stem_definition': None,
   u'usan_substem': None,
   u'usan_year': 1991}],
 u'page_meta': {u'limit': 20,
  u'next': u'/chembl/api/data/molecule.json?max_phase=4&limit=20&offset=20',
  u'offset': 0,
  u'previous': None,
  u'total_count': 2795}}

Page structure

The dictionary contains two top-level keys:

  1. molecules array
  2. page_meta dictionary

This means that by requesting data from the url-endpoint we are not getting the whole result set but a single page. The page consists of a single portion of data (molecules array) and some meta information about the page and whole result set (page_meta dictionary).

In [29]:
# The default size of single page is 20 results:
len(url_approved_drugs['molecules'])
Out[29]:
20
In [30]:
# But it can be extended up to 1000 results by providing `limit` argument:
url = url_stem + "/molecule.json?max_phase=4&limit=200"
bigger_page = requests.get(url).json()

print url
print len(bigger_page['molecules'])
https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4&limit=200
200
In [31]:
#Let's see what data is provided in `page-meta` dictionary:
url_approved_drugs['page_meta']
Out[31]:
{u'limit': 20,
 u'next': u'/chembl/api/data/molecule.json?max_phase=4&limit=20&offset=20',
 u'offset': 0,
 u'previous': None,
 u'total_count': 2795}

It gives following information:

  1. limit - current size of the page (the actual amount of data can be smaller if the whole result set is smaller than page size or we are looking at the last page)
  2. offset - the difference between first element in the whole result set and the first element on current page
  3. next - url poiting to the next page (if it exists)
  4. previous - url pointing to the previous page (if it exists)
  5. total_count - number of elements in the whole result set

This means that in order to get the whole result set we need to loop through the pages:

In [32]:
# Getting all approved drugs using url endpoint
localhost = "http://localhost/"
url_approved_drugs = requests.get(localhost + "chemblws/molecule.json?max_phase=4&limit=1000").json()
results = url_approved_drugs['molecules']
while url_approved_drugs['page_meta']['next']:
    url_approved_drugs = requests.get(localhost + url_approved_drugs['page_meta']['next']).json()
    results += url_approved_drugs['molecules']
print len(results)
print len(results) == url_approved_drugs['page_meta']['total_count']
2795
True

With the client-generated results, we no longer have to worry about pagination:

In [33]:
# The QuerySet object returned by the client is a lazily-evaluated iterator
# This means that it's ready to use and it will try to reduce the amount of server requests
# All results are cached as well so they are fetched from server only once.
approved_drugs = molecule.filter(max_phase=4)

# Getting the lenght of the whole result set is easy:
print len(approved_drugs)

# So is getting a single element:
print approved_drugs[123]

# Or a chunk of elements:
print approved_drugs[2:5]

# Or using in the loops or list comprehensions:
drug_smiles = [drug['molecule_structures']['canonical_smiles'] for drug in approved_drugs if drug['molecule_structures']]
print len(drug_smiles)
2795
{u'max_phase': 4, u'usan_stem': u'-steride', u'parenteral': False, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 521033, u'first_approval': 2001, u'atc_classifications': [u'G04CB02'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'JWJOTENAMICLJG-QWBYCMEYSA-N', u'canonical_smiles': u'C[C@]12CC[C@H]3[C@@H](CC[C@H]4NC(=O)C=C[C@]34C)[C@@H]1CC[C@@H]2C(=O)Nc5cc(ccc5C(F)(F)F)C(F)(F)F', u'standard_inchi': u'InChI=1S/C27H30F6N2O2/c1-24-11-9-17-15(4-8-21-25(17,2)12-10-22(36)35-21)16(24)6-7-19(24)23(37)34-20-13-14(26(28,29)30)3-5-18(20)27(31,32)33/h3,5,10,12-13,15-17,19,21H,4,6-9,11H2,1-2H3,(H,34,37)(H,35,36)/t15-,16-,17-,19+,21+,24-,25+/m0/s1'}, u'chirality': u'1', u'usan_substem': None, u'pref_name': u'DUTASTERIDE', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL1200969', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 2, u'med_chem_friendly': u'Y', u'psa': u'58.20', u'full_mwt': u'528.53', u'ro3_pass': u'N', u'mw_freebase': u'528.53', u'num_alerts': 1, u'acd_logd': u'5.93', u'full_molformula': u'C27H30F6N2O2', u'hba': 2, u'molecular_species': u'NEUTRAL', u'mw_monoisotopic': u'528.2211', u'heavy_atoms': 37, u'aromatic_rings': 1, u'alogp': u'5.70', u'acd_most_apka': u'13.32', u'qed_weighted': u'0.49', u'acd_most_bpka': None, u'hbd': 2, u'acd_logp': u'5.93', u'rtb': 4}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'testosterone reductase inhibitors', u'natural_product': u'1', u'black_box_warning': u'0', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'Avodart', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Dutasteride', u'syn_type': u'BAN'}, {u'synonyms': u'Dutasteride', u'syn_type': u'FDA'}, {u'synonyms': u'Dutasteride', u'syn_type': u'INN'}, {u'synonyms': u'Dutasteride', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Dutasteride', u'syn_type': u'USAN'}, {u'synonyms': u'GG-745', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'GI-198745', u'syn_type': u'RESEARCH_CODE'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1200969', u'parent_chembl_id': u'CHEMBL1200969'}, u'indication_class': None, u'usan_year': 1997, u'first_in_class': u'0', u'topical': False, u'oral': True}
[{u'max_phase': 4, u'usan_stem': u'-vudine', u'parenteral': False, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 63581, u'first_approval': 1994, u'atc_classifications': [u'J05AF04'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'XNKLLVCARDGLGL-JGVFFNPUSA-N', u'canonical_smiles': u'CC1=CN([C@@H]2O[C@H](CO)C=C2)C(=O)NC1=O', u'standard_inchi': u'InChI=1S/C10H12N2O4/c1-6-4-12(10(15)11-9(6)14)8-3-2-7(5-13)16-8/h2-4,7-8,13H,5H2,1H3,(H,11,14,15)/t7-,8+/m0/s1'}, u'chirality': u'1', u'usan_substem': None, u'pref_name': u'STAVUDINE', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL991', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'med_chem_friendly': u'N', u'psa': u'78.87', u'full_mwt': u'224.21', u'ro3_pass': u'N', u'mw_freebase': u'224.21', u'num_alerts': 1, u'acd_logd': u'-0.65', u'full_molformula': u'C10H12N2O4', u'hba': 4, u'molecular_species': u'NEUTRAL', u'mw_monoisotopic': u'224.0797', u'heavy_atoms': 16, u'aromatic_rings': 0, u'alogp': u'-0.52', u'acd_most_apka': u'9.47', u'qed_weighted': u'0.62', u'acd_most_bpka': None, u'hbd': 2, u'acd_logp': u'-0.65', u'rtb': 2}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'antineoplastics,antivirals (zidovudine group) (exception: edoxudine)', u'natural_product': u'1', u'black_box_warning': u'1', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'BMY-27857', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'D4T', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'SID7971664', u'syn_type': u'OTHER_PC'}, {u'synonyms': u'SID855856', u'syn_type': u'OTHER_PC'}, {u'synonyms': u'SID90341632', u'syn_type': u'OTHER_PC'}, {u'synonyms': u'Stavudine', u'syn_type': u'BAN'}, {u'synonyms': u'Stavudine', u'syn_type': u'FDA'}, {u'synonyms': u'Stavudine', u'syn_type': u'INN'}, {u'synonyms': u'Stavudine', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Stavudine', u'syn_type': u'USAN'}, {u'synonyms': u'Stavudine', u'syn_type': u'USP'}, {u'synonyms': u'Zerit', u'syn_type': u'OTHER'}, {u'synonyms': u'Zerit', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Zerit Xr', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'd4T', u'syn_type': u'OTHER'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL991', u'parent_chembl_id': u'CHEMBL991'}, u'indication_class': u'Antiviral', u'usan_year': 1991, u'first_in_class': u'0', u'topical': False, u'oral': True}, {u'max_phase': 4, u'usan_stem': u'-dopa', u'parenteral': False, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 15765, u'first_approval': 1970, u'atc_classifications': [u'N04BA01'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'WTDRDQBEARUVNC-LURJTMIESA-N', u'canonical_smiles': u'N[C@@H](Cc1ccc(O)c(O)c1)C(=O)O', u'standard_inchi': u'InChI=1S/C9H11NO4/c10-6(9(13)14)3-5-1-2-7(11)8(12)4-5/h1-2,4,6,11-12H,3,10H2,(H,13,14)/t6-/m0/s1'}, u'chirality': u'1', u'usan_substem': None, u'pref_name': u'LEVODOPA', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL1009', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'med_chem_friendly': u'Y', u'psa': u'103.78', u'full_mwt': u'197.19', u'ro3_pass': u'N', u'mw_freebase': u'197.19', u'num_alerts': 1, u'acd_logd': u'-3.67', u'full_molformula': u'C9H11NO4', u'hba': 5, u'molecular_species': u'ZWITTERION', u'mw_monoisotopic': u'197.0688', u'heavy_atoms': 14, u'aromatic_rings': 1, u'alogp': u'-2.09', u'acd_most_apka': u'2.24', u'qed_weighted': u'0.42', u'acd_most_bpka': u'8.85', u'hbd': 4, u'acd_logp': u'-1.15', u'rtb': 3}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'dopamine receptor agonists', u'natural_product': u'0', u'black_box_warning': u'0', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'Bendopa', u'syn_type': u'OTHER'}, {u'synonyms': u'Bendopa', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Dopar', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Larodopa', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Levodopa', u'syn_type': u'BAN'}, {u'synonyms': u'Levodopa', u'syn_type': u'FDA'}, {u'synonyms': u'Levodopa', u'syn_type': u'INN'}, {u'synonyms': u'Levodopa', u'syn_type': u'JAN'}, {u'synonyms': u'Levodopa', u'syn_type': u'USAN'}, {u'synonyms': u'Levodopa', u'syn_type': u'USP'}, {u'synonyms': u'SID26753566', u'syn_type': u'OTHER_PC'}, {u'synonyms': u'SID90341430', u'syn_type': u'OTHER_PC'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1009', u'parent_chembl_id': u'CHEMBL1009'}, u'indication_class': u'Antiparkinsonian', u'usan_year': 1969, u'first_in_class': u'0', u'topical': False, u'oral': True}, {u'max_phase': 4, u'usan_stem': u'gli-', u'parenteral': False, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 5384, u'first_approval': 1984, u'atc_classifications': [u'A10BB07'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'ZJJXGWJIGJFDTL-UHFFFAOYSA-N', u'canonical_smiles': u'Cc1cnc(cn1)C(=O)NCCc2ccc(cc2)S(=O)(=O)NC(=O)NC3CCCCC3', u'standard_inchi': u'InChI=1S/C21H27N5O4S/c1-15-13-24-19(14-23-15)20(27)22-12-11-16-7-9-18(10-8-16)31(29,30)26-21(28)25-17-5-3-2-4-6-17/h7-10,13-14,17H,2-6,11-12H2,1H3,(H,22,27)(H2,25,26,28)'}, u'chirality': u'2', u'usan_substem': None, u'pref_name': u'GLIPIZIDE', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL1073', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'med_chem_friendly': u'Y', u'psa': u'138.53', u'full_mwt': u'445.54', u'ro3_pass': u'N', u'mw_freebase': u'445.54', u'num_alerts': 0, u'acd_logd': u'0.05', u'full_molformula': u'C21H27N5O4S', u'hba': 6, u'molecular_species': u'ACID', u'mw_monoisotopic': u'445.1784', u'heavy_atoms': 31, u'aromatic_rings': 2, u'alogp': u'1.90', u'acd_most_apka': u'5.08', u'qed_weighted': u'0.60', u'acd_most_bpka': u'1.33', u'hbd': 3, u'acd_logp': u'1.88', u'rtb': 7}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'antihyperglycemics', u'natural_product': u'0', u'black_box_warning': u'0', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'CP-28720', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'Glipizide', u'syn_type': u'BAN'}, {u'synonyms': u'Glipizide', u'syn_type': u'FDA'}, {u'synonyms': u'Glipizide', u'syn_type': u'INN'}, {u'synonyms': u'Glipizide', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Glipizide', u'syn_type': u'USAN'}, {u'synonyms': u'Glipizide', u'syn_type': u'USP'}, {u'synonyms': u'Glucotrol', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Glucotrol Xl', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'K-4024', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'SID11111214', u'syn_type': u'OTHER_PC'}, {u'synonyms': u'SID11112714', u'syn_type': u'OTHER_PC'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL1073', u'parent_chembl_id': u'CHEMBL1073'}, u'indication_class': u'Antidiabetic', u'usan_year': 1974, u'first_in_class': u'0', u'topical': False, u'oral': True}]
2395

Ordering results

Similar to filtering, it's also possible to order the result set, there is a parameter called order_by that is reposnsible for ordering:

In [34]:
# Sort approved drugs by molecular weight ascending (from lightest to heaviest) and get the first (lightest) element
lightest_drug = molecule.filter(max_phase=4).order_by('molecule_properties__mw_freebase')[0]
lightest_drug['pref_name']
Out[34]:
u'AMMONIA N 13'
In [39]:
# Sort approved drugs by molecular weight descending (from heaviest to lightest) and get the first (heaviest) element
heaviest_drug = molecule.filter(max_phase=4).filter(molecule_properties__mw_freebase__isnull=False).order_by('-molecule_properties__mw_freebase')[0]
heaviest_drug['pref_name']
Out[39]:
u'MIPOMERSEN SODIUM'
In [40]:
# Do the same using url endpoint
url_1 = url_stem + "/molecule.json?max_phase=4&order_by=molecule_properties__mw_freebase"
lightest_drug = requests.get(url_1).json()['molecules'][0]
print url_1
print lightest_drug['pref_name']

url_2 = url_stem + "/molecule.json?max_phase=4&molecule_properties__mw_freebase__isnull=False&order_by=-molecule_properties__mw_freebase"
heaviest_drug = requests.get(url_2).json()['molecules'][0]
print url_2
print heaviest_drug['pref_name']
https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4&order_by=molecule_properties__mw_freebase
AMMONIA N 13
https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4&molecule_properties__mw_freebase__isnull=False&order_by=-molecule_properties__mw_freebase
MIPOMERSEN SODIUM

Filtering molecules using SMILES

It is possible to filter molecules by SMILES

In [41]:
# Atorvastatin...
smiles = "CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CC[C@@H](O)C[C@@H](O)CC(=O)O"

# By default, the type of search used is 'exact search' which means that only compounds with exacly same SMILES string will be picked:
result = molecule.filter(molecule_structures__canonical_smiles=smiles)
print len(result)

# This is quivalent of:
result1 = molecule.filter(molecule_structures__canonical_smiles__exact=smiles)
print len(result1)

# For convenience, we have a shortcut call:
result2 = molecule.filter(smiles=smiles)
print len(result2)

# Checking if they are all the same: 
print result[0]['pref_name'] == result1[0]['pref_name'] == result2[0]['pref_name']

# And because SMILES string are unique in ChEMBL, this is similar to:
result3 = molecule.get(smiles)
print result[0]['pref_name'] == result3['pref_name']
1
1
1
True
True

There are however different filtering operators that can be applied to SMILES; the most important one is called flexmatch, which will return all structures described by given SMILES string even if this is non-canonical SMILES.

In [42]:
# Flexmatch will look for structures that match given SMILES, ignoring stereo:
records = molecule.filter(molecule_structures__canonical_smiles__flexmatch=smiles)
print len(records)

for record in records:
    print("{:15s} : {}".format(record["molecule_chembl_id"], record['molecule_structures']['canonical_smiles']))
3
CHEMBL1207181   : CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CCC(O)C[C@@H](O)CC(=O)O
CHEMBL1790017   : CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CC[C@@H](O)C[C@@H](O)CC(=O)O.CC(C)c5c(C(=O)Nc6ccccc6)c(c7ccccc7)c(c8ccc(F)cc8)n5CC[C@@H](O)C[C@@H](O)CC(=O)O
CHEMBL1487      : CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CC[C@@H](O)C[C@@H](O)CC(=O)O

Unlike with the exact string match, it is possible to retrieve multiple records when a SMILES is used for the flexmatch lookup (i.e. it is potentially one-to-many instead of one-to-one as the ID lookups are). This is due to the nature of flexmatch.

In our case two structures are returned, CHEMBL1487 (Atorvastatin) and CHEMBL1207181, which is the same structure as the former but with one of the two stereocentres undefined.

In [43]:
# The same can be achieved using url endpoint:

url_1 = url_stem + "/molecule.json?molecule_structures__canonical_smiles=" + quote(smiles)
url_2 = url_stem + "/molecule.json?molecule_structures__canonical_smiles__exact=" + quote(smiles)
url_3 = url_stem + "/molecule.json?smiles=" + quote(smiles)
url_4 = url_stem + "/molecule.json?molecule_structures__canonical_smiles__flexmatch=" + quote(smiles)

exact_match = requests.get(url_1).json()
explicit_exact_match = requests.get(url_2).json()
convenient_shortcut = requests.get(url_3).json()
flexmatch = requests.get(url_4).json()

print url_1
print len(exact_match['molecules'])

print url_2
print len(explicit_exact_match['molecules'])

print url_3
print len(convenient_shortcut['molecules'])

print url_4
print len(flexmatch['molecules'])

print exact_match == explicit_exact_match
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_structures__canonical_smiles=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O
1
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_structures__canonical_smiles__exact=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O
1
https://www.ebi.ac.uk/chembl/api/data/molecule.json?smiles=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O
1
https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_structures__canonical_smiles__flexmatch=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O
3
True

A further note on SMILES searches

The URL-based example above used the HTTP GET method, which means the SMILES are passed via the URL. This can cause problems where the SMILES inludes the '/', '\' or '#' characters, for example:

In [44]:
# CHEMBL477889
smiles = "[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]"

url = url_stem + "/molecule/" + smiles + ".json"
result = requests.get(url)

print url
print result.ok
print result.status_code
https://www.ebi.ac.uk/chembl/api/data/molecule/[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-].json
False
404

There are two solutions to this problem:

  1. When using GET, use urllib.quote function
  2. Use POST with X-HTTP-Method-Override: GET header
In [45]:
# Method one:
url = url_stem + "/molecule/" + quote(smiles) + ".json"
result_by_get = requests.get(url)

print url
print result_by_get.ok
print result_by_get.status_code
https://www.ebi.ac.uk/chembl/api/data/molecule/%5BNa%2B%5D.CO%5BC%40%40H%5D%28CCC%23C%5CC%3DC/CCCC%28C%29CCCCC%3DC%29C%28%3DO%29%5BO-%5D.json
True
200
In [46]:
# Method two:
url = url_stem + "/molecule.json"
result_by_post = requests.post(url, data={"smiles": smiles}, headers={"X-HTTP-Method-Override": "GET"})

print result_by_post.ok
print result_by_post.status_code
True
200
In [47]:
print smiles
print result_by_post.json()
print result_by_get.json() == result_by_post.json()['molecules'][0]
[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]
{u'page_meta': {u'previous': None, u'total_count': 1, u'offset': 0, u'limit': 20, u'next': None}, u'molecules': [{u'max_phase': 0, u'usan_stem': None, u'parenteral': False, u'dosed_ingredient': False, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': None, u'first_approval': None, u'atc_classifications': [], u'prodrug': u'-1', u'molecule_structures': {u'standard_inchi_key': u'RLSXKIUQYFNFBI-PJZMSVRGSA-M', u'canonical_smiles': u'[Na+].CO[C@@H](CCC#C\\C=C/CCCC(C)CCCCC=C)C(=O)[O-]', u'standard_inchi': u'InChI=1S/C20H32O3.Na/c1-4-5-6-12-15-18(2)16-13-10-8-7-9-11-14-17-19(23-3)20(21)22;/h4,7-8,18-19H,1,5-6,10,12-17H2,2-3H3,(H,21,22);/q;+1/p-1/b8-7-;/t18?,19-;/m0./s1'}, u'chirality': u'-1', u'usan_substem': None, u'pref_name': None, u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL477889', u'therapeutic_flag': False, u'molecule_properties': {u'num_ro5_violations': 1, u'med_chem_friendly': u'Y', u'psa': u'46.53', u'full_mwt': u'342.45', u'ro3_pass': u'N', u'mw_freebase': u'320.47', u'num_alerts': 3, u'acd_logd': u'2.47', u'full_molformula': u'C20H31NaO3', u'hba': 3, u'molecular_species': u'ACID', u'mw_monoisotopic': u'320.2351', u'heavy_atoms': 23, u'aromatic_rings': 0, u'alogp': u'6.02', u'acd_most_apka': u'3.40', u'qed_weighted': u'0.23', u'acd_most_bpka': None, u'hbd': 1, u'acd_logp': u'6.03', u'rtb': 15}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': None, u'natural_product': u'-1', u'black_box_warning': u'0', u'availability_type': u'-1', u'inorganic_flag': u'-1', u'molecule_synonyms': [{u'synonyms': u'(Z)-Stellettic Acid B Sodium Salt', u'syn_type': u'OTHER'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL477889', u'parent_chembl_id': u'CHEMBL477888'}, u'indication_class': None, u'usan_year': None, u'first_in_class': u'-1', u'topical': False, u'oral': False}]}
True

Substructure-searching

As well as ID lookups, the web services may also be used to perform substructure searches. Currently, only SMILES-based searches are supported, although this could change if there is is a need for more powerful search abilities (e.g. SMARTS searching).

In [48]:
# Lapatinib contains the following core...

query = "c4ccc(Nc2ncnc3ccc(c1ccco1)cc23)cc4"
In [49]:
# Perform substructure search on query using client

substructure = new_client.substructure
records = substructure.filter(smiles=query)
In [50]:
# ... and using raw url-endpoint

url = url_stem + "/substructure/" + quote(query) + ".json"
result = requests.get(url).json()

print url
print result['page_meta']['total_count']
https://www.ebi.ac.uk/chembl/api/data/substructure/c4ccc%28Nc2ncnc3ccc%28c1ccco1%29cc23%29cc4.json
57

Similarity searching

The web services may also be used to perform SMILES-based similarity searches.

In [52]:
# Lapatinib
smiles = "CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2"
In [53]:
# Peform similarity search on molecule using client...

# Note that a percentage similarity must be supplied.
similarity = new_client.similarity
res = similarity.filter(smiles=smiles, similarity=85)

len(res)
Out[53]:
25
In [54]:
##### ... and using raw url-endpoint

url = url_stem + "/similarity/" + quote(smiles) + "/85.json"
result = requests.get(url).json()

print url
print result['page_meta']['total_count']
https://www.ebi.ac.uk/chembl/api/data/similarity/CS%28%3DO%29%28%3DO%29CCNCc1oc%28cc1%29c2ccc3ncnc%28Nc4ccc%28OCc5cccc%28F%29c5%29c%28Cl%29c4%29c3c2/85.json
25

Versions for a parent structure

The versions (e.g. salt forms) for a parent compound may be retrieved for a ChEMBL ID. Keep in mind that a parent structure is one that has had salt/solvate components removed; it corresponds to the bioactive moiety and its use facilitates structure searching, comparison etc. A compound without salt/solvate components is its own parent.

In [56]:
# Neostigmine (a parent)...

chembl_id = "CHEMBL278020" 
In [57]:
records = new_client.molecule_form.get(chembl_id)['molecule_forms']

records
Out[57]:
[{u'molecule_chembl_id': u'CHEMBL54126', u'parent': u'False'},
 {u'molecule_chembl_id': u'CHEMBL278020', u'parent': u'True'},
 {u'molecule_chembl_id': u'CHEMBL211471', u'parent': u'False'}]

The ChEMBL ID lookup service may now be used to get the full records for the salt forms...

In [58]:
for chembl_id in [x["molecule_chembl_id"] for x in records if x["parent"] == 'False']:
    record = new_client.molecule.get(chembl_id)          
    print("{:10s} : {}".format(chembl_id, record['molecule_structures']['canonical_smiles']))
CHEMBL54126 : [Br-].CN(C)C(=O)Oc1cccc(c1)[N+](C)(C)C
CHEMBL211471 : COS(=O)(=O)[O-].CN(C)C(=O)Oc1cccc(c1)[N+](C)(C)C

Drug mechanism(s) of action

The mechanisms of action of marketed drugs may be retrieved.

Note that this data may not be recorded for the parent structure, but rather for one of its versions. For example, the marketed drug, Tykerb, containing the active ingredient Lapatinib (CHEMBL554) is actually the ditosylate monohydrate (CHEMBL1201179).

In [59]:
# Molecule forms for Lapatinib are used here...

for chembl_id in (x["molecule_chembl_id"] for x in new_client.molecule_form.get("CHEMBL554")['molecule_forms']):
        
    print("The recorded mechanisms of action of '{}' are...".format(chembl_id))
        
    mechanism_records = new_client.mechanism.filter(molecule_chembl_id=chembl_id)
    
    if mechanism_records:
    
        for mech_rec in mechanism_records:
    
            print("{:10s} : {}".format(mech_rec["molecule_chembl_id"], mech_rec["mechanism_of_action"]))
        
    print("-" * 50)
The recorded mechanisms of action of 'CHEMBL1201179' are...
CHEMBL1201179 : Epidermal growth factor receptor erbB1 inhibitor
CHEMBL1201179 : Receptor protein-tyrosine kinase erbB-2 inhibitor
--------------------------------------------------
The recorded mechanisms of action of 'CHEMBL554' are...
--------------------------------------------------

Image query

The webservice may be used to obtain a PNG image of a compound.

In [60]:
# Lapatinib ditosylate monohydrate (Tykerb)

chembl_id = "CHEMBL1201179" 
In [61]:
png = new_client.image.get(chembl_id)

Image(png)
Out[61]:

Bioactivities

All bioactivity records for a compound may be retrieved via its ChEMBL ID.

In [62]:
# Lapatinib

chembl_id = "CHEMBL554" 
In [63]:
records = new_client.activity.filter(molecule_chembl_id=chembl_id)

len(records), records[:2]
Out[63]:
(1620,
 [{u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': None, u'uo_units': u'UO_0000065', u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL554', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibition of ErbB1', u'record_id': 564927, u'document_chembl_id': u'CHEMBL1149340', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 1766862, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2006, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL910357', u'published_value': u'0.012', u'published_relation': u'=', u'standard_value': u'12', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'uM', u'pchembl_value': u'7.92', u'published_type': u'IC50', u'activity_comment': None}, {u'document_journal': u'Nat. Biotechnol.', u'bao_endpoint': u'BAO_0000034', u'potential_duplicate': None, u'uo_units': None, u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'assay_type': u'B', u'standard_flag': False, u'molecule_chembl_id': u'CHEMBL554', u'target_organism': u'Homo sapiens', u'assay_description': u'Average Binding Constant for p38-alpha; NA=Not Active at 10 uM', u'record_id': 405809, u'document_chembl_id': u'CHEMBL1144455', u'bao_format': u'BAO_0000357', u'standard_units': None, u'activity_id': 1650333, u'standard_type': u'Kd', u'target_chembl_id': u'CHEMBL260', u'data_validity_comment': None, u'standard_relation': None, u'document_year': 2005, u'target_pref_name': u'MAP kinase p38 alpha', u'assay_chembl_id': u'CHEMBL860914', u'published_value': None, u'published_relation': None, u'standard_value': None, u'qudt_units': None, u'published_units': None, u'pchembl_value': None, u'published_type': u'Kd', u'activity_comment': u'Not Active'}])

Targets

The webservices may also be used to obtain information on biological targets, i.e. the entities, such as proteins, cells or organisms, with which compounds interact.

In [64]:
# Like with any other resource type, a complete list of targets can be requested using the client:
records = new_client.target.all()
len(records)
Out[64]:
10774
In [65]:
records[:4]
Out[65]:
[{u'target_components': [{u'component_id': 774, u'accession': u'P16519', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'NEC2', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'PCSK2', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Neuroendocrine convertase 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'KEX2-like endoprotease 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Prohormone convertase 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Proprotein convertase 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'NEC 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'PC2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'3.4.21.94', u'syn_type': u'EC_NUMBER'}]}], u'target_chembl_id': u'CHEMBL2433', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Prohormone convertase 2', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 775, u'accession': u'P25025', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'IL8RB', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'CXCR2', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'C-X-C chemokine receptor type 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CDw128b', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'GRO/MGSA receptor', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'High affinity interleukin-8 receptor B', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'IL-8 receptor type 2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CD_antigen=CD182', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CXC-R2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CXCR-2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'IL-8R B', u'syn_type': u'UNIPROT'}]}], u'target_chembl_id': u'CHEMBL2434', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Interleukin-8 receptor B', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 778, u'accession': u'P35813', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'PPPM1A', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'PPM1A', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Protein phosphatase 1A', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Protein phosphatase 2C isoform alpha', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Protein phosphatase IA', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'PP2C-alpha', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'3.1.3.16', u'syn_type': u'EC_NUMBER'}]}], u'target_chembl_id': u'CHEMBL2437', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Protein phosphatase 2C alpha', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 779, u'accession': u'P11678', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'EPER ', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'EPO', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'EPP', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'EPX', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Eosinophil peroxidase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Eosinophil peroxidase light chain', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Eosinophil peroxidase heavy chain', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'EPO', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'1.11.1.7', u'syn_type': u'EC_NUMBER'}]}], u'target_chembl_id': u'CHEMBL2438', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Eosinophil peroxidase', u'species_group_flag': False, u'organism': u'Homo sapiens'}]
In [66]:
# Count target types...

counts = Counter([x["target_type"] for x in records if x["target_type"]])

for targetType, n in sorted(counts.items(), key=itemgetter(1), reverse=True): print("{:30s} {:-4d}".format(targetType, n))
SINGLE PROTEIN                 6018
ORGANISM                       2136
CELL-LINE                      1630
PROTEIN COMPLEX                 261
TISSUE                          242
PROTEIN FAMILY                  217
SELECTIVITY GROUP                97
PROTEIN COMPLEX GROUP            44
NUCLEIC-ACID                     29
PROTEIN-PROTEIN INTERACTION      21
UNKNOWN                          18
SMALL MOLECULE                   18
SUBCELLULAR                       9
METAL                             8
PROTEIN NUCLEIC-ACID COMPLEX      6
OLIGOSACCHARIDE                   6
MACROMOLECULE                     5
CHIMERIC PROTEIN                  4
PHENOTYPE                         2
NO TARGET                         1
ADMET                             1
UNCHECKED                         1

ChEMBL ID

Data on any target type may be obtained via a lookup of its ChEMBL ID.

In [67]:
# Receptor protein-tyrosine kinase erbB-2
    
chembl_id = "CHEMBL1824"
In [68]:
record = new_client.target.get(chembl_id)

record
Out[68]:
{u'organism': u'Homo sapiens',
 u'pref_name': u'Receptor protein-tyrosine kinase erbB-2',
 u'species_group_flag': False,
 u'target_chembl_id': u'CHEMBL1824',
 u'target_components': [{u'accession': u'P04626',
   u'component_id': 120,
   u'component_type': u'PROTEIN',
   u'target_component_synonyms': [{u'component_synonym': u'2.7.10.1',
     u'syn_type': u'EC_NUMBER'},
    {u'component_synonym': u'CD_antigen=CD340', u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'ERBB2', u'syn_type': u'GENE_SYMBOL'},
    {u'component_synonym': u'HER2 ', u'syn_type': u'GENE_SYMBOL'},
    {u'component_synonym': u'MLN 19', u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'MLN19', u'syn_type': u'GENE_SYMBOL'},
    {u'component_synonym': u'Metastatic lymph node gene 19 protein',
     u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'NEU', u'syn_type': u'GENE_SYMBOL'},
    {u'component_synonym': u'NGL', u'syn_type': u'GENE_SYMBOL'},
    {u'component_synonym': u'Proto-oncogene Neu', u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'Proto-oncogene c-ErbB-2',
     u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'Receptor tyrosine-protein kinase erbB-2',
     u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'Tyrosine kinase-type cell surface receptor HER2',
     u'syn_type': u'UNIPROT'},
    {u'component_synonym': u'p185erbB2', u'syn_type': u'UNIPROT'}]}],
 u'target_type': u'SINGLE PROTEIN'}

Remember that all targets have ChEMBL IDs, not just proteins...

In [69]:
# SK-BR-3, a cell line over-expressing erbB-2

chembl_id = "CHEMBL613834" 
In [70]:
record = new_client.target.get(chembl_id)

record
Out[70]:
{u'organism': u'Homo sapiens',
 u'pref_name': u'SK-BR-3',
 u'species_group_flag': False,
 u'target_chembl_id': u'CHEMBL613834',
 u'target_components': [],
 u'target_type': u'CELL-LINE'}

UniProt ID

Data on protein targets may also be obtained using UniProt ID.

In [71]:
# UniProt ID for erbB-2, a target of Lapatinib

uniprot_id = "P04626"
In [72]:
records = new_client.target.filter(target_components__accession=uniprot_id)
print [(x['target_chembl_id'], x['pref_name']) for x in records]
[(u'CHEMBL2363049', u'Epidermal growth factor receptor'), (u'CHEMBL1824', u'Receptor protein-tyrosine kinase erbB-2'), (u'CHEMBL2111431', u'Epidermal growth factor receptor and ErbB2 (HER1 and HER2)')]

Bioactivities

All bioactivities for a target may be retrieved.

In [73]:
# Receptor protein-tyrosine kinase erbB-2

chembl_id = "CHEMBL1824"
In [74]:
records = new_client.activity.filter(target_chembl_id=chembl_id)

len(records)
Out[74]:
5893
In [75]:
# Show assays with most recorded bioactivities...

for assay, n in sorted(Counter((x["assay_chembl_id"], x["assay_description"]) for x in records).items(), key=itemgetter(1), reverse=True)[:5]:
    
    print("{:-4d} {:14s} {}".format(n, *assay))
1742 CHEMBL1909205  DRUGMATRIX: Protein Tyrosine Kinase, ERBB2 (HER2) enzyme inhibition (substrate: Poly(Glu:Tyr))
 583 CHEMBL1963780  PUBCHEM_BIOASSAY: Navigating the Kinome. (Class of assay: other) Panel member name: ERBB2
 369 CHEMBL1962280  GSK_PKIS: ERBB2 mean inhibition at 0.1 uM [Nanosyn]
 369 CHEMBL1962281  GSK_PKIS: ERBB2 mean inhibition at 1 uM [Nanosyn]
  83 CHEMBL874202   Inhibition of human epidermal growth factor receptor-2 (HER-2) autophosphorylation

Approved Drugs

The approved drugs for a target may be retrieved.

In [76]:
# Receptor protein-tyrosine kinase erbB-2

chembl_id = "CHEMBL1824"
In [77]:
activities = new_client.mechanism.filter(target_chembl_id=chembl_id)
compound_ids = [x['molecule_chembl_id'] for x in activities]
approved_drugs = new_client.molecule.filter(molecule_chembl_id__in=compound_ids).filter(max_phase=4)

for record in approved_drugs:
    
    print("{:10s} : {}".format(record["molecule_chembl_id"], record["pref_name"]))
CHEMBL1201179 : LAPATINIB DITOSYLATE
CHEMBL1201585 : TRASTUZUMAB
CHEMBL1743082 : TRASTUZUMAB EMTANSINE
CHEMBL2007641 : PERTUZUMAB
CHEMBL2105712 : AFATINIB DIMALEATE

Assays

Information about assays may also be retrieved by the web services.

Assay details

Details of an assay may be retrieved via its ChEMBL ID.

In [78]:
# Inhibitory activity against epidermal growth factor receptor

chembl_id = "CHEMBL674106"
In [79]:
record = new_client.assay.get(chembl_id)

record
Out[79]:
{u'assay_category': None,
 u'assay_cell_type': None,
 u'assay_chembl_id': u'CHEMBL674106',
 u'assay_organism': None,
 u'assay_strain': None,
 u'assay_subcellular_fraction': None,
 u'assay_tax_id': None,
 u'assay_test_type': None,
 u'assay_tissue': None,
 u'assay_type': u'B',
 u'assay_type_description': u'Binding',
 u'bao_format': u'BAO_0000357',
 u'cell_chembl_id': None,
 u'confidence_description': u'Homologous single protein target assigned',
 u'confidence_score': 8,
 u'description': u'Inhibitory activity against epidermal growth factor receptor',
 u'document_chembl_id': u'CHEMBL1146682',
 u'relationship_description': u'Homologous protein target assigned',
 u'relationship_type': u'H',
 u'src_assay_id': None,
 u'src_id': 1,
 u'target_chembl_id': u'CHEMBL203'}

Bioactivities

All bioactivity records for an assay may be requested.

In [80]:
records = new_client.activity.filter(assay_chembl_id=chembl_id)

len(records), records[:2]
Out[80]:
(16,
 [{u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': None, u'uo_units': u'UO_0000065', u'canonical_smiles': u'C=CCOc1ccc2ncnc(Nc3ccc(OCc4ccccc4)cc3)c2c1', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL540701', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against epidermal growth factor receptor', u'record_id': 15417, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 204578, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL674106', u'published_value': u'79', u'published_relation': u'=', u'standard_value': u'79', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'nM', u'pchembl_value': u'7.10', u'published_type': u'IC50', u'activity_comment': None}, {u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': None, u'uo_units': u'UO_0000065', u'canonical_smiles': u'CS(=O)(=O)CCNCCCCOc1ccc2ncnc(Nc3cccc(c3)C#C)c2c1', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL517907', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against epidermal growth factor receptor', u'record_id': 15415, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 203325, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL674106', u'published_value': u'11', u'published_relation': u'=', u'standard_value': u'11', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'nM', u'pchembl_value': u'7.96', u'published_type': u'IC50', u'activity_comment': None}])

Other resources

As noted previously, there are many other resources that can be useful. They won't be covered in this document in a great detail but some examples may be helpful.

In [81]:
# Documents - retrieve all publications published after 1985 in 5th volume.
print new_client.document.filter(doc_type='PUBLICATION').filter(year__gt=1985).filter(volume=5)
[{u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(95)00172-P', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'1039', u'last_page': u'1042', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128608', u'issue': u'10'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(95)00210-K', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'1295', u'last_page': u'1300', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128609', u'issue': u'12'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(95)00259-V', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'1467', u'last_page': u'1470', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128610', u'issue': u'14'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(95)00327-P', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'1933', u'last_page': u'1936', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128613', u'issue': u'17'}, '...(remaining elements truncated)...']
In [82]:
# Cell lines:
print new_client.cell_line.get('CHEMBL3307242')
{u'efo_id': u'EFO_0002312', u'cell_id': 2, u'cell_source_tissue': u'Lyphoma', u'cellosaurus_id': u'CVCL_2676', u'cell_description': u'P3HR-1', u'cell_source_tax_id': 9606, u'cell_source_organism': u'Homo sapiens', u'cell_chembl_id': u'CHEMBL3307242', u'clo_id': u'CLO_0008331', u'cell_name': u'P3HR-1'}
In [83]:
# Protein class:
print new_client.protein_class.filter(l6="CAMK protein kinase AMPK subfamily")
[{u'protein_class_id': 409, u'l6': u'CAMK protein kinase AMPK subfamily', u'l7': None, u'l4': u'CAMK protein kinase group', u'l5': u'CAMK protein kinase CAMK1 family', u'l2': u'Kinase', u'l3': u'Protein Kinase', u'l1': u'Enzyme', u'l8': None}]
In [84]:
# Source:
print new_client.source.filter(src_short_name="ATLAS")
[{u'src_description': u'Gene Expression Atlas Compounds', u'src_short_name': u'ATLAS', u'src_id': 26}]
In [85]:
# Target component:
print new_client.target_component.get(375)
{u'component_id': 375, u'description': u'Reverse transcriptase/RNaseH', u'component_type': u'PROTEIN', u'sequence': u'PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKRKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIRVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLRTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGID', u'accession': u'Q72547', u'protein_classifications': [{u'protein_classification_id': 646}], u'tax_id': 11676, u'organism': u'Human immunodeficiency virus 1', u'target_component_synonyms': [{u'component_synonym': u'Reverse transcriptase/RNaseH', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'pol', u'syn_type': u'GENE_SYMBOL'}]}
In [86]:
# ChEMBL ID Lookup: check if CHEMBL1 is a molecule, assay or target:
print new_client.chembl_id_lookup.get("CHEMBL1")['entity_type']
COMPOUND
In [87]:
# ATC class:
print new_client.atc_class.get('H03AA03')
{u'level1_description': u'SYSTEMIC HORMONAL PREPARATIONS, EXCL. ', u'level1': u'H', u'level2': u'H03', u'level3': u'H03A', u'level4': u'H03AA', u'level5': u'H03AA03', u'who_id': u'who1673', u'level4_description': u'Thyroid hormones', u'who_name': u'combinations of levothyroxine and liothyronine', u'level3_description': u'THYROID PREPARATIONS', u'level2_description': u'THYROID THERAPY'}
In [ ]: