Curation
Interface

Application overview

Michał Nowotka
ChEMBL Group
EMBL-EBI

Overview

  1. Application architecture
  2. Python interface
  3. RESTful API
  4. Web interface demo
  5. Extras

Application
architecture

Used languages:

  • Python,
  • JavaScript,
  • HTML5,
  • CSS,
  • (Java, SQL, C++, PP),
  • [CoffeeScript, less]

Application
architecture

Network architecture: (thin) Client - Server

  • Presentation tier: JavaScript, HTML5, CSS
  • Application tier: Python (Django), third-party apps (Java, C++, PP)
  • Data tier: Python (Django ORM), SQL

Application
architecture

Model View Controller

  • Model: Django ORM (Python)
  • View: ICanHaz.js (JavaScript, browser side)
  • Controller: Django (Python)

Three accessability
layers

  • Python modules
  • RESTful API
  • Web interface

Object Relational
Mapping

Technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language.

In simple words:

  • DB Table ➡ Class
  • DB Column ➡ Class attribute

Example code


class CompoundStructures(ChemblCoreAbstractModel):

    molecule = models.OneToOneField(MoleculeDictionary)
    molfile = ChemblTextField(null=True)
    standard_inchi = ChemblCharField(max_length=4000)
    standard_inchi_key = ChemblCharField(unique=True)
    canonical_smiles = ChemblCharField(db_index=True)
    molformula = ChemblCharField(help_text="Molecular formula of compound")

    ...

                    

Classes code is generated semi-automatically from existing database

Basic Usage


from chembl.models import MoleculeDictionary

molecule = MoleculeDictionary.objects.get(molregno=97)
assertEqual(molecule.pref_name, 'PRAZOSIN')
assertEqual(molecule.molecule_type, 'Small molecule')
                    

Filtering


Assays.objects.filter(curated_by__curated_by__startswith='Expert')
Assays.objects.filter(description__icontains='affinity')
Assays.objects.filter(assay_cell_type__startswith='CHO')
Assays.objects.filter(assay_tissue__endswith='Brain')
Assays.objects.filter(chembl__isnull=False).exclude(chembl__entity_type__exact='ASSAY')
Assays.objects.filter(updated_on__range=(start_date, end_date))
Assays.objects.filter(activity_count__isnull=False).exclude(activity_count__gte=5)
Assays.objects.filter(doc__doc_id=9964)
Assays.objects.filter(src__src_id=1).exists()
                    

Basic relations


# test OneToOneFields:
assertEquals(molecule.compoundproperties.molecular_species, 'NEUTRAL')
assertEquals(molecule.moleculehierarchy.parent_molecule, molecule)
assertEquals(molecule.compoundstructures.standard_inchi_key, 'IENZQIKPVFGBNW-UHFFFAOYSA-N')

act = molecule.activities_set.all()[0]
assertEquals(act.activity_type, 'ED50')
rec = molecule.compoundrecords_set.all()[0]

synonyms = molecule.moleculesynonyms_set.all()[0]
assertEquals(synonyms.synonyms, 'CP-12299')
                    

Many to Many relations back and forth


td = TargetDictionary.objects.get(pk=104088)
docs = td.docs.all()

doc = Docs.objects.get(pk=57482)
targets = doc.targetdictionary_set.all()
                    

Chemistry awareness


ctabs = CompoundMols.objects.with_substructure(smiles)
ids = ctabs.values_list('molecule_id').distinct()

ctabs = CompoundMols.objects.similar_to(smiles,simscore)
                    

More complicated stuff


TargetType.objects.filter(parent_type__isnull=False).exclude(
parent_type__in=map(lambda x: x[0],TargetType.objects.
values_list('target_type').distinct())).exists()
                    

Real world usage


def checkOSRA(molecule):
    img = molecule.compoundimages.png_500
    im = Image.open(StringIO.StringIO(molecule.compoundimages.png_500))
    canonical_smiles = molecule.compoundstructures.canonical_smiles
    smile = smileFromImage(img, OSRA_BINARIES_LOCATION, canonical_smiles)
    im.show()
    return canonical_smiles == Chem.MolToSmiles(Chem.MolFromSmiles(smile[0]), True)
                    

Tests

All examples are taken from test.py file (2267 lines!). All classes/fields/relations are covered.


             python manage.py test chembl
         

All examples can be executed in interactive shell:

             python manage.py shell
         

Tests

Shell can be configured to display SQL statements executed by middleware:


 python manage.py debugsqlshell
 >>> from chembl.models import MoleculeDictionary
 >>> molecules = MoleculeDictionary.objects.all()
 >>> molecules.count()
 SELECT COUNT(*)
 FROM "MOLECULE_DICTIONARY" [1.82ms]

 1254575

         

So, why not SQL?

  • There is nothing wrong in SQL
  • In fact, SQL is the only language to interact with relational DBs
  • But it's not perfect in ChEMBL case...

Problems with SQL

  • There is as many SQL dialects as DB engines
  • e.g. Oracle doesn't support limit
  • Code with inline SQL is not portable across db engines
  • Injected code can't be validated - it's error prone
  • When parametrised, injected SQL is exposed to attacks

ORM

  • Provides DB agnostic interface
  • Takes care about generating valid SQL
  • Class objects can be de/serialised
  • Data migration across db engines are possible
  • Less sensitive to schema changes
  • In fact, schema changes can be done by changing models

Example - cross engine migration


python manage.py migrate --sourceDatabase=ora --targetDatabase=pg

Migrating chemtst to curation_interface
ChemblIdLookup [################################] 2010/2010
Version [################################] 1/1
Docs [################################] 48/48
Source [################################] 1/1
MoleculeDictionary [# ] 51/1281
                    

RESTful API

All model classes are automatically exposed as REST resources. Documentation is generated automatically as well!

Real time examples

Basic API queries


http://localhost:8000/api/v1/moleculedictionary/?format=json

http://localhost:8000/api/v1/compoundstructures/2?format=json

http://localhost:8000/api/v1/compoundstructures/2.json

http://localhost:8000/api/v1/targetdictionary/set/11283;11292/
                    

Basic Filtering


/api/v1/moleculedictionary/?molregno__gte=3&format=json

/api/v1/moleculedictionary/?molregno__gte=3&structure_type__startswith=M&format=json

/api/v1/compoundstructures/?molformula__icontains=H20&format=json
                    

Advanced Filtering


http://localhost:8000/api/v1/moleculedictionary/search/?q=TRIMETHOPRIM&format=json
                    

DEMO

Real time examples

Thank you!

Questions?