Beaker

RDKit and OSRA in Bottle on Tornado

Michał Nowotka
ChEMBL Group
EMBL-EBI

Overview

  1. Motivation
  2. Ingredients
  3. Overview
  4. Examples
  5. Future
~$ whoami:

  • ChEMBL group staff member
  • Web Applications Developer
  • RDKit enthusiast end everyday user
  • Available via email, skype, SO, github, etc.

Disclaimer

  • This is a pet-project
  • Developed in free time
  • Unfinished
  • Proof-of-concept

Motivation

RDKit installation process:


Expectations:


        activate rdkit-virtualenv
        pip install rdkit
    


Reality:

  • 859 hits for build on rdkit-discuss
  • 498 hits for install

Motivation

  • Virtualenv is essential tool for python developers
  • Remember PIL and Pillow case?
  • VMs and Doker are not (yet?) an answer

Motivation

Beaker is to RDkit like Sorl to Lucene.

Beaker is to RDkit like aquarium to fish.

  • Server platform and RDKit container
  • Provides cheminformatics tools
  • REST-like HTTP API
  • Easy to use from any programming language
  • Install Beaker on one machine instead of installing RDKit on many hardware/software configurations

Motivation

  • More and more people are using RDKit as a service
  • Why not to try to standardise it?
  • Maybe even include in distribution...
  • So we don't have to reinvent the wheel all the time

Motivation

I'm into webservices anyway:
https://www.ebi.ac.uk/chemblws2


So why something different? Software stack!


  • Django ORM
  • Tastypie
  • Kilolines of code
  • And RDKit webservices can be dead simple!

RDKit and OSRA
together?

  • Complementary Cheminformatics libraries
  • OSRA extends possible RDKit input formats
  • Both are Open
  • OSRA is even harder to install and available only in C++

Ingredients

  • OSRA - utility designed to convert graphical representations of chemical structures.
  • RDKit - Cheminformatics and Machine Learning Software.
  • Bottle - fast, simple and lightweight WSGI micro web-framework for Python.
  • Tornado - Python web framework and asynchronous networking library.

Why this choice?

  • This needs to be lightweight.
  • But fast and efficient.
  • With small number of small dependencies.
  • Well known, standard, virtualenv-friendly dependencies
  • Small and simple codebase.
  • Generic, elegant, robust API

Overview

Format conversion:


  • ctab2smiles / smiles2ctab
  • ctab2inchi / inchi2ctab
  • ctab2image / image2ctab
  • inchi2inchiKey

Overview

  • All methods implemented as POST and some (x2ctab) as GET
  • For GET, parameters have to be base64 encoded
  • All methods support batch processing

Demo


                        curl -X POST http://beaker/ctab2image -F filedata="@out.mol" > out.png
                        curl -X POST http://beaker/image2ctab --data-binary "@aspirin.jpg" > out.mol
                        curl -X POST http://beaker/smiles2ctab --data-binary "@a.smi"
                        curl -X POST http://beaker/smiles2ctab --data "CNc1ncnc2c1ncn2Cc1cccc(F)c1F
                        curl -X POST http://beaker/inchi2inchiKey --data "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"
                    


Better example: Clippy

Potential use cases

  • Access from languages like java script, ruby
  • Webapplications
  • Mobile apps (camera + OSRA + RDKit)
  • Small desktop apps (clippy)

Future

  • Different output formats: json, jsonp, xml
  • Implement stub methods
  • Compound descriptors: logP, TPSA, molWt, etc.
  • Editing molecules: addHs, removeHs, kekulize
  • Ring information, SSSR, sanitize...
  • IUPAC names
  • Pymol, matplotlib

Code!

Beaker code is available as github repository:
https://github.com/mnowotka/chembl_beaker

Thank you!

Questions?