Reverse engineering Marvin
Michał Nowotka
ChEMBL Group
EMBL-EBI
New Java vulnerabilitydiscovered
Unspecified vulnerability in the Java Runtime Environment (JRE) component in Oracle Java SE 7 Update 21 and earlier, 6 Update 45 and earlier, and 5.0 Update 45 and earlier, and OpenJDK 7, allows remote attackers to affect confidentiality, integrity, and availability via unknown vectors related to 2D. NOTE: the previous information is from the June 2013 CPU. Oracle has not commented on claims from another vendor that this issue allows remote attackers to bypass the Java sandbox via vectors related to “Incorrect image attribute verification” in 2D.
Consequences
Java enabled browsers are highly vulnerable
(thehackernews.com, 27-03-2013
)
Firefox 26 Released With On Demand Java Plugin Feature
(omgubuntu.co.uk, 10-12-2013
)
Apple updates Safari web plugin blocker to disable new Java vulnerability
(9to5mac.com, 29-08-2013
)
- Chrome does not support Java 7 on the Mac platform anyway...
More consequences
JavaScript to the rescue
Curation Interface detects Java availability and decides at runtime, which version to load.
Compounds as JSON
- Smaller than images
- Highly compressible
- Better quality
- Web friendly
Use cases:
- Interactive web widgets
- Easily stored in DBs
- Webservices?
Problems with viewer
- Every web application is sandboxed by browser
- No way to access clipboard
- How to copy from viewer and paste to sketcher?
Solution - SO
Stack Overflow question:
How does Trello access the user's clipboard?
_.defer =>
$clipboardContainer = $("#clipboard-container")
$clipboardContainer.empty().show()
$("<textarea id='clipboard'></textarea>")
.val(@value)
.appendTo($clipboardContainer)
.focus()
.select()
Append invisible textarea, fill it with molfile, set focus, select whole text when user press Ctrl
.
Marvin for JS (sketcher) limitations
- Not all functionality present in Java version can be reimplemented in JavaScript
- Format conversion / 2D coords / stereo info...
- Need to be performed on the server side
- Webservices!
- ChemAxon requires separate licence to use their webservices!
- But at least they publish specification
Open source solution - Beaker
RDKit and OSRA in Bottle on Tornado
What is Beaker?
- A portable, lightweight webserver
- REST-speaking
- CORS-ready
- Wraps RDKit and OSRA
- Built on Bottle and Tornado
What it does?
- Format conversion
- Compound recognition
- Image generation (including JSON)
- Fingerprints, descriptors
- Marvin 4 JS compatible webservices
Potential use cases
- Access from languages like java script, ruby
- Web applications
- Mobile apps (camera + OSRA + RDKit)
- Small desktop apps (clippy)
- Marvin Backend
- Part of webservices?
New webservices
- Released last week
- Different software stack: Java/Spring -> Python/Django
- Can run on Oracle, Postgres, (MySQL)
New webservices
Can run on any machine:
New webservices
Image generation:
- Improved quality
- Two engines: RDKit and Indigo
- Computing coordinates
- Dimensions
New webservices
JSONP and CORS support
Can be used from JavaScript
Bio.js ChEMBL component can be improved
New webservices
NoSQL approach to caching:
- New webservices are intended to be used outside ChEMBL
- They can use only public part of the DB schema
- No materialised views
- Requests can be expensive
- Caching is required
Cache characteristics
- Once cached, request won't change until next ChEMBL release
- Cache should be shared across many production machines
- Independent of output format
- Available from python, supported by EBI
- Failproof, timeout
Our implementation
- Key-value store built on MongoDB
- Key is MD5 hash of certain request parameters
- Value is base64-encoded, z-lib compressed pickle of Django QuerySet
- Values are divided into 16MB chunks to bypass MongoDB limitation
- Timeout set to 1 second.
How to monitor cache?
Sentry!
- Realtime event logging and aggregation platform
- Specializes in monitoring errors
- Alternative to standard user feedback loop.
- Can be used with any programming language.
Sentry is just a part of deployment environment
Other parts
- Fabric
- PIP
- Virtualenv/Virtualenv wrapper
- Rsync
Relationship between components
Deployment in practice
Almost zero downtime (apache restart)
Curation Interface update
- Github-like image comparison
- Document segmentation
- Use case scenarios
Document segmentation
Document segmentation
CI use cases