After a couple of years publishing news of Cheméo on the company website, I decided to create a specific blog for Cheméo. This will allow me dive a bit more into the scientific and technical details of running and developing Cheméo.

A website like Cheméo is not as simple as it looks. In fact I am a fan of the motto "Hide the structural complexity to enhance the functional simplicity". Sometimes you need, because of the problems you deal with, structural complexity, but you still want for the end user the functional simplicity.

An example of complexity hidden for the simplicity is the following, you search first for compounds with a critical temperature below 550K, a boiling point above 370K and you want to display the melting point if available. After getting the list of compounds in less than 100ms, you click on the Dipropyl sulphone to get the data page loaded in less than 75ms.

To get an extremely fast search, you need an index with a kind of columnar datastore, that is, you basically do search through list of property values which are sorted. Imagine a series of big arrays, each array contains all the sorted critical temperatures. This way you can directly get all the compounds with a critical temperature up to 550K. Then you do the same for the boiling points and you merge the results to get the identifiers of the matching compounds. You are quickly going through the yellow (Tc) and blue (Tb) columns as shown on the image on the right.

But now, when you want to display the datapage of a compound, you do not want to go through all the array of data and collect the ones matching the compound. You want to directly get in one go all the data about the given compound, for this you need a kind of document/record based datastore. You pull one record with all the data, like shown on the left with Tc, Tb and the other details like synonyms, drawing and chemical formula.

Of course, when you search and go through the Cheméo website, you just notice that it is fast, in fact faster than 99.9% of all the websites you visit. To achieve this, a lot of thinking and experience has been put into the design of Cheméo while at the end, giving you the feeling that the website is simple. "Hide the structural complexity to enhance the functional simplicity".

This blog will be about all these technical details, the science behind Cheméo and also review of scientific publications.

Update: Today, on a programmer news website came a discussion about MonetDB. MonetDB is one of the first columnar datastore. It is very interesting to read the publications associated to this project.

Created: Tue 24 May 2016
Updated: Tue 14 June 2016
By Loïc d'Anterroches.

