Materials Platform
for Data Science
based on the PAULING FILE experimental inorganic database

The MPDS presents the materials data, extracted by the project PAULING FILE team from the scientific publications. Even nowadays this task cannot be fully automated, so we had to manually process and systematize around half a million articles in physics, chemistry, materials science, environmental science, engineering, geology, etc. The results are now available online.

§1. Introductory examples

Given a picture is worth a thousand words, a screencast is worth a thousand pictures: see Fig. 1-3. Generally, our intention was to create such a user interface which does not need a tutorial. Imagine, do you need a tutorial to use your smartphone or to speak to your robot assistant? So why a working tool for the scientists should be different?

Rich MPDS web-interface: plotting the results of the search as a matrix
Fig. 1. MPDS platform: plotting the results of the search as a matrix.

The problem which we have faced developing MPDS is that only a small subset of our users are really curious. The others just have not enough time. This is very well understandable in our post-post-you-name-it times. We are sinking in the ocean of data. And so if one finally needs a tutorial to dive in quickly, here we are.

Rich MPDS web-interface: using the graph to refine the search
Fig. 2. MPDS platform: using the graph to refine the search.

In the end of the day we just invite all the materials scientists simply to try MPDS. It can be perfectly used for free, we only sell an additional level of comfort like the premium seating — given the customer likes our approach.

Rich MPDS web-interface: complex search and redirection to the source article
Fig. 3. MPDS platform: complex search and redirection to the source article.

Another difficulty with the online tutorials is that they eventually can never be complete. The online software gets constantly updated, bugfixed, improved, and, figuratively speaking, it lives its own life in the internet. Therefore any tutorial gets outdated ultimately with the time. So if something does not work as expected or differs from this tutorial, we seek for the reader's understanding and patience in advance.


The PAULING FILE is the relational database for materials scientists, grouping crystallographic data, phase diagrams, and physical properties of inorganic crystalline substances under the same frame. Its focus is put on the experimental observations, and the data are processed from the original publications, covering world scientific literature from 1891 to the present date. Each individual crystal structure, phase diagram, or physical property database entry originates from a particular publication.

The PAULING FILE was founded by Pierre Villars and Shuichi Iwata in 1993. The double awarded Nobel laureate Linus Pauling personally endorsed the new project and gave an explicit written permission to use his name. The term "file" is an old-fashioned slang designation for the database, now still remains unchanged.

Today the PAULING FILE project is relatively well-known. There are already an order of thousand of publications referring it. Its foundations, schema design, and some data-centric observations are published e.g. in the works Villars 2004, Villars 2008, Xu 2011, Kong 2012, and Villars 2013. In October 2019, Pierre Villars, the main founder of the PAULING FILE project, was acknowledged for the fundamental research supporting data-driven materials development with the prestigious NIMS Award.

The MPDS is not the only product based on the PAULING FILE. There are many others, such as SpringerMaterials, ICDD PDF, NIMS AtomWork, MedeA, etc. More information can be found at the PAULING FILE website.

§3. MPDS platform

§3.1. Overview

The MPDS platform is an online edition of the PAULING FILE materials database. All the data are presented in two ways (online interfaces): browser-based graphical user interface (GUI) and application programming interface (API). Here the browser-based user interface (GUI) is described, whereas the programmatic usage is covered in the API section.

Full access to all the data in all the supported formats (CIF, PDF, PNG, BIBTEX etc.) is provided by the subscription. Free access is also possible although limited. In addition, some parts of the data are open-access. In particular, these are: (a) cell parameters - temperature diagrams and cell parameters - pressure diagrams, (b) all data for compounds containing both Ag and K, (c) selected data for binary compounds, (d) own in-house data, created internally at the MPDS and clearly marked as such (non-peer-reviewed, in a traditional sense).

§3.2. Search criteria and modes

Search of data at the MPDS platform is possible according to 14 criteria: 8 in physics or chemistry (materials classes, physical properties, chemical elements, chemical formulae, space groups, crystal systems, prototypes, and atomic environments) and 6 in bibliography (publication author, years, journal, geography, organization, and DOI). There are two search modes: simple and advanced.

In the simple mode different search terms can be typed all in a single input field (see Fig. 4). Here the most frequently used 5 criteria are supported: materials classes, physical properties, chemical elements, chemical formulae, and crystal systems. All they will be correctly recognized and attributed to your search keywords.

Simple materials search over the MPDS database
Fig. 4. Simple (one input field) mode of search.

In the advanced mode each of the search criteria has its own input field. To use it, either click the middle search menu button (), or click the criteria boxes shown at the right of the results pages. Let us get acquainted with the meaning and proper usage of each criterion of search.

§3.3. Materials classes

In this category various materials classes are collected, ranging from technical terms to physical categories, chemical names, element counts, periodic table groups, some isotope names etc. There are lots of auxiliary terms, only applicable to the specific domains, e.g. cell-only, disordered, and non-disordered are valid for the crystalline structures (S-entries). Another example: the term ab initio literature refers to the data taken from the theoretical first-principles modeling papers. Moreover, the majority of the known mineral names are supported, e.g. perovskite, baddeleyite, stishovite, yeelimite etc. Five special (arity) classes unary, binary, ternary, quaternary, and quinary restrict the distinct element count of the results. Some frequently occurring terms are collected below alphabetically (refresh the page to see the other examples).

Listing 1. Some frequently occurring materials classes.

Finally, MPDS supports the special classes isopolyhedral, dipolyhedral, and tripolyhedral, meaning the number of different atomic environment types in the crystalline structure, i.e. only one, two, and three, correspondingly. See the section "Atomic environments" below for more details.

§3.4. Physical properties

All the supported physical properties are given by the MPDS hierarchy. A search for a high-order property assumes all the subordinate properties included in the results. In addition, even more general terms like permittivity or pressure are supported. The physical properties containing these terms in the name will be found.

A part of the physical properties in the hierarchy supports numerical searches. For that an exact name of the property should be used together with the less or more sign and the numerical value of interest (in SI units). Example: isothermal bulk modulus > 300 (assuming GPa).

There is also a general property keyword physical properties. One finds it inside the data refinement box in the GUI.

§3.5. Chemical elements

Chemical elements can be typed as names or symbols (e.g. copper or Cu). Obviously, chemical elements can be combined arbitrarily in searches, using spaces, commas, or dashes as the separators. By default, equal or greater count of elements is implied, e.g. the results for Cd-O-S may contain not only Cd, O, and S, but also Tl, H, N, K, etc.

Important: to restrict the elements count, the arity materials classes unary, binary, ternary, quaternary, or quinary should be added, e.g. Cd-O-S ternary. We receive a lot of complaints that searching for e.g. aluminium yields many possible aluminium compounds, but not the pure aluminum data. Here's the solution: just add the unary class.

§3.6. Chemical formulae

In the chemical formulae order of elements does not matter. However the results will contain the chemical formulae with the standard order of elements (according to their electronegativity). For instance, the most frequently occurring chemical formulae are listed below alphabetically.

Listing 2. The most frequently occurring chemical formulae.

The so-called anonymous chemical formulae are also supported. They are the chemical formulae with the element names denoted as the letters A, B, C, and D. Only binary, ternary, and quaternary compounds are supported for such querying (i.e. the letters E, F, etc., cannot be used, yet the elements of defects are not counted). The order of the atomic fractions is arbitrary, however the anonymous chemical elements should be always provided alphabetically. Examples: A2B, ABC3, etc.

§3.7. Crystal systems and space groups

Seven crystal systems and 230 space groups are fully supported. The space groups can be specified as the number or international short symbol. Full list of crystal systems and space groups can be found e.g. in Wikipedia. Note, that crystal systems, space groups, and prototype systems (see below) are mutually exclusive, i.e. not possible to combine in a search query.

§3.8. Prototypes

Prototype crystal systems are supported in two notations: Strukturbericht and formula-space-group-based. The first notation is an old crystallographic classification system still sometimes used in the scientific literature (see the listing below). The second notation is given by a combination of the chemical formula, the Pearson symbol, and the space group number. For instance, the most common prototype in the world literature is NaCl cF8 225, counting about 40 000 hits. Other important structural prototypes are e.g. cubic perovskite CaTiO3 cP5 221, zincblende ZnS cF8 216, superconducting cuprate Ba2Cu3YO6.3 tP14 123 etc.

There are nearly 14000 unique distinct prototypes, including about 250 Strukturbericht symbols (see below). Being although very old notation, Strukturbericht is still popular in the literature.

Listing 3. All Strukturbericht symbols.

§3.9. Atomic environments

The atomic environments in the crystalline structures are arranged in the polyhedra (e.g. TiO6 or HgX12). It is possible to search throughout the entire MPDS data by the type and the atomic composition of these polyhedra. The most frequently occurring polyhedral types are shown below, sorted by the number of vertices (i.e. the central atom coordination number or CN):

Listing 4. MPDS data polyhedral types.

More details are provided in the work of Daams et al. (1992). While searching for the atomic environments, the particular chemical symbols can be also given. In this category, the first chemical symbol given is considered as the center of the polyhedron. It makes no sense to specify any numerical coefficient nearby. The next given chemical symbol is considered as the vertex type. Here the numerical coefficient is properly supported. The center and the vertices atoms can be given together (or subdivided using the space or minus sign). The X symbol stands for any chemical element. Consider the following examples: (a) U-center, any CN, any vertices, (b) any center, any CN, Se-vertices, (c) U-center, CN = 6, O-vertices, and (d) U-center, CN = 7, any vertices.

§3.10. Bibliography

Since all the MPDS data were manually excerpted from the peer-reviewed articles, they are searchable by their corresponding author names, publication years, journal issues, pages, DOIs, geography etc. This information can be also used for citing. Generally citing the MPDS is desirable, but not obligatory, as all the data have already their own publishers' citing information (DOIs etc.)

Note that the MPDS data are citable per se via the permanent URLs like for the entries, for the distinct phases, and for the original source articles. Even if the publisher's minted DOI is not valid anymore, the original source article can still be referenced on MPDS. We clearly understand the importance of the valid scientific references and guarantee the stability of these URLs in future, even if our online server technology gets into a new lifecycle.

Moreover, there is a number of such unique literally lost papers which were printed many years ago, never digitized by their publishers, and eventually lost, so that only we keep the digital copy of them — plus the extracted data entries. An example of such lost paper is [20131], we wish our reader a good luck to find its printed original, not to speak about the electronic document.

§4. Administrative interface

Account management on the MPDS platform allows the certain privileged user accounts (called super-admins) to create and manage the other user accounts. This is done as outlined below. First, the super-admin should be (of course!) authenticated. Then the special web-page management can be opened.

MPDS administrative interface
Fig. 5. MPDS platform: user login and rich administrative interface.

The screencast above also shows the password-less login via the secret one-off link emailed by request. In fact, this is very convenient, since our users do not have to remember any passwords (and we hate the passwords!). Of course, the passwords can still be used as an option.

On top of that, we also support the so called external OAuth logins, when a well-known online service, such as LinkedIn or ORCID, confirms the identity of our user. For that, of course, he or she needs to be logged in on that service. According to the OAuth protocol specs, no other information except the name and the email is shared to the MPDS.

This tutorial is a work in progress. We thank the reader for the time and interest! Any questions or feedback is very welcomed and greatly appreciated.

See on GitHub