

High-Level Architecture¶

This page provides a high-level description of the CM-Well hardware, software and information architecture.

Information Architecture¶

The CM-Well information is modeled as triples in an RDF repository. (See CM-Well Data Paradigms to learn more.) Each triple is composed of a subject, a predicate and an object. For example, companyA -> hasParentCompany -> companyB.

The repository is conceptually a graph, with nodes representing subjects and objects, and edges representing predicates. Some of the graph nodes represent entities, such as organizations or people, while some are simple types such as a name string or date value.

To best serve CM-Well's use cases, information is not retrieved at the triple level, but rather at the infoton level. An infoton is an information object representing an entity (such as an organization, instrument, quote or person) and its attributes (the subject entity's predicates and objects).

(Infotons are similar to UML objects, or RDBMS Entity-Relationship models.)

The CM-Well Grid¶

Each CM-Well system is contained within a single data center, and has a grid architecture. This means that the system is composed of several identical machines (a.k.a. nodes), enabling parallel processing of multiple requests, distributed database management and redundancy in case of failure.

Every data item has a replication factor of 3, that is, it is saved in 3 copies over different machines. If one machine fails, other machines take over seamlessly. Once the machine is back online, data is copied to it so that full replication is restored.

Note

There is no failover among data centers in case one center becomes inaccessible. The production environment at Eagan, MN performs an automatic copy of its data to the production environment at Plano, TX, providing a backup in case of critical failure.

Machine-Level Architecture¶

The following diagram and table describe the modules running on each CM-Well node.

Module	Description
Web Service	The web service layer that handles read/write requests for users and external agents.
FS-Logic	An abstraction layer above the storage modules, allowing lower layers to be switched if necessary.
HC	Health Control - monitors health of CM-Well nodes.
Cassandra	A 3rd-party storage package, used for table-based search and direct access to infotons.
ElasticSearch	A 3rd-party search package, used for full-text search capabilities.
Compute JVMs	Separate processes for long computations e.g. SPARQL queries.
Inter-DC Synch	Synchronizes data of master production environment with backup environment.
Background Processing	Processes lengthy requests in the background, according to a work queue.
Active Data	Agent that updates time-triggered info for certain data items.
Calais Agent	Ingests new triples derived from Calais tagging of news metadata.
Other Agents	Agents for ingesting data from additional sources, such as Organization Authority.

Note

The Calais Agent and other agents are not an integral part of CM-Well. They are given as examples of applications that write data to CM-Well.

The CM-Well API¶

CM-Well provides a RESTful API that supports the following functionality:

Direct read access to infotons via URI
Querying for infotons by conditions on their field values
Applying SPARQL queries to infotons
Adding, deleting and modifying infotons and triples
Subscribing to real-time updates
Defining private groupings of objects ("named sub-graphs") and providing them as a virtual queue.