

Overview and Principles¶

Information Architecture Overview¶

The CM-Well architecture takes a synergistic approach to several data models and standards such as Linked Data, triple stores, Software as a Service, Web Oriented Architecture (WOA), big data analysis and more, merging them and adding to them to create a powerful, flexible and dynamic data platform.

Among the central characteristics of CM-Well's information architecture are:

Data modeled both as triples and as information entities ("infotons"). Infotons represent real-world objects and their attributes.
Data modeled as a graph, and APIs that enable graph traversal for relationship discovery.
"Schema on read" – there is no mandatory schema per infoton subject type.
No nested objects – an infoton field may contain a pointer to another infoton, but not the object itself.
Information as a Service model; REST API allows both reading and writing data.
"Information in motion" – goes beyond the mostly read-focused, mostly static-data approach of traditional Linked Data repositories, to handle frequent, dynamic updates from multiple sources.
Ability to subscribe to and stream CM-Well information and its updates.
URI as the web address of the current infoton version.
Immutable data and deep history (infoton versions saved for each update; historical versions are accessible by API).

Linked Data and CM-Well's Advanced Read/Write Features¶

These are the standard rules that a Linked Data repository should comply with:

Use Uniform Resource Identifiers (URIs) to identify things.
URIs should be accessible via HTTP, so that the objects they refer to can be looked up by people and software agents.
Provide useful information about the dereferenced entity, using standard formats such as RDF, XML and N3.
In the dereferenced entity's information, include links to other related URIs to enable discovery of other relevant information on the Web.

CM-Well complies with all these rules, but is also built for global interoperability (as opposed to using a proprietary database schema). CM-Well entities can point to other Linked Data entities, hosted either on CM-Well or on any other global repository.

In addition to allowing you to read entities, CM-Well's API also allows you to:

Update existing entities
Create new entities
Search for entities by attribute values or relationships with other entities
Subscribe to and stream updates
Perform complex queries using the SPARQL language
Create materialized views of your data, using an automated SPARQL agent.

The following figure illustrates how CM-Well's architecture differs from traditional triple store architecture.

Big Data¶

Traditional Linked Data repositories are often read-oriented, as they typically contain relatively static data. In contrast, CM-Well is intended to maintain a large, constantly growing and changing dataset – in other words, a Big Data repository.

Big Data sets are so large or complex that traditional data processing applications have a hard time handling them. They pose challenges related to many aspects of data processing, including capture, storage, transfer, sharing, curation, search, query, analysis, visualization and scaling.

CM-Well is designed to handle Big Data challenges. CM-Well's features include:

Platform architecture that allows seamless addition of nodes as needed
Big Data algorithms applied to storage and search features
Big Data Analytics layer (Spark, Hadoop)

Web Oriented Architecture (WOA)¶

Web Oriented Architecture (WOA) is a fundamental principle of CM-Well design. WOA is specific type of Service Oriented Architecture (SOA) that integrates systems and users via globally linked hypermedia, based on web architecture. WOA uses a core set of web protocols including HTTP, XML, RDF and the REST API.

However, in contrast with standard WOA that only consists of REST services with proprietary JSON-formatted messages, CM-Well principles dictate that the content "flowing" in and out of its services is Linked Data, in standard formats such as RDF, N3 and Turtle. In addition, CM-Well services use a shared information model consisting of ontologies that define different knowledge domains.

In addition, CM-Well supports:

PuSH (PubSubHubbub) subscription
Namespaces from external ontologies
Linear scalability
Representing Linked Data concepts and information objects using a RESTful API, with Web semantics