Skip to content

Query with Data Statistics

Description

Elastic Search is the underlying search engine that CM-Well uses when performing a query (which involves a full-text search on infoton field values). Elastic Search supports several types of statistical metrics of field values within a given group of infotons. For example, using statistical features, you can discover how many distinct values there are for a certain field in a certain group of infotons. (The Elastic Search statistical feature is called "aggregations", and you may see some references to this term in the search syntax and results.)

CM-Well passes the statistical query to Elastic Search, which performs the analysis and returns its results, which are passed back to the caller. You can learn more about Elastic Search aggregation options here.

Note

You can use all the regular CM-Well query parameters such as qp, recursive, date filters and so on, before applying a statistical query. The statistical analysis is applied on the subset of the data that passes the filters.

Syntax

URL: <hostURL>/<PATH>

REST verb: GET

Mandatory parameters: op=stats&ap=type:<statsType>,field:<statsField>


Template:

<cmwellPath>?op=stats&ap=type:<statsType>,field:<statsField>,name:<outputName>&format=<outputFormat>

URL example:

<cm-well-host>/permid.org?op=stats&ap=type:card,name:MyCurrencyStats,field:iso4217.currency&format=json&pretty

Curl example (REST API):

Curl -X GET "<cm-well-host>/permid.org?op=stats&ap=type:card,name:MyCurrencyStats,field:iso4217.currency&format=json&pretty"

Special Parameters

Parameter Description Values Example
ap Aggregation parameters that define the statistical query type and field. See below in this table.
type The type of statistical query to perform. See Using Elastic Search Statistics to learn more. card, stats, term, sig ap=type:card
name Optional. If supplied, its value is returned as the name value in the response. Any string ap=name:MyQueryName
field The name of the field on whose values you want to apply the query. Any valid CM-Well field name ap=field:CommonName.mdaas

Code Example

Call

curl "<cm-well-host>/permid.org?op=stats&ap=type:card,name:MyCurrencyStats,field:iso4217.currency&format=json&pretty"

Results

{
      "AggregationResponse" : [ {
        "name" : "MyCurrencyStats",
        "type" : "CardinalityAggregationResponse",
        "filter" : {
            "name" : "MyCurrencyStats",
            "type" : "CardinalityAggregation",
            "field" : "iso4217.currency"
        },
        "count" : 266
      } ]
    }

Note

  • All counts returned by statistical queries are approximate. This is because Elastic Search is a distributed application, and data updates may take time to replicate on all machines. Usually counts are accurate to within 5%-10% of the true value. Accuracy is affected by the optional precision_threshold parameter.
  • The values defined in ap (aggregation parameters) are passed on to Elastic Search as they are.
  • The aggregation parameters must be passed in this order:
Type Parameter Order Defaults
term type:term[,name:MyName],field(:|::)MyFieldName[,size:MySize][subaggregations] size = 10
sig type:sig[,name:MyName],field(:|::)MyFieldName[,backgroundTerm:FieldName*Value][,minDocCount:MyCount][,size:MySize][subaggregations] size = 10, minDocCount = 10
card type:card[,name:MyName],field(:|::)MyFieldName[,precisionThreshold:MyLong]
stats type:stats[,name:MyName],field(:|::)MyFieldName
  • The output format must be one of: csv, json, jsonl.
  • When using sub-queries, you can only request a total of 2 queries with the csv format, as a table only has 2 dimensions. For larger numbers of queries, use the json format, which has no limit on its nesting levels.

Using Elastic Search Statistics