This post is part of a series describing a demo project that employs various real-world patterns and tools to provide access to data in a MongoDB database for the DevExtreme grid widgets. You can find the introduction and overview to the post series by following this link.
This post describes the functionality implemented in the devextreme-query-mongodb package, which can easily be used outside the context of the larger demo project that this blog series focuses on.
MongoDB is a NoSQL database that stores documents in collections. It is a commonly used system, and many programming languages and environments have easy access to it with the help of drivers or libraries. There is a very useful “getting started” overview of MongoDB that should be your starting point if you want to consider using MongoDB for your own projects.
From the SQL point of view, MongoDB collections are similar to tables. They are containers that contain documents, which are similar to table rows. However, documents are much more flexible and each document can differ from others in the same collection and contain nested structures. For purposes of querying data, collections are the starting points, just like tables are in SQL databases. Joins are not common with NoSQL databases, and while MongoDB supports joins for certain types of queries, these capabilities are not used by the project described here.
DevExtreme loadOptions
The goal for the implementation of devextreme-query-mongodb was to create an interface between the Custom Store querying functionality supplied by DevExtreme and the MongoDB APIs. All queries should utilize MongoDB server functionality, if possible.
At the core of the DevExtreme querying features is a structure commonly called loadOptions
, which defines the parameters for a data query. Here is an overview of the supported parameters:
Parameter | Details |
---|---|
take | Integer. Restrict the number of top level results to return. |
skip | Integer. Skip a number of items from the start of the result set. In conjunction with take , this is often used to implement paging. |
sort | Array. Elements have the structure { selector: "field", desc: true/false } and are applied to the data in sequence to implement multi-level sorting. |
filter | Array. This can be a hierarchical structure of elements and arrays. Details of this can be found below. |
searchExpr , searchOperation and searchValue | Strings. Another mechanism to define a filter, restricted to one criterion. The filtering details described below apply to these parameters. |
select | Array. A simple list of field name strings that restricts the returned data objects to that set of fields. |
requireTotalCount | Bool. Indicate that a total count of result set data objects must be returned in the totalCount field of the result. The count is expected to consider any filters that are being applied, but it is supposed to disregard any take parameter used for the query. The count must reflect the number of data items available after filtering, not for instance the number of grouping nodes or similar. |
totalSummary | Array. Elements have the structure { selector: "field", summaryType: "type" } , where summaryType can be one of sum , avg , min , max and count . Results of these summary calculations are returned in an array called summary that contains the result values in the same order used for the summary definitions. |
group | Array. The structures in this array define the grouping levels that need to be applied to the data. Details of this can be found below. |
requireGroupCount | Bool, default false . When used in conjunction with group , require the return of a top-level group count in the groupCount field of the result. |
groupSummary | Array. The structure is the same as for totalSummary , but these summary values will be returned for each group if group is used. |
If parameters are not included in the options, they don’t apply. In other words, for an empty loadOptions
object the full set of data objects will be returned.
Result data structure
The result of a query operation returned by devextreme-query-mongodb looks like this:
{ data: [ ... ], // list of result data objects summary: [ ... ], // list of summary results if requested in totalSummary totalCount: 10, // if required in requireTotalCount groupCount: 3 // if required in requireGroupCount }
Filtering
Individual filter criteria have the format [ "field", "operator", "value" ]
, where operator
is one of =
, <>
, <
, >
, <=
, >=
, startsWith
, endswith
, contains
and notcontains
. The last four are used for strings, otherwise operators apply to all value types. Criteria can be combined in hierarchies using nested arrays with and
and or
operators like this:
[ [ "field", "=", 10 ],"and", [ [ "otherField", "<", 3 ],"or", [ "otherField", ">", 11 ] ] ]
It is possible to negate a criterion (or hierarchy of criteria) by applying the operator !
like this: [ "!", [ "field", "=", 3 ]]
.
Finally, special syntax is supported to access “parts” of the information contained in date values. This feature is used by the Pivot Grid in situations where groupInterval
settings for date fields are applied, but can be used independently. A field postfix in a criterion looks like this: [ "dateField.Month", "=", 4 ]
. Supported postfixes are Year
, Quarter
(1-4), Month
(1-12), Day
and DayOfWeek
(0=Sunday - 6=Saturday). The implementation doesn’t require particular casing for these postfixes.
Grouping
The group
structure is a flat array of group definitions. Each group definition can have the following parameters:
Parameter | Details |
---|---|
selector | String, required. The field name to group on. |
desc | Bool, default false . Define descending sort order on the selector field. |
isExpanded | Bool, default false . Confusingly named, this field has meaning only for the last group definition and setting it to true requires the group data objects to be returned with the result set. To be clear, nested grouping data is always returned, even if isExpanded is false for a group definition that is not the last one. isExpanded only defines whether data objects will be returned as well. |
groupInterval | Integer or String. For Integer values, data will be grouped in ranges of the given length. String values apply only to date fields, supported options are year , quarter , month , day , dayOfWeek , hour , minute and second . |
For group queries, the result data structure is different from the simple query result structure shown above. Here it is:
{ data: [ // list of groups { count: 3, // count of items in this group, even if items=null key: "groupKey", // value that grouping was applied on items: [ ... ], // sub-groups or data objects (for the last group when isExpanded=true) // can be null when isExpanded=false and there are no further groups summary: [ ...] // list of group summary results if requested in groupSummary }, ... // further groups on this level ], summary: [ ... ], // list of summary results if requested in totalSummary totalCount: 10, // if required in requireTotalCount groupCount: 3 // if required in requireGroupCount }
Note that the key
in each group is the value used for grouping. In simple selector
based group definitions, the key
will be a value valid for the selector
field, but if groupInterval
is also used the key
will contain the interval value (which, in the case of date intervals, is not a valid value for/from the selector
field).
Examples
The tests implemented for the package show many examples of loadOptions
structures.
Querying data
The package devextreme-query-mongodb exports a function called query
. Call this and pass a MongoDB collection with a loadOptions
structure, and optionally processing options (see below). Here is a simple example:
const MongoClient = require("mongodb").MongoClient; const query = require("devextreme-query-mongodb"); async function queryData() { MongoClient.connect("mongodb://localhost:27017/testdatabase", (err, db) => { const results = await query(db.collection("values"), { // This is the loadOptions object - pass in any valid parameters take: 10, filter: [ "intval", ">", 47 ], sort: [ { selector: "intval", desc: true }] }); // Now "results" contains an array of ten or fewer documents from the // "values" collection that have intval > 47, sorted descendingly by intval. }); }
devextreme-query-mongodb Implementation Details
The following two paragraphs describe some details about the implementation of the package. Especially the paragraph Query Pipelines is mainly of interest if you would like to work with the source code, or use it as a pattern to develop similar functionality yourself.
The query
function
The entry point to the package is the query
function:
async function query(collection, loadOptions = {}, options = {}) { const standardContextOptions = { replaceIds: true, summaryQueryLimit: 100 }; const contextOptions = Object.assign(standardContextOptions, options); const context = createContext(contextOptions, loadOptions); return loadOptions.group && loadOptions.group.length > 0 ? context.queryGroups(collection, loadOptions) : context.querySimple(collection, loadOptions); }
The function returns an async
result. I decided to introduce a query context at this point, which holds the loadOptions
required by various parts of the algorithm, and some query options.
The replaceIds
query option is set to true
by default. The effect of this is that _id
field values are returned as strings, instead of using the MongoDB internal object representation. The default value is set under the assumption that data will be passed on to parts of the application system where the origin of the id values should not be visible. Since MongoDB doesn’t have a built-in way (as far as I know) of returning id values as strings, there is a certain overhead associated with this approach. In cases where you are going to process the data further using other MongoDB queries, it might be beneficial to pass false
for replaceIds
.
summaryQueryLimit
is a safety-net style workaround for a situation where group queries are executed to return large result sets, and group summaries are required at the same time. Separate queries need to be executed internally to calculate various summaries, and if this is accidentally done for all groups it can result in an increase in processing time to the extent that other parts of your architecture encounter timeouts. Realistically this shouldn’t happen as long as group queries are combined with reasonable take
values. But with certain combinations of flags (server-side grouping and summaries, but no groupPaging) the Data Grid executes queries that result in this issue. The default summaryQueryLimit
prevents more than 100 summaries from being calculated, which should suffice in most cases. In case you want to deactivate this mechanism, you can set summaryQueryLimit
to 0
(zero).
Finally, the query
function splits the execution path in two by distinguishing between simple and group queries. The query pipelines for the simple queries are less complicated and I decided to implement the two variations separately. Technically it would be possible to combine the two, which is a consideration for future refactoring efforts.
Query Pipelines
MongoDB supports a simple querying interface through the collection.find()
function. For grouping queries however, a call to collection.aggregate()
is required, which uses a powerful framework of processing pipeline stages. Since there is functional overlap in the implementations of the simple and group queries, I decided to use aggregation pipelines for all queries since it allows me to reuse pipeline elements in both algorithms. You can see that the function querySimple
retrieves its results using a collection.aggregate(...)
call:
let resultObject = { data: (await collection.aggregate(dataPipeline).toArray()).map(this.replaceId) };
Both functions querySimple
and queryGroups
build their querying pipelines out of parts constructed in various helper functions. These are the parts:
Part | Details |
---|---|
completeFilterPipeline | This pipeline part combines operations that filter data, based on the loadOptions elements filter , searchExpr , searchOperation and searchValue . If filtering is performed fields that have date part postfixes, the pipeline also includes a stage to add the nested fields. The function createCompleteFilterPipeline returns information on any nested fields that are being added by the part, so they can be referred to later. |
sortPipeline | The loadOptions element sort is implemented here. |
skipTakePipeline | The parameters skip and take from the loadOptions are reflected by this part. |
selectPipeline | This part implements a projection stage to take the loadOptions.select option into account. |
removeNestedFieldsPipeline | Nested fields added to the document stream as part of the completeFilterPipeline are removed again after they have been used. |
countPipeline , totalCountPipeline and groupCountPipeline | These three are constructed conditionally outside the main data retrieval process, to count data objects and groups in the result set. |
summaryPipeline | Constructed outside the main data retrieval process, to calculate total summary values. |
A few additional pipelines are created in the createGroup
function, which queries group data recursively for all grouping levels. The pipeline part groupKeyPipeline
is used to add the group key field to the data stream, which is important in cases where nested sub-groups need to be queried at a later point, because a filter must be applied on the basis of the parent group’s key value. Group level counts and summaries are queried separately (in the case of count only when necessary), which requires separate pipelines.
Since the pipelines are nested arrays of objects, it is easy to observe the construction process in a debugger or output pipelines to the console. For illustration, consider these loadOptions
:
{ filter: [ ["date2.Month", ">=", 4],"and", ["date2.Month", "<", 7] ], group: [ { groupInterval: "month", isExpanded: false, selector: "date1" } ], groupSummary: [ { selector: "int1", summaryType: "sum" } ], totalSummary: [{ selector: "int1", summaryType: "sum" }], requireTotalCount: true }
The main data querying pipeline for these options is this:
[ {"$addFields": {"___date2_Month": {"$month": "$date2" } } }, {"$match": {"$and": [ {"___date2_Month": {"$gte": 4 } }, {"___date2_Month": {"$lt": 7 } } ] } }, {"$project": {"___date2_Month": 0 } }, {"$addFields": {"___group_key_0": {"$month": "$date1" } } }, {"$group": {"_id": "$___group_key_0","count": {"$sum": 1 } } }, {"$project": {"_id": 0,"key": "$_id","count": 1 } }, {"$sort": {"key": 1 } }, {"$addFields": {"items": null } } ]