Query categorization
From CEDPS
Contents |
Introduction
This page has some thoughts about query categorization. This makes it easier to compose queries, i.e. by filling in holes based on a category or by creating new queries as the product of two categories. This is also the logical beginning of the thought process about how to interface between the user and the database.
Categories
Aggregation
Provide a summary statistic that is the result of evaluating some function, like sum or mean, over groups of values. The groups can be defined by "similar" values, such as user name or DN, or by ranges over continuous values like time.
Examples
- total count/duration of jobs per host
- total count/duration jobs per 10-minutes
- total count/duration jobs per type
- average users per day
Parameters
Aggregation-type queries have the following parameters
- group factors: How to group the events, e.g. host, 10-minute period, job type, ..
- values: Which values to aggregate for each grouping, e.g. count or duration
- function: Which aggregation function -- i.e. a function taking a vector and returning a scalar -- to use.
Search/filter
Look for selected events that match some specific criteria. The criteria can be simple or complex, and match any number of values. Whether you call this search or filter mostly depends on how many results you expect; the functionality is the same.
Examples
- jobs with an error
- missing "end" events
- an unusual event sequence
- time in a range
- particular user
- particular VO
Parameters
Search queries have the following parameters:
- value: Value to evaluate during the search, e.g. the user name. In more complex criteria, this value may be derived.
- test: Criteria that the value must satisfy, e.g. a regular expression for a VO in a DN, or an inequality (< 0) for job status
Combining categories
To compose a query, you could combine Aggregation and Search/filter components in a series.
For example, to get the average run-time of jobs that failed in March 2008, you would combine:
- Search: values=(status,time) test=(status < 0, time in March08)
- Aggregate: groups=(All) values=(run-time) function=(average)
Search + Aggregate will probably be the most common pattern of combination, but others are possible.
