Query categorization

From CEDPS

Jump to: navigation, search

Contents

Introduction

This page has some thoughts about query categorization. This makes it easier to compose queries, i.e. by filling in holes based on a category or by creating new queries as the product of two categories. This is also the logical beginning of the thought process about how to interface between the user and the database.

Categories

Aggregation

Provide a summary statistic that is the result of evaluating some function, like sum or mean, over groups of values. The groups can be defined by "similar" values, such as user name or DN, or by ranges over continuous values like time.

Examples

  • total count/duration of jobs per host
  • total count/duration jobs per 10-minutes
  • total count/duration jobs per type
  • average users per day

Parameters

Aggregation-type queries have the following parameters

  • group factors: How to group the events, e.g. host, 10-minute period, job type, ..
  • values: Which values to aggregate for each grouping, e.g. count or duration
  • function: Which aggregation function -- i.e. a function taking a vector and returning a scalar -- to use.

Search/filter

Look for selected events that match some specific criteria. The criteria can be simple or complex, and match any number of values. Whether you call this search or filter mostly depends on how many results you expect; the functionality is the same.

Examples

  • jobs with an error
  • missing "end" events
  • an unusual event sequence
  • time in a range
  • particular user
  • particular VO

Parameters

Search queries have the following parameters:

  • value: Value to evaluate during the search, e.g. the user name. In more complex criteria, this value may be derived.
  • test: Criteria that the value must satisfy, e.g. a regular expression for a VO in a DN, or an inequality (< 0) for job status

Combining categories

To compose a query, you could combine Aggregation and Search/filter components in a series.

For example, to get the average run-time of jobs that failed in March 2008, you would combine:

  • Search: values=(status,time) test=(status < 0, time in March08)
  • Aggregate: groups=(All) values=(run-time) function=(average)

Search + Aggregate will probably be the most common pattern of combination, but others are possible.

Personal tools