Techx portal integration

From CEDPS

Jump to: navigation, search

Contents

Overview

See NetLogger Mantis issue #219 for updates.

At the Star meeting in Davis we agreed to try to do the following proof-of-concept demo, with a goal of Aug 1 for the first, simple version of the demo.

Demo the Tech-X portal connecting to both the UCM database and the CEDPS Nersc database, and allow a Star production manager to drill down into the CEDPS logs when UCM logging detects a problem.

sample options:

  • show all NERSC error logs for a given job
  • show all NERSC logs for a given host
  • show all NERSC that might possibly be related (based on hostname and time)

Action items

Dan/Brian:

  • talk to NERSC security team and get permission to open up a mySQL port

to 1 host and BNL and 1 host at LBL for the portal to talk to.

  • create a sample database of Star logs from Nersc logs and send to David.
  • create sample queries for Star logs and send to David.

David:

  • need new query engine for GridSphere portal to CEDPS DB
  • work with BNL to instrument SUMS with UCM

Doug:

  • look into changing the bestman configuration to use NERSC server instead of the BNL server.
  • ask Jeff Porter about that state of condor-G testing with GT4, and consider switching to using WS-Gram on PDSF.

Sample queries

I put a dumpfile for database "osg_test" at http://acs.lbl.gov/~dang/CEDPS/osg_test.dump.bz2

Despite the dumpfile size, This is a relatively small DB with 181K entries, 122K of which are SGE jobs, and roughly 56K of those are STAR (group = 'rhstar') jobs. The table structure is the "old" one, which only differs from current by the absence of the 'hash' column in the 'event' table. This shouldn't affect much as long as you just assume that the jobs are all unique.

DB size

mysql> select count(*) from event;
+----------+
| count(*) |
+----------+
|   181908 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from event where name = 'sge.job';
+----------+
| count(*) |
+----------+
|   122861 |
+----------+
1 row in set (1.01 sec)


mysql> select count(*) from attr where name = 'group' and value = 'rhstar';
+----------+
| count(*) |
+----------+
|    56361 |
+----------+

num. of errors

How many errors were there in STAR jobs (answer = 1566 for this dataset)

select count(event.id) from attr as a1 
   join event on a1.e_id = event.id 
   join attr as a2 on a2.e_id = event.id 
   where a1.name = 'exit_status' and a1.value != '0' and 
         a2.name = 'group' and a2.value = 'rhstar' and 
         event.name = 'sge.job';

jobs with errors

List all SGE job_number 's for STAR jobs with errors -- #rows is same as result from (1)

select jobnum.value from attr as a1 
join event on a1.e_id = event.id 
join attr as a2 on a2.e_id = event.id 
join attr as jobnum on jobnum.e_id = event.id 
where a1.name = 'exit_status' and a1.value != '0' and 
      a2.name = 'group' and a2.value = 'rhstar' and 
      jobnum.name = 'job_number' and event.name = 'sge.job';


attrs of a job

You can use the previous query as a template for getting other attributes for a STAR job by simply adding joins to new aliases of the 'attr' table, selecting on the value and adding <alias>.name = 'attr_name' to the where clause, e.g., to also get 'job_name':

select jobnum.value as 'jobnum', jobname.value as 'jobname' from attr as a1 
join event on a1.e_id = event.id 
join attr as a2 on a2.e_id = event.id 
join attr as jobnum on jobnum.e_id = event.id 
join attr as jobname on jobname.e_id = event.id 
where 
  a1.name = 'exit_status' and a1.value != '0' and 
  a2.name = 'group' and a2.value = 'rhstar' and 
  jobnum.name = 'job_number' and jobname.name = 'job_name' and
  event.name = 'sge.job';

The full list of available attributes is:

mysql> select name from attr where e_id = 155;
+-----------------+
| name            |
+-----------------+
| exit_status     |
| ru_inblock      |
| ru_minflt       |
| io              |
| owner           |
| slots           |
| job_number      |
| category        |
| ru_nvcsw        |
| group           |
| ru_isrss        |
| ru_stime        |
| ru_nsignals     |
| hostname        |
| priority        |
| failed          |
| ru_nivcsw       |
| department      |
| pe_taskid       |
| dur             |
| job_name        |
| status          |
| task_number     |
| granted_pe      |
| mem             |
| start_time      |
| ru_msgsnd       |
| ru_wallclock    |
| ru_utime        |
| submission_time |
| maxvmem         |
| ru_oublock      |
| ru_nswap        |
| ru_majflt       |
| account         |
| iow             |
| qname           |
| project         |
| ru_ixrss        |
| ru_ismrss       |
| end_time        |
| ru_idrss        |
| ru_maxrss       |
| ru_msgrcv       |
| cpu             |
+-----------------+
Personal tools