Revision 50c4e9f16a0ed18f86a97f5989c2419f9348899a authored by MPillas on 07 November 2022, 10:58:35 UTC, committed by GitHub on 07 November 2022, 10:58:35 UTC
* Changes in faithsim to be used with python3

* change the first line to get the env

* create a directory for the template bank project with the workflow using Pegasus

* changes because I realized the pycbc_collect_results output is a single flie

* currently trying to run the workflow and fix the issues I encounter, here I started with create inj issues

* now fixing pycbc_faithsim job

* almost final version just having problem with the plot names

* clean the code

* add bash files to run and submit the workflow

* final workflow and associated scripts problem because the kickstart job fails after submission

* fix a mistake with a parameter from the config file

* changes in submit file

* fix the run_workflow.sh script, the kickstar job seems to work now

* fix an error in create_inj script

* all scripts for the workflow, final version

* rebase and move the scripts to right directories

* remove the old files

* remove the hardcoded config path

* lot of changes, add a script to add some parameters in the dat file before the plotting script, put all the arguments in the configuration file, remove them from the workflow script ...

* add the header in the dat file

* changes in the descriptions of the scripts given to argparse and compute the derived quantities in the plotting script

* remove the path to the collect full results in the ini file

* use black on the plot script

* fix a bug in the plotting script

* changes suggested by Tito and Ian

* fix bug: change q into mchirp

* fix bug : add d in the derived_map for the s2 magnitude

* last bug fixed

* Ian's comments

Co-authored-by: Marion Pillas <marion.pillas@ldas-pcdev3.ligo.caltech.edu>
Co-authored-by: Marion Pillas <marion.pillas@ldas-pcdev1.ligo.caltech.edu>
Co-authored-by: Marion Pillas <marion.pillas@ldas-pcdev6.ligo.caltech.edu>
Co-authored-by: Marion Pillas <marion.pillas@ldas-grid.ligo.caltech.edu>
1 parent ce7ad08
Raw File
pycbc_live_nagios_monitor
#!/bin/env python
# Monitor the pycbc live process and log files to determine the state

# Future
# Actually check log file for errors and node specific problems
# Monitor data transfer to sites as well

import json
import lal
import argparse
import time
import os.path

parser = argparse.ArgumentParser(
    description="This scripts monitors the log file of the "
    "PyCBC Live process. This is used to generate a json file that can be "
    "picked up by nagios to determine if the PyCBC Live process has died.")
parser.add_argument('--log-file',
                   help="The pycbc live log file")
parser.add_argument('--output-file',
                   help="The JSON nagios status file")
parser.add_argument('--check-interval', type=int,
                   help="Time in seconds to wait before rechecking status")
args = parser.parse_args()


while 1:
    everything_ok = True
    status = {}
    status['author'] = "Alexander Harvey Nitz"
    status['email'] = "alex.nitz@ligo.org"
    status['created_gps'] = int(lal.GPSTimeNow())

    try:
        tdiff = time.time() - os.path.getmtime(args.log_file)
        # Check that the pycbc live logfile has been updated recently.
        if tdiff >= 60: everything_ok = False
    except:   
        everything_ok = False 

    if everything_ok:
        status['status_intervals'] = \
            [
                {
                    "num_status": 0,
                    "txt_status": "OK: No reported problems",
                    "start_sec": 0
                },
                {
                    "num_status": 1,
                    "txt_status": "WARNING: The process is slow to report.",
                    "start_sec": 120
                },
                {
                    "num_status": 3,
                    "txt_status": "UNKNOWN: It has been 4 minutes. Has it died?",
                    "start_sec": 240
                }
            ]
    else:
        status['status_intervals'] = [{"num_status": 2,
                                       "txt_status": "PyCBC Live appears to be down!",
                                      }]
    open(args.output_file, 'w').write(json.dumps(status))
    time.sleep(args.check_interval)
    
back to top