Revision - 4f75fae - Allow different burn-in iteration and ACL for independent MCMC [...]

Revision 4f75faeded2cb284dedbc856a8b2ae56075ea158 authored by Collin Capano on 20 June 2020, 18:27:09 UTC, committed by GitHub on 20 June 2020, 18:27:09 UTC

Allow different burn-in iteration and ACL for independent MCMC chains (#3311)

* use different acl for every chain in epsie

* create base burn in class, move common functions to there; rename MCMCBurnInTests EnsembleMCMC, first stab at creating MCMC tests for independent chains

* more changes to burn in module

* simplify the attributes in the burn in classes

* add write method to burn in classes

* add write_data method to base_hdf

* remove write_burn_in method from mcmc io; use the write method in burn in module instead

* make use of new burn in functions in sampler/base_mcmc

* have emcee and emcee pt use ensemble burn in tests

* add compute_acf function to epsie

* start separating ensemble and mcmc io methods

* stop saving thin settings to file; just return on the fly

* make read/write samples stand alone functions, and update emcee

* rename write functions; update emcee

* move multi temper read/write functions to stand alone and update emcee_pt

* pass kwargs from emcee(_pt) io functions

* simplify get_slice method

* add function to base_mcmc to calculate the number of samples in a chain

* use nsamples_in_chain function to calculate effective number of samples

* add read_raw_samples function that can handle differing number of samples from different chains

* add forgotten import

* use write/read functions from base_multitemper in epsie io

* use stand alone functions for computing ensemble acf/acls

* separate out ensemble-specific attributes in sampler module; update emcee and emcee_pt

* add acl and effective_nsample methods to epsie

* simplify writing acls and burn in

* fix various bugs and typos

* use a single function for writing both acl and raw_acls

* add some more logging info to burn in

* reduce identical blocks of code in burn in module

* fix self -> fp in read_raw_samples

* reduce code duplication in base io and simplify read raw samples function

* fix missed rename

* reduce code redundacy in sampler/base_multitemper

* whitespace

* fix bugs and typos in burn_in module

* fix code climate issues

* use map in compute_acl

* more code climate fixes

* remove unused variable; try to silence pylint

* fix issues reading epsie samples

* only load samples from burned in chains by default

* add act property to mcmc files

* fix act logging message

* fix effective number of samples calculation in epsie

* remap walkers option to chains for reading samples

* fix thinning update

* fix acceptance ratio and temperature data thinning in epsie

* allow for different fields to have differing number of temperatures when loading

* don't try to figure out how many samples will be loaded ahead of time

* store acts in file instead of acls

* write burn in status to file before computing acls

* drop write_acts function

* fix issue with getting specific chains

* fix typo

* code climate issues

* fix plot_acl

1 parent 926b628

Files
Changes

Permalinks

pycbc_condition_strain

#!/usr/bin/env python

# Copyright (C) 2016 Tito Dal Canton
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 3 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

"""
Read strain data, apply the standard preparation done in offline CBC searches
(highpass, downsampling, gating, injections etc) and write the result back to a
file. Optionally also write the gating data to a text file. This program can
also be used to generate frames of simulated strain data, with or without
injections.
"""

import logging
import argparse
import pycbc.strain
import pycbc.version
import pycbc.frame
from pycbc.types import float32, float64


def write_strain(file_name, channel, data):
    logging.info('Writing output strain to %s', file_name)

    if file_name.endswith('.gwf'):
        pycbc.frame.write_frame(file_name, channel, data)
    elif file_name.endswith(('.hdf', '.h5')):
        data.save(file_name, group=channel)
    else:
        raise ValueError('Unknown extension for ' + file_name)


parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--version", action="version",
                    version=pycbc.version.git_verbose_msg)
parser.add_argument('--output-strain-file', required=True,
                    help='Name of output frame file. The file format is '
                         'selected based on the extension (.gwf, .npy, .hdf '
                         'and .txt accepted)')
parser.add_argument('--output-channel-name',
                    help='Name of channel in output frame file (default: same '
                         'as input channel)')
parser.add_argument('--output-gates-file',
                    help='Save gating info to specified file, in the same '
                         'format as accepted by the --gating-file option')
parser.add_argument('--output-precision', type=str,
                    choices=['single', 'double'], default='double',
                    help='Precision of output strain, %(default)s by default')
parser.add_argument('--dyn-range-factor', action='store_true',
                    help='Scale the output strain by a large factor (%f) '
                         'to avoid underflows in subsequent '
                         'calculations' % pycbc.DYN_RANGE_FAC)
parser.add_argument('--low-frequency-cutoff', type=float,
                    help='Provide a low-frequency-cutoff for fake strain. '
                         'This is only needed if fake-strain or '
                         'fake-strain-from-file is used')
parser.add_argument('--frame-duration', metavar='SECONDS', type=int,
                    help='Split the produced data into different frame files '
                         'of the given duration. The output file name should '
                         'contain the strings {start} and {duration}, which '
                         'will be replaced by the start GPS time and duration '
                         'in seconds')

pycbc.strain.insert_strain_option_group(parser)
args = parser.parse_args()

if args.frame_duration is not None and args.frame_duration <= 0:
    parser.error('Frame duration should be positive integer, {} given'.format(args.frame_duration))

logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)

# read and condition strain as pycbc_inspiral would do
out_strain = pycbc.strain.from_cli(args, dyn_range_fac=pycbc.DYN_RANGE_FAC,
                                   precision=args.output_precision)

# if requested, save the gates while we have them
if args.output_gates_file:
    logging.info('Writing output gates')
    with file(args.output_gates_file, 'wb') as gate_f:
        for k, v in out_strain.gating_info.items():
            for t, w, p in v:
                gate_f.write('%.4f %.2f %.2f\n' % (t, w, p))

# force strain precision to be as requested
out_strain = out_strain.astype(
        float32 if args.output_precision == 'single' else float64)

# unless asked otherwise, revert the dynamic range factor
if not args.dyn_range_factor:
    out_strain /= pycbc.DYN_RANGE_FAC

output_channel_name = args.output_channel_name or args.channel_name

if args.frame_duration:
    start = args.gps_start_time
    stop = args.gps_end_time
    step = args.frame_duration

    # Last frame duration can be shorter than duration if stop doesn't allow
    for s in range(start, stop, step):
        ts = out_strain.time_slice(s, s+step if s+step < stop else stop)
        complete_fn = args.output_strain_file.format(
                start=s, duration=step if s+step < stop else stop - s)
        write_strain(complete_fn, output_channel_name, ts)
else:
    write_strain(args.output_strain_file, output_channel_name, out_strain)

logging.info('Done')

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...