In [None]:
import os, sys
import h5py
import numpy as np
import pandas as pd

# Reading in HDF5 files
__Author:__ Ji Won Park
 
__Created:__ 9/30/19
 
__Last run:__ 9/30/19

__Goals:__
Read in the contents of the HDF5 file written with the Baobab script, `to_hdf5.py`

__Before running this notebook:__
1. Generate some data. At the root of the `baobab` repo, run:
```
generate baobab/configs/tdlmc_diagonal_config.py --n_data 10
```
This generates 10 images with the `.npy` extension at the location this notebook expects.

2. Save the `.npy` image files and the `metadata.csv` into an HDF5 file. At the root of the `baobab` repo, run:
```
to_hdf5 tdlmc_train_DiagonalBNNPrior_seed1113 --format 'tf'
```
This creates an HDF file at the path `tdlmc_train_DiagonalBNNPrior_seed1113/tdlmc_train_DiagonalBNNPrior_seed1113.h5` in the Tensorflow (`tf`) format, which places channels at the last dimension.

In [None]:
hdf5_filepath = os.path.join('..',
 'tdlmc_train_DiagonalBNNPrior_seed1113', 
 'tdlmc_train_DiagonalBNNPrior_seed1113.h5')
hdf5_file = h5py.File(hdf5_filepath, 'r')

Here are the datasets stored in this file.

In [None]:
sorted(hdf5_file.keys())

The images can be accessed as follows. Note that the shape of the image is such that the channels go at the end, in the `tf` (TensorFlow) format as specified by the `--format` command-line argument.

In [None]:
hdf5_file['image_0'].shape

The pixel-wise mean and std across all the images can be accessed as follows.

In [None]:
hdf5_file['pixels_mean'].shape, hdf5_file['pixels_std'].shape

The metadata dataframe can be read in using the Pandas command `read_hdf5`, with the `start` and `stop` arguments specifying the rows and `columns` argument specifying the columns.

In [None]:
pd.read_hdf(hdf5_filepath, key='metadata', mode='r', 
 start=3, 
 stop=5, 
 columns=['lens_mass_theta_E', 'lens_mass_gamma'])