{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os, sys\n",
"import h5py\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading in HDF5 files\n",
"__Author:__ Ji Won Park\n",
" \n",
"__Created:__ 9/30/19\n",
" \n",
"__Last run:__ 9/30/19\n",
"\n",
"__Goals:__\n",
"Read in the contents of the HDF5 file written with the Baobab script, `to_hdf5.py`\n",
"\n",
"__Before running this notebook:__\n",
"1. Generate some data. At the root of the `baobab` repo, run:\n",
"```\n",
"generate baobab/configs/tdlmc_diagonal_config.py --n_data 10\n",
"```\n",
"This generates 10 images with the `.npy` extension at the location this notebook expects.\n",
"\n",
"2. Save the `.npy` image files and the `metadata.csv` into an HDF5 file. At the root of the `baobab` repo, run:\n",
"```\n",
"to_hdf5 tdlmc_train_DiagonalBNNPrior_seed1113 --format 'tf'\n",
"```\n",
"This creates an HDF file at the path `tdlmc_train_DiagonalBNNPrior_seed1113/tdlmc_train_DiagonalBNNPrior_seed1113.h5` in the Tensorflow (`tf`) format, which places channels at the last dimension."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdf5_filepath = os.path.join('..',\n",
" 'tdlmc_train_DiagonalBNNPrior_seed1113', \n",
" 'tdlmc_train_DiagonalBNNPrior_seed1113.h5')\n",
"hdf5_file = h5py.File(hdf5_filepath, 'r')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the datasets stored in this file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted(hdf5_file.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The images can be accessed as follows. Note that the shape of the image is such that the channels go at the end, in the `tf` (TensorFlow) format as specified by the `--format` command-line argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdf5_file['image_0'].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The pixel-wise mean and std across all the images can be accessed as follows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdf5_file['pixels_mean'].shape, hdf5_file['pixels_std'].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The metadata dataframe can be read in using the Pandas command `read_hdf5`, with the `start` and `stop` arguments specifying the rows and `columns` argument specifying the columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.read_hdf(hdf5_filepath, key='metadata', mode='r', \n",
" start=3, \n",
" stop=5, \n",
" columns=['lens_mass_theta_E', 'lens_mass_gamma'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (baobab)",
"language": "python",
"name": "baobab"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}