https://gitlab.com/migvasc/lowcarboncloud/
Revision 44d2de89057ff6df4657769e0b14ac2bade57830 authored by migvasc on 10 March 2023, 13:36:34 UTC, committed by migvasc on 10 March 2023, 13:36:34 UTC
Updating figures numbers and adding codemeta.json

See merge request migvasc/lowcarboncloud!14
2 parent s a824c41 + df37370
Raw File
Tip revision: 44d2de89057ff6df4657769e0b14ac2bade57830 authored by migvasc on 10 March 2023, 13:36:34 UTC
Merge branch 'master' into 'main'
Tip revision: 44d2de8
readme.md
# **Material for the article *"Optimal sizing of a globally distributed low carbon cloud federation"* submitted on the [23rd IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing (CCGrid 2023)](https://ccgrid2023.iisc.ac.in/call-for-papers/)** #

In this Git repository, you will find the code for running the experiments, the inputs used (the workload, the solar irradiation, parameters and so on), and the instructions to run the experiments and obtain the results.

## **Running the experiments** ##


### **Hardware Requirements** ###
- Any modern x86 or x64 CPU is appropriate to execute the experiments.
- At least 4 GB of RAM (for running the experiments in a sequential way), or at least 16GB of RAM (for running the experiments in a parallel way), more details in the [**Instructions**](https://gitlab.com/migvasc/lowcarboncloud/-/tree/main/#instructions) section.

### **Software Requirements** ###

1. [Git](https://git-scm.com/) (strong requirement): a version control system that is being used for storing all the necessary material to run and analyze the experiments: the code of the simulations, input files, and scripts for extracting the results.
2. [Nix](https://nixos.org) (weak requirement): a multiplatform packet manager (run in Linux, Windows with WSL, and MacOS) that allows practically configuring the experiments and allows for reproducibility. Nix was used to automate all the necessary configurations of the reproducible execution environment, execute the experiments, and obtain the same results.

If the reader does not want to install Nix, it is recommended to use a Linux distribution (preferably Ubuntu or Debian). Furthermore, it is necessary to install the following programs and packages/libraries:

- [R](https://cran.r-project.org/) (version 4.1.2): Used for generating the plots. Necessary packages:  [tidyverse](https://www.tidyverse.org/packages/) (1.3.1), [gridExtra](https://cran.r-project.org/web/packages/gridExtra/index.html) (2.3.0), [patchwork](https://cran.r-project.org/web/packages/patchwork/index.html) (1.1.1), [viridis](https://cran.r-project.org/web/packages/viridis/index.html) (0.6.2), [stringr](https://cran.r-project.org/web/packages/stringr/index.html) (1.4.0), [rjson](https://cran.r-project.org/web/packages/rjson/index.html) (0.2.20) and [rlist](https://cran.r-project.org/web/packages/rlist/index.html) (0.4.6.2).

- [Python 3](https://www.python.org/downloads/) (version 3.10.8): Used for modeling and executing the LP, and extracting data. Necessary libraries: [pulp](https://coin-or.github.io/pulp/main/installing_pulp_at_home.html#installation) (2.7.0),  [argparse](https://docs.python.org/3/library/argparse.html) (1.1.0).

### **Instructions** ###

Regarding the execution of the experiments, it is possible to execute them in parallel in order to reduce the execution time, but this demand more hardware, as can be seen in the [**Hardware Requirements**](https://gitlab.com/migvasc/lowcarboncloud/-/tree/main/#hardware-requirements) section. If your system does not meet the minimum requirements for running in parallel, you may still run the experiments in a sequential way. Here are the steps for running the experiments:


- Install [Git](https://git-scm.com/downloads)
- Clone this Git repository with the command:
```git clone https://gitlab.com/migvasc/lowcarboncloud```
- Access the directory of the repository. For example, in Linux the command is:
  ```cd lowcarboncloud ```


If you plan to use Nix:

- Install [Nix](https://nixos.org/download.html)
- If you want to execute the experiments in parallel, you must use the following command: ```bash ./scripts/workflow_parallel_nix.sh``` and if you want to run the experiments in a sequential way, execute: ```bash ./scripts/workflow_sequential_nix.sh```. This command will build a nix environment with all the necessary dependencies to reproduce the experiments and execute them.  


If you do not plan to use Nix:

- If you want to execute the experiments in parallel, you must use the following command: ```bash ./scripts/workflow_parallel.sh``` and if you want to run the experiments in a sequential way, execute: ```bash ./scripts/workflow_sequential.sh```. 

After the experiment finishes, a directory for the results will be created. More details on the section [**Repository directory strucutre**](https://gitlab.com/migvasc/lowcarboncloud/-/tree/main/#repositorys-directory-structure).

#### **Script Explanation** ####

Both the script files ```workflow_sequential``` and ```workflow_parallel``` have 3 main steps. In the first step, the LP will be executed (using the ```low_carbon_cloud.py``` program) for each scenario (the years 2018, 2019, 2020, 2021, and the two extra scenarios of only using the grid and only using PVs and batteries). After the execution of the LPs, the data for generating the plots and the tables will be extracted (files ```extract_data_figures.py```, ```extract_table_v_data.py```, ```extract_table_viii_data.py```, ```extract_table_vi_vii_data.py```). Finally, in the last step, the figures will be generated using the R script file ```plots.r```.

#### **Experiment time** ####


The total time it takes to run the entire workflow may vary depending on the machine's hardware configuration where the experiments will execute. For example, on our test machine equipped with an Intel core i9-11950H CPU and 32\,GB of RAM, it took about 1 hour for the parallel version, and about 6 hours for the sequential version.

This execution time is different from the one reported in the paper because, in this artifact, we are using another solver: the CBC solver (Pulp's default solver). In the paper, we used Gurobi, a commercial solver requiring a license (free licence academics), so we are providing this artifact using CBC to avoid any costs for the reader that wants to reproduce the experiments.

## **Repository's directory structure** ##

- ```input```  - contains the data that will be used as input for the LP (solar irradiation values, parameters, workload data)
- ```script``` - contains all the code that will run the LP, extract the results and generate the plots
- ```results``` - contains the plots (in .pdf format) and the data files (.csv format) used to generate them. The structure of this directory is detailed as follows:  
- For each Table *y* present in the paper, there is a respective file *table_y_data.csv* that contains the data used for the table.
- Each scenario of the experiment has its own folder. For example, for the results for the year 2021 there is the folder ```results/2021```, and inside this folder, the files:
  - ```metrics.csv``` contains the values for the metrics Green Energy Coefficient (GEC) and $CO_2$ Savings, and Data Center utilization (DCU)
  - ```solution.csv``` is an auxiliary file to store the values of the computed variables of the linear program
  - ```summary_results.csv``` contains the values for carbon emissions, which input was used in the experiments, and the runtime
  - For each Figure *x* present in the paper, there is a file ```figure_x.pdf``` with the plot, and ```figure_x_data.csv``` with the data used to generate the figure. In the paper, we only presented the figures for the year 2021, but the figures are generated for other years as well.


## **Evaluation and expected result** ##

For all the Tables and most of the Figures, the results presented in the paper and obtained by reproducing the experiments will be the same or very similar.

The reason why some results may be different is that there are multiple alternative solutions for an LP. In other words, the optimal value of the LP regarding its objective function will be the same, but the variable values might differ. In our case, the computed optimal value for the PV surface area and the capacity of batteries is the same. The difference might occur in the variables regarding how much power to charge/discharge from the batteries, use from the grid, or workload to execute at a specific data center at a specific time slot.

We provide a script ```test_output.py``` that will compare the results of the experiment execution with the expected results to automate the validation process. The validation will consider the sizing and the total carbon emissions. If the validation succeed, the following messages will be shown: *"Sizing results validated!! The value obtained is equal to the expected result."* and  *"Total emissions results validated!! The value obtained is equal to the expected result."*.  To do the validation process, the reader may also compare the output with the tables and figures in the paper and the folder ```expected_results```.

## **Experiment customization** ##

This section will describe how other scenarios can be executed using the present artifact. Examples of possible scenarios: using other data centers' locations, workloads, and carbon footprint values for PV, batteries, or grid.

The main file that describes the input is a .json file located in the ```input``` folder. It uses a key-value approach for the inputs, where the key is the name of the selected parameter, and the value will be the input value for that parameter. 

In what follows, there is an example of an input file structure with the required parameters that the LP needs:

- ``` "DCs": ["dc_a", "dc_b"] ```: a list of type string that contains the name of the considered data centers. **Important**: do not use spaces or other special characters in the name of the DC to avoid breaking some of the scripts.
- ```"PUE": { "dc_a":1.1, "dc_b":1.2}``` a hash structure with the name of the DC as key and the Power Usage Effectiveness (PUE) of the data center as value
- ```"pIdleDC": {"dc_a":10000.0, "dc_b":1000.0}``` a hash structure with the name of the DC as key and the Idle power consumption of all the servers of the data center as value (in Watts), that is, the static power consumption of the DC
- ```"C": {"dc_a":200000, "dc_b":200000}``` a hash structure with the name of the DC as key and the total number of cores within the data center (sum of all servers) as value
- ```"pNetIntraDC":   { "dc_a":150000.0, "dc_b":150000.0}``` a hash structure with the name of the DC as key and the total power consumption of the network equipment that interconnects the servers inside the DC as value (in Watts) 
- ```"length_k": 100```  number of timeslots
- ```"workload_file":  "input/workload.csv"```  a string with the CSV file path containing the workload data.
- ```"solar_irradiance_dc_file_path":{   "dc_a":"input/solar_irradiation_data/location_dc_a/2021.csv", "dc_b":"input/solar_irradiation_data/location_dc_b/2021.csv"} ``` a hash structure with the name of the DC as key and the value is a string with the path to the CSV file that contains the solar irradiation data for that data center. In this experiment, we used the data from the MERRA-2 project, with a granularity of 1 time slot per hour.
- ``` "grid_co2" :  {"dc_a":123.4, "dc_b":456.7 } ``` a hash structure with the name of the DC as key and the value is a number with the carbon emissions of the local electricity grid at the location of the DC (in $gCO_2 eq/kWh$)
- ``` "pv_co2":   {"dc_a":56.78, "dc_b":78.90}``` a hash structure with the name of the DC as key and the value is a number with the carbon emissions of using power produced from PVs installed at the location of the DC (in $gCO_2 eq/kWh$)
 - ``` "bat_co2" : 1000.0``` the total carbon emissions generated from manufacturing the battery (in $gCO_2 eq$). This value needs to be adequate to the number of time slots.
 - ```"eta_ch": 0.80 ```   efficiency of the battery charging process
 - ``` "eta_dch": 1.25 ``` efficiency of the battery discharging process
 - ``` "eta_pv" : 0.15 ``` PV panel efficiency of converting solar irradiation into electricity 
 - ``` "pCore":10.0 ``` dynamic power consumption of using one CPU core  (in Watts) 

There are two special cases regarding the input. The first is for the solar irradiation data. In the .json file, it is necessary to pass a path to a .csv file containing the irradiation data for each DC. 

In this experiment, we extracted the data from the MERRA-2 project, with a granularity of 1 time slot per hour. The format of the input file for the solar irradiation needs to be in the following structure: a csv file, where each row represents how much irradiation was recorded in a given time slot, and it has two columns: the first column is the time stamp of the registration, and the second column is the solar irradiation value (in W/m²). The file ```script/extract_data_from_merra2_csv.py``` is an example of a script that can preprocess data from a data source (in this case the one from MERRA-2) and generate csv files with the expected structure for the LP. It can be used as inspiration if the user wants to work with other data sources.

The other particular case is for the workload, where the value must also be a path to a .csv file. The format of the csv file is as follows. A row represents each time slot with two columns:  the first is the time slot number, and the second is the total CPU cores demand for that time slot. 

For the $CO_2$ emissions data of using the local electricity grid, we used values from [electricityMap](https://app.electricitymaps.com/map) and [climate-transparency.org](https://climate-transparency.org/).

Values for carbon emissions from manufacturing PVs and batteries can be found oinn the literature. For the efficiency of PV panels and batteries, they can also be found in the literature or data sets from companies that fabricate these devices.


After creating your input file (for example, a file named example.json), make sure that it is located inside the ```input``` directory. Finally, to execute your custom experiment, run: ```bash scripts/run_custom_scenario_nix.sh example.json``` if you are using Nix or ```bash scripts/run_custom_scenario.sh example.json``` otherwise.

After the execution finishes, all the results will be in the respective result directory, in this example, a directory named ```results/example```.

If the user have a licence for Gurobi and wants to use it as the solver, there are some necessary steps.
[Here](https://coin-or.github.io/pulp/guides/how_to_configure_solvers.html) the reader can find more details of the necessary configurations to integrate Pulp with Gurobi. Finally, it is necessary to pass the flag ```--use_gurobi``` when executing the program. For example:

```python3 script/low_carbon_cloud.py --input_file input/example.json --use_gurobi```. This modification needs to be done in the files ```run_all_sequential.sh```, ```run_all_parallel.sh```, and ```run_custom_scenario.sh```.
back to top