Raw File
Tip revision: 9985e6d930e8b98eb03330a964c2c3fc8788630c authored by estherjulien on 01 August 2022, 11:54:59 UTC
HybridCode deleted from
Tip revision: 9985e6d
# Code for the paper:
> **Reconstructing Phylogenetic Networks via Cherry Picking and Machine Learning**  
> *Giulia Bernardini, Leo van Iersel, Esther Julien and Leen Stougie*

To run the code, the following packages have to be installed: `numpy`, `pandas`, `scikit-learn`, `networkx`, and 

### Cherry picking heuristic (CPH)
The CPH is implemented in ``. To experiment with this file,

RUN in terminal:
python <instance num.> <ML model name> <leaf number> <bool (0/1) for exact input> <option>
if `exact input` = 0:
    `option` = reticulation number,
    `option` = forest size

python 0 N10_maxL100_random_balanced 20 0 50
In the file ``, tree sets used for testing can be generated.

### Generate training data
The code for initializing and updating the features can be found in ``.

For training the random forest, we first have to generate training data. 
The code for this is given in ``. 
RUN in terminal:
python <number of networks> <maxL>
`maxL`: maximum number of leaves per network
python 10 20

### Train random forest
In the folder `LearningCherries` you can find the code for training a random forest. 
In the folder `LearningCherries/TrainedRFModels`, there are some trained random forests in `joblib` format.

### Preprocessed data
The `Data` folder consists of:
- `Results` of CPH instances. The instances are divided in 4 categories: FTS (Full Tree Set), Real, RealSmall, and RTS (Restricted Tree Set).
  - `FINAL_heuristic_scores_ML[<random forest model used>]_TEST[<test instance type used>]_<number of instances>.pickle`
- `Test` for different test instances
- `Train` for generated train data
back to top