Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

Revision 09787369109a391eccf11783fae8eb1c402f1e89 authored by Wesley Tansey on 08 September 2015, 21:42:48 UTC, committed by Wesley Tansey on 08 September 2015, 21:42:48 UTC
Added initial FDR-L algorithm implementation
1 parent 0e6b846
  • Files
  • Changes
  • 1bc911b
  • /
  • README.md
Raw File Download
Permalinks

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
revision badge
swh:1:rev:09787369109a391eccf11783fae8eb1c402f1e89
directory badge Iframe embedding
swh:1:dir:1bc911b8a5e8f10ae4533a1fce342d9956528873
content badge Iframe embedding
swh:1:cnt:0e890ecb1e1b0dfa4e77516846fa688b83ad0234
Citations

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
README.md
False Discovery Rate Smoothing (smoothfdr)
------------------------------------------

The `smoothfdr` package provides an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false-discovery rate at a given level. This results in increased power and cleaner spatial separation of signals from noise. It tends to detect patterns that are much more biologically plausible than those detected by existing FDR-controlling methods.

For a detailed description of how FDR smoothing works, see the [paper on arXiv](http://arxiv.org/abs/1411.6144).

Installation
============
To install the Python version:

```
pip install smoothfdr
```

You can then run the tool directly from the terminal by typing the `smoothfdr` command. If you want to integrate it into your code, you can simply `import smoothfdr`.

The R source and package will be on CRAN and available publicly soon.


Running an example
==================

There are lots of parameters that you can play with if you so choose, but one of the biggest benefit of FDR smoothing is that you don't have to worry about it in most cases.

To run a simple example, we can generate our own synthetic data:

```
smoothfdr --signals_file test.signals --generate_data --signal_dist_name alt1 --estimate_signal --solution_path --data_file test.data \
 --plot_data test_data.pdf --plot_signal test_signal.pdf --plot_true_signal --plot_path test_path.pdf --plot_results test_results.pdf \
 --verbose 1 \
 2d
```

This will run the algorithm on a synthetic dataset that is generated on-the-fly. The algorithm will auto-tune its parameters by following a solution path approach where it tries multiple values. The result will be several plots:

### Visualizations of the true priors and raw data

![Visualizations of the true priors and raw data](https://raw.githubusercontent.com/tansey/smoothfdr/master/data/test_data.png)

### Density plots of the true and estimated signal distributions

![Density plots of the true and estimated signal distributions](https://raw.githubusercontent.com/tansey/smoothfdr/master/data/test_signal.png)

### Solution path diagnostics

![Solution path diagnostics](https://raw.githubusercontent.com/tansey/smoothfdr/master/data/test_path.png)

### Resulting plateaus detected

![Resulting plateaus detected](https://raw.githubusercontent.com/tansey/smoothfdr/master/data/test_results.png)

For a detailed list of commands, just run `smoothfdr -h`.

Running an example on an arbitrary graph
========================================

If your problem is not structured simply according to a 1d, 2d, or 3d grid, then you will want to run using the __graph__ type. This uses the `pygfl` package to solve the underlying graph-fused lasso problem. You'll need a CSV file containing the list of edges in the format:

```
0,1
1,2
2,10
5,15
3,0
...
```

where the first and second numbers correspond to the index of the node in the graph. Assuming this file is `test/edges.csv`, the first step is to generate the setup files for `pygfl`:

```
maketrails file --infile test/edges.csv --savet test/trails.csv
```

You can then run FDR smoothing using the `graph` setting, applied to some z-score file called `test/data.csv`:

```
smoothfdr --data_file test/data.csv \
--empirical_null --estimate_signal --solution_path --dual_solver graph --fdr_level 0.1 \
--save_weights test/weights.csv --save_posteriors test/posteriors.csv --save_signal test/signal.csv --save_plateaus test/plateaus.csv \
--plot_path test/plot_path.pdf --plot_signal test/plot_signal.pdf \
graph --trails test/trails.csv
```

The first line simply specifies the data file containing the vector z-scores. The second line specifies that the null and alternative (signal) distributions should be estimated from the data, that a solution path approach should be used to auto-tune the hyperparameters, and that we're using a false discovery rate of 10%. The third and fourth lines specify where to save the various types of output from the algorithm-- note that the weights (AKA priors) and posteriors are what you are most likely interested in here. Finally, the last line specifies we're using the graph-fused lasso solver and provides the setup file we generated previously.



































The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

Software Heritage — Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Contact— JavaScript license information— Web API

back to top