Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/gabe-dubose/emus
27 October 2023, 08:23:24 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
Revision 7e564642a8534324a40203783498dd8d6501e030 authored by Gabe DuBose on 15 May 2022, 13:05:17 UTC, committed by Gabe DuBose on 15 May 2022, 13:05:17 UTC
Updated simulation script
1 parent 5305461
  • Files
  • Changes
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    • 7e564642a8534324a40203783498dd8d6501e030
    No releases to show
  • 5541b2c
  • /
  • emus
  • /
  • compare_variants.py
Raw File Download Save again
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • revision
  • directory
  • content
  • snapshot
origin badgerevision badge
swh:1:rev:7e564642a8534324a40203783498dd8d6501e030
origin badgedirectory badge
swh:1:dir:27a089d6a1ee0b2918edefba57b84e90f8fbf402
origin badgecontent badge
swh:1:cnt:fbd534092b29e5a82e2866a0cdf9ac3e006c8543
origin badgesnapshot badge
swh:1:snp:2bd7ed1081ff598939e948409013744f1715835b

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • revision
  • directory
  • content
  • snapshot
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
Tip revision: 7e564642a8534324a40203783498dd8d6501e030 authored by Gabe DuBose on 15 May 2022, 13:05:17 UTC
Updated simulation script
Tip revision: 7e56464
compare_variants.py
#!/usr/bin/env python3

import argparse
from collections import Counter
from random import sample
import pandas as pd
import re

#define command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input', metavar="<Input Annotations>", 
    help = "List of annotated variants in .tsv format, where mutation classes are present in the first column")
parser.add_argument('-c', '--comparison', metavar="<Comparison Annotations>", 
    help = "List of annotated variants in .csv format for the input variants to be compared to")
parser.add_argument('-b', '--bootstraps', metavar = "<Number of bootstraps>", 
    help = "Number of times the comparison dataset is to be subsampled (number of bootstraps)")
parser.add_argument('-o', '--output', metavar = "<Output file name>", 
    help = "Handle for results files")
args = parser.parse_args()

class CompareVariants:
    def __init__(self, input, comparison):
        self.input = input
        self.comparison = comparison
    
    #function to retreive all annotations
    def get_annotations(input, comparison):
        mutations_annots = {}
        with open(comparison) as random:
            lines = random.readlines()
            for line in lines:
                mutation_name = line.split('\t')[0].rstrip().lstrip()
                if mutation_name not in mutations_annots:
                    mutations_annots[mutation_name] = list()
        with open(input) as input:
            lines = input.readlines()
            for line in lines:
                mutation_name = line.split('\t')[0].rstrip().lstrip()
                if mutation_name not in mutations_annots:
                    mutations_annots[mutation_name] = list() 
        return mutations_annots
    
    #function to read in comparisons
    def read_comparisons(comparison):
        comparison_mutations = []
        with open(comparison) as comparison:
            lines = comparison.readlines()
            for line in lines:
                header = bool(re.match("^#", line))
                if header == False:
                    mutation = line.split('\t')[0].rstrip().lstrip()
                    comparison_mutations.append(mutation)
        return comparison_mutations
    
    #function to read input
    def get_input_annotations(input_data_set):
        input_data = []
        with open(input_data_set) as input:
            lines = input.readlines()
            for line in lines:
                header = bool(re.match("^#", line))
                if header == False:
                    mutation_name = line.split('\t')[0].rstrip().lstrip()
                    input_data.append(mutation_name)
        return input_data

    #function to get bootstraps
    def get_bootstraps(bootstraps, sample_size, comparison_mutations, comparison_annotations, input_annotations):
        boot_annotations = CompareVariants.get_annotations(comparison_annotations, input_annotations) 
        for i in range(int(bootstraps)):
            random_sample = sample(comparison_mutations, sample_size)
            random_sample_counts = Counter(random_sample)
            for count in random_sample_counts:
                boot_annotations[count].append(random_sample_counts[count])
        return boot_annotations 
    
    #function to get probabilities
    def get_probabilities(frequencies, bootstraps):

        #initialize dictionaries to store counts for probabilities
        prob_positive = {}
        prob_negative = {}
        for mutation in input_frequencies:
            prob_positive[mutation] = 0
            prob_negative[mutation] = 0
        
        for mutation in input_frequencies:
            comparison = bootstrap_comparisons[mutation]
            obs = input_frequencies[mutation]
            for comp in comparison:
                #input is greater than comparison
                if obs >= comp:
                    prob_positive[mutation] += 1
                #input is less than comparison
                if obs <= comp:
                    prob_negative[mutation] += 1
            
        for num in prob_positive:
            prob_positive[num] = prob_positive[num]/int(bootstraps)
        for num in prob_negative:
            prob_negative[num] = prob_negative[num]/int(bootstraps)
        
        return (prob_positive, prob_negative)
       
#read input
input_data = CompareVariants.get_input_annotations(args.input)

#read comparison
comparison_mutations = CompareVariants.read_comparisons(args.comparison)

#get bootstraps
input_size = len(input_data)
bootstrap_comparisons = CompareVariants.get_bootstraps(bootstraps=args.bootstraps, sample_size=input_size, comparison_mutations=comparison_mutations, 
    comparison_annotations=args.comparison, input_annotations=args.input)

#define comparison data and initial frequencies
comparison_df = pd.DataFrame(list(bootstrap_comparisons.values()), index=bootstrap_comparisons.keys()).transpose()
input_frequencies = Counter(input_data)

#get probability results
probs = CompareVariants.get_probabilities(input_frequencies, args.bootstraps)
probs_greater = probs[0]
probs_less = probs[1]

#write results
output_file = f"{args.output}.tsv"
with open(output_file, 'a') as output:
    output.write(f'Mutation\tP(Obs>=Sim)\tP(Obs<=Sim)\n')
    for result in probs_greater:
        output.write(f"{result}\t{probs_greater[result]}\t{probs_less[result]}\n")

    #output.write(f'P(Observed >= Simulation)\n')
    #for result in probs_greater:
        #output.write(f"{result}\t{probs_greater[result]}\n")
    #output.write(f'\n')
    #output.write(f'P(Observed <= Simulation)\n')
    #for res in probs_less:
        #output.write(f"{res}\t{probs_less[res]}\n")

#write datafile for plotting
with open(f'{args.output}.bootstraps.tsv', 'a') as datafile:
    datafile.write('#Mutation\tObserved\tBootstraps\n')
for mutation in input_frequencies:
    bootstraps = bootstrap_comparisons[mutation]
    bootstraps_out = ''
    for boot in bootstraps:
        bootstraps_out = bootstraps_out + f'{boot},'
    observed = input_frequencies[mutation]
    with open(f'{args.output}.bootstraps.tsv', 'a') as datafile:
        datafile.write(f'{mutation}\t{observed}\t{bootstraps_out.rstrip()}\n')
The diff you're trying to view is too large. Only the first 1000 changed files have been loaded.
Showing with 0 additions and 0 deletions (0 / 0 diffs computed)
swh spinner

Computing file changes ...

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API