https://github.com/jvivian/one_off_scripts

sort by:
Revision Author Date Message Commit Date
3ad04be Refactor SRA pipeline to use faster method than fastq-dump Pull SRA data from FTP and convert locally Run cutadapt directly skipping unnecessary pre-processing step in rna-seq pipeline 02 February 2017, 01:31:18 UTC
7843cb5 Add additional fastq-dump parameters 20 January 2017, 09:40:19 UTC
cd867db Add SRA manifest 02 January 2017, 00:01:43 UTC
4f2f094 Add gz flag to config attributes 02 January 2017, 00:01:21 UTC
5a3ea3d Fix path for globbed fastqs 02 January 2017, 00:01:06 UTC
cf154da Make output dir for failed samples: "failed-samples" 02 January 2017, 00:00:42 UTC
854cac8 Add cores attribute 02 January 2017, 00:00:08 UTC
42b429f Manifest partitions 01 January 2017, 23:59:14 UTC
69fec21 Example config 01 January 2017, 04:45:34 UTC
27204ff Initial pipeline commit 01 January 2017, 04:45:20 UTC
a34c861 Initial commit for SRA-CGL-RNASeq pipeline 01 January 2017, 03:15:45 UTC
2ea61ea Add encryption to upload 20 December 2016, 20:33:55 UTC
a9997da Short script for packaging / transferring beatAML data 20 December 2016, 19:13:48 UTC
30c440f serial gzip of fastqs 13 December 2016, 13:39:34 UTC
955c281 Replaced start due to module loading issue Confirm still an issue to issue 1000 jobs from one child? 20000? 13 December 2016, 11:20:43 UTC
4877c9e Handy collection of files for creating test inputs 13 December 2016, 08:34:26 UTC
c2ef039 Committing to get out of my git history 13 December 2016, 08:34:02 UTC
2dcd1f5 Correct comment typo 13 December 2016, 08:32:32 UTC
5db70a9 Cython example code 13 December 2016, 08:32:21 UTC
8284631 Finish process and upload step PEP 13 December 2016, 08:31:49 UTC
131c513 initial commit 13 December 2016, 07:53:22 UTC
14fb2d2 Fix bad split, skip existing files. 25 May 2016, 22:47:25 UTC
aabf387 Clarified key path 25 May 2016, 21:50:52 UTC
4f22cc0 Python hello world 25 May 2016, 21:50:37 UTC
338e8db For re-encrypting data using per-file keys derived from a master 25 May 2016, 21:50:19 UTC
fd51b68 Generate signed URL for SSEC downloads 25 May 2016, 21:48:49 UTC
4522d02 Convert back to boto2 25 May 2016, 21:48:13 UTC
447195b Initial idea for jenkins.py for toil-scripts 25 May 2016, 21:47:41 UTC
3951b22 Script for uploading to Ceph If boto credentials not setup appropriately 25 May 2016, 21:47:00 UTC
cb3a4cb Defuckifing the results of your paper before submitting it is a good idea! 25 May 2016, 21:46:24 UTC
306a9ec Delete SDB artifacts 23 March 2016, 01:34:26 UTC
8cc97c2 Functionalized id retrieval instead of slicing 23 March 2016, 00:55:05 UTC
5d0a92f Made start_time and end_time optional 23 March 2016, 00:54:40 UTC
ca412e6 Compacted get_instance_ids 23 March 2016, 00:54:26 UTC
b6b1fe8 gitignore for pyc 23 March 2016, 00:53:54 UTC
3918676 fixed ridiculous os.path.join bug 22 March 2016, 23:50:38 UTC
7a2c68e bug fixes 22 March 2016, 20:54:15 UTC
ab98362 My version of the upload directory to s3 script 22 March 2016, 20:43:47 UTC
478a570 Modified documentation 02 March 2016, 07:01:51 UTC
56c1ce0 Fixed invocation of pipeline for restart, wiggle, and save_bams 28 February 2016, 17:22:56 UTC
b2e97b3 Renamed as no longer for scaling tests Script with sub parsers for: - Creating config for scaling tests - `create-config` - Launching a cluster (with cgcloud) - `launch-cluster` - Launching a pipeline - `launch-pipeline` - Real time metric collection `launch-metrics` 25 February 2016, 23:00:21 UTC
5d8ad00 Script to generate metric plots and estimate cost Given a directory of metrics produced from `launch-metrics` of the automated_scaling_tests script, produce a plot of metrics and estimate costs. 25 February 2016, 22:57:29 UTC
08fce62 Added `--share` to create-config options 25 February 2016, 22:51:42 UTC
b577f7b Stupid typo 23 February 2016, 20:08:42 UTC
5581b70 Made saving wiggle and bams optional via cmd line arguments 23 February 2016, 20:07:35 UTC
f792666 Added "Zone" as option to launch-cluster CGCloud now requires a `--zone`. 21 February 2016, 17:16:17 UTC
24cc3b3 Merge pull request #6 from arkal/master No longer have parallely running instances of s3am 17 February 2016, 23:41:38 UTC
ed276d1 No longer have parallelly running instances of s3am. 17 February 2016, 22:36:13 UTC
d0f7a4b Metric collection and instance termination now more aggressive 14 February 2016, 19:18:41 UTC
a8d60de Peppy 11 February 2016, 22:53:36 UTC
91b89ac Modularized "Uber script" Added timestamps to logging Added sub parsers to each "part" of the program Pipeline now launched via a screen 11 February 2016, 22:52:23 UTC
a816bf5 Merge pull request #5 from arkal/master encrypt_files_in_dir_to_s3.py now universally uses --sse-key-base64 10 February 2016, 02:48:42 UTC
5e5c807 Added timestamp to logging. Fixed critical error where a comprehension was run before checking if it contained anything. Don't worry, definitely didn't fail right at the start of the large recompute project our lab spent a month preparing for. Removed call to modify launch script since it was deprecated. 10 February 2016, 02:46:08 UTC
6d8007a YAMR: Yet Another Massive Refactor Certain variables pulled out as top-level vars for development Removed launch script editing, directly call pipeline. Pipeline is now run a second time with `--restart` if exits with non-zero status code. Remove alarms, instances now directly terminated via boto. Datapoints are stored as named tuples, now raw dumped to a file with no processing. 09 February 2016, 00:45:51 UTC
e3361ab Used function 'id' in place of str 'instance_id' 03 February 2016, 17:19:34 UTC
b0b21d8 Complete refactor Metrics now collected and raw dumped to a file in real time (1 hour intervals) Workers are now killed during metric collection when idle. Removed alarm application in lieu of `boto.terminate_instances()` Replaced cost calculations with "Max" costs to simulate if entire cluster were running. Will have to develop a new method to analyze costs given an average hourly cost and the generated metrics array that represents time 03 February 2016, 09:45:09 UTC
75ff824 ensure state is running 03 February 2016, 05:58:42 UTC
35858e9 Added try/except block in case of failure collecting cost values. 01 February 2016, 20:09:30 UTC
a215da6 Simplified output Added try/except block for pipeline launch. 01 February 2016, 17:23:16 UTC
3283aaa Added backoff for metric collection 31 January 2016, 00:12:56 UTC
9eb0676 Improved blocking. Added backoff for alarm application Made run_report more robust. Fixed type error. Output log.txt on leader 31 January 2016, 00:11:55 UTC
c4572ac Added TypeError in block_workers() for bizarre boto auth failure Collect metrics before killing workers in case metric collection takes too long. 28 January 2016, 08:28:16 UTC
8802f9b Removed all plotting / pruning. Added 'Paging' for long running instances collect_metrics accepts time.time() floats for start and stop fixed collection period to be 5 min 27 January 2016, 06:16:39 UTC
b20dc35 Refactored to handle metric collection refactor 27 January 2016, 05:51:40 UTC
f19ee2b Added precautions to avoid preemptive shutdown 25 January 2016, 19:38:01 UTC
d07c7de Added date to folder where run_report.txt is written. 23 January 2016, 18:38:02 UTC
0e2109c 50 char limit on s3 buckets (and no underscores). 22 January 2016, 22:15:23 UTC
5b6a62d cleanup 22 January 2016, 17:56:21 UTC
57c2603 Added date to s3_dir, add try/except block on blocking function in case instance goes down. 22 January 2016, 17:20:40 UTC
c344aa4 Added doctoring 22 January 2016, 17:19:45 UTC
c84f498 Added generalized function for applying alarms to an instance 22 January 2016, 17:03:11 UTC
cfbf988 Moved buffer time back to 15 minutes. 22 January 2016, 03:31:47 UTC
f61a60c Add uuid for consistency with automation pipeline 22 January 2016, 03:30:13 UTC
665a57c Kill leader, standard UUID, log output. 21 January 2016, 18:21:08 UTC
6943441 structure change for automated scaling tests 21 January 2016, 18:20:18 UTC
df9c438 Added avail zone to boto_lib 21 January 2016, 18:19:41 UTC
15ed4b0 Changed alarm and termination mechanism Instances are periodically checked for low CPU usage. Once all instances at <1 CPU for 15 minutes, apply "insta-kill" alarm that terminates all workers. 19 January 2016, 17:08:21 UTC
98900d0 Removed parallelization (AWS doesn't support) added try/except block for instances that don't return metrics. These are subsequently removed from the instances pool. 19 January 2016, 15:41:14 UTC
4748f11 Automated pipeline for scalings tests for Toil recompute 18 January 2016, 07:30:10 UTC
ae5acf8 Made primary plotting function generic to any metric 18 January 2016, 07:29:37 UTC
f57b31c Improved plots for aggregate metric data 11 January 2016, 22:23:06 UTC
577b289 library of boto functions 11 January 2016, 22:22:36 UTC
1592519 vertical plot of cpu, disk, and networking 07 January 2016, 04:52:02 UTC
1e7ba3b Code refactor for more accurate avg pricing and total pricing. 30 December 2015, 21:03:19 UTC
563c9bd Now returns an answer if instance is actively running 24 December 2015, 06:42:59 UTC
f4e6f00 Help menu improvements 15 December 2015, 22:36:53 UTC
dfe3ec5 args fix 15 December 2015, 20:27:33 UTC
0de4c2e Master key is now an optional argument. Running without master key will transfer to S3 BUCKET without encryption. 08 December 2015, 22:21:24 UTC
4f49bc5 Minor adjustments 08 December 2015, 18:29:29 UTC
b951828 pack static values, fix spacing in main() docstring 08 December 2015, 18:28:22 UTC
58a6d3b PEP8 compliance 08 December 2015, 18:17:45 UTC
bc2c043 Actually works now! Hoorah 08 December 2015, 18:04:10 UTC
109e904 Calculates the ec2 spot instance cost given instanceID and instanceType Needs availability zone specification 08 December 2015, 08:19:26 UTC
622f148 encrypt_files_in_dir_to_s3.py now universally uses --sse-key-base64 as a s3am argument. 17 November 2015, 23:07:44 UTC
44bcdb7 Merge pull request #4 from arkal/master Fixed to use the correct remote s3 url if -R is provided 17 November 2015, 23:01:32 UTC
42c5174 Fixed to use the correct remote s3 url if -R is provided 17 November 2015, 22:19:58 UTC
c1e7a4f Merge pull request #3 from arkal/master Added clause to handle quotes in the key. 17 November 2015, 19:19:20 UTC
f850f95 Added clause to handle quotes in the key. Such keys will be passed as files since they will corrupt the list of strings passed to popen otherwise. 17 November 2015, 18:20:42 UTC
5b8cd66 Merge pull request #2 from arkal/master Refactored encrypt_files_in_dir_to_s3.py 06 November 2015, 23:52:52 UTC
79e1841 Refactored encrypt_files_in_dir_to_s3.py to have a main function, arguments parsed through argparse, and now the script accepts multiple files, a folder of a files, and even a folder with subfolders. encrypt_files_in_dir_to_s3.py also now attempts to pass the key itsef to s3am instead of writing it out somewhere. However, if the key starts with a - character, the key is written to a temp directory and the key file is passed to s3am. The directory is deleted on exit. Pylinted for PEP8 compliance. 06 November 2015, 22:46:32 UTC
back to top