Content - 9647cb8675c27f5c3884bdcefc8de06f22f3fbd3 - 4024694/FinalReport.md

visit type:
Tip revision: 7d9b4fde9bc98d834dc11cfc0acd2380e6676f0e authored by Richard Elkins on 25 May 2022, 21:06:03 UTC
Merge pull request #317 from texadactyl/master
Tip revision: 7d9b4fd
FinalReport.md
### Summary
The goal of this project was to document turboSETI and make any fixes that stuck out along the way. To do this I started running a debugger from seti_event.py. By starting at the entry point where arguments get parsed I was able to follow the whole pipeline of the code. I deviated from this path here and there to handle files that were used a bit less chronologically, like files with helper functions and file writers. All files in turboSETI now have documentation, though there are a couple areas that could be expanded on. I worked exclusively on files in the find_doppler folder, since the rest of the code already had basic documentation.

### Changes and Suggested Changes
One change I made was a performance improvement that can yield up to a 90% speedup in certain files. In data_handler.py in the load_data function, there was a section of code which adds rows of zeros to a numpy matrix in order to get its length to be a power of 2 (this is because the log base 2 is later taken of this length to assign drift indices). These rows of zeros were being added one by one in a loop that was running thousands of times, which was bottlenecking certain files. I removed this loop and replaced it with a single call to the numpy function which adds all the necessary rows in one call. Note that this section does not seem to bottleneck every file, from what I could tell the maximum speedup occurs in files that do not have hits. This is because the doppler correcting forward and reverse takes much longer than this section, and only occurs when hits are found.

Another thing I noticed was that there are a few functions that are unused or partially implemented. I spent some time especially observing bitrev and its variants since there were three of them which seemed superficially to serve the same purpose. I found that bitrev and bitrev2 in fact do have the same behavior though are implemented differently, meaning bitrev2 is slightly slower. Bitrev3 is the outlier of the three. The bitrev functions serve the purpose of taking in a number, reading it as a binary number, and reversing a certain amount of bits. While bitrev and bitrev2 take that length as an input, allowing the user to flip only the first `nbits` bits (least significant), bitrev3 always assumes that it is dealing with a 32 bit number, meaning that it is more limited than the other two. I also found through testing that bitrev actually behaves inconsistently in its base case. With all inputs of `nbits` > 1, if `nbits` is not the entire length of the number, then any bits past the length are not only not flipped, but are also truncated from the result. For example, decimal 10 = binary 1010 with nbits of 2 returns decimal 1 = binary 0001 instead of decimal 9 = binary 1001. Bitrev2 behaves consistently like this for all `nbits` >0, but bitrev behaves like this for all inputs of `nbits`>1. On an input of `nbits` = 1, bitrev behaves differently, in that it does not truncate. This is because it is caught in a base if-statement that detects an input of 1 and returns the original number. If we want bitrev to behave consistently the way it does with the rest of its inputs, then the first line of the function, 
```if nbits <= 1:```
should be changed to 
``` if nbits < 1:```
This allows the code to properly continue and truncate as it does with all other inputs. Note however that this change is likely unnecessary, since as long as the proper function length is input, this will never be a problem. For more on this, you can see some tests I wrote which illustrate this inconsistency in the tests folder of the testing branch of this fork. That branch still contains bitrev2 and bitrev3, but they have since been removed from turboSETI’s master branch due to the fact that they are not used.

### Problems and Next Steps
The largest issue I ran into while doing this project was my lack of radio astronomy knowledge. There are a few fairly complicated functions like search_data in find_doppler.py which I was not able to fully understand. While I could figure out what the code was doing, I was unable to figure out what exactly was going on at a more abstract level. Due to this, some functions have a blank param tag or two in places where I was unable to come up with a description for an argument, and other functions may not have docstrings as descriptive as others. 

My recommendation for future interns or others interested in adding on to the documentation or otherwise improving this repository would be to work in a group of two, where one person specializes in computer science and one in radio astronomy. Understanding what the code is supposed to do on a higher level would make it easier for the programmer to understand where the code would likely bottleneck, what outputs may be erroneous and what parts of the code may be doing the wrong thing. While one programmer can look things up and read papers on radio astronomy, this is much slower and more difficult than having someone who already knows the stuff sit next to them as they work.

Besides that, I strongly recommend any who plan on working on this to use a debugger, as you get a lot more out of what occurs in the code by stepping through it rather than just reading it and running it. It may be a bit difficult to get a debugger working at first but it is worth it. I used intelliJ with a python plug-in because that’s the program I have the most experience with but there are plenty of others. One issue I had with my debugger was that the relative imports in some files were causing the debugger to error. To fix this I just removed the `.` in front of the relative import statements in file_writers.py, seti_event.py, and find_doppler.py. I just had to remember to change it back before making any pull requests. Like I said though, there are plenty of other debuggers that are better for python, so spend some time trying what works.

Things that could be done for people who want to continue to work on turboSETI are writing tests, expanding documentation even further and filling out the spots I was unable to, and optimizing. I began writing unit tests for some of the files, though they are in no way exhaustive. They were mainly used for me to examine behavior with certain inputs. These can be found in the tests folder of the testing branch of this fork. Note also that some of these tests are for functions which no longer exist in turboSETI, as some functions which had no usages were removed from the master branch. The functions still exist in the testing branch however.
Browse the archive

https://github.com/UCBerkeleySETI/turbo_seti