https://gitlab.com/makhlaghi/maneage-paper.git

sort by:
Revision Author Date Message Commit Date
54e4eb2 Updated SoftwareHeritage IDs (SWHIDs) to previous commit Until now, the SWHIDs mentioned in the paper were for the published version of this paper, but updates have come since then, so to be complete in the final arXiv release, it is important that they point to the most recent status before it. With this commit, the relevant SWHIDs of this paper (and Maneage itself) now point to the most recent commit (last commit). 09 May 2022, 22:25:46 UTC
f0a9b31 Updated Zenodo DOI for third arXiv release Until now, the Zenodo identifier for the project was for the second arXiv release (after the first referee reports). However, since the paper has been published, it hasn't been updated on arXiv and its necessary to make a "final" arXiv publication. With this commit, a new Zenodo DOI has been reserved for the third release and is now being used. 09 May 2022, 22:02:37 UTC
9fdeeba Imported recent updates in Maneage, conflicts fixed Until now, Maneage had undergone some updates. With this commit, those updates have been imported and the conflicts that resulted were fixed. They were all cosmetic and had no effect on the analysis. The most significant one was about the change in the format of 'INPUTS.conf'. In the process, I also noticed that the IEEEtran LaTeX package is now called 'ieeetran' (the 'tlmgr' of TeXLive 2022 was failing). 09 May 2022, 21:52:29 UTC
f51b5e2 ./project: make clean removes extra tex files in top source directory Until now, the './project make clean' command would only clean (remove) the PDF file from the top source directory. However, if a user would run LaTeX outside of Maneage, many extra latex output such as *.aux, *.log, *.synctex and etc would be produced in the top source directory. These files can interfere with './project make'. With this commit, when './project make clean' is run, any possibly existing LaTeX temporary files will also be deleted from the top source directory. This problem was first reported by Matin Torkian. 08 May 2022, 10:00:07 UTC
597d1df Updated Git, Coreutils and Emacs, new script to prepare tarballs Until now, one had to follow the instructions from [1] to prepare a standard software tarball before merging with the low-level tarballs-software repository [2]. The script only worked for '.tar.gz' suffix and was only available as a comment on Savannah (in [1]). With this commit, the script has been imported into Maneage as 'reproduce/software/shell/tarball-prepare.sh' to simplify future software updates. It work with all supported '.tar.*' suffixes (of the upstream tarball repository) and will convert the tarballs to Maneage's standard format. Also, this script has a minimal argument parser and can skip the tarballs that are already unpacked, allowing faster tests. This script was used to update the versions of: Coreutiles 9.0 --> 9.1 Git 2.34 --> 2.36 Emacs 27.2 --> 28.1 The main motive behind this update was Git which announced a vulnerability issue [3] and suggested an update to the latest version as soon as possible. More detail is described in this github blog [4], but in summary, it was a security issue on multi-user systems that has been found and fixed by Git developers. Since Maneage is often installed on such shared systems, it was important to make this update. GNU Coreutils and GNU Emacs were also updated because they are also commonly used. The following improvements have also done with this commit: - .gitignore: ignore emacs auto-save files (that end with a '#') - README-hacking.md: In the checklist for updating the Maneage branch, the no-longer-necessary '--decorate' option of Git was removed from the command to check the general branch history. [1] https://savannah.nongnu.org/task/?15699 [2] https://git.maneage.org/tarballs-software.git/ [3] https://lore.kernel.org/git/xmqqv8veb5i6.fsf@gitster.g/ [4] https://github.blog/2022-04-12-git-security-vulnerability-announced/ 20 April 2022, 08:21:07 UTC
7726397 ./project: new --refresh-bib to force-build bibliography Until now, the bibliography was only re-built when 'tex/src/references.tex' was modified. This is useful in many regular cases because building the bibliography can slow down the build and it is in-efficient to built it in every edit of the text of the paper. However, it can be inconvenient when a change in the paper's bibliography is necessary, without actually editing 'references.tex' (for example when you are removing a citation from the text). This happens because Make is only sensitive to file modification time. In this case, Make does not see the need to create a new 'bib' file because the 'tex/src/reference' is not changed, and only the 'paper.tex' is changed. Make is totally 'blind' to the new 'citation' defined in 'paper.tex'. As a workaround, until now users were forced to manually change the 'tex/src/references.tex' file modification date: either by altering the content, or using the 'touch' command. With this commit, the '--refresh-bib' is added to './project' arguments to address this issue. It will just 'touch' the 'tex/src/references.tex' file before calling Make. In effect, this will 'force' Make to create the bibliography file, even if 'tex/src/references.tex' hasn't been updated. 15 April 2022, 03:37:21 UTC
91799fe IMPORTANT: more generic, robust and secure INPUTS.conf and download.mk SUMMARY: it is necessary to update your 'INPUTS.conf' and 'download.mk'. Until now, adding an input file involved several steps that needed manual (and inconvenient!) intervention: for every file, you needed to define four variables in 'INPUTS.conf', and in 'reproduce/analysis/make/download.mk' you had to use a (complex for large number of files) shell 'if/elif/else' condition to link the names of the input files to those variables. Besides inconvenience, this could cause bugs (typos!). Furthermore, a basic MD5 checksum was used for verifying the files. With this commit, a new structure has been defined for 'INPUTS.conf' that (thanks to some pretty useful GNU Make features), removes the need for users to manually edit 'reproduce/analysis/make/download.mk', and reduces the number of variables necessary for each file to three (from four). Furthermore, we now use the SHA256 checksum for input data validation. Regarding the trick used in 'INPUTS.conf' (form the newly added description in 'download.mk'): In GNU Make, '.VARIABLES' "... expands to a list of the names of all global variables defined so far" (from the "Other Special Variables" section of the GNU Make manual). Assuming that the pattern 'INPUT-%-sha256' is only used for input files, we find all the variables that contain the input file names (the '%' is the filename). Finally, using the pattern-substitution function ('patsubst'), we remove the fixed string at the start and end of the variable name. Steps you need to take: - INPUTS.conf: translate your old format to the new format (after carefully reading the description in the comments at the start of the file). After applying the new standards, you don't need to use the variables of 'INPUTS.conf' directly in your Makefiles! For example if one of your input datasets is called 'abc.fits', the checksum variable will be 'INPUT-abc.fits-sha256' and in your high-level Makefiles, you can simply set '$(indir)/abc.fits' as a prerequisite (like you probably did already). - reproduce/analysis/make/download.mk: for the definition and rule of 'inputdatasets', simply use the Maneage branch, and remove anything you had added in your project. In the process, I also noticed that 'README-hacking.md' still referred to 'master' as the main project branch, while we have used 'main' in the paper (and is the common convention with Git). 15 April 2022, 03:22:19 UTC
c5d7f2a Bug fix: wrong definition of the prepare directory is corrected Until now, the definition of the prepare directory was wrong (not in the 'analysis' directory of the build directory). I noticed this after an update of the Maneage branch of one project that requires the prepare step. With this commit, this problem has been fixed. 10 March 2022, 11:12:23 UTC
470803e paper.tex: fix double dash that was not showing up in output pdf Until now, the 'double dash' (i.e. \texttt{--}) in the default 'paper.tex' would only print one (longer) dash in the output pdf. With this commit, the double dashes are replaced with '-{}-' in the LaTeX source as a workaround suggested by Stefan Kottwitz in [1]. [1] https://latex.org/forum/viewtopic.php?f=44&t=4670&start=0 07 March 2022, 13:50:20 UTC
8463df9 IMPORTANT: Updates to almost all software This commit primarily affects the configuration step of Maneage'd projects, and in particular, updated versions of the many of the software (see P.S.). So it shouldn't affect your high-level analysis other than the version bumps of the software you use (and the software's possibly improve/changed behavior). The following software (and thus their dependencies) couldn't be updated as described below: - Cryptography: isn't building because it depends on a new setuptools-rust package that has problems (https://savannah.nongnu.org/bugs/index.php?61731), so it has been commented in 'versions.conf'. - SecretStorage: because it depends on Cryptography. - Keyring: because it depends on SecretStorage. - Astroquery: because it depends on Keyring. This is a "squashed" commit after rebasing a development branch of 60 commits corresponding to a roughly two-month time interval. The following people contributed to this branch. - Boudewijn Roukema added all the R software infrastructure and the R packages, as well as greatly helping in fixing many bugs during the update. - Raul Infante-Sainz helped in testing and debugging the build. - Pedram Ashofteh Ardakani found and fixed a bug. - Zahra Sharbaf helped in testing and found several bugs. Below a description of the most noteworthy points is given. - Software tarballs: all updated software now have a unified format tarball (ustar; if not possible, pax) and unified compression (Lzip) in Maneage's software repository in Zenodo (https://doi.org/10.5281/zenodo.3883409). For more on this See https://savannah.nongnu.org/task/?15699 . This won't affect any extra software you would like to add; you can use any format recognized by GNU Tar, and all common compression algorithms. This new requirement is only for software that get merged to the core Maneage branch. - Metastore (and thus libbsd and libmd) moved to highlevel: Metastore (and the packages it depends on) is a high-level product that is only relevant during the project development (like Emacs!): when the user wants the file meta data (like dates) to be unchanged after checking out branches. So it should be considered a high-level software, not basic. Metastore also usually causes many more headaches and error messages, so personally, I have stopped using it! Instead I simply merge my branches in a separate clone, then pull the merge commit: in this way, the files of my project aren't re-written during the checkout phase and therefore their dates are untouched (which can conflict with Make's dates on configuration files). - The un-official cloned version of Flex (2.6.4-91 until this commit) was causing problems in the building of Netpbm, so with this commit, it has been moved back to version 2.6.4. - Netpbm's official page had version 10.73.38 as the latest stable tarball that was just released in late 2021. But I couldn't find our previously-used version 10.86.99 anywhere (to see when it was released and why we used it! Its at last more than one year old!). So the official stable version is being used now. - Improved instructions in 'README.md' for building software environment in a Docker container (while having project source and output data products on the local system; including the usage of the host's '/dev/shm' to speed up temporary operations). - Until now, the convention in Maneage was to put eight SPACE characters before the comment lines within recipes. This was done because by default GNU Emacs (also many other editors) show a TAB as eight characters. However, in other text editors, online browsers, or even the Git diff, a TAB can correspond to a different number of characters. In such cases, the Maneage recipes wouldn't look too interesting (the comments and the recipe commands would show a different indentation!). With this commit, all the comment lines in the Makefiles within the core Maneage branch have a hash ('#') as their first character and a TAB as the second. This allows the comment lines in recipes to have the same indentation as code; making the code much more easier to read in a general scenario including a 'git diff' (editor agnostic!). P.S. List of updated software with their old and new versions - Software with no version update are not mentioned. - The old version of newly added software are shown with '--'. Name (Basic) Old version New version ------------ ----------- ----------- Bzip2 1.0.6 1.0.8 CURL 7.71.1 7.79.1 Dash 0.5.10.2 0.5.11.5 File 5.39 5.41 Flock 0.2.3 0.4.0 GNU Bash 5.0.18 5.1.8 GNU Binutils 2.35 2.37 GNU Coreutils 8.32 9.0 GNU GCC 10.2.0 11.2.0 GNU M4 1.4.18 1.4.19 GNU Readline 8.0 8.1.1 GNU Tar 1.32 1.34 GNU Texinfo 6.7 6.8 GNU diffutils 3.7 3.8 GNU findutils 4.7.0 4.8.0 GNU gmp 6.2.0 6.2.1 GNU grep 3.4 3.7 GNU gzip 1.10 1.11 GNU libunistring 0.9.10 1.0 GNU mpc 1.1.0 1.2.1 GNU mpfr 4.0.2 4.1.0 GNU nano 5.2 6.0 GNU ncurses 6.2 6.3 GNU wget 1.20.3 1.21.2 Git 2.28.0 2.34.0 Less 563 590 Libxml2 2.9.9 2.9.12 Lzip 1.22-rc2 1.22 OpenSLL 1.1.1a 3.0.0 Patchelf 0.10 0.13 Perl 5.32.0 5.34.0 Podlators -- 4.14 Name (Highlevel) Old version New version ---------------- ----------- ----------- Apachelog4cxx 0.10.0-603 0.12.1 Astrometry.net 0.80 0.85 Boost 1.73.0 1.77.0 CFITSIO 3.48 4.0.0 Cmake 3.18.1 3.21.4 Eigen 3.3.7 3.4.0 Expat 2.2.9 2.4.1 FFTW 3.3.8 3.3.10 Flex 2.6.4-91 2.6.4 Fontconfig 2.13.1 2.13.94 Freetype 2.10.2 2.11.0 GNU Astronomy Utilities 0.12 0.16.1-e0f1 GNU Autoconf 2.69.200-babc 2.71 GNU Automake 1.16.2 1.16.5 GNU Bison 3.7 3.8.2 GNU Emacs 27.1 27.2 GNU GDB 9.2 11.1 GNU GSL 2.6 2.7 GNU Help2man 1.47.11 1.48.5 Ghostscript 9.52 9.55.0 ICU -- 70.1 ImageMagick 7.0.8-67 7.1.0-13 Libbsd 0.10.0 0.11.3 Libffi 3.2.1 3.4.2 Libgit2 1.0.1 1.3.0 Libidn 1.36 1.38 Libjpeg 9b 9d Libmd -- 1.0.4 Libtiff 4.0.10 4.3.0 Libx11 1.6.9 1.7.2 Libxt 1.2.0 1.2.1 Netpbm 10.86.99 10.73.38 OpenBLAS 0.3.10 0.3.18 OpenMPI 4.0.4 4.1.1 Pixman 0.38.0 0.40.0 Python 3.8.5 3.10.0 R 4.0.2 4.1.2 SWIG 3.0.12 4.0.2 Util-linux 2.35 2.37.2 Util-macros 1.19.2 1.19.3 Valgrind 3.15.0 3.18.1 WCSLIB 7.3 7.7 Xcb-proto 1.14 1.14.1 Xorgproto 2020.1 2021.5 Name (Python) Old version New version ------------- ----------- ----------- Astropy 4.0 5.0 Beautifulsoup4 4.7.1 4.10.0 Beniget -- 0.4.1 Cffi 1.12.2 1.15.0 Cryptography 2.6.1 36.0.1 Cycler 0.10.0 0.11.0+} Cython 0.29.21 0.29.24 Esutil 0.6.4 0.6.9 Extension-helpers -- 0.1 Galsim 2.2.1 2.3.3 Gast -- 0.5.3 Jinja2 -- 3.0.3 MPI4py 3.0.3 3.1.3 Markupsafe -- 2.0.1 Numpy 1.19.1 1.21.3 Packaging -- 21.3 Pillow -- 8.4.0 Ply -- 3.11 Pyerfa -- 2.0.0.1 Pyparsing 2.3.1 3.0.4 Pythran -- 0.11.0 Scipy 1.5.2 1.7.3 Setuptools 41.6.0 58.3.0 Six 1.12.0 1.16.0 Uncertainties 3.1.2 3.1.6 Wheel -- 0.37.0 Name (R) Old version New version -------- ----------- ----------- Cli -- 2.5.0 Colorspace -- 2.0-1 Cowplot -- 1.1.1 Crayon -- 1.4.1 Digest -- 0.6.27 Ellipsis -- 0.3.2 Fansi -- 0.5.0 Farver -- 2.1.0 Ggplot2 -- 3.3.4 Glue -- 1.4.2 GridExtra -- 2.3 Gtable -- 0.3.0 Isoband -- 0.2.4 Labeling -- 0.4.2 Lifecycle -- 1.0.0 Magrittr -- 2.0.1 MASS -- 7.3-54 Mgcv -- 1.8-36 Munsell -- 0.5.0 Pillar -- 1.6.1 R-Pkgconfig -- 2.0.3 R6 -- 2.5.0 RColorBrewer -- 1.1-2 Rlang -- 0.4.11 Scales -- 1.1.1 Tibble -- 3.1.2 Utf8 -- 1.2.1 Vctrs -- 0.3.8 ViridisLite -- 0.4.0 Withr -- 2.4.2 21 January 2022, 00:15:24 UTC
480184b Fixed faulty spell check correction in software name As part of Commit 87b510bc, an Emacs spell check was run on the paper. However, during the process, the Jupyter add-on name 'nbextensions' was mistakenly "corrected" to "extension's"! With this commit, it has been corrected to its correct name. The commit message was edited to add more clarity/context, also Florian's name has been added in the acknowledgments by Mohammad. 22 November 2021, 21:58:20 UTC
775fc03 Configuration: GCC not linking to system libunwind (crashed GCC's build) This commit provides a hack/correction to the unwrapped GCC source files that sym-links the generic file 'libgcc/unwind-generic.h' to the two directories in which a file includes "unwind.h" or <unwind.h>. The aim is that the gcc compilation system uses this header file from the internal gcc source files instead of searching for a system-level file 'unwind.h'. This commit also unaliases two 'ls' commands in some build recipes of 'basic.mk' in case the host system (normally at user level) has aliased the command to something like 'ls -F'. In the situation that sometimes occurs of library files being given executable status, the '-F' decorative option could lead to an asterisk being included in a string that is not expected to contain asterisks. If the system shell does not contain the 'alias' command at all, then a fallback of 'true' should provide safe behaviour. The notation of the 'sed' command is also clarified. This solves bug #61240: https://savannah.nongnu.org/bugs/index.php?61240 01 October 2021, 14:15:41 UTC
3a1b967 Configuration: fixed bugs in building of OpenSSL and Gettext Until now, the 'RPATH' variable (specifying where to look for shared libraries) wasn't being set in the 'libcrypto' library of OpenSSL (it was only set for the 'libssl' library). Also, Gettext used the host Emacs for some operations during installation that could cause the following crash (because we are giving priority to local libraries, which the host Emacs doesn't recognize): emacs: /BDIR/libcrypto.so.1.1: version `OPENSSL_1_1_1b' not found (required by /lib64/libk5crypto.so.3) With this commit both these bugs have been fixed: 1) Patchelf is run on the 'libcrypto' library also and 2) we pass the '--without-emacs' configuration option to the configure script of Gettext. These bugs were found by Elham Saremi. 12 July 2021, 17:14:46 UTC
ae5fb4d Copyedits in appendices, suggested by Antonio Dı́az Dı́az Antonio kindly proposed these corrections (mostly in Appendix A, but one also at the start of Appendix B). They are fixed with this commit. 02 July 2021, 19:57:18 UTC
a3358bb Affiliations in body: France added after Lyon for Mohammad While looking at the affiliations, I noticed that "France" was missing in my Lyon affiliation! Also, for both Boud and myself it was necessary to put a '.' after 'Univ' because its short for University and not a full word. 25 June 2021, 18:53:42 UTC
016d938 Configuration: New check to see if /dev/shm allows execution On systems that allow it (like GNU/Linux systems), Maneage will build the necessary software in shared memory (a directory that is actually in the RAM, not on an SSD/HDD, on GNU/Linux systems, it is '/dev/shm'). This allows Maneage to operate faster and not harm the HDD/SSD with all the temporary writing of many small files. Until now, we would only check that this directory exists and that it has enough space. However, some systems also set the 'noexec' flag on shared memory for security reasons [1]. This causes Maneage to crash upon building of the software in later phases. With this commit, at the very start of the configuration step, and after all other shared-memory checks are done, a dummy executable script file is created there and its execution is tested. If it doesn't work, shared memory will not be used at all. In the process, the steps dealing with the software building directory in the configure script have been brought in one place and comments were added to further clarify every step. This commit was initially done by Boud Roukema and later edited by Mohammad Akhlaghi. [1] https://web.archive.org/web/20210624192819/https://serverfault.com/questions/72356/how-useful-is-mounting-tmp-noexec 25 June 2021, 18:20:02 UTC
c61d44e Appendix A: minor edits to clarify text While having a fast glance at Appendix A, I noticed two small parts that could be improved by adding a 'from' and using 'Maneage' instead of "the proposed solution". They are corrected with this commit. 24 June 2021, 16:17:52 UTC
70d22be Paper title: towards --> toward to conform with CiSE version I just(!) noticed that in the CiSE version of the paper, they replaced the "Towards" (first word in the title) with "Toward" (removing the 's'). According to thorough history provided by the Merriam-Webster dictionary[1], the difference is mainly because of US/British English. Also, they have slightly changed the capitalizations of the "long-term" phrase, from "Long-term" that we had initially used to "Long-Term". I have no particular opinion on this and accept their judgement. To keep things in line with the published paper, I am correcting both these issues in our version of the paper also (that will later go in arXiv). https://www.merriam-webster.com/words-at-play/toward-towards-usage 22 June 2021, 00:30:14 UTC
c07ce07 Copyedit (main body): fixed sentence on importance of history This commit changes the rather confused sentence ending "is, thus, not any the less valuable as itself" to "often as valuable as the result itself". This clarifies the intended meaning. The error was unfortunately missed by the proofreaders of our article. 19 June 2021, 21:07:21 UTC
1cfb83e Main body: corrected mistakenly written "bottom" --> "right" In the old versions of this paper, the two components of Figure 1 were under each other, so we referred to them as "top" and "bottom"! However, we later put them beside each other (by shrinking the data graph), so they became "left" and "right". I just noticed that within the main body of the text, in one place, we were still mistakenly saying "bottom"! So with this commit, it has been changed to "right". Unfortunately this has gone into the final publication on CiSE, but it is important to fix such minor issues anyway (the good thing with having a Git history!), we also haven't yet put the final upload on arXiv. 15 June 2021, 14:40:06 UTC
ccce0fb Futher copyediting on the apendices In the discussion on criteria that Popper lacks, the last mentioned criteria "including the narrative" is written in such a way that can confuse readers into thinking that only a single criteria is lacking. Hyphenating ('including-the-narrative') has been applied to make the sentence less likely to be misunderstood. The ending of the first paragraph in the "Generational gaps" item in Appendix A.G ("... every few years is not practically possible.") sounds like "not almost possible". So it can cause confusions. Endings that are much clearer include: * is impractical. * is not possible in practice. * is not practical. * is not possible practically. [meaning 2. is less likely in this case] I've selected the first option, also replacing "they" by "scientists" to avoid the misinterpretation that "programming languages ... have their own science field to focus on". This commit and the previous one were "amended" by Mohammad (compared to the original commits that Boud had sent). 14 June 2021, 23:56:20 UTC
d3ee579 Copyediting of appendices This commit does several small copyediting fixes in the body of the appandices which should improve their readability. 14 June 2021, 22:39:03 UTC
ba7792f Replace archive.today with archive.org in footnote URL Based on a reasonable suggestion on ethical reasoning [1], this commit replaces the git.sdf.org + archive.today pair of URLs in the footnote on Github's unethical aspects, with a single archive.org URL, which contains the original URL, making this sufficient for readers wishing to check either the live or archived versions. [1] https://social.privacytools.io/@resist1984/106403926114506533 https://social.privacytools.io/@resist1984/106403932399114639 13 June 2021, 17:05:53 UTC
a77bba6 Add GHTorrent, some https, notabug This commit adds a few sentences in relation to the first known attempt to store and make available git repository hosting ephemera (GHTorrent, introduced to us by Roberto Di Cosmo). Since one of the two sponsors of GHTorrent is Microsoft, both the ethics and practical aspects of this in the context of reproducibility and scientific ethics as expressed by the international scientific community are rather unclear, so a link to one of the well-known lists of practical and ethical issues with Github is included. A minor fix is made in 'tex/src/appendix-existing-solutions.tex', since the word 'data' is plural (singular is 'datum'). 13 June 2021, 14:09:19 UTC
313db0b Published version in CiSE This is the version of the project that will be published in Computing in Science and Engineering (CiSE), Volume 23, Issue 3, Pages 82--91. 11 June 2021, 20:48:10 UTC
a0a5176 Minor edits and updated first-page Software Heritage ID After going through Boud's corrections and edits in the previous commit, I thought some minor clarifications would be necessary, and they are implemented in this commit. Also, in preparation for submission to the journal, the top-level software heritage ID has been corrected to the latest commit on Software Heritage. 08 June 2021, 18:11:11 UTC
6f7f00f Several minor edits, removed exact value of arXiv's size-limit This commit makes several copyediting changes to the appendices and to the supplement.tex introduction to the appendices. The ArXiv unofficially increased upload limit of 50 Mb comes from a tweet: https://nitter.fdn.fr/arxiv/status/1286381643893268483 (archive: https://archive.today/PdxhT) but not listed on official ArXiv pages. So it seems safer not to quote a value. The very old value was 0.5 Mb - out of respect to people with low bandwidth, especially scientists in poor countries. Tweets are generally not acceptable as "reliable sources" in en.Wikipedia. 08 June 2021, 17:33:57 UTC
c1bc4eb Minor edits suggested by David and updating of Zenodo DOI David made suggested some minor edits that are now implemented (most importantly that he would not like to be associated with an ORCID ID). I also "saved" a new Zenodo DOI for the final submission of this paper to Zenodo, but "after" obtaining the page number information and other minor things. 08 June 2021, 17:22:26 UTC
54d994d Improved appendix on archival Until now the appendix only touched upon the archival aspects of scholarly research producs (data, code, narrative). To help in clarity, the context of this section has been improved, giving more explanations and examples. 08 June 2021, 01:21:35 UTC
f88b104 Clarifications added to ReproZip in the appendix After Boud posted a notice about Maneage in an online forum [1], Rémi Rampin and Vicky Rampin (from the ReproZip project) replied with some notes about our review of ReproZip in Appendix B. We are very grateful to both Rémi and Vicky for looking into it and for their comments, their contribution has been gratefully acknowledged with this commit. The relevant comments are listed below and have been addressed in this commit (see the 'diff' of this commit). - [Rémi Rampin] ReproZip can capture the build step if you want it to, it's just another command. So if you want to trace "make" and "pip install" etc before tracing your actual experiment, you will have all that build information. - [Rémi Rampin] Bundle size is easily fixed by not putting terabyte-sized data in the bundle, which is done by editing a simple configuration file. - [Vicky Rampin] Not all the files in the bundle are compiled/binary files [in relation to the old sentence "ReproZip just copies the binary/compiled files used in a project"]. [1] https://framapiaf.org/@boud/106296894758145705 07 June 2021, 22:40:20 UTC
b97c1ff Update publications: Peper+Roukema published This commit updates some of the publication data in README-hacking.md : Peper+Roukema (2021) is now published in MNRAS and Akhlaghi+ (2021) is published online and very close to getting a conventional volume and page number. :) See task https://savannah.nongnu.org/task/?15736 for ideas of how to make a more systematic publication list instead one managed by prose text. There are already too many non-automated places for publication lists where we have to copy/paste our publication data again and again and again and ... This commit also adds the softwareheritage ID that we have in the content of Akhlaghi+2021 (without the extra context, because as a URL that's very long). There are plenty of arguments to be made each way for different versions of the swh IDS. One advantage of the 'rev' ID is that the hash is the original (full) git hash, which is what I've done for the elaphrocentre and subpoisson papers. 03 June 2021, 02:12:43 UTC
ef9fe47 Configuration: improved warning when TeX Live couldn't be installed Once a year, the texlive update system becomes incompatible with the version from the previous year. Since a texlive install failure is considered non-fatal by 'high-level.mk', so until now, the user could miss the printed message and mistakenly believe that the configure is valid. This commit explicitly adds a 10-second delay that should be enough for a user who does the 'configure --existing-conf' step alone to notice that there is a TeX Live problem. It also adds the explicit instruction of how to allow an update from an earlier year's texlive installer to the warning message (by deleting '.build/software/tarballs/install-tl-unx.tar.gz'). I had to rediscover this a few times for old Maneage installs. Also, a few lines in 'reproduce/software/shell/configure.sh' were indented with a TAB (that is not recommended because TAB is displayed with different widths on different browsers). So while doing this commit, those TABs were also converted to a space. 03 June 2021, 01:02:49 UTC
e429b04 ReproZip, Popper: minor fixes This commit contains minor fixes in Appendix B. ReproZip: As Vicky Rampin points out [1], ReproZip typically also includes non-binary files, so I removed "just" and improved the wording. Popper: the Popper URL that we gave is obsolete; at Wayback Machine it redirects to getpopper.io [2], so I've updated this; and I've fixed up the wording ('off of' only exists in US English). [1] https://octodon.social/@VickyRampin/106298214313216228 [2] https://web.archive.org/web/20210425223605/http://falsifiable.us/ 26 May 2021, 00:08:09 UTC
ff4b8b8 Fix typo: s/empemera/ephemera/ 25 May 2021, 16:46:52 UTC
6d83f32 Brief notes on archiving as Appendix A.D This commit adds a few extremely brief and incomplete paragraphs on archiving, including URLs, as what is now subsection D of Appendix A. 25 May 2021, 16:35:30 UTC
6ab5696 Implemented changes of first proof by CiSE A few days, CiSE gave us a proof of the edited text and formatted PDF. After comparing the edited text with our text, I noticed some minor editorial issues that have been corrected in this commit. The parts that were wrong (or could be improved in the proof) have been listed and will be submitted to the journal. In particular, following the recommendation from the editor, the biographies were extended with a full listing of each author's affiliation, I also added our ORCID IDs in the biographies. 12 May 2021, 01:38:15 UTC
6fad78e Minor edit to footnote introducing resolvers for SWHID Until now, the paragraph impilied implicitly that the 'n2t.net' link is the only way to access SWHIDs. Also, context/content duality wasn't too clear in the end where I had mentioned to click on the digital format SWHID. With this commit, I tried to edit it and avoid these two sources of confusion. 29 April 2021, 00:52:04 UTC
e69c640 Software Heritage resolver info in first footnote The most basic way to resolve a Software Heritage identifier (SWHID) is to prefix it with 'https://archive.softwareheritage.org'. However, Roberto Di Cosmo informed me that SWHIDs are also resolved by 'n2t.net' and 'identifiers.org'. With this commit, on the first occurance of an SWHID, I added some explanation of how to resolve it by adding 'http://n2t.org' (since it was the shorter option). Some further minor edits were made: - In the manuscript submission information, instead of "published on IEEE", I wrote "first published online". The journal name is available on the top of every page and doesn't include "IEEE", so this hopefully avoids some confusion for people who don't know CiSE is published by IEEE. - The URL with the link to Ubuntu images was moved to footnotes to help the readablity and better type-setting of the paragraph. A minor edit was then made in that paragraph to shrink the paragraph by two words that had occupied a whole line in its end. - The first comment line in the second listing (Git commands to start a new branch from Maneage) was slightly edited to avoid the term 'main' (which could be confused with the branch name after 'git checkout -b main'). - In the acknowledgements, the paragraph on Maneage commit/branch information was moved at the top so the people and institutions are acknowledged immediately after each other. - Some minor edits were made in the Spanish acknowledgements to fit with new project names. 29 April 2021, 00:30:04 UTC
d67debd Software Heritage IDs (SWHIDs) now printed in PDF Until now, the SWHIDs were not accessible in the print version of the paper, they were only hidden as hyper-links within the PDF for readers to click on. This is not a robust way to use the fruits of Software Heritage and was kindly highlighted by Roberto Di Cosmo (principle investigator of Software Heritage) after a first look at the paper. With this commit, following the recommendation of Roberto, all the URLs are corrected to print the raw SWHID as a footnote (for example 'swh:1:dir:...', for directories, or 'swh:1:cnt:...', for contents/files). The click-able link of the SWHID also contains the context (for example "origin" and etc). In the process I noticed that the paper submission/acceptance info was not filled and was also a footnote (which would not be seen if not cited). So this information (received, accepted and published on IEEE) is now taken just under the author list on the first page heading. 28 April 2021, 00:36:22 UTC
5a6e6e6 README.md: edited steps to only build software env in Docker image Until now, while the series of steps mentioned in 'README.md' were complete, they had some implicit thing in them that made it a little hard to run as a checklist (the commands to do some basic things weren't included). Also, it was recommending to run a long 'docker run ...' command, which wasn't too user friendly. With this commit, the series of steps is now a complete checklist, containing every step. Also, the checklist now recommends putting the long 'docker run' command inside a script called 'docker-run' that will also do a 'sudo' internally (thus making things very easy for a first-time user). Also, since the 'docker-run' script contains host OS-specific directory names, it should not be under control, so it has been added to the '.gitignore' file in case users decide to keep this same name (which is recommended). 25 April 2021, 22:59:41 UTC
b858c60 DOI added to README and paper's header The DOI of the paper has been minted by IEEE, so as a step to finalize this paper, it has been added to the REAMEME.md and the header of all PDF pages. Along with the DOI in the header, the arXiv and Zenodo links are also added to the header (they are small, and won't bother the reading). 25 April 2021, 18:15:54 UTC
d11725a Imported recent work in Maneage, minor conflicts fixed Some minor conflicts (all expected from the commit messages in the Maneage branch) occurred but were easily fixed. 17 April 2021, 03:55:58 UTC
6e4ec9a IMPORTANT: print-general-metadata new name for print-copyright Summary: - Use the new name of this variable in your Makefiles. - In 'metadata.conf', remove fixed URL prefixes for DOIs ('https://doi.org/') or arXiv ('https://arxiv.org/abs'). Until now, the Make variable that would print the general metadata (of whole project) into each to-be-published dataset was called 'print-copyright'! But it now does much more than simply printing the copyright, it will also print a lot of metadata like arXiv ID, Zenodo DOI and etc into plain-text outputs. The out-dated name could thus be misleading and cause confusions. With this commit, the variable is therefore called 'print-general-metadata'. After merging your project with the Maneage branch, please replace any usage of 'print-copyright' to 'print-general-metadata'. Also with this commit, 'README-hacking.md' mentions 'metadata.conf' and 'print-general-metadata' in the "Publication checklist" section and reminds you to keep the first up to date, and use the second in your to-be-published datasets. 17 April 2021, 03:31:31 UTC
30bf462 Finally published journal DOI added In the project's 'metadata.conf', we also have an option to store the journal DOI of the project (that will later be printed in the output file products). So now that the paper's DOI has been set by the journal, it was time to add it in the project too. While looking at the usage of the metadata, I noticed that the "Publication checklist" of 'README-hacking.md' didn't talk about it. In fact, the part about putting metadata went into a lot of detail without even mentioning the generic 'print-general-metadata' variable (previously called 'print-copyright') that is created in 'initialize.mk'. So I removed those extra points and just recommended using this variable for plain-text files and putting similar info in other formats. Some other minor changes were made: - The metadata now doesn't need the fixed 'https://doi.org/' prefix (to make it consistent with the arXiv identifier). Inside 'initialize.mk', there are now two variables called 'doi-prefix-url' and 'arxiv-prefix-url' that contain the fixed prefix. - The 'print-copyright' name was clearly outdated for all the extra metadata that this variable created (including the copyright). So its name was changed to 'print-general-metadata'. The generic Maneage changes will be taken into Maneage after this (they were tested here). 17 April 2021, 02:35:49 UTC
566190a Added final review result In the previous commit, I had forgot to put a '-f' before the 'git add'! Becauase '.txt' files are set to be ignored in Git by default (they are marked in '.gitignore'). With this commit this file is now added into the project history. 17 April 2021, 01:37:18 UTC
925091e Implemented EiC (Lorena Barba) comments, and added final review The email notice of the final acceptance of this paper in CiSE has been included in the project and the stylistic points that were raised by the editor in chief (EiC) have also been implemented. The most important points were: - Including citations within the text structure (as if they would be footnotes), so things like "see \cite{...}" should have been changed. - Hyperlinks should be printed as footnotes (because the journal gets actually printed). Also, to avoid the second listing breaking between pages, it has been moved to after the next paragraph. 09 April 2021, 16:31:39 UTC
4ccf014 Minor corrections on previous copyedit Being immutable doesn't necessary mean that something is always present, so an "always present" was also added for the reason we recommend a Git hash. The end of the sentence was also slightly summarized to allow the extra few words. The re-wording of the conclusion of Active papers, was great! I just changed the "likely" to "possible", because as Konrad mentioned in Commit a63900bc5a8, he is now using Guix. 09 April 2021, 13:09:30 UTC
832aca8 Minor copyedits These are minor last minute copyedits for recently added text, e.g. a git hash is not literally a timestamp. 09 April 2021, 12:35:04 UTC
1140619 Corrected Roberto's affiliation and email Roberto has recently moved to a new position as professor in the Universidad Internacional de La Rioja. With this commit, his short bio and email address have thus been updated in the main paper to reflect this. 09 April 2021, 10:58:29 UTC
a3efaff Changed all gitlab.com URLs to git.maneage.org Until now, we were primarily linking people to the Gitlab fork of this paper. However, since this paper is part of Maneage, its main repository is on Maneage's own server at http://git.maneage.org/paper-concept.git With this commit therefore, all the gitlab.com URLs have been corrected to owr own Git server. While looking into Git-related points, I also noticed that in the demo code listing showing how to clone Maneage and start a new project, we were using Git's old/depreciated 'master' name. Git (and almost all common repositories) now use 'main' as the default branch name, so this has also been corrected here. 09 April 2021, 10:44:14 UTC
f6904b0 Acknowledged Peter Wittenburg I attended one of Peter Wittenburg's talks in the context of RDA on the Canonical Workflow Frameworks for Research (CWFR). Afterwards I got in touch with him about Maneage and this paper. He kindly read the paper was very supportive of it with positive/encouraging feedback. It was thanks to that discussion that I added CWFR in the discussion (in the previous commit). But since that commit was focused on IAA's suggestions, I am acknowledging Peter here. 09 April 2021, 03:34:26 UTC
e8de7ed Comments by IAA's AMIGA team implemented The AMIGA team at the Instituto Astrofísica Andalucía (IAA) are very active proponents of reproducibility. They had already provided very constructive comments after my visit there and many subsequent interactions. So until now, the whole team's contributions were acknowledged. Since the last submission, several of the team members were able to kindly invest the time in reading the paper and providing very useful comments which are now being implemented. As a result, I was able to specifically thank them in the paper's acknowledgments (Thanks a lot AMIGA!). Below, I am listing the points in the order that is shown in 'git log -p -1' for this commit. - Javier Moldón: "PM is not defined. First appearance in the first page". Thanks for noticing this Javier, it has been corrected. - Javier Moldón: "In Section III. PROPOSED CRITERIA FOR LONGEVITY and Appendix B, you mention the FAIR principles as desirable properties of research projects and solutions, respectively which is good, but may bring confusion. Although they are general enough, FAIR principles are specifically for scientific data, not scientific software. Currently, there is an initiative promoted by the Research Data Alliance (RDA), among others, to create FAIR principles adapted to research software, and it is called FAIR4RS (FAIR for Research Software). More information here: https://www.rd-alliance.org/groups/fair-4-research-software-fair4rs-wg. In 2020 there was a kick-off meeting to divide the work in 4 WG. There is some more information in this talk: https://sorse.github.io/programme/workshops/event-016/. I have been following the work of WG1, and they are about the finish the first document describing how to adapt the FAIR principles to software. Even if all this is still work in progress, I think the paper would benefit from mentioning the existence of this effort and noticing the diferences between Data and Software FAIR definitions." Thanks for highlighting this Javier, a footnote has been added for this (hopefully faithfully summarizing it into one sentence due to space limitations). - Sebastian Luna Valero: "Would it be a good idea to define long-term as a period of time; for example, 5 years is a lot in the field of computer science (i.e. in terms of hardware and software aging), but maybe that is not the case in other domains (e.g. Astronomy)." Thanks Sebastian, in section 2, we do give longevity of the various "tools" in rough units of years (this was also a suggestion by a referee). But of course the discussion there is very generic, so going into finer detail would probably be too subjective and bore the reader. - Sebastian Luna Valero: "Why do you use git commit eeff5de instead of git tags or releases for Maneage? Shown for example in the abstract of the paper: "This paper is itself written with Maneage (project commit eeff5de)." Thanks for raising this important point, a sentence has been added to explain why hashes are objective and immutable for a given history, while tags can easily be removed or changed, or not cloned/pushed at all. - Susana Sanchez Exposito: "We think interoperability with other research projects would be important, do you have any plans to make maneage interoperable with, for example, the Common Workflow Language (CWL)?". Thanks a lot for raising this point Susana. Indeed, in the future I really do hope we can invest enough resources on this. In the discussion, I had already touched upon research objects as one method for interoperability, there was also a discussion on such generic standards in Appendix A.D.10. But to further clarify this point (given its importance), I mentioned CWL (and also the even more generic CWFR) in the discussion. - Sebastian Luna Valero: "Regarding Apache Taverna, please see:" https://github.com/apache/incubator-taverna-engine/blob/master/README.md Thanks a lot for this note Sebastian! I didn't know this! I wrote this section (and visited their webpage) before their "vote"! It was a surprize to see that their page had changed. I have modified the explanation of Taverna to mention that it has been "retired" and use the Github link instead. - Sebastian Luna Valero: "Page 21: 'logevity' should be 'longevity'." Thanks a lot for noticing this! It has been corrected :-). - Javier Moldón: "There is a nice diagram in Johannes Köster's article on data processing with snakemake that I find very interesting to show some key aspects of data workflows: see Fig 1 in https://www.authorea.com/users/165354/articles/441233-sustainable-data-analysis-with-snakemake " This is indeed a nice diagram! I tried to cite it, but as of today, this link is not a complete paper (with no abstract and many empty section titles). If it was complete, I would certainly have cited it in Snakemake's discussion. - Javier Moldón: "Regarding the problem mentioned in the introduction about PM not precisely identified all software versions, I would like to mention that with Snakemake, even if the analysis are usually constructed using other package managers such as conda, or containers, you don't need to depend on online servers or poorly-documented software versions, as you can now encapsulate an analysis in a tarball containing all the software needed. You still have long-term dependency problems (as you will need to install snakemake itself, and a particular OS), but at least you can keep the exact software versions for a particular platform." Thanks for highlighting this Javier. This is indeed better than nothing, we have already discussed the dangers of this "black box" approach of archiving binaries in many contexts, and many package managers have it. So while I really appreciate the point (I didn't know this), to avoid lengthening the paper, I think its fine to not mention it in the paper. 09 April 2021, 02:58:46 UTC
a63900b Comments by Konrad Hinsen implemented Konrad had kindly gone through the paper and the appendices with very good feedback that is now being addressed in the paper (thanks a lot Konrad!): - IPOL recently also allows Python code. So the respective parts of the description of IPOL have been updated. To address the dependency issue, I also added a sentence that only certain dependencies (with certain versions) are acceptable. - On Active Papers (AP: which is written by Konrad) corrections were made based on the following parts of his comments: - "The fundamental issue with ActivePapers is its platform dependence on either Java or Python, neither of which is attractive." - "The one point which is overemphasized, in my opinion, is the necessity to download large data files if some analysis script refers to it. That is true in the current implementation (which I consider a research prototype), but not a fundamental feature of the approach. Implementing an on-demand download strategy is not particularly complicated, it just needs to be done, and it wasn't a priority for my own use cases." - "A historical anecdote: you mention that HDF View requires registering for download. This is true today, but wasn't when I started ActivePapers. Otherwise I'd never have built on HDF5. What happened is that the HDF Group, formerly part of NCSA and thus a public research infrastructure, was turned into a semi-commercial entity. They have committed to keeping the core HDF5 library Open Source, but not any of the tooling around it. Many users have moved away from HDF5 as a consequence. The larger lesson is that Richard Stallman was right: if software isn't GPLed, then you never know what will happen to it in the future." - On Guix, some further clarification was added to address Konrad's quote below (with a link to the blog-post mentioned there). In short, I clarified that I mean storing the Guix commit hash with any respective high-level analysis change is the extra step. - "I also looked at the discussion of Nix and Guix, which is what I am mainly using today. It is mostly correct as well, the one exception being the claim that 'it is up to the user to ensure that their created environment is recorded properly for reproducibility in the future'. The environment is *recorded* in all detail, automatically. What requires some effort is extracting a human-readable description of that environment. For Guix, I have described how to do this in a blog post (https://guix.gnu.org/en/blog/2020/reproducible-computations-with-guix/), and in less detail in a recent CiSE paper (https://hal.archives-ouvertes.fr/hal-02877319). There should definitely be a better user interface for this, but it's no more than a user interface issue. What is pretty nice in Guix by now is the user interface for re-creating an environment, using the "guix time-machine" subcommand." - The sentence on Software Heritage being based on Git was reworded to fit this comment of Konrad: "The plural sounds quite optimistic. As far as I know, SWH is the only archive of its kind, and in view of the enormous resources and long-time commitments it requires, I don't expect to see a second one." - When introducing hashes, Konrad suggested the following useful paper that shows how they are used in content-based storage: DOI:10.1109/MCSE.2019.2949441 - On Snakemake, Konrad had the following comment: "[A system call in Python is] No slower than from bash, or even from any C code. Meaning no slower than Make. It's the creation of a new process that takes most of the time." So the point was just shifted to the many quotations necessary for calling external programs and how it is best suited for a Python-based project. In addition some minor typos that I found during the process are also fixed. 09 April 2021, 01:00:18 UTC
20b6273 Configuration: corrected check of group name When built in 'group' mode, the write permissions of all created files will be activated for a certain group of users in the host operating system. The user specifies the name of the group with the '--group' option at configure time. At the very start, the './project' script checks to see if the given group name actually exists or not (to avoid hard-to-debug errors popping up later). Until now, the checking 'sg' command (that was used to build the project with group-writable permissions) would always fail due to the excessive number of redirections. Therefore, it would always print the error message and abort. With this commit, the output of 'sg' is no longer re-directed (which also helps users in debuggin). If the group does actually exist, it will just print a small statement saying so, and if it fails, the error message is printed. This fixed the problem, allowing maneage to be built in group-mode. I also noticed that the variable name keeping the group name ('reproducible_paper_group_name') used the old name for the project (which was "Reproducible paper template"! So it has been changed/corrected to 'maneage_group_name'. 28 March 2021, 11:55:03 UTC
611c2f1 Initialization: removed other Gnuastro-specific features In the previous commit, some Gnuastro-specific initializations were removed but a few more cases remained that are removed with this commit. 26 March 2021, 20:39:48 UTC
ac8890d ./project: unused --minmapsize option is removed Until now, the './project' script included an '--minmapsize' option which is an option to one of the original programs that was used in Maneage (Gnuastro). Such an option doesn't exist in many other programs, so it is not a suitable option for the generic Maneage project (and can just cause confusion). It was also not used in any part of Maneage any more! With this commit, this option is removed from the core Maneage './project' script and if any project uses it, they can implement it in their own branch. 26 March 2021, 18:19:22 UTC
66453b4 Maneage installation: removed TCL as a dependency of SWIG Until now the SWIG software would use the host operating system's packages to find the TCL configuraiton (which we don't install yet in Maneage). In particular, you can see the error during its configuration here: .... checking for pkg-config... pkg-config checking for Tcl configuration... found /usr/lib/tclConfig.sh /usr/lib/tclConfig.sh: line 2: dpkg-architecture: command not found /usr/lib//tcl8.6/tclConfig.sh: line 2: dpkg-architecture: com. not found With this commit, TCL has been disabled when building SWIG with the '--without-tcl' option. Later, when we add TCL in Maneage, we can remove this option. 24 March 2021, 21:01:35 UTC
a981196 Configuration: nullability-completeness warnings suppressed With a recent update of macOS systems (macOS Big Sur 11.2.3 and Xcode 12.4), there are many warnings when building C programs (for example the simple program we compile to check the compiler, or some of the software like `gzip'). It prints hundreds of warning lines for every source file that are irrelevant for our builds, but really clutters the output. With this commit, these warnings are disabled by adding `-Wno-nullability-completeness' to the 'CPPFLAGS' environment variable. This has also been added to the very first check of the C compiler in the configure step. 20 March 2021, 01:08:21 UTC
c3e82b1 Configuration: --debug option available in this phase also Until now, each time there was a problem in the configuration of Maneage'd projects and debugging was necessary, we had to take the following changes: - Run the configuration on a single thread ('-j1') to see the building of only the problematic software. - Disable the Zenodo check manually by commenting those parts of 'reproduce/software/shell/configure.sh'. Because the internet connection wastes a few seconds and is thus very annoying during repeated runs! - Manually remove the '-k' option that was passed to Make (when building the software). With the '-k', Make keeps going with the execution of other targets if something crashes and this usually causes confusions during the debugging. Doing the manual changes within the code was both very annoying and prone to errors (forgetting to correct it!). With this commit, the existing '--debug' option has been generalized to the software configuration phase of Maneage also. Until now, it was only available in the analysis phase (and would directly be passed to the 'make' command that would run the analysis). When this option is used, and the project is in the software configuration phase, the Zenodo check won't be done, it will use one single thread ('-j1'), and it will stop the execution as soon as an error occurs (Make is not run with '-k'). 20 March 2021, 00:53:37 UTC
1524213 Installation: minor correction in links to system libraries Until now when making a link to the system's 'dl' and 'pthread' libraries we were simply linking the installed location on the system (in '/usr/lib'). However, in some systems, these may themselves be links to other locations and this could cause linking problems. With this commit, we now use 'realpath' to extract the absolute address of the final file that the libraries may link to, and directly link to them. A minor cosmetic correction was also made in the build rule for CFITSIO: the long line was broken into two! 12 February 2021, 23:49:16 UTC
ecbaadc Default LaTeX preamble: some packages moved to preamble-project.tex Until now, important LaTeX packages like 'caption' (for managing figure captions), 'hyperref' (for managing links) and 'xcolor' (for managing colors) were being loaded inside the optional 'tex/src/preamble-maneagge-defualt-style.tex' file. We recommend to remove this file from loading when you use custom journal sytels. However, these packages will often be necessary after loading special journal styles also. With this commit, these packages are now loaded into LaTeX as part of the 'tex/src/preamble-project.tex' file. This file is in charge of LaTeX settings that are custom to the project and independent of its style. Several other small corrections are made with this commit: - I noticed that './project make texclean' crashes if no PDF exists in the working directory! So a '-f' was added to the 'rm' command of the 'texclean' rule. - As part of the LaTeX Hyperref, we can set general metadata or properties for the PDF (that aren't written into the printable PDF, but into the file metadata). They can be viewed in many PDF viewers as PDF properties. Until now, we were only using the '\projecttitle' macro here to write the paper's title. However, thanks to the recently added 'reproduce/analysis/config/metadata.conf', we now have a lot of useful information that can also go here. So the 'metadata-copyright-owner' is now used to define the PDF author, and the project's 'metadata-git-repository' and commit hash are written into the PDF subject. But to import these, it was necessary to define them as LaTeX macros, hence the addition of these macros in 'initialize.mk'. - Some extra packages that aren't necessary to build the default PDF were removed in 'preamble-project.tex'. 12 January 2021, 16:17:18 UTC
cbe3030 make dist: removing temp files moved after project-specific files Until now, when you ran './project make dist', first it would delete the temporary files (like files ending in '~' or '.swp' created by some editors), then it had a place to add project-specific operations for the distribution. However, in the process of cleaning the temporary files, it would 'cd' into the directory that would later be packaged. So project-specific operations would first have to 'cd' back into the top source directory. This was prone to hard-to-find bugs. With this commit, to avoid the problem the project-specific operations are now placed before the cleaning phase. This is also technically good because in the project-specific operations there may also be temporary files that shouldn't go into the distribution tarball. 10 January 2021, 03:32:54 UTC
55d6570 Imported recent changes in Maneage, minor single conflict fixed There was a single conflict in the comments of one part of 'configure.sh' that has been fixed. There was also a single place that needed to convert 'BDIR' to 'badir' in this project (so after the merge, it also built easily). 09 January 2021, 23:44:32 UTC
d9a6855 IMPORTANT: analysis outputs written in BDIR/analysis Until now, the build directory contained a 'software/' directory (that hosted all the built software), a 'tex/' subdirectory for the final building of the paper, and many other directories containing intermediate/final data of the specific project. But this mixing of built software and data is against our modularity and minimal complexity principles: built software and built data are separate things and keeping them separate will enable many optimizations. With this commit, the build directory of the core Maneage branch will only contain two sub-directories: 'software/' and 'analysis/'. The 'software/' directory has the same contents as before and is not touched in this commit. However, the 'analysis/' directory is new and everything created in the './project make' phase of the project will be created inside of this directory. To facilitate easy access to these top-level built directories, two new variables are defined at the top of 'initialize.mk': 'badir', which is short for "built-analysis directory" and 'bsdir', which is short for "built-software directory". HOW TO IMPLEMENT THIS CHANGE IN YOUR PROJECT. It is easy: simply replace all occurances of '$(BDIR)' in your project's subMakefiles (except the ones below) to '$(badir)'. To confirm if everything is fine before building your project from scratch after merging, you can run the following command to see where 'BDIR' is used and confirm the only remaning cases. $ grep -r BDIR reproduce/analysis/* --> make/verify.mk: innobdir=$$(echo $$infile | sed -e's|$(BDIR)/||g'); \ --> make/initialize.mk:badir=$(BDIR)/analysis --> make/initialize.mk:bsdir=$(BDIR)/software --> make/initialize.mk: $$sys_rm -rf $(BDIR) --> make/top-prepare.mk:all: $(BDIR)/software/preparation-done.mk 'BDIR' should only be present in lines of the files above. If you see '$(BDIR)' used anywhere else, simply change it to '$(badir)'. Ofcourse, if your project assumes BDIR in other contexts, feel free to keep it, it will not conflict. If anything un-expected happens, please post a comment on the link below (you need to be registered on Savannah to post a comment): https://savannah.nongnu.org/task/?15855 One consequence of this change is that the 'analysis/' subdirectory can be optionally mounted on a separate partition. The need for this actually came up for some new users of Maneage in a Docker image. Docker can fix portability problems on systems that we haven't yet supported (even Windows!), or had a chance to fix low-level issues on. However, Docker doesn't have a GUI interface. So to see the built PDF or intermediate data, it was necessary to copy the built data to the host system after every change, which is annoying during working on a project. It would also need two copies of the source: one in the host, one in the container. All these frustrations can be fixed with this new feature. To describe this scenario, README.md now has a new section titled "Only software environment in the Docker image". It explains step-by-step how you can make a Docker image to only host the built software environment. While your project's source, software tarballs and 'BDIR/analysis' directories are on your host operating system. It has been tested before this commit and works very nicely. 09 January 2021, 03:00:15 UTC
e3f4be6 Removed all \new highlights after submission of review With the submission of the revision (which highlighted all the relevant parts to the points the referees raised in the submitted PDF) it is no longer necessary to highlight these parts. If we get another revision request, we can add new '\new' parts for highlighting. 07 January 2021, 17:36:18 UTC
e52cbf5 Minor copyedits in appendices, e.g. parentheses This commit makes some minor fixes following the hardwired non-numerical solution to the cross-referencing issue between the main article and the supplement, such as fixing "lineage like lineage" and missing closing parentheses. From Mohammad: while re-basing the commit over the 'master' branch, I also added Boud'd name at the top of the copyright holders of the appendices. 07 January 2021, 16:57:59 UTC
b91af98 Configuration: GNU Binutils linking bug on some systems fixed Until now, when building GNU Binutils on GNU Linux operating systems, we would simply put a link to the host's core C library components (the '*crt*' files). However, the symbolic link wasn't "forced"! So if it already existed in the build directory, it would crash. With this commit a '-f' option has been added to the 'ln' command and this fixed the problem. This bug was reported by Zahra Sharbaf. 05 January 2021, 18:01:19 UTC
eeff5de appendix.bbl is now included in make dist tarball Since the addition of the appendix bibliography we hadn't checked the 'make dist' command, as a result the PDF couldn't be built. With this commit, in the 'dist' rule, we are now also copying 'appendix.bbl' and the created tarball could build the PDF properly. Also the 'peer-review' directory is now also included in the tarball created by './project make dist'. I also found a small typo in the description of Occam (an 'a' was missing) and fixed it. 05 January 2021, 03:22:34 UTC
e4a5566 Polished main paper and appendices after a full re-read In preparation for the submission of the revised manuscript, I went through the full paper and appendices one last time. The second appendix (reviewing existing reproducible solutions) in particular needed some attention because some of the tools weren't properly compared with the criteria. In the paper, I was also able to remove about 30 words, and bring our own count (which is an over-estimation already) to below 6250. 05 January 2021, 01:19:07 UTC
962128c Edits in the answers to the referee report Given the new appendix/supplement structure, it was necessary to go through the answers and correct them. I also generally edited them and added a top-level letter to the editors (to directly copy-paste into the webpage). 04 January 2021, 05:32:16 UTC
402070b Imported recent updates in Maneage, no conflicts There weren't any conflicts in this merge; either technical conflicts that can be found by Git, or logical conflicts (that will cause a crash in the project). 04 January 2021, 03:47:07 UTC
a1a966a Building of Less program now uses patchelf to ensure good linking After correctly setting Less to depend on 'ncurses', I noticed its still not linking to Maneage's 'ncurses', but pointing to my host system's 'ncurses' (that happens to have the same version! So it would crash on a system with a different version). This shows that like some other software, we need to manually correct the RPATH inside Less. With this command, the necessary call to 'patchelf' has been added and with it, the installed 'less' command properly linked to Maneage's internal build of 'ncurses'. 04 January 2021, 03:32:38 UTC
dc4aa8c README-hacking.md: edits and improvements to publication checklist After going through the publication checklist, some edits were made to make things more clear. Also, an item was added to remind the project author that the commit hashes on the uploaded data files should be the same. 04 January 2021, 03:21:03 UTC
02e53b9 README.md: summary Dockerfile with all necessary lines in one step Until now, the description in 'README.md' to build the Dockerfile in 'README.md' had one item per line, thoroughly describing the reason behind that line. But in many cases, the user is already familiar with Docker (or has already read through the items) and just wants to have the Dockerfile ready fast. In these cases, all those extra explanations are annoying. With this commit, an item '0' has been added at the start of the item list for summary. It only contains the necessary Dockerfile contents with no extra explanation. 04 January 2021, 02:58:05 UTC
31f4ea3 Building of less software depends on ncurses Until now, the 'less' software package (used to view large files easily on the command-line and used by Git for things like 'git diff' or 'git log') only depended on 'patchelf' (which is a very low-level software). However, as Boud reported in bug #59811 [1], building less would crash with an error saying "Cannot find terminal libraries" in some systems (including the proposed Docker image of 'README.md' which I confirmed afterwards). Looking into the 'configure' script of 'less', I noticed that 'less' is actually just checking for some functions provided by the ncurses library! With this commit, 'less' depends on 'ncurses'. I was able to confirm that with this change, 'less' successfully builds within the Docker image. [1] https://savannah.nongnu.org/bugs/?59811 04 January 2021, 01:52:25 UTC
624ccc1 Edits on points raised by Raul After his previous two commits, we discussed some of the points and I am making these edits following those. In particular the last statement about Madagascar "could have been more useful..." was changed to simply mention that mixing workflow with analysis is against the modularity principle. We should not judge its usefulness to the community (which is beyond our scope and would need an official survey). A few other minor edits were done here and there to clarify some of the points. 04 January 2021, 00:42:02 UTC
0ac1e3c Very minor corrections to the necessity appendix With this commit, I have corrected some minor typos of this appendix. They are very minor corrections. 04 January 2021, 00:10:37 UTC
20cd841 Minor corrections to the existing solutions appendix With this commit, I have corrected some minor typos of this appendix. In addition to that, I also put empty lines to separate subsections and subsubsections appropiately. 04 January 2021, 00:10:34 UTC
87b510b Spell check on main body and appendices I ran a simple Emacs spell check over the main body and the two appendices. All discovered typos have been fixed. 03 January 2021, 23:45:27 UTC
2ddfb42 Minor corrections to the existing tools appendix With this commit, I have corrected some minor typos of this appendix. In addition to that, I also put empty lines to separate subsections and subsubsections appropiately (5 lines and 1 line, respectively). 03 January 2021, 20:12:45 UTC
3c93f4c Minor corrections to the main body text With this commit, I had a look at the paper and correct some minor typos. When possible, I tried to simplify some phrases to have less number of words. To do that, I added some hypens when I considered it could be necessary/possible. 03 January 2021, 19:38:59 UTC
44e5ba4 Updated copyrights of project-specific copyrights Having entered 2021, it was necessary to update the years of all the copyright statements. 03 January 2021, 15:50:32 UTC
daeb41b Imported recent updates in Maneage, minor conflicts fixed There were only three very small conflicts that have been fixed. 03 January 2021, 15:24:08 UTC
281ad23 No links to main body in the appendices in --supplement mode Until now, in the appendices we were simply using '\ref' to refer to different parts of the published paper. However, when built in '--supplement' mode, the main body of the paper is a separate PDF and having links to a separate PDF is not impossible, but far too complicated. However, having the links adds to the richness of the text and helps point readers to specific parts of the paper. With this commit, there is a LaTeX conditional anywhere in the appendices that we want to refer the reader to sections/figures in the main body. When building a separate PDF, the resepective section/figure is cited in a descriptive mode (like "Seciton discussing longevity of tools"). However, when the appendices go into the same PDF as the main body, the '\ref's remain. 03 January 2021, 14:46:25 UTC
fc076b2 Added Boud as copyright holder of supplement.tex Having added/modified text in the supplements, Boud is now a copyright holder of this file too. I also added 2021 to the copyright years of paper.tex and supplement.tex. 03 January 2021, 14:15:41 UTC
f6c0499 Minor copyediting This commit does some minor copyediting, especially of the introduction to the supplement. There's no point complaining to the reader about the word limit of the journal: s/he is not interested in that. This is not the right place for discussing journal policy. The need for summarising content and focussing on key elements of a cohesive argument is fundamental in a world of information overload. A&A/MNRAS/ApJ/PRD letters are generally much worse than normal articles in terms of reproducibility because they have to omit so many details that the reader has to read the full articles to really know what is done. But the reality is that letters get read a lot, because they're short and snappy. 03 January 2021, 10:23:32 UTC
68ac28e Cleaned abstract and Section II to fit word limit In the abstract the repeated benefits of Maneage (which are also mentioned in the criteria) were removed to fit into CiSE's online submission guidelines. In Section II (Longevity of existing tools), the paragraph that itemized the following paragrahs as a numbered list has been removed with the sentence that repeatedly states the importance of reproducibility in the sciences and some branches of the industry. With these changes our approximate automatic count has 6277 words. This is still very slightly larger than the 6250 word limit of the journal. However, this count is a definite over-estimation (including many things like page titles and page numberings from the raw PDF to text conversion). So the actual count for the journal publication should be less than this. A few other tiny corrections were made: - The year of the paper and copyright in 'README.md' was set to 2021. The copyright of the rest of the files will be set to 2021 after the next merge with Maneage soon (the years of core infrastructure copyrights has already been corrected there). - Mohammadreza's name was added in 'README.md'. - The line to import the "necessity" appendix has been commented in the version to have the full paper in one PDF (to be upladed to arXiv or Zenodo). - The supplement PDF now starts with '\appendices' so the sections have the same labels as the single-PDF version. 03 January 2021, 02:38:45 UTC
e5627db Added abstract for supplement Until now the supplement had no introduction for a random reader to see the purpose of this "Web extra" supplement. With this commit, an abstract has been added. 03 January 2021, 01:34:52 UTC
e4f6154 Supplement (containing appendices) optionally built separately Until now, the build strategy of the paper was to have a single output PDF that either contains (1) the full paper with appendices in the same paper (2) only the main body of the paper with no appencies. But the editor in chief of CiSE recently recommended publishing the appendices as supplements that is a separate PDF (on its webpage). So with this commit, the project can make either (1) a single PDF (containing both the main body and the appendices) that will be published on arXiv and will be the default output (this is the same as before). (2) two PDFs: one that is only the main body of the paper and another that is only the appendices. Since the appendices will be printed as a PDF in any case now, the old '--no-appendix' option has been replaced by '--supplement'. Also, the internal shell/TeX variable 'noappendix' has been renamed to 'separatesupplement'. 02 January 2021, 17:30:15 UTC
b1bd282 ./project make: new texclean target Until now there was only a 'clean' (to delete all files created during the 'make' phase) and the 'distclean' (to delete all files during configuration and make). But sometimes we don't want to delete all the files created during the full 'make' phase, we only want to delete the files that were created by LaTeX for building the paper. Witht this commit, a new target has been added for this job. You can now run the following command for this job: ./project make texclean Only the files in '$(BDIR)/tex/build' will be deleted (and the 'tikz' directory under that location is recreated, ready for a future build). 02 January 2021, 15:59:49 UTC
ff43476 Copyright year updated in all source files Having entered 2021, it was necessary to update the copyright years at the top of the source files. We recommend that you do this for all your project-specific source files also. 02 January 2021, 15:52:31 UTC
e7bfc66 Minor edits in the acknowledgement and biographies Since we have a long list of Copyright statements at the top, I thought its easier to just move the copyright notice to the top of 'paper.tex' also. In the acknowledgments, the paragraph on Maneage was slighltly summarized to save a few words and still be clear. Also, the long name of the Japanese Ministry of Education, Culture, Sports, Science, and Technology, was summarized to Japanese MEXT. In the biographies, the '-at' (replacing '@' in the emails) was changed to '-AT' to be more clear to the eye that its just a place holder. 01 January 2021, 20:18:06 UTC
5ff8e27 Each appendix moved to a separate .tex file As recommended by Lorena Barba (editor in chief of CiSE), we should prepare the appendices as a separate "Supplement" for the journal. But we also want them to be appendices within the paper when built for arXiv. As a first step, with this commit, each appendix has been put in a separate 'tex/src/appendix-*.tex' file and '\input' into the paper. We will then be able to conditionally include them in the PDF or not. Also, as recommended by Lorena, the general "necessity for reproducible research" appendix isn't included (possibly going into the webpage later). 30 December 2020, 19:05:58 UTC
5345c14 Added Mohammadreza's copyright notice on paper.tex After adding Mohammadreza as an author of the paper, we forgot to add him as a copyright holder at the start of the paper. 29 December 2020, 10:59:47 UTC
10a4690 Copyedit on Appendix A This commit makes many small wording fixes, mainly to Appendix A. It also insert "quotes" around some of the titles fields in 'tex/src/references.tex', since otherwise capitalisation is lost (DNA becomes Dna; 'of Reinhart and Rogoff' becomes 'of reinhart and rogoff'; and so on). I didn't do this for all titles, because some Have All Words Capitalised, which blocks the .bib file from choosing a consistent style. 29 December 2020, 10:46:54 UTC
a695e71 Mohammadreza Khellat added as an author Mohammadreza has made significant contributions to the text of the paper and also the source. However his contributions to the text came after the initial submission, so until now, he was not added as an author. The reason we waited for this was that no responses were given by CiSE editors, on the inquiry of the possibility of adding a new author at this phase. With this commit, following approval from the editors, Mohammadreza's information has been added to the manuscript as an author to refrain from delays in submitting the manuscript revision. While merging with the 'master' branch, Mohammad also done some minor edits to the other biographies to follow a similar format. 29 December 2020, 10:25:09 UTC
33f4c1e Minor edits, updated citation to published Menke+20 paper Some minor edits were made to the paper to shorten it. In particular the example of IPOL was removed from the main body of the paper, and we'll just rely on the more extensive review of IPOL in the appendix. I also updated the referee report to account for the new Appendix A that is just an extended introduction. Also, I noticed that the Menke+20 paper that we replicate here has recently been published in the iScience journal. So its bibliography was updated from the bioarXiv information to the journal information. Also, the number of words (after removing abstract and captions and accounting for figures) is now only printed when the project is built with '--no-appendix'. This was done because this information is extra/annoying/unnecessary for the case where there is an appendix. 28 December 2020, 16:16:17 UTC
326adfb The old/long introduction is now an appendix on necessity In the first/long draft of this work, we had a good introduction on the necessity of reproducibility. But we were forced to remove it because of word-count limits. Having moved a major portion of the previous work into the appendices, I thought it would be good to put that introduction as a first appendix also, focused on the necessity for reproducibile research. 28 December 2020, 02:37:54 UTC
5a2a4c3 Edits to snapshot size argument, minor edits here and there Following Boud's point in the previous commit, I tried to clarify the point in the text that we are only talking about hand-written source files: in short, in this part of the paper, we are not talking abou the version/snapshot for arXiv which needs figures and many extra automatically built files. We are just talking about the raw, hand-written files. Trying to convince people how good it is to keep the raw files separate from automatically generated files ;-). Also, while looking around in other parts of the main body of the paper, I tried to edit/clarify a few points and summarize/shorten others. 27 December 2020, 19:49:46 UTC
afc7c57 Fix typos; snapshot size This commit fixes 'automaticly', 'mega byte', 'terra byte'. It also changes 'will be far less than a mega byte' to 'should be less than a megabyte'. The reason for 'should' is that in some cases, providing a small data set in the package is useful, as in [1]. Of course, [1] would be only 0.9 Mb in size, including the data sets, instead of 1.3 Mb, if the author, whoever that may happen to be, had excluded the useless (produced) file 'paper-tmp.eps'. :P Case [2] is 0.4 Mb. These two tar archives are for ArXiv, so they also contain produced .eps files. So maybe in principle 'far less than' is right. However, on neither [3] nor [4], trying to follow the recommendations :), are any of the "useful" versions of single file archives smaller than the ArXiv version. The git bundles are bigger because of the git history, and the 'software' archives are 0.5 to 0.6 Gb because they include almost everything. However, stating something that is possible in principle but not done in practice would be misleading. So I would not include 'far less'. [1] https://zenodo.org/record/3951152/files/subpoisson-252cf1c-arXiv.tar.gz [2] https://zenodo.org/record/4062461/files/elaphrocentre-724a7c8-arXiv.tar.gz [3] https://zenodo.org/record/3951152 [4] https://zenodo.org/record/4062461 27 December 2020, 18:14:28 UTC
back to top