From: W. Trevor King Date: Fri, 31 May 2013 18:45:05 +0000 (-0400) Subject: posts:package_management: Add post following SWC discussion X-Git-Url: http://git.tremily.us/gitweb.cgi?a=commitdiff_plain;ds=sidebyside;h=081345b6207d3b31abd39b41d10baaf7df366196;p=blog.git posts:package_management: Add post following SWC discussion --- diff --git a/posts/Package_management.mdwn b/posts/Package_management.mdwn new file mode 100644 index 0000000..da8f427 --- /dev/null +++ b/posts/Package_management.mdwn @@ -0,0 +1,136 @@ +Lex Nederbragt posted [a question about version control and +provenance][LN] on the [Software Carpentry][SWC] [discussion +list][discuss]. I responded with my [Portage][]-based +[workflow][WTK], but C. Titus Brown [pointed out a number of reasons +why this approach isn't more widely used][CTB], which seem to boil +down to “that sounds like more trouble than it's worth”. Because +recording the state of a system is important for [reproducible +research][RR], it is worth doing *something* to clean up the current +seat-of-the-pants approach. + +Figuring out what software you have intalled on your system is +actually a (mostly) solved problem. There is a long history in the +Linux ecosystem for [package management systems][PMS] that track +installed packages and install new software (and any dependencies) +automatically. Unfortunately, there is not a consensus package +manager across distributions, with [Debian][]-based distributions +using [apt][], [Fedora][]-based distributions using [yum][], …. If +you are not the system administrator for your computer, you can either +talk your sysadmin into installing the packages you need, or use one +of a number of guest package managers ([Gentoo Prefix][prefix], +[homebrew][], …). The guest package managers also work if you're +committed to an OS that doesn't have an existing native package +manager. + +Despite the existence of many high quality package managers, I know +many people who continue to install significant amounts of software by +hand. While this is sustainable for a handful of packages, I see no +reason to struggle through manual installations (subsequent upgrades, +dependencies, …) when existing tools can automate the procedure. A +stopgap solution is to use language specific package managers ([pip][] +for [[Python]], [gem][] for [Ruby][], …). This works fairly well, but +once you reach a certain level of complexity (e.g. integrating +[Fortran][] and [[C]] extensions with [[Python]] in [SciPy][]), things +[get difficult][SciPy-no-pip]. While language-specific packaging +standards ease automation, they are not a substitute for a +language-agnostic package manager. + +Many distributions distribute pre-compiled, binary packages, which +give fast, stable installs without the need to have a full build +system on your local machine. When the package you need is in the +official repository (or a third-party repository), this approach works +quite well. There's no need to go through the time or effort of +compiling [Firefox][], [[LaTeX]], [LibreOffice][], or other software +that I interact with as a general a user. However, my own packages +(or actively developed libraries that use from my own software) are +rarely available as pre-compiled binaries. If you find yourself in +this situation, it is useful to use a package manager that makes it +easy to write source-based packages ([Gentoo][]'s [Portage][], +[Exherbo][]'s [Paludis][], [Arch][]'s [packman][], …). + +With source-based packaging systems, packaging an existing Python +package is usually a matter of [listing a bit of +metadata][ipythonblocks]. With [layman][], integrating your [[local +packages|Gentoo_overlay]] into your [Portage][] tree is extremely +simple. Does your package depend on some other package in another +oddball language? Some wonky build tool? No problem! Just list the +new dependency in your [ebuild][PMS-doc] (it probably [already +exists][packages]). Source-based package managers also make it easy +to stay up to date with ongoing development. [Portage][] supports +live ebuilds that build fresh checkouts from a project's version +control repository (use [[Git]]!). There is no need to dig out your +old installation notes or reread the projects installation +instructions. + +Getting back to the goals of reproducible research, I think that +existing package managers are an excellent solution for tracking the +software used to perform experiments or run simulations and analysis. +The main stumbling block is the lack of market penetration ;). +Building a lightweight package manager that can work easily at both +the system-wide and per-user levels across a range of host OSes is +hard work. With the current fractured packaging ecosystem, I doubt +that rolling a new package manager from scratch would be an effective +approach. Existing package managers have mostly satisfied their +users, and the fundamental properties haven't changed much in over a +decade. Writing a system appealing enough to drag these satisfied +users over to your new system is probably not going to happen. + +[Portage][] (and [Gentoo Prefix][prefix]) get you most of the way +there, with the help of well written [specifications][PMS-doc] and +[documentation][devguide]. However, compatibility and testing in the +prefix configuration still need some polishing, as does [robust binary +packaging support][binary]. These issues are less interesting to most +[Portage][] developers, as they usually run Portage as the native +package manager and avoid binary packages. If the broader scientific +community is interested in sustainable software, I think effort +channeled into polishing these use-cases would be time well spent. + +For those less interested in adopting a full-fledged package manager, +you should at least make *some* effort to package your software. I +have used software that didn't even have a `README` with build +instructions, but compiling it was awful. If you're publishing your +software in the hopes that others will find it, use it, and cite you +in their subsequent paper, it behooves you to make the installation as +easy as possible. Until your community coalesces around a single +package management framework, picking a standard build system +([Autotools][], [Distutils][], …) will at least make it easier for +folks to install your software by hand. + +[LN]: http://lists.software-carpentry.org/pipermail/discuss-software-carpentry.org/2013-May/000529.html +[SWC]: http://software-carpentry.org/ +[discuss]: http://lists.software-carpentry.org/listinfo.cgi/discuss-software-carpentry.org +[Portage]: http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=1 +[WTK]: http://lists.software-carpentry.org/pipermail/discuss-software-carpentry.org/2013-May/000533.html +[CTB]: http://lists.software-carpentry.org/pipermail/discuss-software-carpentry.org/2013-May/000534.html +[RR]: http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4720211 +[PMS]: http://en.wikipedia.org/wiki/Package_management_system +[Debian]: http://www.debian.org/ +[apt]: http://wiki.debian.org/Apt +[Fedora]: http://fedoraproject.org/ +[yum]: http://yum.baseurl.org/ +[prefix]: http://www.gentoo.org/proj/en/gentoo-alt/prefix/ +[homebrew]: http://mxcl.github.io/homebrew/ +[pip]: http://www.pip-installer.org/en/latest/ +[gem]: http://rubygems.org/ +[Ruby]: http://www.ruby-lang.org/ +[Fortran]: http://en.wikipedia.org/wiki/Fortran +[SciPy]: http://www.scipy.org/ +[SciPy-no-pip]: http://www.scipy.org/Installing_SciPy/BuildingGeneral#head-fe5a03a1d9ad7414becca62672c1316dc91b5f88 +[Firefox]: http://www.mozilla.org/en-US/firefox/new/ +[LibreOffice]: http://www.libreoffice.org/ +[Gentoo]: http://www.gentoo.org/ +[Exherbo]: http://www.exherbo.org/ +[Paludis]: http://paludis.exherbo.org/ +[Arch]: http://www.archlinux.org/ +[packman]: https://www.archlinux.org/pacman/ +[ipythonblocks]: http://git.tremily.us/?p=wtk-overlay.git;a=blob;f=dev-python/ipythonblocks/ipythonblocks-9999.ebuild;hb=HEAD +[layman]: http://layman.sourceforge.net/ +[PMS-doc]: http://www.gentoo.org/proj/en/qa/pms.xml +[packages]: http://gpo.zugaina.org/ +[devguide]: http://devmanual.gentoo.org/quickstart/ +[binary]: http://article.gmane.org/gmane.linux.gentoo.devel/84964/ +[Autotools]: http://www.gnu.org/savannah-checkouts/gnu/automake/manual/html_node/Autotools-Introduction.html +[Distutils]: http://docs.python.org/3/distutils/ + +[[!tag tags/tools]] +[[!tag tags/programming]]