package-cache.git
8 years agopackage_cache: Bump to version 0.2 master v0.2
W. Trevor King [Sat, 22 Feb 2014 03:00:47 +0000 (19:00 -0800)]
package_cache: Bump to version 0.2

Changes since v0.1:
* Added a fallback MIME type to fix server errrors on unknown
  extensions.
* Documented a transparent proxy iptables setup.

8 years agoserver: Add a fallback MIME type (application/octet-stream)
W. Trevor King [Fri, 21 Feb 2014 20:16:06 +0000 (12:16 -0800)]
server: Add a fallback MIME type (application/octet-stream)

Avoid:

  Traceback (most recent call last):
    File "/.../wsgiref/handlers.py", line 137, in run
      self.result = application(self.environ, self.start_response)
    File "/.../site-packages/package_cache/server.py", line 50, in __call__
      environ=environ, start_response=start_response)
    File "/.../site-packages/package_cache/server.py", line 69, in _serve_request
      path=cache_path, environ=environ, start_response=start_response)
    File "/.../site-packages/package_cache/server.py", line 124, in _serve_file
      start_response('200 OK', list(headers.items()))
    File "/.../wsgiref/handlers.py", line 226, in start_response
      self.headers = self.headers_class(headers)
    File "/.../wsgiref/headers.py", line 39, in __init__
      self._convert_string_type(v)
    File "/.../wsgiref/headers.py", line 46, in _convert_string_type
      " of type str (got {0})".format(repr(value)))
  AssertionError: Header names/values must be of type str (got None)

for portage-20140220.tar.xz.md5sum.

8 years agoREADME: Document tranparent proxy setup
W. Trevor King [Fri, 21 Feb 2014 19:08:59 +0000 (11:08 -0800)]
README: Document tranparent proxy setup

8 years agopackage_cache: Bump to version 0.1 v0.1
W. Trevor King [Fri, 21 Feb 2014 06:07:55 +0000 (22:07 -0800)]
package_cache: Bump to version 0.1

8 years agoREADME: Document Gentoo / OpenRC usage for distfiles caching
W. Trevor King [Fri, 21 Feb 2014 05:41:42 +0000 (21:41 -0800)]
README: Document Gentoo / OpenRC usage for distfiles caching

This is what I'm using this project for ;).

8 years agocontrib/openrc/init.d/package-cache: Don't include 'distfiles' in source
W. Trevor King [Fri, 21 Feb 2014 05:36:38 +0000 (21:36 -0800)]
contrib/openrc/init.d/package-cache: Don't include 'distfiles' in source

Portage will request ${MIRROR}/distfiles/${FILENAME}, so we don't want
the 'distfiles' part in the source directory.  With the shift, we can
also use a single cache for both source (distfiles) and binary
packages (PKGDIR, which defaults to /usr/portage/packages) if there
are any binary packages on the upstream mirror.

8 years agomain: Add the logger name and process ID to the syslog formatter
W. Trevor King [Fri, 21 Feb 2014 02:35:08 +0000 (18:35 -0800)]
main: Add the logger name and process ID to the syslog formatter

When logging to stderr, there's no need to differentiate the logging
process.  When everything's landing in the same system log, there is.

8 years agocontrib/openrc/init.d/package-cache: Add PC_OPTS
W. Trevor King [Fri, 21 Feb 2014 02:29:58 +0000 (18:29 -0800)]
contrib/openrc/init.d/package-cache: Add PC_OPTS

You can use this to tweak the logging:

  $ cat /etc/conf.d/package-cache
  PC_OPTS="-vvv"

8 years agocontrib/openrc/init.d/package-cache: Nest more deeply
W. Trevor King [Fri, 21 Feb 2014 01:04:47 +0000 (17:04 -0800)]
contrib/openrc/init.d/package-cache: Nest more deeply

Gentoo's doinitd can't handle renaming the files it installs, so
create a deeper tree where the init script can be called
'package-cache'.  This layout leaves room for a future
contrib/openrc/conf.d/package-cache if we want to supply one.

8 years agopackage-cache: Remove .py extension
W. Trevor King [Thu, 20 Feb 2014 23:44:46 +0000 (15:44 -0800)]
package-cache: Remove .py extension

Users shouldn't care what the implementation language is.

8 years agoREADME.rst: Add symlink for GitHub rendering
W. Trevor King [Thu, 20 Feb 2014 23:20:50 +0000 (15:20 -0800)]
README.rst: Add symlink for GitHub rendering

8 years agoREADME: Convert from Markdown to reStructuredText for PyPI
W. Trevor King [Thu, 20 Feb 2014 23:20:11 +0000 (15:20 -0800)]
README: Convert from Markdown to reStructuredText for PyPI

8 years agocontrib/openrc-init: Add an OpenRC init script
W. Trevor King [Thu, 20 Feb 2014 23:06:13 +0000 (15:06 -0800)]
contrib/openrc-init: Add an OpenRC init script

References:
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4#doc_chap4
https://wiki.gentoo.org/wiki/OpenRC

8 years agomain: Teach package-cache the --syslog option
W. Trevor King [Thu, 20 Feb 2014 22:31:53 +0000 (14:31 -0800)]
main: Teach package-cache the --syslog option

8 years agomain: Teach package-cache the --verbose option
W. Trevor King [Thu, 20 Feb 2014 22:20:42 +0000 (14:20 -0800)]
main: Teach package-cache the --verbose option

For adjusting the verbosity of the package-level logger.

Also add a simple LoggingRequestHandler class so WSGI-side logging is
routed through our loggers instead of being written directly to
stderr.

8 years agopackage_cache: Add a package-level logger
W. Trevor King [Thu, 20 Feb 2014 22:19:50 +0000 (14:19 -0800)]
package_cache: Add a package-level logger

This gives us a single location for configuring verbosity, handlers,
etc., for submodule loggers.

8 years agoserver: Log source-requests and errors
W. Trevor King [Thu, 20 Feb 2014 22:19:22 +0000 (14:19 -0800)]
server: Log source-requests and errors

8 years agoREADME.md: Explain what this is all about
W. Trevor King [Thu, 20 Feb 2014 21:59:45 +0000 (13:59 -0800)]
README.md: Explain what this is all about

8 years agoRun update-copyright.py
W. Trevor King [Thu, 20 Feb 2014 21:57:54 +0000 (13:57 -0800)]
Run update-copyright.py

8 years agopackage-cache.py: Add a '# Copyright' stub for update-copyright.py
W. Trevor King [Thu, 20 Feb 2014 21:56:54 +0000 (13:56 -0800)]
package-cache.py: Add a '# Copyright' stub for update-copyright.py

8 years ago.update-copyright.conf: Add copyright configuration
W. Trevor King [Thu, 20 Feb 2014 21:55:28 +0000 (13:55 -0800)]
.update-copyright.conf: Add copyright configuration

Use my external update-copyright package to maintain copyright blurbs.

http://pypi.python.org/pypi/update-copyright/

8 years ago.gitignore: Ignore Python-3 side effects
W. Trevor King [Thu, 20 Feb 2014 21:54:56 +0000 (13:54 -0800)]
.gitignore: Ignore Python-3 side effects

8 years agosetup.py: Package package-cache with distutils
W. Trevor King [Thu, 20 Feb 2014 21:54:15 +0000 (13:54 -0800)]
setup.py: Package package-cache with distutils

The AUTHORS file doesn't exist yet, but we'll have it soon.

8 years agoserver: Use the Last-Modified header to set last-modified time (mtime)
W. Trevor King [Thu, 20 Feb 2014 21:42:19 +0000 (13:42 -0800)]
server: Use the Last-Modified header to set last-modified time (mtime)

This also sets the access time to the same value, but we're only
calling _get_file if we're about to serve the file to a client, which
will clobber any value of atime set here.

8 years agoserver: Check for relative paths to invalid directories
W. Trevor King [Thu, 20 Feb 2014 21:10:50 +0000 (13:10 -0800)]
server: Check for relative paths to invalid directories

Avoid leaking information to requests like:

  http://localhost:4000/../../etc/passwd

PEP 333 isn't clear on what values are allowed for PATH_INFO, but it
does mention them as "CGI-style" [1].  RFC 3875, defining CGI 1.1,
says about PATH_INFO [2]:

  The server MAY impose restrictions and limitations on what values it
  permits for PATH_INFO, and MAY reject the request with an error if
  it encounters any values considered objectionable.

I can't actually exploit this with Python's reference WSGI
implementation.  When I tried to fetch /../../etc/passwd with Wget, I
got '/etc/passwd' as PATH_INFO, but this seems like an
important-enough risk that a little extra checking would not be wrong
;).

Also drop the urlparse call, because PATH_INFO is already the parsed
path portion of the URL.

[1]: http://legacy.python.org/dev/peps/pep-0333/#specification-details
[2]: http://tools.ietf.org/search/rfc3875#section-4.1.5

8 years agoserver: Create file paths as needed
W. Trevor King [Thu, 20 Feb 2014 20:48:22 +0000 (12:48 -0800)]
server: Create file paths as needed

Add support for non-flat source file layouts (e.g. relative paths that
contain directory parts).

Instead of creating the cache directory and possible per-file
subdirectories separately, just create per-file directories on the
fly.  This simplifies the code, but means that you won't die until the
first request if your server doesn't have permission to create these
directories.

8 years agoserver: Implement Server._get_file
W. Trevor King [Thu, 20 Feb 2014 20:16:47 +0000 (12:16 -0800)]
server: Implement Server._get_file

It would be nice to use sendfile to copy between the HTTPResponse
object [1] and the cache file.  Linux supports arbitrary files (not
just sockets) for out_fd since 2.6.33, so the "to the cache file" side
works.  However, from sendfile(2) [2]:

  The in_fd argument must correspond to a file which supports
  mmap(2)-like operations (i.e., it cannot be a socket).

So reading from the HTTPResponse is not going to happen (yet).  Once
Linux gains support for socket in_fd, we could use something like:

    _os.sendfile(
        f.fileno(), response.fileno(), offset=None, count=content_length)

[1]: http://docs.python.org/3/library/http.client.html#httpresponse-objects
[2]: http://man7.org/linux/man-pages/man2/sendfile.2.html

8 years agoserver: Don't use a keyword for the response_headers argument to start_response
W. Trevor King [Thu, 20 Feb 2014 19:19:22 +0000 (11:19 -0800)]
server: Don't use a keyword for the response_headers argument to start_response

Despite being documented as response_headers [1], using a keyword
argument raises a TypeError:

  TypeError: start_response() got an unexpected keyword argument 'response_headers'

[1]: http://legacy.python.org/dev/peps/pep-0333/#the-start-response-callable

8 years agoserver: Don't use a keyword for the path argument to getmtime
W. Trevor King [Thu, 20 Feb 2014 19:16:21 +0000 (11:16 -0800)]
server: Don't use a keyword for the path argument to getmtime

Despite being documented as path [1], using a keyword argument
raises a TypeError:

  TypeError: getmtime() got an unexpected keyword argument 'path'

[1]: http://docs.python.org/3/library/os.path.html#os.path.getmtime

8 years agoserver: Don't use a keyword for the path argument to getsize
W. Trevor King [Thu, 20 Feb 2014 19:14:54 +0000 (11:14 -0800)]
server: Don't use a keyword for the path argument to getsize

Despite being documented as path [1], using a keyword argument raises
a TypeError:

  TypeError: getsize() got an unexpected keyword argument 'path'

[1]: http://docs.python.org/3/library/os.path.html#os.path.getsize

8 years agoserver: Don't use a keyword for the urlstring argument to urlparse
W. Trevor King [Thu, 20 Feb 2014 19:11:24 +0000 (11:11 -0800)]
server: Don't use a keyword for the urlstring argument to urlparse

Despite being documented as urlstring [1], using a keyword argument
raises a TypeError:

  TypeError: urlparse() got an unexpected keyword argument 'urlstring'

[1]: http://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse

8 years agoserver: Don't use a keyword for the path argument to makedirs
W. Trevor King [Thu, 20 Feb 2014 19:05:34 +0000 (11:05 -0800)]
server: Don't use a keyword for the path argument to makedirs

Despite being documented as path [1], using a keyword argument raises
a TypeError:

  TypeError: makedirs() got an unexpected keyword argument 'path'

[1]: http://docs.python.org/3/library/os.html#os.makedirs

8 years agoserver: Create the cache directory if it doesn't already exist
W. Trevor King [Thu, 20 Feb 2014 19:02:29 +0000 (11:02 -0800)]
server: Create the cache directory if it doesn't already exist

8 years agomain: Add an argparse-based command line interface
W. Trevor King [Thu, 20 Feb 2014 19:00:16 +0000 (11:00 -0800)]
main: Add an argparse-based command line interface

And a package-cache.py wrapper script to call it.

8 years agoserver: Stub out a WSGI server
W. Trevor King [Thu, 20 Feb 2014 18:50:41 +0000 (10:50 -0800)]
server: Stub out a WSGI server

This still needs source-fetching and Content-Range support, but it
should handle serving from the cache well enough.

8 years agopackage_cache: Create a Python package with a version
W. Trevor King [Thu, 20 Feb 2014 18:50:17 +0000 (10:50 -0800)]
package_cache: Create a Python package with a version

8 years agoCOPYING: Use the GPLv3
W. Trevor King [Thu, 20 Feb 2014 17:21:07 +0000 (09:21 -0800)]
COPYING: Use the GPLv3

Fresh download from http://www.gnu.org/licenses/gpl-3.0.txt.