security.txt

   1 = Security analysis of irker =
   2
   3 This is an analysis of security and DoS vulnerabilities associated
   4 with irker, exploring and explaining certain design choices.  Much of
   5 it derives from a code audit and report by Daniel Franke.
   6
   7 == Assumptions and Goals ==
   8
   9 We begin by stating some assumptions about how irker will be deployed,
  10 and articulating a set of security goals.
  11
  12 Communication flow in an irker deployment will look like this:
  13
  14 -----------------------------------------------------------------------------
  15              Committers
  16                  |
  17                  |
  18         Version-control repositories
  19                  |
  20                  |
  21             irkerhook.py
  22                  |
  23                  |
  24                irkerd
  25                  |
  26                  |
  27              IRC servers
  28 -----------------------------------------------------------------------------
  29
  30 Here are our assumptions:
  31
  32 1. The repositories are hosted on a public forge sites such as
  33 SourceForge, GitHub, Gitorious, Savannah, or Gna and must be
  34 accessible to untrusted users.
  35
  36 2. Repository project owners can set properties on their repositories
  37 (including but not limited to irker.*), and may be able to set custom
  38 post-commit hooks which can execute arbitrary code on the repostory
  39 server. In particular, these people my be able to modify the local
  40 copy of irkerhook.py.
  41
  42 3. The machine which hosts irkerd has the same owner as the machine which
  43 hosts the the repo; these machines are possibly but not necessarily
  44 one and the same.
  45
  46 4. The network is protected by a perimeter firewall, and only a
  47 trusted group is able to emit arbitrary packets from inside the
  48 perimeter; committers are not necessarily part of this group.
  49
  50 5. irkerd communicates with IRC servers over the open internet,
  51 and an IRC server's administrator is assumed to hold no position of
  52 trust with any other party.
  53
  54 We can, accordingly, identify the following groups of security
  55 principals:
  56
  57 A. irker administrators.
  58 B. Project committers.
  59 C. Project owners
  60 D. IRC server administrators.
  61 E. Other people on irker's internal network.
  62 F. irkerd-IRC men-in-the-middle (i.e. people who control the network path
  63    between irkerd and the IRC server).
  64 G. Random people on the internet.
  65
  66 Our security goals for irker can be enumerated as follows:
  67
  68 * Control: We don't want anyone outside group A gaining control of
  69   the machines which host irkerd or the git repos.
  70
  71 * Availability: Only group A should be able to to deny or degrade
  72   irkerd's ability to receive commit messages and relay them to the
  73   IRC server. We recognize and accept as inevitable that MITMs (groups
  74   E and F) can do this too (by ARP spoofing, cable-cutting, etc.).
  75   But, in particular, we would like irker-mediated services to be
  76   resilient against DoS (denial of service) attacks.
  77
  78 * Authentication/integrity: Notifications should be truthful, i.e.,
  79   commit messages sent to IRC channels should actually reflect that a
  80   corresponding commit has taken place. We accept that groups A, C,
  81   D, and E can violate this property.
  82
  83 * Secrecy: irker shouldn't aid spammers (group G) in harvesting
  84   committers' email addresses.
  85
  86 * Auditability: If people abuse irkerd, we want to be able to identify
  87   the abusive account or IP address.
  88
  89 == Control Issues ==
  90
  91 We have audited the irker and irkerhook.py code for exploitable
  92 vulnerabilities.  We have not found any in the code itself, and the
  93 use of Python gives us confidence in the absence of large classes of errors
  94 (such as buffer overruns) that afflict C programs.
  95
  96 However, the fact that irkerhook.py relies on external binaries to
  97 mine data out of its repository opens up a well-known set of
  98 vulnerabilities if a malicious user is able to insert binaries in a
  99 carelessly-set execution path.  Normal precautions against this should
 100 be taken.
 101
 102 == Availability ==
 103
 104 === Solved problems ===
 105
 106 When the original implementation of irkerd saw a nick collision it
 107 generated new nicks in a predictable sequence. A malicious IRC user
 108 could have continuously changed his own nick to the next one that
 109 irkerd is going to try. Some randomness has been added to nick
 110 generation to prevent this.
 111
 112 === Unsolved problems ===
 113
 114 DoS attacks on any networked application can never completely
 115 prevented, only mitigated by forcing attackers to invest more
 116 resources.  Here we consider the easiest attack paths against irker,
 117 and possible countermeasures.
 118
 119 irker handles each connection to a particular IRC server in a separate
 120 thread - actually, due to server limits on open channels per
 121 connection, there may be multiple sessions per server. This may not
 122 scale well, especially on 32-bit architectures.
 123
 124 Thread instance overhead, combined with the lack of any restriction on
 125 how many URLs can appear in the 'to' list, is a DoS vulnerability. If
 126 a repository's properties specify that notifications should go to more
 127 than about 500 unique hostnames, then on 32-bit architectures we'll
 128 hit the 4GB cap on virtual memory (even while the resident set size
 129 remains small).
 130
 131 Another ceiling to watch out for is the ulimit on file descriptors,
 132 which defaults to 1024 on many Linux systems but can safely be set
 133 much larger. Each connection instance costs a file descriptor.
 134
 135 We consider some possible ways of addressing the problem:
 136
 137 1. Limit the number of URLs in a request.  Pretty painless - it will
 138 be very rare that anyone wants to specify a larger set than a project
 139 channel plus freenode #commits - but also ineffective.  A malicious
 140 hook could achieve DoS simply by spamming lots of requests.
 141
 142 2. Limit the total number of requests than can be queued. Completely
 143 ineffective - just sets a target for the DoS attack.
 144
 145 3. Limit the number of requests that can be queued by source IP address.
 146 This might be worth doing; it would stymie a single-source DoS attack through
 147 a publicly-exposed irkerd, though not a DDoS by a botnet.  But there isn't
 148 a lot of win here for a properly installed irker (e.g. behind a firewall),
 149 which is typically going to get all its requests from a single repo host
 150 anyway.
 151
 152 4. Rate-limit requests by source IP address - that is, after any request
 153 discard additional ones during some timeout period.  Again, good for
 154 stopping a single-source DoS against an exposed irker, won't stop a
 155 DDoS.  The real problem though, is that any such rate limit might interfere
 156 with legitimate high-volume use by a very active repo site.
 157
 158 After this we appear to have run out of easy options, as source IP address
 159 is the only thing irkerd can see that an attacker can't spoof.
 160
 161 We mitigate some availability risks by reaping old sessions when we're
 162 near resource limits.  An ordinary DoS attack would then be prevented
 163 from completely blocking all message traffic; the cost would be a
 164 whole lot of join/leave spam due to connection churn.
 165
 166 We also use greenlets (Python coroutines imitating system threads)
 167 when they are available.  This reduces memory overhead due to
 168 threading substantially, making a thread-flooding DoS more dfficult.
 169
 170 == Authentication/Integrity ==
 171
 172 One way to help prevent DoS attacks would be in-band authentication -
 173 requiring irkerd submitters to present a credential along with each
 174 message submission.  In principle this, if it existed, could also be used
 175 to verify that a submitter is authorized to issue notifications with
 176 respect to a given project.
 177
 178 We rejected this approach. The design goal for irker was to make
 179 submissions fast, cheap, and stateless; baking an authentication
 180 system directly into the irkerd codebase would have conflicted with
 181 these objectives, not to mention probably becoming the camel's nose
 182 for a godawful amount of code bloat.
 183
 184 The deployment advice in the installation instructions assumes that
 185 irkerd submitters are "authenticated" by being inside a firewall - that is,
 186 mesages are issued from an intranet and it can be trusted that anyone
 187 issuing messages from within a given intrenet is authorized to do so.
 188 This fits the assumption that irker instances will run on forge sites
 189 receiving requests from instances of irkerhook.py.
 190
 191 If this is *not* the case (e.g. the network between a hook and irkerd
 192 has to be considered hostile) we could hide irkerd behind an instance
 193 of spiped <http://www.tarsnap.com/spiped.html> or an instance of
 194 stunnel <http://www.stunnel.org>. These would be far superior to
 195 in-band authentication in that they would leave the job to specialist
 196 code not in any way coupled to irkerd's internals, minimizing
 197 global complexity and failure modes.
 198
 199 One larger issue (not unique to irker) is that because of the
 200 insecured nature of IRC it is essentially impossible to secure
 201 #commits against commit notifications that are either garbled by
 202 software errors and misconfigurations or maliciously crafted to
 203 confuse anyone attempting to gather statistics from that channel.  The
 204 lesson here is that IRC monitoring isn't a good method for that
 205 purpose; going direct to the repositories via a toolkit such as Ohloh
 206 is a far better idea.
 207
 208 === Future directions ===
 209
 210 There is presently no direct support for spipe or stunnel in
 211 irkerhook.py.  We'd take patches for this.
 212
 213 == Secrecy ==
 214
 215 irkerd has no inherent secrecy risks.
 216
 217 The distributed version of irkerhook.py removes the host part of
 218 author addresses specifically in order to prevent address harvesting
 219 from the notifications.
 220
 221 == Auditability ==
 222
 223 We previously noted that source IP address is the only thing irker can
 224 see that an attacker can't spoof.  This makes auditability difficult
 225 unless we impose conventions on the notifications passing though it.
 226
 227 The irkerhook.py that we ship inherits an auditability property from
 228 the CIA service it was designed to replace: the first field of every
 229 notification (terminated by a colon) is the name of the issuing
 230 project.  The only other competitor to replace CIA known to us
 231 (kgb_bot) shares this property.
 232
 233 In the general case we cannot guarantee this property against
 234 groups A and F.
 235
 236 == Why there is no support for passworded channels ==
 237
 238 We've had support for password authentication to IRC requested, but it
 239 would be a rather bad fit for irkerd’s usage pattern. The problem
 240 isn’t that credentials would be difficult to pass to irkerd – an
 241 optional password field wiuld ve easily enough added to the JSON.
 242
 243 No, the problem is that once irkerd were to acquire such a credential,
 244 it would have to do source-address IP checking to know (at a minimum)
 245 whether the source host of any given notification request is the same
 246 as one that has presented the password.
 247
 248 It seems best not gong to go there; the potential for IRC access controls
 249 becoming leaky seems too high.
 250
 251 == Risks relative to centralized services ==
 252
 253 irker and irkerhook.py were written as a replacement for the
 254 now-defunct CIA notification service.  The author has written
 255 a critique of that service: "CIA and the perils of overengineering"
 256 at <http://esr.ibiblio.org/?p=4540>.  It is thus worth considering how
 257 a risk assessment of CIA compares to this one.
 258
 259 The principal advantages of CIA from a security point of view were (a)
 260 it provided a single point at which spam filtering and source blocking
 261 could be done with benefit to all projects using the service, and (b)
 262 since it had to have a database anyway for routing messages to project
 263 channels, the incremental overhead for an authentication feature would
 264 have been relatively low.
 265
 266 As a matter of fact rather than theory CIA never fully exploited
 267 either possibility.  Anyone could create a CIA project entry with
 268 fanout to any desired set of IRC channels.  Notifications were not
 269 authenticated, so anyone could masquerade as a member of any project.
 270 The only check on abuse was human intervention to source-block
 271 spammers, and this was by no means completely effective - spam shipped
 272 via CIA was occasionally seen on on the freenode #commits channel.
 273
 274 The principal security disadvantage of CIA was that it meant the
 275 entire notification system was subject to single-point failure due
 276 to software or hosting failures on cia.vc, or to DoS attacks
 277 against the server.  While there is no evidence that the site
 278 was ever deliberately DoSed, failures were sufficiently common
 279 that a half-hearted DoS attack might not have been even noticed.
 280
 281 Despite the absence of authentication, irker instances on
 282 properly firewalled intranets do not obviously pose additional
 283 spamming risks beyond those incurred by the CIA service.  The
 284 overall robustness of the notification system as a whole should
 285 be greatly improved.
 286
 287 == Conclusions ==
 288
 289 The security and DoS issues irker has are not readily addressable by
 290 changing the irker codebase itself, short of a complete (much more
 291 complex and heavyweight) redesign.  They are largely implicit risks of
 292 its operating environment and must be managed by properly controlling
 293 access to irker instances.
 294