The RFC2047 unquoting, used to parse email addresses in From and Cc
headers, is broken in several ways:
* It erroneously substitutes ' ' for '_' in *the whole* header, even
outside the quoted field. [Noticed by Christoph.]
* It is too liberal in its matching, and happily matches the start
of one quoted chunk against the end of another, or even just
something that looks like such an end. [Noticed by Junio.]
* It fundamentally cannot cope with encodings that are not a
superset of ASCII, nor several (incompatible) encodings in the
same header.
This patch fixes the first two by doing a more careful decoding of
the outer quoting (e.g. "=AB" to represent an octet whose value is
0xAB). Fixing the fundamental issues is left for a future, more
intrusive, patch.
Noticed-by: Christoph Miebach <christoph.miebach@web.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sub unquote_rfc2047 {
local ($_) = @_;
my $encoding;
- if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
+ s{=\?([^?]+)\?q\?(.*?)\?=}{
$encoding = $1;
- s/_/ /g;
- s/=([0-9A-F]{2})/chr(hex($1))/eg;
- }
+ my $e = $2;
+ $e =~ s/_/ /g;
+ $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
+ $e;
+ }eg;
return wantarray ? ($_, $encoding) : $_;
}
grep "^Subject: =?UTF-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1
'
+test_expect_success $PREREQ 'utf8 author is correctly passed on' '
+ clean_fake_sendmail &&
+ test_commit weird_author &&
+ test_when_finished "git reset --hard HEAD^" &&
+ git commit --amend --author "Füñný Nâmé <odd_?=mail@example.com>" &&
+ git format-patch --stdout -1 >funny_name.patch &&
+ git send-email --from="Example <nobody@example.com>" \
+ --to=nobody@example.com \
+ --smtp-server="$(pwd)/fake.sendmail" \
+ funny_name.patch &&
+ grep "^From: Füñný Nâmé <odd_?=mail@example.com>" msgtxt1
+'
+
test_expect_success $PREREQ 'detects ambiguous reference/file conflict' '
echo master > master &&
git add master &&