git.tremily.us Git - cython.git/commit

author	Stefan Behnel <scoder@users.berlios.de>
	Fri, 15 Aug 2008 02:41:09 +0000 (04:41 +0200)
committer	Stefan Behnel <scoder@users.berlios.de>
	Fri, 15 Aug 2008 02:41:09 +0000 (04:41 +0200)
commit	2e8a0084bb1aa86b27877bfb117cbdad023bbdd9
tree	d6ef76b572a01e46e27c27ff190747c2c19830ad	tree \| snapshot
parent	7bc8549aa00f6969240d49fa775f7062bfd12ca5	commit \| diff

Rewrite of the string literal handling code

String literals pass through the compiler as follows:
- unicode string literals are stored as unicode strings and encoded to UTF-8 on the way out
- byte string literals are stored as correctly encoded byte strings by unescaping the source string literal into the corresponding byte sequence. No further encoding is done later on!
- char literals are stored as byte strings of length 1. This can be verified by the parser now, e.g. a non-ASCII char literal in UTF-8 source code will result in an error, as it would end up as two or more bytes in the C code, which can no longer be represented as a C char.

Storing byte strings is necessary as we otherwise loose the ability to encode byte string literals on the way out. They do not necessarily contain only bytes that fit into the source code encoding as the source can use escape sequences to represent them. Previously, ASCII encoded source code could not contain byte string literals with properly escaped non-ASCII bytes.

Another bug that was fixed: in Python, escape sequences behave different in unicode strings (where they represent the character code) and byte strings (where they represent a byte value). Previously, they resulted in the same byte value in Cython code. This is only a problem for non-ASCII escapes, since the character code and the byte value of ASCII characters are identical.

Cython/Compiler/Buffer.py		diff \| blob \| history
Cython/Compiler/ExprNodes.py		diff \| blob \| history
Cython/Compiler/Main.py		diff \| blob \| history
Cython/Compiler/ModuleNode.py		diff \| blob \| history
Cython/Compiler/Nodes.py		diff \| blob \| history
Cython/Compiler/ParseTreeTransforms.py		diff \| blob \| history
Cython/Compiler/Parsing.py		diff \| blob \| history
Cython/Compiler/PyrexTypes.py		diff \| blob \| history
Cython/Compiler/Scanning.py		diff \| blob \| history
Cython/Compiler/StringEncoding.py	[new file with mode: 0644]	blob
Cython/Compiler/Symtab.py		diff \| blob \| history
Cython/Compiler/TypeSlots.py		diff \| blob \| history
Cython/Compiler/Visitor.py		diff \| blob \| history
Cython/Utils.py		diff \| blob \| history
tests/run/charencoding.pyx	[new file with mode: 0644]	blob