From: Minh Van Nguyen Date: Fri, 2 Oct 2009 15:30:33 +0000 (+1000) Subject: move reference guide and SciPy tutorial to separate dirs under sr/ X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=1e14663a463136b4e6f9c10ced38f8afe56fada1;p=cython.git move reference guide and SciPy tutorial to separate dirs under sr/ --- diff --git a/index.rst b/index.rst index c5186638..a67db73d 100644 --- a/index.rst +++ b/index.rst @@ -1,27 +1,11 @@ -Welcome to Cython's Reference Guide -=================================== +Welcome to Cython's Documentation +================================= Contents: - -.. toctree:: - :maxdepth: 2 - - src/overview - src/language_basics - src/compilation - src/extension_types - src/interfacing_with_other_code - src/special_mention - src/limitations - -Indices and tables -================== - -* :ref:`genindex` -* :ref:`modindex` -* :ref:`search` - .. toctree:: + :maxdepth: 1 + src/reference/index + src/tutorial/index diff --git a/src/reference/Makefile b/src/reference/Makefile new file mode 100644 index 00000000..70a45cba --- /dev/null +++ b/src/reference/Makefile @@ -0,0 +1,68 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . + +.PHONY: help clean html web htmlhelp latex changes linkcheck + +help: + @echo "Please use \`make ' where is one of" + @echo " html to make standalone HTML files" + @echo " web to make files usable by Sphinx.web" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " changes to make an overview over all changed/added/deprecated items" + @echo " linkcheck to check all external links for integrity" + +clean: + -rm -rf build/* + +html: + mkdir -p build/html build/doctrees + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) build/html + @echo + @echo "Build finished. The HTML pages are in build/html." + +web: + mkdir -p build/web build/doctrees + $(SPHINXBUILD) -b web $(ALLSPHINXOPTS) build/web + @echo + @echo "Build finished; now you can run" + @echo " python -m sphinx.web build/web" + @echo "to start the server." + +htmlhelp: + mkdir -p build/htmlhelp build/doctrees + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) build/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in build/htmlhelp." + +latex: + mkdir -p build/latex build/doctrees + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) build/latex + @echo + @echo "Build finished; the LaTeX files are in build/latex." + @echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \ + "run these through (pdf)latex." + +changes: + mkdir -p build/changes build/doctrees + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) build/changes + @echo + @echo "The overview file is in build/changes." + +linkcheck: + mkdir -p build/linkcheck build/doctrees + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) build/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in build/linkcheck/output.txt." diff --git a/src/compilation.rst b/src/reference/compilation.rst similarity index 100% rename from src/compilation.rst rename to src/reference/compilation.rst diff --git a/src/extension_types.rst b/src/reference/extension_types.rst similarity index 100% rename from src/extension_types.rst rename to src/reference/extension_types.rst diff --git a/src/reference/index.rst b/src/reference/index.rst new file mode 100644 index 00000000..5943f509 --- /dev/null +++ b/src/reference/index.rst @@ -0,0 +1,22 @@ +Reference Guide +=============== + +Contents: + +.. toctree:: + :maxdepth: 2 + + overview + language_basics + compilation + extension_types + interfacing_with_other_code + special_mention + limitations + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/src/interfacing_with_other_code.rst b/src/reference/interfacing_with_other_code.rst similarity index 100% rename from src/interfacing_with_other_code.rst rename to src/reference/interfacing_with_other_code.rst diff --git a/src/language_basics.rst b/src/reference/language_basics.rst similarity index 100% rename from src/language_basics.rst rename to src/reference/language_basics.rst diff --git a/src/limitations.rst b/src/reference/limitations.rst similarity index 100% rename from src/limitations.rst rename to src/reference/limitations.rst diff --git a/src/overview.rst b/src/reference/overview.rst similarity index 100% rename from src/overview.rst rename to src/reference/overview.rst diff --git a/src/special_mention.rst b/src/reference/special_mention.rst similarity index 100% rename from src/special_mention.rst rename to src/reference/special_mention.rst diff --git a/src/tutorial/appendix.rst b/src/tutorial/appendix.rst new file mode 100644 index 00000000..f8650170 --- /dev/null +++ b/src/tutorial/appendix.rst @@ -0,0 +1,29 @@ +Appendix: Installing MinGW on Windows +===================================== + + 1. Download the MinGW installer from http://mingw.org. (As of this + writing, the download link is a bit difficult to find; it's under + "About" in the menu on the left-hand side). You want the file + entitled "Automated MinGW Installer" (currently version 5.1.4). + 2. Run it and install MinGW. Only the basic package is strictly + needed for Cython, although you might want to grab at least the + C++ compiler as well. + 3. You need to set up Windows' "PATH" environment variable so that + includes e.g. "c:\\mingw\\bin" (if you installed MinGW to + "c:\\mingw"). The following web-page describes the procedure + in Windows XP (the Vista procedure is similar): + http://support.microsoft.com/kb/310519 + 4. Finally, tell Python to use MinGW as the default compiler + (otherwise it will try for Visual C). If Python is installed to + "c:\\Python26", create a file named + "c:\\Python26\\Lib\\distutils\\distutils.cfg" containing:: + + [build] + compiler = mingw32 + +The [WinInst]_ wiki page contains updated information about this +procedure. Any contributions towards making the Windows install +process smoother is welcomed; it is an unfortunate fact that none of +the regular Cython developers have convenient access to Windows. + +.. [WinInst] http://wiki.cython.org/InstallingOnWindows diff --git a/src/tutorial/build.rst b/src/tutorial/build.rst new file mode 100644 index 00000000..783d2987 --- /dev/null +++ b/src/tutorial/build.rst @@ -0,0 +1,58 @@ +Building Cython code +==================== + +Cython code must, unlike Python, be compiled. This happens in two stages: + + - A ``.pyx`` file is compiled by Cython to a ``.c`` file, containing + the code of a Python extension module + - The ``.c`` file is compiled by a C compiler to + a ``.so`` file (or ``.pyd`` on Windows) which can be + ``import``-ed directly into a Python session. + +There are several ways to build Cython code: + + - Write a distutils ``setup.py``. + - Use ``pyximport``, importing Cython ``.pyx`` files as if they + were ``.py`` files (using distutils to compile and build the background). + - Run the ``cython`` command-line utility manually to produce the ``.c`` file + from the ``.pyx`` file, then manually compiling the ``.c`` file into a shared + object library or ``.dll`` suitable for import from Python. + (This is mostly for debugging and experimentation.) + - Use the [Sage]_ notebook which allows Cython code inline. + +Currently, distutils is the most common way Cython files are built and distributed. + +Building a Cython module using distutils +---------------------------------------- + +Imagine a simple "hello world" script in a file ``hello.pyx``:: + + def say_hello_to(name): + print(Hello %s!" % name) + +The following could be a corresponding ``setup.py`` script:: + + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + ext_modules = [Extension("hello", ["hello.pyx"])] + + setup( + name = 'Hello world app', + cmdclass = {'build_ext': build_ext}, + ext_modules = ext_modules + ) + +To build, run ``python setup.py build_ext --inplace``. Then simply +start a Python session and do ``from hello import say_hello_to`` and +use the imported function as you see fit. + +.. figure:: sage.png + + The Sage notebook allows transparently editing and compiling Cython + code simply by typing ``%cython`` at the top of a cell and evaluate + it. Variables and functions defined in a Cython cell imported into + the running session. + +.. [Sage] W. Stein et al., Sage Mathematics Software, http://sagemath.org diff --git a/src/tutorial/caveats.rst b/src/tutorial/caveats.rst new file mode 100644 index 00000000..5895c80d --- /dev/null +++ b/src/tutorial/caveats.rst @@ -0,0 +1,18 @@ +Caveats +======= + +Since Cython mixes C and Python semantics, some things may be a bit +surprising or unintuitive. Work always goes on to make Cython more natural +for Python users, so this list may change in the future. + + - ``10**-2 == 0``, instead of ``0.01`` like in Python. + - Given two typed ``int`` variables ``a`` and ``b``, ``a % b`` has the + same sign as the first argument (following C semantics) rather then + having the same sign as the second (as in Python). This will change + in Cython 0.12. + - Care is needed with unsigned types. ``cdef unsigned n = 10; + print(range(-n, n))`` will print an empty list, since ``-n`` wraps + around to a large positive integer prior to being passed to the + ``range`` function. + - Python's ``float`` type actually wraps C ``double`` values, and + Python's ``int`` type wraps C ``long`` values. diff --git a/src/tutorial/cdef_classes.rst b/src/tutorial/cdef_classes.rst new file mode 100644 index 00000000..5a4aff1c --- /dev/null +++ b/src/tutorial/cdef_classes.rst @@ -0,0 +1,141 @@ +Extension types (aka. cdef classes) +=================================== + +To support object-oriented programming, Cython supports writing normal +Python classes exactly as in Python:: + + class MathFunction(object): + def __init__(self, name, operator): + self.name = name + self.operator = operator + + def __call__(self, *operands): + return self.operator(*operands) + +Based on what Python calls a "built-in type", however, Cython supports +a second kind of class: *extension types*, sometimes referred to as +"cdef classes" due to the keywords used for their declaration. They +are somewhat restricted compared to Python classes, but are generally +more memory efficient and faster than generic Python classes. The +main difference is that they use a C struct to store their fields and methods +instead of a Python dict. This allows them to store arbitrary C types +in their fields without requiring a Python wrapper for them, and to +access fields and methods directly at the C level without passing +through a Python dictionary lookup. + +Normal Python classes can inherit from cdef classes, but not the other +way around. Cython requires to know the complete inheritance +hierarchy in order to lay out their C structs, and restricts it to +single inheritance. Normal Python classes, on the other hand, can +inherit from any number of Python classes and extension types, both in +Cython code and pure Python code. + +So far our integration example has not been very useful as it only +integrates a single hard-coded function. In order to remedy this, +without sacrificing speed, we will use a cdef class to represent a +function on floating point numbers:: + + cdef class Function: + cpdef double evaluate(self, double x) except *: + return 0 + +Like before, cpdef makes two versions of the method available; one +fast for use from Cython and one slower for use from Python. Then:: + + cdef class SinOfSquareFunction(Function): + cpdef double evaluate(self, double x) except *: + return sin(x**2) + +Using this, we can now change our integration example:: + + def integrate(Function f, double a, double b, int N): + cdef int i + cdef double s, dx + if f is None: + raise ValueError("f cannot be None") + s = 0 + dx = (b-a)/N + for i in range(N): + s += f.evaluate(a+i*dx) + return s * dx + + print(integrate(SinOfSquareFunction(), 0, 1, 10000)) + +This is almost as fast as the previous code, however it is much more flexible +as the function to integrate can be changed. It is even possible to pass +in a new function defined in Python-space:: + + >>> import integrate + >>> class MyPolynomial(integrate.Function): + ... def evaluate(self, x): + ... return 2*x*x + 3*x - 10 + ... + >>> integrate(MyPolynomial(), 0, 1, 10000) + -7.8335833300000077 + +This is about 20 times slower, but still about 10 times faster than +the original Python-only integration code. This shows how large the +speed-ups can easily be when whole loops are moved from Python code +into a Cython module. + +Some notes on our new implementation of ``evaluate``: + + - The fast method dispatch here only works because ``evaluate`` was + declared in ``Function``. Had ``evaluate`` been introduced in + ``SinOfSquareFunction``, the code would still work, but Cython + would have used the slower Python method dispatch mechanism + instead. + + - In the same way, had the argument ``f`` not been typed, but only + been passed as a Python object, the slower Python dispatch would + be used. + + - Since the argument is typed, we need to check whether it is + ``None``. In Python, this would have resulted in an ``AttributeError`` + when the ``evaluate`` method was looked up, but Cython would instead + try to access the (incompatible) internal structure of ``None`` as if + it were a ``Function``, leading to a crash or data corruption. + +There is a *compiler directive* ``nonecheck`` which turns on checks +for this, at the cost of decreased speed. Here's how compiler directives +are used to dynamically switch on or off ``nonecheck``:: + + #cython: nonecheck=True + # ^^^ Turns on nonecheck globally + + import cython + + # Turn off nonecheck locally for the function + @cython.nonecheck(False) + def func(): + cdef MyClass obj = None + try: + # Turn nonecheck on again for a block + with cython.nonecheck(True): + print obj.myfunc() # Raises exception + except AttributeError: + pass + print obj.myfunc() # Hope for a crash! + + + +Attributes in cdef classes behave differently from attributes in regular classes: + + - All attributes must be pre-declared at compile-time + - Attributes are by default only accessible from Cython (typed access) + - Properties can be declared to expose dynamic attributes to Python-space + +:: + + cdef class WaveFunction(Function): + # Not available in Python-space: + cdef double offset + # Available in Python-space: + cdef public double freq + # Available in Python-space: + property period: + def __get__(self): + return 1.0 / self. freq + def __set__(self, value): + self. freq = 1.0 / value + <...> diff --git a/src/tutorial/cdef_functions.rst b/src/tutorial/cdef_functions.rst new file mode 100644 index 00000000..41b0bea3 --- /dev/null +++ b/src/tutorial/cdef_functions.rst @@ -0,0 +1,41 @@ +cdef functions +============== + +Python function calls can be expensive--in Cython doubly so because +one might need to convert to and from Python objects to do the call. +In our example above, the argument is assumed to be a C double both inside f() +and in the call to it, yet a Python ``float`` object must be constructed around the +argument in order to pass it. + +Therefore Cython provides a syntax for declaring a C-style function, +the cdef keyword:: + + cdef double f(double) except *: + return sin(x**2) + +Some form of except-modifier should usually be added, otherwise Cython +will not be able to propagate exceptions raised in the function (or a +function it calls). Above ``except *`` is used which is always +safe. An except clause can be left out if the function returns a Python +object or if it is guaranteed that an exception will not be raised +within the function call. + +A side-effect of cdef is that the function is no longer available from +Python-space, as Python wouldn't know how to call it. Using the +``cpdef`` keyword instead of cdef, a Python wrapper is also created, +so that the function is available both from Cython (fast, passing +typed values directly) and from Python (wrapping values in Python +objects). + +Note also that it is no longer possible to change ``f`` at runtime. + +Speedup: 45 times over pure Python. + +.. figure:: htmlreport.png + + Using the ``-a`` switch to the ``cython`` command line program (or + following a link from the Sage notebook) results in an HTML report + of Cython code interleaved with the generated C code. Lines are + colored according to the level of "typedness" -- white lines + translates to pure C without any Python API calls. This report + is invaluable when optimizing a function for speed. diff --git a/src/tutorial/clibraries.rst b/src/tutorial/clibraries.rst new file mode 100644 index 00000000..4dae583f --- /dev/null +++ b/src/tutorial/clibraries.rst @@ -0,0 +1,370 @@ +Using C libraries +================= + +Apart from writing fast code, one of the main use cases of Cython is +to call external C libraries from Python code. As seen for the C +string decoding functions above, it is actually trivial to call C +functions directly in the code. The following describes what needs to +be done to use an external C library in Cython code. + +Imagine you need an efficient way to store integer values in a FIFO +queue. Since memory really matters, and the values are actually +coming from C code, you cannot afford to create and store Python +``int`` objects in a list or deque. So you look out for a queue +implementation in C. + +After some web search, you find the C-algorithms library [CAlg]_ and +decide to use its double ended queue implementation. To make the +handling easier, however, you decide to wrap it in a Python extension +type that can encapsulate all memory management. + +The C API of the queue implementation, which is defined in the header +file ``libcalg/queue.h``, essentially looks like this:: + + typedef struct _Queue Queue; + typedef void *QueueValue; + + Queue *queue_new(void); + void queue_free(Queue *queue); + + int queue_push_head(Queue *queue, QueueValue data); + QueueValue queue_pop_head(Queue *queue); + QueueValue queue_peek_head(Queue *queue); + + int queue_push_tail(Queue *queue, QueueValue data); + QueueValue queue_pop_tail(Queue *queue); + QueueValue queue_peek_tail(Queue *queue); + + int queue_is_empty(Queue *queue); + +To get started, the first step is to redefine the C API in a ``.pxd`` +file, say, ``cqueue.pxd``:: + + cdef extern from "libcalg/queue.h": + ctypedef struct Queue: + pass + ctypedef void* QueueValue + + Queue* new_queue() + void queue_free(Queue* queue) + + int queue_push_head(Queue* queue, QueueValue data) + QueueValue queue_pop_head(Queue* queue) + QueueValue queue_peek_head(Queue* queue) + + int queue_push_tail(Queue* queue, QueueValue data) + QueueValue queue_pop_tail(Queue* queue) + QueueValue queue_peek_tail(Queue* queue) + + bint queue_is_empty(Queue* queue) + +Note how these declarations are almost identical to the header file +declarations, so you can often just copy them over. One exception is +the last line. The return value of the ``queue_is_empty`` method is +actually a C boolean value, i.e. it is either zero or non-zero, +indicating if the queue is empty or not. This is best expressed by +Cython's ``bint`` type, which is a normal ``int`` type when used in C +but maps to Python's boolean values ``True`` and ``False`` when +converted to a Python object. Another difference is the first +line. ``Queue`` is in this case used as an *opaque handle*; only the +library that is called know what is actually inside. Since no Cython code +needs to know the contents of the struct, we do not need to declare its contents, +so we simply provide an empty definition (as we do not want to declare +the ``_Queue`` type which is referenced in the C header) [#]_. + +.. [#] There's a subtle difference between ``cdef struct Queue: pass`` + and ``ctypedef struct Queue: pass``. The former declares a type + which is referenced in C code as ``struct Queue``, while the + latter is referenced in C as ``Queue``. This is a C language + quirk that Cython is not able to hide. Most modern C libraries + use the ``ctypedef`` kind of struct. + +Next, we need to design the Queue class that should wrap the C queue. +Here is a first start for the Queue class:: + + cimport cqueue + cimport python_exc + + cdef class Queue: + cdef cqueue.Queue _c_queue + def __cinit__(self): + self._c_queue = cqueue.new_queue() + +Note that it says ``__cinit__`` rather than ``__init__``. While +``__init__`` is available as well, it is not guaranteed to be run (for +instance, one could create a subclass and forget to call the ancestor +constructor). Because not initializing C pointers often leads to +crashing the Python interpreter without leaving as much as a stack +trace, Cython provides ``__cinit__`` which is *always* called on +construction. However, as ``__cinit__`` is called during object +construction, ``self`` is not fully +constructed yet, and one must avoid doing anything with ``self`` but +assigning to ``cdef`` fields. + +Note also that the above method takes no parameters, although subtypes +may want to accept some. Although it is guaranteed to get called, the +no-arguments ``__cinit__()`` method is a special case here as it does +not prevent subclasses from adding parameters as they see fit. If +parameters are added they must match those of any declared ``__init__`` +method. + +Before we continue implementing the other methods, it is important to +understand that the above implementation is not safe. In case +anything goes wrong in the call to ``new_queue()``, this code will +simply swallow the error, so we will likely run into a crash later on. +According to the documentation of the ``new_queue()`` function, the +only reason why the above can fail is due to insufficient memory. In +that case, it will return ``NULL``, whereas it would normally return a +pointer to the new queue. + +The normal way to get out of this is to raise an exception, but +allocating a new exception instance may actually fail when we are +running out of memory. Luckily, CPython provides a function +``PyErr_NoMemory()`` that raises the right exception for us. We can +thus change the init function as follows:: + + def __cinit__(self): + self._c_queue = cqueue.new_queue() + if self._c_queue is NULL: + python_exc.PyErr_NoMemory() + +The next thing to do is to clean up when the Queue is no longer used. +To this end, CPython provides a callback that Cython makes available +as a special method ``__dealloc__()``. In our case, all we have to do +is to free the Queue, but only if we succeeded in initialising it in +the init method:: + + def __dealloc__(self): + if self._c_queue is not NULL: + cqueue.queue_free(self._c_queue) + +At this point, we have a compilable Cython module that we can test. +To compile it, we need to configure a ``setup.py`` script for +distutils. Based on the example presented earlier on, we can extend +the script to include the necessary setup for building against the +external C library. Assuming it's installed in the normal places +(e.g. under ``/usr/lib`` and ``/usr/include`` on a Unix-like system), +we could simply change the extension setup from + +:: + + ext_modules = [Extension("hello", ["hello.pyx"])] + +to + +:: + + ext_modules = [ + Extension("hello", ["hello.pyx"], + libraries=["calg"]) + ] + +If it is not installed in a 'normal' location, users can provide the +required parameters externally by passing appropriate C compiler +flags, such as:: + + CFLAGS="-I/usr/local/otherdir/calg/include" \ + LDFLAGS="-L/usr/local/otherdir/calg/lib" \ + python setup.py build_ext -i + +Once we have compiled the module for the first time, we can try to +import it:: + + PYTHONPATH=. python -c 'import queue.Queue as Q; Q()' + +However, our class doesn't do much yet so far, so +let's make it more usable. + +Before implementing the public interface of this class, it is good +practice to look at what interfaces Python offers, e.g. in its +``list`` or ``collections.deque`` classes. Since we only need a FIFO +queue, it's enough to provide the methods ``append()``, ``peek()`` and +``pop()``, and additionally an ``extend()`` method to add multiple +values at once. Also, since we already know that all values will be +coming from C, it's better to provide only ``cdef`` methods for now, +and to give them a straight C interface. + +In C, it is common for data structures to store data as a ``void*`` to +whatever data item type. Since we only want to store ``int`` values, +which usually fit into the size of a pointer type, we can avoid +additional memory allocations through a trick: we cast our ``int`` values +to ``void*`` and vice versa, and store the value directly as the +pointer value. + +Here is a simple implementation for the ``append()`` method:: + + cdef append(self, int value): + cqueue.queue_push_tail(self._c_queue, value) + +Again, the same error handling considerations as for the +``__cinit__()`` method apply, so that we end up with this +implementation:: + + cdef append(self, int value): + if not cqueue.queue_push_tail(self._c_queue, + value): + python_exc.PyErr_NoMemory() + +Adding an ``extend()`` method should now be straight forward:: + + cdef extend(self, int* values, Py_ssize_t count): + """Append all ints to the queue. + """ + cdef Py_ssize_t i + for i in range(count): + if not cqueue.queue_push_tail( + self._c_queue, values[i]): + python_exc.PyErr_NoMemory() + +This becomes handy when reading values from a NumPy array, for +example. + +So far, we can only add data to the queue. The next step is to write +the two methods to get the first element: ``peek()`` and ``pop()``, +which provide read-only and destructive read access respectively:: + + cdef int peek(self): + return cqueue.queue_peek_head(self._c_queue) + + cdef int pop(self): + return cqueue.queue_pop_head(self._c_queue) + +Simple enough. Now, what happens when the queue is empty? According +to the documentation, the functions return a ``NULL`` pointer, which is +typically not a valid value. Since we are simply casting to and from +ints, we cannot distinguish anymore if the +return value was ``NULL`` because the queue was empty or because the +value stored in the queue was ``0``. However, in Cython code, we +would expect the first case to raise an exception, whereas the second +case should simply return ``0``. To deal with this, we need to +special case this value, and check if the queue really is empty or +not:: + + cdef int peek(self) except? 0: + cdef int value = \ + cqueue.queue_peek_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, or + # that it happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + +The ``except? 0`` declaration is worth explaining. If the function +was a Python function returning a Python object value, CPython would +simply return ``NULL`` instead of a Python object to indicate a raised +exception, which would immediately be propagated by the surrounding +code. The problem is that any ``int`` value is a valid queue item +value, so there is no way to explicitly indicate an error to the +calling code. + +The only way CPython (and Cython) can deal with this situation is to +call ``PyErr_Occurred()`` when returning from a function to check if +an exception was raised, and if so, propagate the exception. This +obviously has a performance penalty. Cython therefore allows you to +indicate which value is explicitly returned in the case of an +exception, so that the surrounding code only needs to check for an +exception when receiving this special value. All other values will be +accepted almost without a penalty. + +Now that the ``peek()`` method is implemented, the ``pop()`` method is +almost identical. It only calls a different C function:: + + cdef int pop(self) except? 0: + cdef int value = \ + cqueue.queue_pop_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, or + # that it happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + +Lastly, we can provide the Queue with an emptiness indicator in the +normal Python way:: + + def __nonzero__(self): + return not cqueue.queue_is_empty(self._c_queue) + +Note that this method returns either ``True`` or ``False`` as the +return value of the ``queue_is_empty`` function is declared as a +``bint``. + +Now that the implementation is complete, you may want to write some +tests for it to make sure it works correctly. Especially doctests are +very nice for this purpose, as they provide some documentation at the +same time. To enable doctests, however, you need a Python API that +you can call. C methods are not visible from Python code, and thus +not callable from doctests. + +A quick way to provide a Python API for the class is to change the +methods from ``cdef`` to ``cpdef``. This will let Cython generate two +entry points, one that is callable from normal Python code using the +Python call semantics and Python objects as arguments, and one that is +callable from C code with fast C semantics and without requiring +intermediate argument conversion from or to Python types. + +The following listing shows the complete implementation that uses +``cpdef`` methods where possible. This feature is obviously not +available for the ``extend()`` method, as the method signature is +incompatible with Python argument types. + +:: + + cimport cqueue + cimport python_exc + + cdef class Queue: + cdef cqueue.Queue* _c_queue + def __cinit__(self): + self._c_queue = cqueue.queue_new() + if self._c_queue is NULL: + python_exc.PyErr_NoMemory() + + def __dealloc__(self): + if self._c_queue is not NULL: + cqueue.queue_free(self._c_queue) + + cpdef append(self, int value): + if not cqueue.queue_push_tail(self._c_queue, + value): + python_exc.PyErr_NoMemory() + + cdef extend(self, int* values, Py_ssize_t count): + cdef Py_ssize_t i + for i in range(count): + if not cqueue.queue_push_tail( + self._c_queue, values[i]): + python_exc.PyErr_NoMemory() + + cpdef int peek(self) except? 0: + cdef int value = \ + cqueue.queue_peek_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, + # or that it happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + + cpdef int pop(self) except? 0: + cdef int value = \ + cqueue.queue_pop_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, + # or that it happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + + def __nonzero__(self): + return not cqueue.queue_is_empty(self._c_queue) + +As a quick test with numbers from 0 to 9999 indicates, using this +Queue from Cython code with C ``int`` values is about five times as +fast as using it from Cython code with Python values, almost eight +times faster than using it from Python code in a Python loop, and +still more than twice as fast as using Python's highly optimised +``collections.deque`` type from Cython code with Python integers. + +.. [CAlg] Simon Howard, C Algorithms library, http://c-algorithms.sourceforge.net/ diff --git a/src/tutorial/cythonize.rst b/src/tutorial/cythonize.rst new file mode 100644 index 00000000..f77abddf --- /dev/null +++ b/src/tutorial/cythonize.rst @@ -0,0 +1,44 @@ +Faster code by adding types +=========================== + +Consider the following pure Python code:: + + from math import sin + + def f(x): + return sin(x**2) + + def integrate_f(a, b, N): + s = 0 + dx = (b-a)/N + for i in range(N): + s += f(a+i*dx) + return s * dx + +Simply compiling this in Cython merely gives a 5% speedup. This is +better than nothing, but adding some static types can make a much larger +difference. + +With additional type declarations, this might look like:: + + from math import sin + + def f(double x): + return sin(x**2) + + def integrate_f(double a, double b, int N): + cdef int i + cdef double s, dx + s = 0 + dx = (b-a)/N + for i in range(N): + s += f(a+i*dx) + return s * dx + +Since the iterator variable ``i`` is typed with C semantics, the for-loop will be compiled +to pure C code. Typing ``a``, ``s`` and ``dx`` is important as they are involved +in arithmetic withing the for-loop; typing ``b`` and ``N`` makes less of a +difference, but in this case it is not much extra work to be +consistent and type the entire function. + +This results in a 24 times speedup over the pure Python version. diff --git a/src/tutorial/data.py b/src/tutorial/data.py new file mode 100644 index 00000000..f0027629 --- /dev/null +++ b/src/tutorial/data.py @@ -0,0 +1,40 @@ +{ + 'title': 'Cython Tutorial', + 'paper_abstract': ''' +Cython is a programming language based on Python with extra +syntax to provide static type declarations. This takes advantage of the +benefits of Python while allowing one to achieve the speed of C. +In this paper we describe the Cython language and show how it can +be used both to write optimized code and to interface with external +C libraries. +''', + 'authors': [ + {'first_names': 'Stefan', + 'surname': 'Behnel', + 'address': '', + 'country': 'Germany', + 'email_address': 'stefan\_ml@behnel.de', + 'institution': ''}, + {'first_names': 'Robert W.', + 'surname': 'Bradshaw', + 'address': '', + 'country': 'USA', + 'email_address': 'robertwb@math.washington.edu', + 'institution': '''University of Washington\\footnote{ + Department of Mathematics, University of Washington, Seattle, WA, USA + }'''}, + {'first_names': 'Dag Sverre', + 'surname': 'Seljebotn', + 'address': '', + 'country': 'Norway', + 'email_address': 'dagss@student.matnat.uio.no', + # I need three institutions w/ full address... leave it + # all here until we get to editing stage + 'institution': '''University of Oslo\\footnote{Institute of Theoretical Astrophysics, + University of Oslo, P.O. Box 1029 Blindern, N-0315 Oslo, Norway}\\footnote{Department + of Mathematics, University of Oslo, P.O. Box 1053 Blindern, + N-0316 Oslo, Norway}\\footnote{Centre of Mathematics for + Applications, University of Oslo, P.O. Box 1053 Blindern, N-0316 + Oslo, Norway}'''} + ], +} diff --git a/src/tutorial/datatypes.rst b/src/tutorial/datatypes.rst new file mode 100644 index 00000000..b94206a3 --- /dev/null +++ b/src/tutorial/datatypes.rst @@ -0,0 +1,24 @@ +Data types in Cython +==================== + +Cython is a Python compiler. This means that it can compile normal +Python code without changes (with a few obvious exceptions of some as-yet +unsupported language features). However, for performance critical +code, it is often helpful to add static type declarations, as they +will allow Cython to step out of the dynamic nature of the Python code +and generate simpler and faster C code - sometimes faster by orders of +magnitude. + +It must be noted, however, that type declarations can make the source +code more verbose and thus less readable. It is therefore discouraged +to use them without good reason, such as where benchmarks prove +that they really make the code substantially faster in a performance +critical section. Typically a few types in the right spots go a long way. + +All C types are available for type declarations: integer and floating +point types, complex numbers, structs, unions and pointer types. +Cython can automatically and correctly convert between the types on +assignment. This also includes Python's arbitrary size integer types, +where value overflows on conversion to a C type will raise a Python +``OverflowError`` at runtime. The generated C code will handle the +platform dependent sizes of C types correctly and safely in this case. diff --git a/src/tutorial/external.rst b/src/tutorial/external.rst new file mode 100644 index 00000000..dc40480e --- /dev/null +++ b/src/tutorial/external.rst @@ -0,0 +1,52 @@ +Calling external C functions +============================ + +It is perfectly OK do ``from math import sin`` to use Python's +``sin()`` function. However, calling C's own ``sin()`` function is +substantially faster, especially in tight loops. It can be declared +and used in Cython as follows:: + + cdef extern from "math.h": + double sin(double) + + cdef double f(double x): + return sin(x*x) + +At this point there are no longer any Python wrapper objects around +our values inside of the main for loop, and so we get an impressive +speedup to 219 times the speed of Python. + +Note that the above code re-declares the function from ``math.h`` to +make it available to Cython code. The C compiler will see the +original declaration in ``math.h`` at compile time, but Cython +does not parse "math.h" and requires a separate definition. + +When calling C functions, one must take care to link in the appropriate +libraries. This can be platform-specific; the below example works on Linux +and Mac OS X:: + + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + ext_modules=[ + Extension("demo", + ["demo.pyx"], + libraries=["m"]) # Unix-like specific + ] + + setup( + name = "Demos", + cmdclass = {"build_ext": build_ext}, + ext_modules = ext_modules + ) + +If one uses the Sage notebook to compile Cython code, one can use a special +comment to tell Sage to link in libraries:: + + #clib: m + +Just like the ``sin()`` function from the math library, it is possible +to declare and call into any C library as long as the module that +Cython generates is properly linked against the shared or static +library. diff --git a/src/tutorial/htmlreport.png b/src/tutorial/htmlreport.png new file mode 100644 index 00000000..a07a53df Binary files /dev/null and b/src/tutorial/htmlreport.png differ diff --git a/src/tutorial/index.rst b/src/tutorial/index.rst new file mode 100644 index 00000000..f9219148 --- /dev/null +++ b/src/tutorial/index.rst @@ -0,0 +1,33 @@ +Cython Tutorial +=============== + +Contents: + +.. toctree:: + :maxdepth: 2 + + overview + install + build + datatypes + cythonize + cdef_functions + external + cdef_classes + pxd_files + caveats + numpy + strings + clibraries + readings + related_work + appendix + references + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/src/tutorial/install.rst b/src/tutorial/install.rst new file mode 100644 index 00000000..626b4347 --- /dev/null +++ b/src/tutorial/install.rst @@ -0,0 +1,52 @@ +Installing Cython +================= + +Many scientific Python distributions, such as the Enthought Python +Distribution [EPD]_, Python(x,y) [Pythonxy]_, and Sage [Sage]_, bundle +Cython and no setup is needed. Note however that if your distribution +ships a version of Cython which is too old you can still use the +instructions below to update Cython. Everything in this tutorial +should work with Cython 0.11.2 and newer, unless a footnote says +otherwise. + +Unlike most Python software, Cython requires a C compiler to be +present on the system. The details of getting a C compiler varies +according to the system used: + + - **Linux** The GNU C Compiler (gcc) is usually present, or easily + available through the package system. On Ubuntu or Debian, for + instance, the command ``sudo apt-get install build-essential`` will + fetch everything you need. + + - **Mac OS X** To retrieve gcc, one option is to install Apple's + XCode, which can be retrieved from the Mac OS X's install DVDs or + from http://developer.apple.com. + + - **Windows** A popular option is to use the open source MinGW (a + Windows distribution of gcc). See the appendix for instructions for + setting up MinGW manually. EPD and Python(x,y) bundle MinGW, but + some of the configuration steps in the appendix might still be + necessary. Another option is to use Microsoft's Visual C. One must + then use the same version which the installed Python was compiled + with. + +.. dagss tried other forms of ReST lists and they didn't look nice +.. with rst2latex. + +The newest Cython release can always be downloaded from +http://cython.org. Unpack the tarball or zip file, enter the +directory, and then run:: + + python setup.py install + +If you have Python setuptools set up on your system, you should be +able to fetch Cython from PyPI and install it using:: + + easy_install cython + +For Windows there is also an executable installer available for +download. + +.. [EPD] http://www.enthought.com/products/epd.php +.. [Pythonxy] http://www.pythonxy.com/ +.. [Sage] W. Stein et al., Sage Mathematics Software, http://sagemath.org diff --git a/src/tutorial/numpy.rst b/src/tutorial/numpy.rst new file mode 100644 index 00000000..9551bd00 --- /dev/null +++ b/src/tutorial/numpy.rst @@ -0,0 +1,106 @@ +Using Cython with NumPy +======================= + +Cython has support for fast access to NumPy arrays. To optimize code +using such arrays one must ``cimport`` the NumPy pxd file (which ships +with Cython), and declare any arrays as having the ``ndarray`` +type. The data type and number of dimensions should be fixed at +compile-time and passed. For instance:: + + import numpy as np + cimport numpy as np + def myfunc(np.ndarray[np.float64_t, ndim=2] A): + <...> + +``myfunc`` can now only be passed two-dimensional arrays containing +double precision floats, but array indexing operation is much, much faster, +making it suitable for numerical loops. Expect speed increases well +over 100 times over a pure Python loop; in some cases the speed +increase can be as high as 700 times or more. [Seljebotn09]_ +contains detailed examples and benchmarks. + +Fast array declarations can currently only be used with function +local variables and arguments to ``def``-style functions (not with +arguments to ``cpdef`` or ``cdef``, and neither with fields in cdef +classes or as global variables). These limitations are considered +known defects and we hope to remove them eventually. In most +circumstances it is possible to work around these limitations rather +easily and without a significant speed penalty, as all NumPy arrays +can also be passed as untyped objects. + +Array indexing is only optimized if exactly as many indices are +provided as the number of array dimensions. Furthermore, all indices +must have a native integer type. Slices and NumPy "fancy indexing" is +not optimized. Examples:: + + def myfunc(np.ndarray[np.float64_t, ndim=1] A): + cdef Py_ssize_t i, j + for i in range(A.shape[0]): + print A[i, 0] # fast + j = 2*i + print A[i, j] # fast + k = 2*i + print A[i, k] # slow, k is not typed + print A[i][j] # slow + print A[i,:] # slow + +``Py_ssize_t`` is a signed integer type provided by Python which +covers the same range of values as is supported as NumPy array +indices. It is the preferred type to use for loops over arrays. + +Any Cython primitive type (float, complex float and integer types) can +be passed as the array data type. For each valid dtype in the ``numpy`` +module (such as ``np.uint8``, ``np.complex128``) there is a +corresponding Cython compile-time definition in the cimport-ed NumPy +pxd file with a ``_t`` suffix [#]_. Cython structs are also allowed +and corresponds to NumPy record arrays. Examples:: + + cdef packed struct Point: + np.float64_t x, y + + def f(): + cdef np.ndarray[np.complex128_t, ndim=3] a = \ + np.zeros((3,3,3), dtype=np.complex128) + cdef np.ndarray[Point] b = np.zeros(10, + dtype=np.dtype([('x', np.float64), + ('y', np.float64)])) + <...> + +Note that ``ndim`` defaults to 1. Also note that NumPy record arrays +are by default unaligned, meaning data is packed as tightly as +possible without considering the alignment preferences of the +CPU. Such unaligned record arrays corresponds to a Cython ``packed`` +struct. If one uses an aligned dtype, by passing ``align=True`` to the +``dtype`` constructor, one must drop the ``packed`` keyword on the +struct definition. + +Some data types are not yet supported, like boolean arrays and string +arrays. Also data types describing data which is not in the native +endian will likely never be supported. It is however possible to +access such arrays on a lower level by casting the arrays:: + + cdef np.ndarray[np.uint8, cast=True] boolarr = (x < y) + cdef np.ndarray[np.uint32, cast=True] values = \ + np.arange(10, dtype='>i4') + +Assuming one is on a little-endian system, the ``values`` array +can still access the raw bit content of the array (which must then +be reinterpreted to yield valid results on a little-endian system). + +Finally, note that typed NumPy array variables in some respects behave +a little differently from untyped arrays. ``arr.shape`` is no longer a +tuple. ``arr.shape[0]`` is valid but to e.g. print the shape one must +do ``print (arr).shape`` in order to "untype" the variable +first. The same is true for ``arr.data`` (which in typed mode is a C +data pointer). + +There are many more options for optimizations to consider for Cython +and NumPy arrays. We again refer the interested reader to [Seljebotn09]_. + +.. [#] In Cython 0.11.2, ``np.complex64_t`` and ``np.complex128_t`` + does not work and one must write ``complex`` or + ``double complex`` instead. This is fixed in 0.11.3. Cython + 0.11.1 and earlier does not support complex numbers. + +.. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython, + Proceedings of the 8th Python in Science Conference, 2009. diff --git a/src/tutorial/overview.rst b/src/tutorial/overview.rst new file mode 100644 index 00000000..c0807759 --- /dev/null +++ b/src/tutorial/overview.rst @@ -0,0 +1,54 @@ +Cython - an overview +==================== + +[Cython]_ is a programming language based on Python, with extra syntax +allowing for optional static type declarations. It aims to become a superset +of the [Python]_ language which gives it high-level, object-oriented, +functional, and dynamic programming. The source code gets translated +into optimized C/C++ code and compiled as Python extension modules. +This allows for both very fast program execution and tight integration +with external C libraries, while keeping up the high programmer +productivity for which the Python language is well known. + +The primary Python execution environment is commonly referred to as +CPython, as it is written in C. Other major implementations use Java +(Jython [Jython]_), C# (IronPython [IronPython]_) and Python itself +(PyPy [PyPy]_). Written in C, CPython has been conducive to wrapping +many external libraries that interface through the C language. It +has, however, remained non trivial to write the necessary glue code in +C, especially for programmers who are more fluent in a high-level +language like Python than in a do-it-yourself language like C. + +Originally based on the well-known Pyrex [Pyrex]_, the Cython project has +approached this problem by means of a source code compiler that +translates Python code to equivalent C code. This code is executed +within the CPython runtime environment, but at the speed of compiled C +and with the ability to call directly into C libraries. +At the same time, it keeps the original interface of the Python +source code, which makes it directly usable from Python code. These +two-fold characteristics enable Cython's two major use cases: +extending the CPython interpreter with fast binary modules, and +interfacing Python code with external C libraries. + +While Cython can compile (most) regular Python code, the generated C +code usually gains major (and sometime impressive) speed improvements +from optional static type declarations for both Python and C types. +These allow Cython to assign C semantics to parts of the code, and to +translate them into very efficient C code. Type declarations can +therefore be used for two purposes: for moving code sections from +dynamic Python semantics into static-and-fast C semantics, but also +for directly manipulating types defined in external libraries. Cython +thus merges the two worlds into a very broadly applicable programming +language. + +.. [Cython] G. Ewing, R. W. Bradshaw, S. Behnel, D. S. Seljebotn et al., + The Cython compiler, http://cython.org. +.. [IronPython] Jim Hugunin et al., http://www.codeplex.com/IronPython. +.. [Jython] J. Huginin, B. Warsaw, F. Bock, et al., + Jython: Python for the Java platform, http://www.jython.org/ +.. [PyPy] The PyPy Group, PyPy: a Python implementation written in Python, + http://codespeak.net/pypy. +.. [Pyrex] G. Ewing, Pyrex: C-Extensions for Python, + http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ +.. [Python] G. van Rossum et al., The Python programming language, + http://python.org. diff --git a/src/tutorial/pxd_files.rst b/src/tutorial/pxd_files.rst new file mode 100644 index 00000000..c120c8b9 --- /dev/null +++ b/src/tutorial/pxd_files.rst @@ -0,0 +1,41 @@ +pxd files +========= + +In addition to the ``.pyx`` source files, Cython uses ``.pxd`` files +which work like C header files--they contain Cython declarations +(and sometimes code sections) which are only meant for inclusion by +Cython modules. A ``pxd`` file is imported into a ``pyx`` module by +using the ``cimport`` keyword. + +``pxd`` files have many use-cases: + + 1. They can be used for sharing external C declarations. + 2. They can contain functions which are well suited for inlining by + the C compiler. Such functions should be marked ``inline``, example: + :: + + cdef inline int int_min(int a, int b): + return b if b < a else a + + 3. When accompanying an equally named ``pyx`` file, they + provide a Cython interface to the Cython module so that other + Cython modules can communicate with it using a more efficient + protocol than the Python one. + +In our integration example, we might break it up into ``pxd`` files like this: + + 1. Add a ``cmath.pxd`` function which defines the C functions available from + the C ``math.h`` header file, like ``sin``. Then one would simply do + ``from cmath import sin`` in ``integrate.pyx``. + 2. Add a ``integrate.pxd`` so that other modules written in Cython + can define fast custom functions to integrate. + :: + + cdef class Function: + cpdef evaluate(self, double x) + cpdef integrate(Function f, double a, + double b, int N) + + Note that if you have a cdef class with attributes, the attributes must + be declared in the class declaration ``pxd`` file (if you use one), not + the ``pyx`` file. The compiler will tell you about this. diff --git a/src/tutorial/queue_example/cqueue.pxd b/src/tutorial/queue_example/cqueue.pxd new file mode 100644 index 00000000..a6c5d575 --- /dev/null +++ b/src/tutorial/queue_example/cqueue.pxd @@ -0,0 +1,17 @@ +cdef extern from "libcalg/queue.h": + ctypedef struct Queue: + pass + ctypedef void* QueueValue + + Queue* queue_new() + void queue_free(Queue* queue) + + int queue_push_head(Queue* queue, QueueValue data) + QueueValue queue_pop_head(Queue* queue) + QueueValue queue_peek_head(Queue* queue) + + int queue_push_tail(Queue* queue, QueueValue data) + QueueValue queue_pop_tail(Queue* queue) + QueueValue queue_peek_tail(Queue* queue) + + int queue_is_empty(Queue* queue) diff --git a/src/tutorial/queue_example/queue.pyx b/src/tutorial/queue_example/queue.pyx new file mode 100644 index 00000000..fd8addf7 --- /dev/null +++ b/src/tutorial/queue_example/queue.pyx @@ -0,0 +1,95 @@ +cimport cqueue +cimport python_exc + +cdef class Queue: + cdef cqueue.Queue* _c_queue + def __cinit__(self): + self._c_queue = cqueue.queue_new() + if self._c_queue is NULL: + python_exc.PyErr_NoMemory() + + def __dealloc__(self): + if self._c_queue is not NULL: + cqueue.queue_free(self._c_queue) + + cpdef int append(self, int value) except -1: + if not cqueue.queue_push_tail(self._c_queue, value): + python_exc.PyErr_NoMemory() + return 0 + + cdef int extend(self, int* values, Py_ssize_t count) except -1: + cdef Py_ssize_t i + for i in xrange(count): + if not cqueue.queue_push_tail(self._c_queue, values[i]): + python_exc.PyErr_NoMemory() + return 0 + + cpdef int peek(self) except? 0: + cdef int value = cqueue.queue_peek_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, or that it + # happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + + cpdef int pop(self) except? 0: + cdef int value = cqueue.queue_pop_head(self._c_queue) + if value == 0: + # this may mean that the queue is empty, or that it + # happens to contain a 0 value + if cqueue.queue_is_empty(self._c_queue): + raise IndexError("Queue is empty") + return value + + def __nonzero__(self): + return not cqueue.queue_is_empty(self._c_queue) + +DEF repeat_count=10000 + +def test_cy(): + cdef int i + cdef Queue q = Queue() + for i in xrange(repeat_count): + q.append(i) + for i in xrange(repeat_count): + q.peek() + while q: + q.pop() + +def test_py(): + cdef int i + q = Queue() + for i in xrange(repeat_count): + q.append(i) + for i in xrange(repeat_count): + q.peek() + while q: + q.pop() + +from collections import deque + +def test_deque(): + cdef int i + q = deque() + for i in xrange(repeat_count): + q.appendleft(i) + for i in xrange(repeat_count): + q[-1] + while q: + q.pop() + +repeat = range(repeat_count) + +def test_py_exec(): + q = Queue() + d = dict(q=q, repeat=repeat) + + exec u"""\ +for i in repeat: + q.append(9) +for i in repeat: + q.peek() +while q: + q.pop() +""" in d diff --git a/src/tutorial/readings.rst b/src/tutorial/readings.rst new file mode 100644 index 00000000..135ad563 --- /dev/null +++ b/src/tutorial/readings.rst @@ -0,0 +1,26 @@ +Further reading +=============== + +The main documentation is located at http://docs.cython.org/. Some +recent features might not have documentation written yet, in such +cases some notes can usually be found in the form of a Cython +Enhancement Proposal (CEP) on http://wiki.cython.org/enhancements. + +[Seljebotn09]_ contains more information about Cython and NumPy +arrays. If you intend to use Cython code in a multi-threaded setting, +it is essential to read up on Cython's features for managing the +Global Interpreter Lock (the GIL). The same paper contains an +explanation of the GIL, and the main documentation explains the Cython +features for managing it. + +Finally, don't hesitate to ask questions (or post reports on +successes!) on the Cython users mailing list [UserList]_. The Cython +developer mailing list, [DevList]_, is also open to everybody. Feel +free to use it to report a bug, ask for guidance, if you have time to +spare to develop Cython, or if you have suggestions for future +development. + +.. [DevList] Cython developer mailing list: http://codespeak.net/mailman/listinfo/cython-dev. +.. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython, + Proceedings of the 8th Python in Science Conference, 2009. +.. [UserList] Cython users mailing list: http://groups.google.com/group/cython-users diff --git a/src/tutorial/references.rst b/src/tutorial/references.rst new file mode 100644 index 00000000..10d738ec --- /dev/null +++ b/src/tutorial/references.rst @@ -0,0 +1,32 @@ +References +========== + +.. [Cython] G. Ewing, R. W. Bradshaw, S. Behnel, D. S. Seljebotn et al., + The Cython compiler, http://cython.org. +.. [Python] G. van Rossum et al., The Python programming language, + http://python.org. +.. [Sage] W. Stein et al., Sage Mathematics Software, http://sagemath.org +.. [EPD] http://www.enthought.com/products/epd.php +.. [Pythonxy] http://www.pythonxy.com/ +.. [Jython] J. Huginin, B. Warsaw, F. Bock, et al., + Jython: Python for the Java platform, http://www.jython.org/ +.. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython, + Proceedings of the 8th Python in Science Conference, 2009. +.. [NumPy] T. Oliphant et al., NumPy, http://numpy.scipy.org/ +.. [CAlg] Simon Howard, C Algorithms library, http://c-algorithms.sourceforge.net/ +.. [Pyrex] G. Ewing, Pyrex: C-Extensions for Python, + http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ +.. [ShedSkin] M. Dufour, J. Coughlan, ShedSkin, + http://code.google.com/p/shedskin/ +.. [PyPy] The PyPy Group, PyPy: a Python implementation written in Python, + http://codespeak.net/pypy. +.. [IronPython] Jim Hugunin et al., http://www.codeplex.com/IronPython. +.. [SWIG] David M. Beazley et al., + SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++, + http://www.swig.org. +.. [WinInst] http://wiki.cython.org/InstallingOnWindows +.. [ctypes] http://docs.python.org/library/ctypes.html. +.. there's also the original ctypes home page: http://python.net/crew/theller/ctypes/ +.. [UserList] Cython users mailing list: http://groups.google.com/group/cython-users +.. [DevList] Cython developer mailing list: http://codespeak.net/mailman/listinfo/cython-dev. + diff --git a/src/tutorial/related_work.rst b/src/tutorial/related_work.rst new file mode 100644 index 00000000..6234e8ae --- /dev/null +++ b/src/tutorial/related_work.rst @@ -0,0 +1,49 @@ +Related work +============ + +Pyrex [Pyrex]_ is the compiler project that Cython was originally based on. +Many features and the major design decisions of the Cython language +were developed by Greg Ewing as part of that project. Today, Cython +supersedes the capabilities of Pyrex by providing a higher +compatibility with Python code and Python semantics, as well as +superior optimisations and better integration with scientific Python +extensions like NumPy. + +ctypes [ctypes]_ is a foreign function interface (FFI) for Python. It +provides C compatible data types, and allows calling functions in DLLs +or shared libraries. It can be used to wrap these libraries in pure +Python code. Compared to Cython, it has the major advantage of being +in the standard library and being usable directly from Python code, +without any additional dependencies. The major drawback is its +performance, which suffers from the Python call overhead as all +operations must pass through Python code first. Cython, being a +compiled language, can avoid much of this overhead by moving more +functionality and long-running loops into fast C code. + +SWIG [SWIG]_ is a wrapper code generator. It makes it very easy to parse +large API definitions in C/C++ header files, and to generate straight +forward wrapper code for a large set of programming languages. As +opposed to Cython, however, it is not a programming language itself. +Thin wrappers are easy to generate, but the more functionality a +wrapper needs to provide, the harder it gets to implement it with +SWIG. Cython, on the other hand, makes it very easy to write very +elaborate wrapper code specifically for the Python language. Also, +there exists third party code for parsing C header files and using it +to generate Cython definitions and module skeletons. + +ShedSkin [ShedSkin]_ is an experimental Python-to-C++ compiler. It +uses profiling information and very powerful type inference engine +to generate a C++ program from (restricted) Python source code. +The main drawback is has no support for calling the Python/C API for +operations it does not support natively, and supports very few of the +standard Python modules. + +.. [ctypes] http://docs.python.org/library/ctypes.html. +.. there's also the original ctypes home page: http://python.net/crew/theller/ctypes/ +.. [Pyrex] G. Ewing, Pyrex: C-Extensions for Python, + http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ +.. [ShedSkin] M. Dufour, J. Coughlan, ShedSkin, + http://code.google.com/p/shedskin/ +.. [SWIG] David M. Beazley et al., + SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++, + http://www.swig.org. diff --git a/src/tutorial/sage.png b/src/tutorial/sage.png new file mode 100644 index 00000000..b19df05d Binary files /dev/null and b/src/tutorial/sage.png differ diff --git a/src/tutorial/strings.rst b/src/tutorial/strings.rst new file mode 100644 index 00000000..e1c91b04 --- /dev/null +++ b/src/tutorial/strings.rst @@ -0,0 +1,125 @@ +Unicode and passing strings +=========================== + +Similar to the string semantics in Python 3, Cython also strictly +separates byte strings and unicode strings. Above all, this means +that there is no automatic conversion between byte strings and unicode +strings (except for what Python 2 does in string operations). All +encoding and decoding must pass through an explicit encoding/decoding +step. + +It is, however, very easy to pass byte strings between C code and Python. +When receiving a byte string from a C library, you can let Cython +convert it into a Python byte string by simply assigning it to a +Python variable:: + + cdef char* c_string = c_call_returning_a_c_string() + py_string = c_string + +This creates a Python byte string object that holds a copy of the +original C string. It can be safely passed around in Python code, and +will be garbage collected when the last reference to it goes out of +scope. + +To convert the byte string back into a C ``char*``, use the opposite +assignment:: + + cdef char* other_c_string = py_string + +This is a very fast operation after which ``other_c_string`` points to +the byte string buffer of the Python string itself. It is tied to the +life time of the Python string. When the Python string is garbage +collected, the pointer becomes invalid. It is therefore important to +keep a reference to the Python string as long as the ``char*`` is in +use. Often enough, this only spans the call to a C function that +receives the pointer as parameter. Special care must be taken, +however, when the C function stores the pointer for later use. Apart +from keeping a Python reference to the string, no manual memory +management is required. + +The above way of passing and receiving C strings is as simple that +that, as long as we only deal with binary data in the strings. When +we deal with encoded text, however, it is best practice to decode the C byte +strings to Python Unicode strings on reception, and to encode Python +Unicode strings to C byte strings on the way out. + +With a Python byte string object, you would normally just call the +``.decode()`` method to decode it into a Unicode string:: + + ustring = byte_string.decode('UTF-8') + +You can do the same in Cython for a C string, but the generated code +is rather inefficient for small strings. +While Cython could potentially call the Python +C-API function for decoding a C string from UTF-8 to Unicode +(``PyUnicode_DecodeUTF8()``), the problem is that this requires +passing the length of the C string, which Cython cannot know at +compile time nor runtime. So it would have to call ``strlen()`` +first, although the user code will already know the length of the +string in almost all cases. Also, the encoded byte string might +actually contain null bytes, so this isn't even a safe solution. It +is therefore currently recommended to call the API functions directly:: + + # .pxd file that comes with Cython + cimport python_unicode + + cdef char* c_string = NULL + cdef Py_ssize_t length = 0 + + # get pointer and length from a C function + get_a_c_string(&c_string, &length) + + # decode the string to Unicode + ustring = python_unicode.PyUnicode_DecodeUTF8( + c_string, length, 'strict') + +It is common practice to wrap this in a dedicated function, as this +needs to be done whenever receiving text from C. This could look as +follows:: + + cimport python_unicode + cimport stdlib + cdef extern from "string.h": + size_t strlen(char *s) + + cdef unicode tounicode(char* s): + return python_unicode.PyUnicode_DecodeUTF8( + s, strlen(s), 'strict') + + cdef unicode tounicode_with_length( + char* s, size_t length): + return python_unicode.PyUnicode_DecodeUTF8( + s, length, 'strict') + + cdef unicode tounicode_with_length_and_free( + char* s, size_t length): + try: + return python_unicode.PyUnicode_DecodeUTF8( + s, length, 'strict') + finally: + stdlib.free(s) + +Most likely, you will prefer shorter function names in your code based +on the kind of string being handled. Different types of content often +imply different ways of handling them on reception. To make the code +more readable and to anticipate future changes, it is good practice to +use separate conversion functions for different types of strings. + +The reverse way, converting a Python unicode string to a C ``char*``, +is pretty efficient by itself, assuming that what you actually want is +a memory managed byte string:: + + py_byte_string = py_unicode_string.encode('UTF-8') + cdef char* c_string = py_byte_string + +As noted above, this takes the pointer to the byte buffer of the +Python byte string. Trying to do the same without keeping a reference +to the intermediate byte string will fail with a compile error:: + + # this will not compile ! + cdef char* c_string = py_unicode_string.encode('UTF-8') + +Here, the Cython compiler notices that the code takes a pointer to a +temporary string result that will be garbage collected after the +assignment. Later access to the invalidated pointer will most likely +result in a crash. Cython will therefore refuse to compile this code.