From 72310bb654fd3f518123de4759c535711a5819e3 Mon Sep 17 00:00:00 2001 From: Francesc Alted Date: Sat, 2 Apr 2011 12:37:38 +0200 Subject: [PATCH] NumPy tutorial moved from user's guide to tutorials section. Removed the original (and pretty sort) tutorial on the tutorials section. --- docs/src/tutorial/numpy.rst | 422 ++++++++++++++++------ docs/src/userguide/index.rst | 1 - docs/src/userguide/numpy_tutorial.rst | 496 -------------------------- 3 files changed, 319 insertions(+), 600 deletions(-) delete mode 100644 docs/src/userguide/numpy_tutorial.rst diff --git a/docs/src/tutorial/numpy.rst b/docs/src/tutorial/numpy.rst index 9551bd00..a7d90f03 100644 --- a/docs/src/tutorial/numpy.rst +++ b/docs/src/tutorial/numpy.rst @@ -1,106 +1,322 @@ Using Cython with NumPy ======================= -Cython has support for fast access to NumPy arrays. To optimize code -using such arrays one must ``cimport`` the NumPy pxd file (which ships -with Cython), and declare any arrays as having the ``ndarray`` -type. The data type and number of dimensions should be fixed at -compile-time and passed. For instance:: - - import numpy as np - cimport numpy as np - def myfunc(np.ndarray[np.float64_t, ndim=2] A): - <...> - -``myfunc`` can now only be passed two-dimensional arrays containing -double precision floats, but array indexing operation is much, much faster, -making it suitable for numerical loops. Expect speed increases well -over 100 times over a pure Python loop; in some cases the speed -increase can be as high as 700 times or more. [Seljebotn09]_ -contains detailed examples and benchmarks. - -Fast array declarations can currently only be used with function -local variables and arguments to ``def``-style functions (not with -arguments to ``cpdef`` or ``cdef``, and neither with fields in cdef -classes or as global variables). These limitations are considered -known defects and we hope to remove them eventually. In most -circumstances it is possible to work around these limitations rather -easily and without a significant speed penalty, as all NumPy arrays -can also be passed as untyped objects. - -Array indexing is only optimized if exactly as many indices are -provided as the number of array dimensions. Furthermore, all indices -must have a native integer type. Slices and NumPy "fancy indexing" is -not optimized. Examples:: - - def myfunc(np.ndarray[np.float64_t, ndim=1] A): - cdef Py_ssize_t i, j - for i in range(A.shape[0]): - print A[i, 0] # fast - j = 2*i - print A[i, j] # fast - k = 2*i - print A[i, k] # slow, k is not typed - print A[i][j] # slow - print A[i,:] # slow - -``Py_ssize_t`` is a signed integer type provided by Python which -covers the same range of values as is supported as NumPy array -indices. It is the preferred type to use for loops over arrays. - -Any Cython primitive type (float, complex float and integer types) can -be passed as the array data type. For each valid dtype in the ``numpy`` -module (such as ``np.uint8``, ``np.complex128``) there is a -corresponding Cython compile-time definition in the cimport-ed NumPy -pxd file with a ``_t`` suffix [#]_. Cython structs are also allowed -and corresponds to NumPy record arrays. Examples:: - - cdef packed struct Point: - np.float64_t x, y - - def f(): - cdef np.ndarray[np.complex128_t, ndim=3] a = \ - np.zeros((3,3,3), dtype=np.complex128) - cdef np.ndarray[Point] b = np.zeros(10, - dtype=np.dtype([('x', np.float64), - ('y', np.float64)])) - <...> - -Note that ``ndim`` defaults to 1. Also note that NumPy record arrays -are by default unaligned, meaning data is packed as tightly as -possible without considering the alignment preferences of the -CPU. Such unaligned record arrays corresponds to a Cython ``packed`` -struct. If one uses an aligned dtype, by passing ``align=True`` to the -``dtype`` constructor, one must drop the ``packed`` keyword on the -struct definition. - -Some data types are not yet supported, like boolean arrays and string -arrays. Also data types describing data which is not in the native -endian will likely never be supported. It is however possible to -access such arrays on a lower level by casting the arrays:: - - cdef np.ndarray[np.uint8, cast=True] boolarr = (x < y) - cdef np.ndarray[np.uint32, cast=True] values = \ - np.arange(10, dtype='>i4') - -Assuming one is on a little-endian system, the ``values`` array -can still access the raw bit content of the array (which must then -be reinterpreted to yield valid results on a little-endian system). - -Finally, note that typed NumPy array variables in some respects behave -a little differently from untyped arrays. ``arr.shape`` is no longer a -tuple. ``arr.shape[0]`` is valid but to e.g. print the shape one must -do ``print (arr).shape`` in order to "untype" the variable -first. The same is true for ``arr.data`` (which in typed mode is a C -data pointer). - -There are many more options for optimizations to consider for Cython -and NumPy arrays. We again refer the interested reader to [Seljebotn09]_. - -.. [#] In Cython 0.11.2, ``np.complex64_t`` and ``np.complex128_t`` - does not work and one must write ``complex`` or - ``double complex`` instead. This is fixed in 0.11.3. Cython - 0.11.1 and earlier does not support complex numbers. - -.. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython, - Proceedings of the 8th Python in Science Conference, 2009. +Cython has support for fast access to NumPy arrays. Let's see how this +works with a simple example. + +The code below does 2D discrete convolution of an image with a filter (and I'm +sure you can do better!, let it serve for demonstration purposes). It is both +valid Python and valid Cython code. I'll refer to it as both +:file:`convolve_py.py` for the Python version and :file:`convolve1.pyx` for the +Cython version -- Cython uses ".pyx" as its file suffix. + +.. code-block:: python + + from __future__ import division + import numpy as np + def naive_convolve(f, g): + # f is an image and is indexed by (v, w) + # g is a filter kernel and is indexed by (s, t), + # it needs odd dimensions + # h is the output image and is indexed by (x, y), + # it is not cropped + if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: + raise ValueError("Only odd dimensions on filter supported") + # smid and tmid are number of pixels between the center pixel + # and the edge, ie for a 5x5 filter they will be 2. + # + # The output size is calculated by adding smid, tmid to each + # side of the dimensions of the input image. + vmax = f.shape[0] + wmax = f.shape[1] + smax = g.shape[0] + tmax = g.shape[1] + smid = smax // 2 + tmid = tmax // 2 + xmax = vmax + 2*smid + ymax = wmax + 2*tmid + # Allocate result image. + h = np.zeros([xmax, ymax], dtype=f.dtype) + # Do convolution + for x in range(xmax): + for y in range(ymax): + # Calculate pixel value for h at (x,y). Sum one component + # for each pixel (s, t) of the filter g. + s_from = max(smid - x, -smid) + s_to = min((xmax - x) - smid, smid + 1) + t_from = max(tmid - y, -tmid) + t_to = min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = x - smid + s + w = y - tmid + t + value += g[smid - s, tmid - t] * f[v, w] + h[x, y] = value + return h + +This should be compiled to produce :file:`yourmod.so` (for Linux systems). We +run a Python session to test both the Python version (imported from +``.py``-file) and the compiled Cython module. + +.. sourcecode:: ipython + + In [1]: import numpy as np + In [2]: import convolve_py + In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), + ... np.array([[1],[2],[1]], dtype=np.int)) + Out [3]: + array([[1, 1, 1], + [2, 2, 2], + [1, 1, 1]]) + In [4]: import convolve1 + In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), + ... np.array([[1],[2],[1]], dtype=np.int)) + Out [4]: + array([[1, 1, 1], + [2, 2, 2], + [1, 1, 1]]) + In [11]: N = 100 + In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) + In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9)) + In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g) + 2 loops, best of 3: 1.86 s per loop + In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g) + 2 loops, best of 3: 1.41 s per loop + +There's not such a huge difference yet; because the C code still does exactly +what the Python interpreter does (meaning, for instance, that a new object is +allocated for each number used). Look at the generated html file and see what +is needed for even the simplest statements you get the point quickly. We need +to give Cython more information; we need to add types. + +Adding types +============= + +To add types we use custom Cython syntax, so we are now breaking Python source +compatibility. Here's :file:`convolve2.pyx`. *Read the comments!* :: + + from __future__ import division + import numpy as np + # "cimport" is used to import special compile-time information + # about the numpy module (this is stored in a file numpy.pxd which is + # currently part of the Cython distribution). + cimport numpy as np + # We now need to fix a datatype for our arrays. I've used the variable + # DTYPE for this, which is assigned to the usual NumPy runtime + # type info object. + DTYPE = np.int + # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For + # every type in the numpy module there's a corresponding compile-time + # type with a _t-suffix. + ctypedef np.int_t DTYPE_t + # The builtin min and max functions works with Python objects, and are + # so very slow. So we create our own. + # - "cdef" declares a function which has much less overhead than a normal + # def function (but it is not Python-callable) + # - "inline" is passed on to the C compiler which may inline the functions + # - The C type "int" is chosen as return type and argument types + # - Cython allows some newer Python constructs like "a if x else b", but + # the resulting C file compiles with Python 2.3 through to Python 3.0 beta. + cdef inline int int_max(int a, int b): return a if a >= b else b + cdef inline int int_min(int a, int b): return a if a <= b else b + # "def" can type its arguments but not have a return type. The type of the + # arguments for a "def" function is checked at run-time when entering the + # function. + # + # The arrays f, g and h is typed as "np.ndarray" instances. The only effect + # this has is to a) insert checks that the function arguments really are + # NumPy arrays, and b) make some attribute access like f.shape[0] much + # more efficient. (In this example this doesn't matter though.) + def naive_convolve(np.ndarray f, np.ndarray g): + if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: + raise ValueError("Only odd dimensions on filter supported") + assert f.dtype == DTYPE and g.dtype == DTYPE + # The "cdef" keyword is also used within functions to type variables. It + # can only be used at the top indendation level (there are non-trivial + # problems with allowing them in other places, though we'd love to see + # good and thought out proposals for it). + # + # For the indices, the "int" type is used. This corresponds to a C int, + # other C types (like "unsigned int") could have been used instead. + # Purists could use "Py_ssize_t" which is the proper Python type for + # array indices. + cdef int vmax = f.shape[0] + cdef int wmax = f.shape[1] + cdef int smax = g.shape[0] + cdef int tmax = g.shape[1] + cdef int smid = smax // 2 + cdef int tmid = tmax // 2 + cdef int xmax = vmax + 2*smid + cdef int ymax = wmax + 2*tmid + cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE) + cdef int x, y, s, t, v, w + # It is very important to type ALL your variables. You do not get any + # warnings if not, only much slower code (they are implicitly typed as + # Python objects). + cdef int s_from, s_to, t_from, t_to + # For the value variable, we want to use the same data type as is + # stored in the array, so we use "DTYPE_t" as defined above. + # NB! An important side-effect of this is that if "value" overflows its + # datatype size, it will simply wrap around like in C, rather than raise + # an error like in Python. + cdef DTYPE_t value + for x in range(xmax): + for y in range(ymax): + s_from = int_max(smid - x, -smid) + s_to = int_min((xmax - x) - smid, smid + 1) + t_from = int_max(tmid - y, -tmid) + t_to = int_min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = x - smid + s + w = y - tmid + t + value += g[smid - s, tmid - t] * f[v, w] + h[x, y] = value + return h + +After building this and continuing my (very informal) benchmarks, I get: + +.. sourcecode:: ipython + + In [21]: import convolve2 + In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g) + 2 loops, best of 3: 828 ms per loop + +Efficient indexing +==================== + +There's still a bottleneck killing performance, and that is the array lookups +and assignments. The ``[]``-operator still uses full Python operations -- +what we would like to do instead is to access the data buffer directly at C +speed. + +What we need to do then is to type the contents of the :obj:`ndarray` objects. +We do this with a special "buffer" syntax which must be told the datatype +(first argument) and number of dimensions ("ndim" keyword-only argument, if +not provided then one-dimensional is assumed). + +More information on this syntax [:enhancements/buffer:can be found here]. + +Showing the changes needed to produce :file:`convolve3.pyx` only:: + + ... + def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): + ... + cdef np.ndarray[DTYPE_t, ndim=2] h = ... + +Usage: + +.. sourcecode:: ipython + + In [18]: import convolve3 + In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g) + 3 loops, best of 100: 11.6 ms per loop + +Note the importance of this change. + +*Gotcha*: This efficient indexing only affects certain index operations, +namely those with exactly ``ndim`` number of typed integer indices. So if +``v`` for instance isn't typed, then the lookup ``f[v, w]`` isn't +optimized. On the other hand this means that you can continue using Python +objects for sophisticated dynamic slicing etc. just as when the array is not +typed. + +Tuning indexing further +======================== + +The array lookups are still slowed down by two factors: + +1. Bounds checking is performed. +2. Negative indices are checked for and handled correctly. The code above is + explicitly coded so that it doesn't use negative indices, and it + (hopefully) always access within bounds. We can add a decorator to disable + bounds checking:: + + ... + cimport cython + @cython.boundscheck(False) # turn of bounds-checking for entire function + def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): + ... + +Now bounds checking is not performed (and, as a side-effect, if you ''do'' +happen to access out of bounds you will in the best case crash your program +and in the worst case corrupt data). It is possible to switch bounds-checking +mode in many ways, see [:docs/compilerdirectives:compiler directives] for more +information. + +Negative indices are dealt with by ensuring Cython that the indices will be +positive, by casting the variables to unsigned integer types (if you do have +negative values, then this casting will create a very large positive value +instead and you will attempt to access out-of-bounds values). Casting is done +with a special ``<>``-syntax. The code below is changed to use either +unsigned ints or casting as appropriate:: + + ... + cdef int s, t # changed + cdef unsigned int x, y, v, w # changed + cdef int s_from, s_to, t_from, t_to + cdef DTYPE_t value + for x in range(xmax): + for y in range(ymax): + s_from = max(smid - x, -smid) + s_to = min((xmax - x) - smid, smid + 1) + t_from = max(tmid - y, -tmid) + t_to = min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = (x - smid + s) # changed + w = (y - tmid + t) # changed + value += g[(smid - s), (tmid - t)] * f[v, w] # changed + h[x, y] = value + ... + +The function call overhead now starts to play a role, so we compare the latter +two examples with larger N: + +.. sourcecode:: ipython + + In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g) + 3 loops, best of 100: 5.97 ms per loop + In [12]: N = 1000 + In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) + In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9)) + In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g) + 1 loops, best of 10: 1.16 s per loop + In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g) + 1 loops, best of 10: 597 ms per loop + +(Also this is a mixed benchmark as the result array is allocated within the +function call.) + +.. Warning:: + + Speed comes with some cost. Especially it can be dangerous to set typed + objects (like ``f``, ``g`` and ``h`` in our sample code) to + :keyword:`None`. Setting such objects to :keyword:`None` is entirely + legal, but all you can do with them is check whether they are None. All + other use (attribute lookup or indexing) can potentially segfault or + corrupt data (rather than raising exceptions as they would in Python). + + The actual rules are a bit more complicated but the main message is clear: + Do not use typed objects without knowing that they are not set to None. + +More generic code +================== + +It would be possible to do:: + + def naive_convolve(object[DTYPE_t, ndim=2] f, ...): + +i.e. use :obj:`object` rather than :obj:`np.ndarray`. Under Python 3.0 this +can allow your algorithm to work with any libraries supporting the buffer +interface; and support for e.g. the Python Imaging Library may easily be added +if someone is interested also under Python 2.x. + +There is some speed penalty to this though (as one makes more assumptions +compile-time if the type is set to :obj:`np.ndarray`, specifically it is +assumed that the data is stored in pure strided more and not in indirect +mode). + diff --git a/docs/src/userguide/index.rst b/docs/src/userguide/index.rst index 4a08e7c9..32193c5a 100644 --- a/docs/src/userguide/index.rst +++ b/docs/src/userguide/index.rst @@ -15,7 +15,6 @@ Contents: external_C_code source_files_and_compilation wrapping_CPlusPlus - numpy_tutorial limitations pyrex_differences early_binding_for_speed diff --git a/docs/src/userguide/numpy_tutorial.rst b/docs/src/userguide/numpy_tutorial.rst deleted file mode 100644 index aafeea70..00000000 --- a/docs/src/userguide/numpy_tutorial.rst +++ /dev/null @@ -1,496 +0,0 @@ -.. highlight:: cython - -.. _numpy_tutorial: - -************************** -Cython for NumPy users -************************** - -This tutorial is aimed at NumPy users who have no experience with Cython at -all. If you have some knowledge of Cython you may want to skip to the -''Efficient indexing'' section which explains the new improvements made in -summer 2008. - -The main scenario considered is NumPy end-use rather than NumPy/SciPy -development. The reason is that Cython is not (yet) able to support functions -that are generic with respect to datatype and the number of dimensions in a -high-level fashion. This restriction is much more severe for SciPy development -than more specific, "end-user" functions. See the last section for more -information on this. - -The style of this tutorial will not fit everybody, so you can also consider: - -* Robert Bradshaw's `slides on cython for SciPy2008 - `_ - (a higher-level and quicker introduction) -* Basic Cython documentation (see `Cython front page `_). -* ``[:enhancements/buffer:Spec for the efficient indexing]`` - -.. Note:: - The fast array access documented below is a completely new feature, and - there may be bugs waiting to be discovered. It might be a good idea to do - a manual sanity check on the C code Cython generates before using this for - serious purposes, at least until some months have passed. - -Cython at a glance -==================== - -Cython is a compiler which compiles Python-like code files to C code. Still, -''Cython is not a Python to C translator''. That is, it doesn't take your full -program and "turns it into C" -- rather, the result makes full use of the -Python runtime environment. A way of looking at it may be that your code is -still Python in that it runs within the Python runtime environment, but rather -than compiling to interpreted Python bytecode one compiles to native machine -code (but with the addition of extra syntax for easy embedding of faster -C-like code). - -This has two important consequences: - -* Speed. How much depends very much on the program involved though. Typical Python numerical programs would tend to gain very little as most time is spent in lower-level C that is used in a high-level fashion. However for-loop-style programs can gain many orders of magnitude, when typing information is added (and is so made possible as a realistic alternative). -* Easy calling into C code. One of Cython's purposes is to allow easy wrapping - of C libraries. When writing code in Cython you can call into C code as - easily as into Python code. - -Some Python constructs are not yet supported, though making Cython compile all -Python code is a stated goal (among the more important omissions are inner -functions and generator functions). - -Your Cython environment -======================== - -Using Cython consists of these steps: - -1. Write a :file:`.pyx` source file -2. Run the Cython compiler to generate a C file -3. Run a C compiler to generate a compiled library -4. Run the Python interpreter and ask it to import the module - -However there are several options to automate these steps: - -1. The `SAGE `_ mathematics software system provides - excellent support for using Cython and NumPy from an interactive command - line (like IPython) or through a notebook interface (like - Maple/Mathematica). See `this documentation - `_. -2. A version of `pyximport `_ is shipped - with Cython, so that you can import pyx-files dynamically into Python and - have them compiled automatically (See :ref:`pyximport`). -3. Cython supports distutils so that you can very easily create build scripts - which automate the process, this is the preferred method for full programs. -4. Manual compilation (see below) - -.. Note:: - If using another interactive command line environment than SAGE, like - IPython or Python itself, it is important that you restart the process - when you recompile the module. It is not enough to issue an "import" - statement again. - -Installation -============= - -Unless you are used to some other automatic method: -`download Cython `_ (0.9.8.1.1 or later), unpack it, -and run the usual ```python setup.py install``. This will install a -``cython`` executable on your system. It is also possible to use Cython from -the source directory without installing (simply launch :file:`cython.py` in the -root directory). - -As of this writing SAGE comes with an older release of Cython than required -for this tutorial. So if using SAGE you should download the newest Cython and -then execute :: - - $ cd path/to/cython-distro - $ path-to-sage/sage -python setup.py install - -This will install the newest Cython into SAGE. - -Manual compilation -==================== - -As it is always important to know what is going on, I'll describe the manual -method here. First Cython is run:: - - $ cython yourmod.pyx - -This creates :file:`yourmod.c` which is the C source for a Python extension -module. A useful additional switch is ``-a`` which will generate a document -:file:`yourmod.html`) that shows which Cython code translates to which C code -line by line. - -Then we compile the C file. This may vary according to your system, but the C -file should be built like Python was built. Python documentation for writing -extensions should have some details. On Linux this often means something -like:: - - $ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o yourmod.so yourmod.c - -``gcc`` should have access to the NumPy C header files so if they are not -installed at :file:`/usr/include/numpy` or similar you may need to pass another -option for those. - -This creates :file:`yourmod.so` in the same directory, which is importable by -Python by using a normal ``import yourmod`` statement. - -The first Cython program -========================== - -The code below does 2D discrete convolution of an image with a filter (and I'm -sure you can do better!, let it serve for demonstration purposes). It is both -valid Python and valid Cython code. I'll refer to it as both -:file:`convolve_py.py` for the Python version and :file:`convolve1.pyx` for the -Cython version -- Cython uses ".pyx" as its file suffix. - -.. code-block:: python - - from __future__ import division - import numpy as np - def naive_convolve(f, g): - # f is an image and is indexed by (v, w) - # g is a filter kernel and is indexed by (s, t), - # it needs odd dimensions - # h is the output image and is indexed by (x, y), - # it is not cropped - if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: - raise ValueError("Only odd dimensions on filter supported") - # smid and tmid are number of pixels between the center pixel - # and the edge, ie for a 5x5 filter they will be 2. - # - # The output size is calculated by adding smid, tmid to each - # side of the dimensions of the input image. - vmax = f.shape[0] - wmax = f.shape[1] - smax = g.shape[0] - tmax = g.shape[1] - smid = smax // 2 - tmid = tmax // 2 - xmax = vmax + 2*smid - ymax = wmax + 2*tmid - # Allocate result image. - h = np.zeros([xmax, ymax], dtype=f.dtype) - # Do convolution - for x in range(xmax): - for y in range(ymax): - # Calculate pixel value for h at (x,y). Sum one component - # for each pixel (s, t) of the filter g. - s_from = max(smid - x, -smid) - s_to = min((xmax - x) - smid, smid + 1) - t_from = max(tmid - y, -tmid) - t_to = min((ymax - y) - tmid, tmid + 1) - value = 0 - for s in range(s_from, s_to): - for t in range(t_from, t_to): - v = x - smid + s - w = y - tmid + t - value += g[smid - s, tmid - t] * f[v, w] - h[x, y] = value - return h - -This should be compiled to produce :file:`yourmod.so` (for Linux systems). We -run a Python session to test both the Python version (imported from -``.py``-file) and the compiled Cython module. - -.. sourcecode:: ipython - - In [1]: import numpy as np - In [2]: import convolve_py - In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), - ... np.array([[1],[2],[1]], dtype=np.int)) - Out [3]: - array([[1, 1, 1], - [2, 2, 2], - [1, 1, 1]]) - In [4]: import convolve1 - In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), - ... np.array([[1],[2],[1]], dtype=np.int)) - Out [4]: - array([[1, 1, 1], - [2, 2, 2], - [1, 1, 1]]) - In [11]: N = 100 - In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) - In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9)) - In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g) - 2 loops, best of 3: 1.86 s per loop - In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g) - 2 loops, best of 3: 1.41 s per loop - -There's not such a huge difference yet; because the C code still does exactly -what the Python interpreter does (meaning, for instance, that a new object is -allocated for each number used). Look at the generated html file and see what -is needed for even the simplest statements you get the point quickly. We need -to give Cython more information; we need to add types. - -Adding types -============= - -To add types we use custom Cython syntax, so we are now breaking Python source -compatibility. Here's :file:`convolve2.pyx`. *Read the comments!* :: - - from __future__ import division - import numpy as np - # "cimport" is used to import special compile-time information - # about the numpy module (this is stored in a file numpy.pxd which is - # currently part of the Cython distribution). - cimport numpy as np - # We now need to fix a datatype for our arrays. I've used the variable - # DTYPE for this, which is assigned to the usual NumPy runtime - # type info object. - DTYPE = np.int - # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For - # every type in the numpy module there's a corresponding compile-time - # type with a _t-suffix. - ctypedef np.int_t DTYPE_t - # The builtin min and max functions works with Python objects, and are - # so very slow. So we create our own. - # - "cdef" declares a function which has much less overhead than a normal - # def function (but it is not Python-callable) - # - "inline" is passed on to the C compiler which may inline the functions - # - The C type "int" is chosen as return type and argument types - # - Cython allows some newer Python constructs like "a if x else b", but - # the resulting C file compiles with Python 2.3 through to Python 3.0 beta. - cdef inline int int_max(int a, int b): return a if a >= b else b - cdef inline int int_min(int a, int b): return a if a <= b else b - # "def" can type its arguments but not have a return type. The type of the - # arguments for a "def" function is checked at run-time when entering the - # function. - # - # The arrays f, g and h is typed as "np.ndarray" instances. The only effect - # this has is to a) insert checks that the function arguments really are - # NumPy arrays, and b) make some attribute access like f.shape[0] much - # more efficient. (In this example this doesn't matter though.) - def naive_convolve(np.ndarray f, np.ndarray g): - if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: - raise ValueError("Only odd dimensions on filter supported") - assert f.dtype == DTYPE and g.dtype == DTYPE - # The "cdef" keyword is also used within functions to type variables. It - # can only be used at the top indendation level (there are non-trivial - # problems with allowing them in other places, though we'd love to see - # good and thought out proposals for it). - # - # For the indices, the "int" type is used. This corresponds to a C int, - # other C types (like "unsigned int") could have been used instead. - # Purists could use "Py_ssize_t" which is the proper Python type for - # array indices. - cdef int vmax = f.shape[0] - cdef int wmax = f.shape[1] - cdef int smax = g.shape[0] - cdef int tmax = g.shape[1] - cdef int smid = smax // 2 - cdef int tmid = tmax // 2 - cdef int xmax = vmax + 2*smid - cdef int ymax = wmax + 2*tmid - cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE) - cdef int x, y, s, t, v, w - # It is very important to type ALL your variables. You do not get any - # warnings if not, only much slower code (they are implicitly typed as - # Python objects). - cdef int s_from, s_to, t_from, t_to - # For the value variable, we want to use the same data type as is - # stored in the array, so we use "DTYPE_t" as defined above. - # NB! An important side-effect of this is that if "value" overflows its - # datatype size, it will simply wrap around like in C, rather than raise - # an error like in Python. - cdef DTYPE_t value - for x in range(xmax): - for y in range(ymax): - s_from = int_max(smid - x, -smid) - s_to = int_min((xmax - x) - smid, smid + 1) - t_from = int_max(tmid - y, -tmid) - t_to = int_min((ymax - y) - tmid, tmid + 1) - value = 0 - for s in range(s_from, s_to): - for t in range(t_from, t_to): - v = x - smid + s - w = y - tmid + t - value += g[smid - s, tmid - t] * f[v, w] - h[x, y] = value - return h - -At this point, have a look at the generated C code for :file:`convolve1.pyx` and -:file:`convolve2.pyx`. Click on the lines to expand them and see corresponding C. -(Note that this code annotation is currently experimental and especially -"trailing" cleanup code for a block may stick to the last expression in the -block and make it look worse than it is -- use some common sense). - -* .. literalinclude: convolve1.html -* .. literalinclude: convolve2.html - -Especially have a look at the for loops: In :file:`convolve1.c`, these are ~20 lines -of C code to set up while in :file:`convolve2.c` a normal C for loop is used. - -After building this and continuing my (very informal) benchmarks, I get: - -.. sourcecode:: ipython - - In [21]: import convolve2 - In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g) - 2 loops, best of 3: 828 ms per loop - -Efficient indexing -==================== - -There's still a bottleneck killing performance, and that is the array lookups -and assignments. The ``[]``-operator still uses full Python operations -- -what we would like to do instead is to access the data buffer directly at C -speed. - -What we need to do then is to type the contents of the :obj:`ndarray` objects. -We do this with a special "buffer" syntax which must be told the datatype -(first argument) and number of dimensions ("ndim" keyword-only argument, if -not provided then one-dimensional is assumed). - -More information on this syntax [:enhancements/buffer:can be found here]. - -Showing the changes needed to produce :file:`convolve3.pyx` only:: - - ... - def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): - ... - cdef np.ndarray[DTYPE_t, ndim=2] h = ... - -Usage: - -.. sourcecode:: ipython - - In [18]: import convolve3 - In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g) - 3 loops, best of 100: 11.6 ms per loop - -Note the importance of this change. - -*Gotcha*: This efficient indexing only affects certain index operations, -namely those with exactly ``ndim`` number of typed integer indices. So if -``v`` for instance isn't typed, then the lookup ``f[v, w]`` isn't -optimized. On the other hand this means that you can continue using Python -objects for sophisticated dynamic slicing etc. just as when the array is not -typed. - -Tuning indexing further -======================== - -The array lookups are still slowed down by two factors: - -1. Bounds checking is performed. -2. Negative indices are checked for and handled correctly. The code above is - explicitly coded so that it doesn't use negative indices, and it - (hopefully) always access within bounds. We can add a decorator to disable - bounds checking:: - - ... - cimport cython - @cython.boundscheck(False) # turn of bounds-checking for entire function - def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): - ... - -Now bounds checking is not performed (and, as a side-effect, if you ''do'' -happen to access out of bounds you will in the best case crash your program -and in the worst case corrupt data). It is possible to switch bounds-checking -mode in many ways, see [:docs/compilerdirectives:compiler directives] for more -information. - -Negative indices are dealt with by ensuring Cython that the indices will be -positive, by casting the variables to unsigned integer types (if you do have -negative values, then this casting will create a very large positive value -instead and you will attempt to access out-of-bounds values). Casting is done -with a special ``<>``-syntax. The code below is changed to use either -unsigned ints or casting as appropriate:: - - ... - cdef int s, t # changed - cdef unsigned int x, y, v, w # changed - cdef int s_from, s_to, t_from, t_to - cdef DTYPE_t value - for x in range(xmax): - for y in range(ymax): - s_from = max(smid - x, -smid) - s_to = min((xmax - x) - smid, smid + 1) - t_from = max(tmid - y, -tmid) - t_to = min((ymax - y) - tmid, tmid + 1) - value = 0 - for s in range(s_from, s_to): - for t in range(t_from, t_to): - v = (x - smid + s) # changed - w = (y - tmid + t) # changed - value += g[(smid - s), (tmid - t)] * f[v, w] # changed - h[x, y] = value - ... - -(In the next Cython release we will likely add a compiler directive or -argument to the ``np.ndarray[]``-type specifier to disable negative indexing -so that casting so much isn't necessary; feedback on this is welcome.) - -The function call overhead now starts to play a role, so we compare the latter -two examples with larger N: - -.. sourcecode:: ipython - - In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g) - 3 loops, best of 100: 5.97 ms per loop - In [12]: N = 1000 - In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) - In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9)) - In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g) - 1 loops, best of 10: 1.16 s per loop - In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g) - 1 loops, best of 10: 597 ms per loop - -(Also this is a mixed benchmark as the result array is allocated within the -function call.) - -.. Warning:: - - Speed comes with some cost. Especially it can be dangerous to set typed - objects (like ``f``, ``g`` and ``h`` in our sample code) to :keyword:`None`. - Setting such objects to :keyword:`None` is entirely legal, but all you can do with them - is check whether they are None. All other use (attribute lookup or indexing) - can potentially segfault or corrupt data (rather than raising exceptions as - they would in Python). - - The actual rules are a bit more complicated but the main message is clear: Do - not use typed objects without knowing that they are not set to None. - -More generic code -================== - -It would be possible to do:: - - def naive_convolve(object[DTYPE_t, ndim=2] f, ...): - -i.e. use :obj:`object` rather than :obj:`np.ndarray`. Under Python 3.0 this -can allow your algorithm to work with any libraries supporting the buffer -interface; and support for e.g. the Python Imaging Library may easily be added -if someone is interested also under Python 2.x. - -There is some speed penalty to this though (as one makes more assumptions -compile-time if the type is set to :obj:`np.ndarray`, specifically it is -assumed that the data is stored in pure strided more and not in indirect -mode). - -[:enhancements/buffer:More information] - -The future -============ - -These are some points to consider for further development. All points listed -here has gone through a lot of thinking and planning already; still they may -or may not happen depending on available developer time and resources for -Cython. - -1. Support for efficient access to structs/records stored in arrays; currently - only primitive types are allowed. -2. Support for efficient access to complex floating point types in arrays. The - main obstacle here is getting support for efficient complex datatypes in - Cython. -3. Calling NumPy/SciPy functions currently has a Python call overhead; it - would be possible to take a short-cut from Cython directly to C. (This does - however require some isolated and incremental changes to those libraries; - mail the Cython mailing list for details). -4. Efficient code that is generic with respect to the number of dimensions. - This can probably be done today by calling the NumPy C multi-dimensional - iterator API directly; however it would be nice to have for-loops over - :func:`enumerate` and :func:`ndenumerate` on NumPy arrays create efficient - code. -5. A high-level construct for writing type-generic code, so that one can write - functions that work simultaneously with many datatypes. Note however that a - macro preprocessor language can help with doing this for now. - -- 2.26.2