From: Robert Bradshaw Date: Thu, 12 Nov 2009 07:56:21 +0000 (-0800) Subject: Users guide. X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=e245838728956d367cff8e1a2da511468cf90bf9;p=cython.git Users guide. --HG-- rename : docs/early_binding_for_speed.rst => src/userguide/early_binding_for_speed.rst rename : docs/extension_types.rst => src/userguide/extension_types.rst rename : docs/external_C_code.rst => src/userguide/external_C_code.rst rename : index.rst => src/userguide/index.rst rename : docs/language_basics.rst => src/userguide/language_basics.rst rename : docs/limitations.rst => src/userguide/limitations.rst rename : docs/numpy_tutorial.rst => src/userguide/numpy_tutorial.rst rename : docs/overview.rst => src/userguide/overview.rst rename : docs/profiling_tutorial.rst => src/userguide/profiling_tutorial.rst rename : docs/pxd_package.rst => src/userguide/pxd_package.rst rename : docs/pyrex_differences.rst => src/userguide/pyrex_differences.rst rename : docs/sharing_declarations.rst => src/userguide/sharing_declarations.rst rename : docs/source_files_and_compilation.rst => src/userguide/source_files_and_compilation.rst rename : docs/special_methods.rst => src/userguide/special_methods.rst rename : docs/tutorial.rst => src/userguide/tutorial.rst rename : docs/wrapping_CPlusPlus.rst => src/userguide/wrapping_CPlusPlus.rst --- diff --git a/src/userguide/early_binding_for_speed.rst b/src/userguide/early_binding_for_speed.rst new file mode 100644 index 00000000..07e0047a --- /dev/null +++ b/src/userguide/early_binding_for_speed.rst @@ -0,0 +1,131 @@ +.. highlight:: cython + +.. _early-binding-for-speed: + +************************** +Early Binding for Speed +************************** + +As a dynamic language, Python encourages a programming style of considering +classes and objects in terms of their methods and attributes, more than where +they fit into the class hierarchy. + +This can make Python a very relaxed and comfortable language for rapid +development, but with a price - the 'red tape' of managing data types is +dumped onto the interpreter. At run time, the interpreter does a lot of work +searching namespaces, fetching attributes and parsing argument and keyword +tuples. This run-time 'late binding' is a major cause of Python's relative +slowness compared to 'early binding' languages such as C++. + +However with Cython it is possible to gain significant speed-ups through the +use of 'early binding' programming techniques. + +For example, consider the following (silly) code example: + +.. sourcecode:: cython + + cdef class Rectangle: + cdef int x0, y0 + cdef int x1, y1 + def __init__(self, int x0, int y0, int x1, int y1): + self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1 + def area(self): + area = (self.x1 - self.x0) * (self.y1 - self.y0) + if area < 0: + area = -area + return area + + def rectArea(x0, y0, x1, y1): + rect = Rectangle(x0, y0, x1, y1) + return rect.area() + +In the :func:`rectArea` method, the call to :meth:`rect.area` and the +:meth:`.area` method contain a lot of Python overhead. + +However, in Cython, it is possible to eliminate a lot of this overhead in cases +where calls occur within Cython code. For example: + +.. sourcecode:: cython + + cdef class Rectangle: + cdef int x0, y0 + cdef int x1, y1 + def __init__(self, int x0, int y0, int x1, int y1): + self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1 + cdef int _area(self): + int area + area = (self.x1 - self.x0) * (self.y1 - self.y0) + if area < 0: + area = -area + return area + def area(self): + return self._area() + + def rectArea(x0, y0, x1, y1): + cdef Rectangle rect + rect = Rectangle(x0, y0, x1, y1) + return rect._area() + +Here, in the Rectangle extension class, we have defined two different area +calculation methods, the efficient :meth:`_area` C method, and the +Python-callable :meth:`area` method which serves as a thin wrapper around +:meth:`_area`. Note also in the function :func:`rectArea` how we 'early bind' +by declaring the local variable ``rect`` which is explicitly given the type +Rectangle. By using this declaration, instead of just dynamically assigning to +``rect``, we gain the ability to access the much more efficient C-callable +:meth:`_rect` method. + +But Cython offers us more simplicity again, by allowing us to declare +dual-access methods - methods that can be efficiently called at C level, but +can also be accessed from pure Python code at the cost of the Python access +overheads. Consider this code: + +.. sourcecode:: cython + + cdef class Rectangle: + cdef int x0, y0 + cdef int x1, y1 + def __init__(self, int x0, int y0, int x1, int y1): + self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1 + cpdef int area(self): + int area + area = (self.x1 - self.x0) * (self.y1 - self.y0) + if area < 0: + area = -area + return area + + def rectArea(x0, y0, x1, y1): + cdef Rectangle rect + rect = Rectangle(x0, y0, x1, y1) + return rect.area() + +.. note:: + + in earlier versions of Cython, the :keyword:`cpdef` keyword is + :keyword:`rdef` - but has the same effect). + +Here, we just have a single area method, declared as :keyword:`cpdef` to make it +efficiently callable as a C function, but still accessible from pure Python +(or late-binding Cython) code. + +If within Cython code, we have a variable already 'early-bound' (ie, declared +explicitly as type Rectangle, (or cast to type Rectangle), then invoking its +area method will use the efficient C code path and skip the Python overhead. +But if in Pyrex or regular Python code we have a regular object variable +storing a Rectangle object, then invoking the area method will require: + +* an attribute lookup for the area method +* packing a tuple for arguments and a dict for keywords (both empty in this case) +* using the Python API to call the method + +and within the area method itself: + +* parsing the tuple and keywords +* executing the calculation code +* converting the result to a python object and returning it + +So within Cython, it is possible to achieve massive optimisations by +using strong typing in declaration and casting of variables. For tight loops +which use method calls, and where these methods are pure C, the difference can +be huge. + diff --git a/src/userguide/extension_types.rst b/src/userguide/extension_types.rst new file mode 100644 index 00000000..a8ed564c --- /dev/null +++ b/src/userguide/extension_types.rst @@ -0,0 +1,557 @@ +.. highlight:: cython + +.. _extension-types: + +****************** +Extension Types +****************** + +Introduction +============== + +As well as creating normal user-defined classes with the Python class +statement, Cython also lets you create new built-in Python types, known as +extension types. You define an extension type using the :keyword:`cdef` class +statement. Here's an example:: + + cdef class Shrubbery: + + cdef int width, height + + def __init__(self, w, h): + self.width = w + self.height = h + + def describe(self): + print "This shrubbery is", self.width, \ + "by", self.height, "cubits." + +As you can see, a Cython extension type definition looks a lot like a Python +class definition. Within it, you use the def statement to define methods that +can be called from Python code. You can even define many of the special +methods such as :meth:`__init__` as you would in Python. + +The main difference is that you can use the :keyword:`cdef` statement to define +attributes. The attributes may be Python objects (either generic or of a +particular extension type), or they may be of any C data type. So you can use +extension types to wrap arbitrary C data structures and provide a Python-like +interface to them. + +Attributes +============ + +Attributes of an extension type are stored directly in the object's C struct. +The set of attributes is fixed at compile time; you can't add attributes to an +extension type instance at run time simply by assigning to them, as you could +with a Python class instance. (You can subclass the extension type in Python +and add attributes to instances of the subclass, however.) + +There are two ways that attributes of an extension type can be accessed: by +Python attribute lookup, or by direct access to the C struct from Cython code. +Python code is only able to access attributes of an extension type by the +first method, but Cython code can use either method. + +By default, extension type attributes are only accessible by direct access, +not Python access, which means that they are not accessible from Python code. +To make them accessible from Python code, you need to declare them as +:keyword:`public` or :keyword:`readonly`. For example,:: + + cdef class Shrubbery: + cdef public int width, height + cdef readonly float depth + +makes the width and height attributes readable and writable from Python code, +and the depth attribute readable but not writable. + +.. note:: + + You can only expose simple C types, such as ints, floats, and + strings, for Python access. You can also expose Python-valued attributes. + +.. note:: + + Also the :keyword:`public` and :keyword:`readonly` options apply only to + Python access, not direct access. All the attributes of an extension type + are always readable and writable by C-level access. + +Type declarations +=================== + +Before you can directly access the attributes of an extension type, the Cython +compiler must know that you have an instance of that type, and not just a +generic Python object. It knows this already in the case of the ``self`` +parameter of the methods of that type, but in other cases you will have to use +a type declaration. + +For example, in the following function,:: + + cdef widen_shrubbery(sh, extra_width): # BAD + sh.width = sh.width + extra_width + +because the ``sh`` parameter hasn't been given a type, the width attribute +will be accessed by a Python attribute lookup. If the attribute has been +declared :keyword:`public` or :keyword:`readonly` then this will work, but it +will be very inefficient. If the attribute is private, it will not work at all +-- the code will compile, but an attribute error will be raised at run time. + +The solution is to declare ``sh`` as being of type :class:`Shrubbery`, as +follows:: + + cdef widen_shrubbery(Shrubbery sh, extra_width): + sh.width = sh.width + extra_width + +Now the Cython compiler knows that ``sh`` has a C attribute called +:attr:`width` and will generate code to access it directly and efficiently. +The same consideration applies to local variables, for example,:: + + cdef Shrubbery another_shrubbery(Shrubbery sh1): + cdef Shrubbery sh2 + sh2 = Shrubbery() + sh2.width = sh1.width + sh2.height = sh1.height + return sh2 + + +Type Testing and Casting +------------------------ + +Suppose I have a method :meth:`quest` which returns an object of type :class:`Shrubbery`. +To access it's width I could write:: + + cdef Shrubbery sh = quest() + print sh.width + +which requires the use of a local variable and performs a type test on assignment. +If you *know* the return value of :meth:`quest` will be of type :class:`Shrubbery` +you can use a cast to write:: + + print (quest()).width + +This may be dangerous if :meth:`quest()` is not actually a :class:`Shrubbery`, as it +will try to access width as a C struct member which may not exist. At the C level, +rather than raising an :class:`AttributeError`, either an nonsensical result will be +returned (interpreting whatever data is at at that address as an int) or a segfault +may result from trying to access invalid memory. Instead, one can write:: + + print (quest()).width + +which performs a type check (possibly raising a :class:`TypeError`) before making the +cast and allowing the code to proceed. + +To explicitly test the type of an object, use the :meth:`isinstance` method. By default, +in Python, the :meth:`isinstance` method checks the :class:`__class__` attribute of the +first argument to determine if it is of the required type. However, this is potentially +unsafe as the :class:`__class__` attribute can be spoofed or changed, but the C structure +of an extension type must be correct to access its :keyword:`cdef` attributes and call its :keyword:`cdef` methods. Cython detects if the second argument is a known extension +type and does a type check instead, analogous to Pyrex's :meth:`typecheck`. +The old behavior is always available by passing a tuple as the second parameter:: + + print isinstance(sh, Shrubbery) # Check the type of sh + print isinstance(sh, (Shrubbery,)) # Check sh.__class__ + + +Extension types and None +========================= + +When you declare a parameter or C variable as being of an extension type, +Cython will allow it to take on the value ``None`` as well as values of its +declared type. This is analogous to the way a C pointer can take on the value +``NULL``, and you need to exercise the same caution because of it. There is no +problem as long as you are performing Python operations on it, because full +dynamic type checking will be applied. However, when you access C attributes +of an extension type (as in the widen_shrubbery function above), it's up to +you to make sure the reference you're using is not ``None`` -- in the +interests of efficiency, Cython does not check this. + +You need to be particularly careful when exposing Python functions which take +extension types as arguments. If we wanted to make :func:`widen_shrubbery` a +Python function, for example, if we simply wrote:: + + def widen_shrubbery(Shrubbery sh, extra_width): # This is + sh.width = sh.width + extra_width # dangerous! + +then users of our module could crash it by passing ``None`` for the ``sh`` +parameter. + +One way to fix this would be:: + + def widen_shrubbery(Shrubbery sh, extra_width): + if sh is None: + raise TypeError + sh.width = sh.width + extra_width + +but since this is anticipated to be such a frequent requirement, Cython +provides a more convenient way. Parameters of a Python function declared as an +extension type can have a ``not None`` clause:: + + def widen_shrubbery(Shrubbery sh not None, extra_width): + sh.width = sh.width + extra_width + +Now the function will automatically check that ``sh`` is ``not None`` along +with checking that it has the right type. + +.. note:: + + ``not None`` clause can only be used in Python functions (defined with + :keyword:`def`) and not C functions (defined with :keyword:`cdef`). If + you need to check whether a parameter to a C function is None, you will + need to do it yourself. + +.. note:: + + Some more things: + + * The self parameter of a method of an extension type is guaranteed never to + be ``None``. + * When comparing a value with ``None``, keep in mind that, if ``x`` is a Python + object, ``x is None`` and ``x is not None`` are very efficient because they + translate directly to C pointer comparisons, whereas ``x == None`` and + ``x != None``, or simply using ``x`` as a boolean value (as in ``if x: ...``) + will invoke Python operations and therefore be much slower. + +Special methods +================ + +Although the principles are similar, there are substantial differences between +many of the :meth:`__xxx__` special methods of extension types and their Python +counterparts. There is a :ref:`separate page ` devoted to this subject, and you should +read it carefully before attempting to use any special methods in your +extension types. + +Properties +============ + +There is a special syntax for defining properties in an extension class:: + + cdef class Spam: + + property cheese: + + "A doc string can go here." + + def __get__(self): + # This is called when the property is read. + ... + + def __set__(self, value): + # This is called when the property is written. + ... + + def __del__(self): + # This is called when the property is deleted. + + +The :meth:`__get__`, :meth:`__set__` and :meth:`__del__` methods are all +optional; if they are omitted, an exception will be raised when the +corresponding operation is attempted. + +Here's a complete example. It defines a property which adds to a list each +time it is written to, returns the list when it is read, and empties the list +when it is deleted.:: + + # cheesy.pyx + cdef class CheeseShop: + + cdef object cheeses + + def __cinit__(self): + self.cheeses = [] + + property cheese: + + def __get__(self): + return "We don't have: %s" % self.cheeses + + def __set__(self, value): + self.cheeses.append(value) + + def __del__(self): + del self.cheeses[:] + + # Test input + from cheesy import CheeseShop + + shop = CheeseShop() + print shop.cheese + + shop.cheese = "camembert" + print shop.cheese + + shop.cheese = "cheddar" + print shop.cheese + + del shop.cheese + print shop.cheese + +.. sourcecode:: text + + # Test output + We don't have: [] + We don't have: ['camembert'] + We don't have: ['camembert', 'cheddar'] + We don't have: [] + +Subclassing +============= + +An extension type may inherit from a built-in type or another extension type:: + + cdef class Parrot: + ... + + cdef class Norwegian(Parrot): + ... + + +A complete definition of the base type must be available to Cython, so if the +base type is a built-in type, it must have been previously declared as an +extern extension type. If the base type is defined in another Cython module, it +must either be declared as an extern extension type or imported using the +:keyword:`cimport` statement. + +An extension type can only have one base class (no multiple inheritance). + +Cython extension types can also be subclassed in Python. A Python class can +inherit from multiple extension types provided that the usual Python rules for +multiple inheritance are followed (i.e. the C layouts of all the base classes +must be compatible). + +C methods +========= +Extension types can have C methods as well as Python methods. Like C +functions, C methods are declared using :keyword:`cdef` or :keyword:`cpdef` instead of +:keyword:`def`. C methods are "virtual", and may be overridden in derived +extension types.:: + + # pets.pyx + cdef class Parrot: + + cdef void describe(self): + print "This parrot is resting." + + cdef class Norwegian(Parrot): + + cdef void describe(self): + Parrot.describe(self) + print "Lovely plumage!" + + + cdef Parrot p1, p2 + p1 = Parrot() + p2 = Norwegian() + print "p1:" + p1.describe() + print "p2:" + p2.describe() + +.. sourcecode:: text + + # Output + p1: + This parrot is resting. + p2: + This parrot is resting. + Lovely plumage! + +The above example also illustrates that a C method can call an inherited C +method using the usual Python technique, i.e.:: + + Parrot.describe(self) + +Forward-declaring extension types +=================================== + +Extension types can be forward-declared, like :keyword:`struct` and +:keyword:`union` types. This will be necessary if you have two extension types +that need to refer to each other, e.g.:: + + cdef class Shrubbery # forward declaration + + cdef class Shrubber: + cdef Shrubbery work_in_progress + + cdef class Shrubbery: + cdef Shrubber creator + +If you are forward-declaring an extension type that has a base class, you must +specify the base class in both the forward declaration and its subsequent +definition, for example,:: + + cdef class A(B) + + ... + + cdef class A(B): + # attributes and methods + +Making extension types weak-referenceable +========================================== + +By default, extension types do not support having weak references made to +them. You can enable weak referencing by declaring a C attribute of type +object called :attr:`__weakref__`. For example,:: + + cdef class ExplodingAnimal: + """This animal will self-destruct when it is + no longer strongly referenced.""" + + cdef object __weakref__ + +Public and external extension types +==================================== + +Extension types can be declared extern or public. An extern extension type +declaration makes an extension type defined in external C code available to a +Cython module. A public extension type declaration makes an extension type +defined in a Cython module available to external C code. + +External extension types +------------------------ + +An extern extension type allows you to gain access to the internals of Python +objects defined in the Python core or in a non-Cython extension module. + +.. note:: + + In previous versions of Pyrex, extern extension types were also used to + reference extension types defined in another Pyrex module. While you can still + do that, Cython provides a better mechanism for this. See + :ref:`sharing-declarations`. + +Here is an example which will let you get at the C-level members of the +built-in complex object.:: + + cdef extern from "complexobject.h": + + struct Py_complex: + double real + double imag + + ctypedef class __builtin__.complex [object PyComplexObject]: + cdef Py_complex cval + + # A function which uses the above type + def spam(complex c): + print "Real:", c.cval.real + print "Imag:", c.cval.imag + +.. note:: + + Some important things: + + 1. In this example, :keyword:`ctypedef` class has been used. This is + because, in the Python header files, the ``PyComplexObject`` struct is + declared with: + + .. sourcecode:: c + + ctypedef struct { + ... + } PyComplexObject; + + 2. As well as the name of the extension type, the module in which its type + object can be found is also specified. See the implicit importing section + below. + + 3. When declaring an external extension type, you don't declare any + methods. Declaration of methods is not required in order to call them, + because the calls are Python method calls. Also, as with + :keyword:`structs` and :keyword:`unions`, if your extension class + declaration is inside a :keyword:`cdef` extern from block, you only need to + declare those C members which you wish to access. + +Name specification clause +------------------------- + +The part of the class declaration in square brackets is a special feature only +available for extern or public extension types. The full form of this clause +is:: + + [object object_struct_name, type type_object_name ] + +where ``object_struct_name`` is the name to assume for the type's C struct, +and type_object_name is the name to assume for the type's statically declared +type object. (The object and type clauses can be written in either order.) + +If the extension type declaration is inside a :keyword:`cdef` extern from +block, the object clause is required, because Cython must be able to generate +code that is compatible with the declarations in the header file. Otherwise, +for extern extension types, the object clause is optional. + +For public extension types, the object and type clauses are both required, +because Cython must be able to generate code that is compatible with external C +code. + +Implicit importing +------------------ + +Cython requires you to include a module name in an extern extension class +declaration, for example,:: + + cdef extern class MyModule.Spam: + ... + +The type object will be implicitly imported from the specified module and +bound to the corresponding name in this module. In other words, in this +example an implicit:: + + from MyModule import Spam + +statement will be executed at module load time. + +The module name can be a dotted name to refer to a module inside a package +hierarchy, for example,:: + + cdef extern class My.Nested.Package.Spam: + ... + +You can also specify an alternative name under which to import the type using +an as clause, for example,:: + + cdef extern class My.Nested.Package.Spam as Yummy: + ... + +which corresponds to the implicit import statement:: + + from My.Nested.Package import Spam as Yummy + +Type names vs. constructor names +-------------------------------- + +Inside a Cython module, the name of an extension type serves two distinct +purposes. When used in an expression, it refers to a module-level global +variable holding the type's constructor (i.e. its type-object). However, it +can also be used as a C type name to declare variables, arguments and return +values of that type. + +When you declare:: + + cdef extern class MyModule.Spam: + ... + +the name Spam serves both these roles. There may be other names by which you +can refer to the constructor, but only Spam can be used as a type name. For +example, if you were to explicity import MyModule, you could use +``MyModule.Spam()`` to create a Spam instance, but you wouldn't be able to use +:class:`MyModule.Spam` as a type name. + +When an as clause is used, the name specified in the as clause also takes over +both roles. So if you declare:: + + cdef extern class MyModule.Spam as Yummy: + ... + +then Yummy becomes both the type name and a name for the constructor. Again, +there are other ways that you could get hold of the constructor, but only +Yummy is usable as a type name. + +Public extension types +====================== + +An extension type can be declared public, in which case a ``.h`` file is +generated containing declarations for its object struct and type object. By +including the ``.h`` file in external C code that you write, that code can +access the attributes of the extension type. + + + diff --git a/src/userguide/external_C_code.rst b/src/userguide/external_C_code.rst new file mode 100644 index 00000000..14910c28 --- /dev/null +++ b/src/userguide/external_C_code.rst @@ -0,0 +1,465 @@ +.. highlight:: cython + +.. _external-C-code: + +********************************** +Interfacing with External C Code +********************************** + +One of the main uses of Cython is wrapping existing libraries of C code. This +is achieved by using external declarations to declare the C functions and +variables from the library that you want to use. + +You can also use public declarations to make C functions and variables defined +in a Cython module available to external C code. The need for this is expected +to be less frequent, but you might want to do it, for example, if you are +`embedding Python`_ in another application as a scripting language. Just as a +Cython module can be used as a bridge to allow Python code to call C code, it +can also be used to allow C code to call Python code. + +.. _embedding Python: http://www.freenet.org.nz/python/embeddingpyrex/ + +External declarations +======================= + +By default, C functions and variables declared at the module level are local +to the module (i.e. they have the C static storage class). They can also be +declared extern to specify that they are defined elsewhere, for example:: + + cdef extern int spam_counter + + cdef extern void order_spam(int tons) + +Referencing C header files +--------------------------- + +When you use an extern definition on its own as in the examples above, Cython +includes a declaration for it in the generated C file. This can cause problems +if the declaration doesn't exactly match the declaration that will be seen by +other C code. If you're wrapping an existing C library, for example, it's +important that the generated C code is compiled with exactly the same +declarations as the rest of the library. + +To achieve this, you can tell Cython that the declarations are to be found in a +C header file, like this:: + + cdef extern from "spam.h": + + int spam_counter + + void order_spam(int tons) + +The ``cdef extern`` from clause does three things: + +1. It directs Cython to place a ``#include`` statement for the named header file in + the generated C code. +2. It prevents Cython from generating any C code + for the declarations found in the associated block. +3. It treats all declarations within the block as though they started with + ``cdef extern``. + +It's important to understand that Cython does not itself read the C header +file, so you still need to provide Cython versions of any declarations from it +that you use. However, the Cython declarations don't always have to exactly +match the C ones, and in some cases they shouldn't or can't. In particular: + +1. Don't use ``const``. Cython doesn't know anything about ``const``, so just + leave it out. Most of the time this shouldn't cause any problem, although + on rare occasions you might have to use a cast. You can also explicitly + declare something like:: + + ctypedef char* const_char_ptr "const char*" + + though in most cases this will not be needed. + + .. warning:: + + A problem with const could arise if you have something like:: + + cdef extern from "grail.h": + char *nun + + where grail.h actually contains:: + + extern const char *nun; + + and you do:: + + cdef void languissement(char *s): + #something that doesn't change s + + ... + + languissement(nun) + + which will cause the C compiler to complain. You can work around it by + casting away the constness:: + + languissement(nun) + +2. Leave out any platform-specific extensions to C declarations such as + ``__declspec()``. + +3. If the header file declares a big struct and you only want to use a few + members, you only need to declare the members you're interested in. Leaving + the rest out doesn't do any harm, because the C compiler will use the full + definition from the header file. + + In some cases, you might not need any of the struct's members, in which + case you can just put pass in the body of the struct declaration, e.g.:: + + cdef extern from "foo.h": + struct spam: + pass + + .. note:: + + you can only do this inside a ``cdef extern from`` block; struct + declarations anywhere else must be non-empty. + + + +4. If the header file uses ``typedef`` names such as :ctype:`word` to refer + to platform-dependent flavours of numeric types, you will need a + corresponding :keyword:`ctypedef` statement, but you don't need to match + the type exactly, just use something of the right general kind (int, float, + etc). For example,:: + + ctypedef int word + + will work okay whatever the actual size of a :ctype:`word ` is (provided the header + file defines it correctly). Conversion to and from Python types, if any, will also + be used for this new type. + +5. If the header file uses macros to define constants, translate them into a + dummy ``enum`` declaration. + +6. If the header file defines a function using a macro, declare it as though + it were an ordinary function, with appropriate argument and result types. + +7. For archaic reasons C uses the keyword :keyword:`void` to declare a function + taking no parameters. In Cython as in Python, simply declare such functions + as :meth:`foo()`. + +A few more tricks and tips: + +* If you want to include a C header because it's needed by another header, but + don't want to use any declarations from it, put pass in the extern-from + block:: + + cdef extern from "spam.h": + pass + +* If you want to include some external declarations, but don't want to specify + a header file (because it's included by some other header that you've + already included) you can put ``*`` in place of the header file name:: + + cdef extern from *: + ... + +Styles of struct, union and enum declaration +---------------------------------------------- + +There are two main ways that structs, unions and enums can be declared in C +header files: using a tag name, or using a typedef. There are also some +variations based on various combinations of these. + +It's important to make the Cython declarations match the style used in the +header file, so that Cython can emit the right sort of references to the type +in the code it generates. To make this possible, Cython provides two different +syntaxes for declaring a struct, union or enum type. The style introduced +above corresponds to the use of a tag name. To get the other style, you prefix +the declaration with :keyword:`ctypedef`, as illustrated below. + +The following table shows the various possible styles that can be found in a +header file, and the corresponding Cython declaration that you should put in +the ``cdef extern`` from block. Struct declarations are used as an example; the +same applies equally to union and enum declarations. + ++-------------------------+---------------------------------------------+-----------------------------------------------------------------------+ +| C code | Possibilities for corresponding Cython Code | Comments | ++=========================+=============================================+=======================================================================+ +| .. sourcecode:: c | :: | Cython will refer to the as ``struct Foo`` in the generated C code. | +| | | | +| struct Foo { | cdef struct Foo: | | +| ... | ... | | +| }; | | | ++-------------------------+---------------------------------------------+-----------------------------------------------------------------------+ +| .. sourcecode:: c | :: | Cython will refer to the type simply as ``Foo`` in | +| | | the generated C code. | +| typedef struct { | ctypedef struct Foo: | | +| ... | ... | | +| } Foo; | | | ++-------------------------+---------------------------------------------+-----------------------------------------------------------------------+ +| .. sourcecode:: c | :: | If the C header uses both a tag and a typedef with *different* | +| | | names, you can use either form of declaration in Cython | +| typedef struct foo { | cdef struct foo: | (although if you need to forward reference the type, | +| ... | ... | you'll have to use the first form). | +| } Foo; | ctypedef foo Foo #optional | | +| | | | +| | or:: | | +| | | | +| | ctypedef struct Foo: | | +| | ... | | ++-------------------------+---------------------------------------------+-----------------------------------------------------------------------+ +| .. sourcecode:: c | :: | If the header uses the *same* name for the tag and typedef, you | +| | | won't be able to include a :keyword:`ctypedef` for it -- but then, | +| typedef struct Foo { | cdef struct Foo: | it's not necessary. | +| ... | ... | | +| } Foo; | | | ++-------------------------+---------------------------------------------+-----------------------------------------------------------------------+ + +Note that in all the cases below, you refer to the type in Cython code simply +as :ctype:`Foo`, not ``struct Foo``. + +Accessing Python/C API routines +--------------------------------- + +One particular use of the ``cdef extern from`` statement is for gaining access to +routines in the Python/C API. For example,:: + + cdef extern from "Python.h": + + object PyString_FromStringAndSize(char *s, Py_ssize_t len) + +will allow you to create Python strings containing null bytes. + +Special Types +-------------- + +Cython predefines the name ``Py_ssize_t`` for use with Python/C API routines. To +make your extensions compatible with 64-bit systems, you should always use +this type where it is specified in the documentation of Python/C API routines. + +Windows Calling Conventions +---------------------------- + +The ``__stdcall`` and ``__cdecl`` calling convention specifiers can be used in +Cython, with the same syntax as used by C compilers on Windows, for example,:: + + cdef extern int __stdcall FrobnicateWindow(long handle) + + cdef void (__stdcall *callback)(void *) + +If ``__stdcall`` is used, the function is only considered compatible with +other ``__stdcall`` functions of the same signature. + +Resolving naming conflicts - C name specifications +---------------------------------------------------- + +Each Cython module has a single module-level namespace for both Python and C +names. This can be inconvenient if you want to wrap some external C functions +and provide the Python user with Python functions of the same names. + +Cython provides a couple of different ways of solving this problem. The +best way, especially if you have many C functions to wrap, is probably to put +the extern C function declarations into a different namespace using the +facilities described in the section on sharing declarations between Cython +modules. + +The other way is to use a C name specification to give different Cython and C +names to the C function. Suppose, for example, that you want to wrap an +external function called :func:`eject_tomato`. If you declare it as:: + + cdef extern void c_eject_tomato "eject_tomato" (float speed) + +then its name inside the Cython module will be ``c_eject_tomato``, whereas its name +in C will be ``eject_tomato``. You can then wrap it with:: + + def eject_tomato(speed): + c_eject_tomato(speed) + +so that users of your module can refer to it as ``eject_tomato``. + +Another use for this feature is referring to external names that happen to be +Cython keywords. For example, if you want to call an external function called +print, you can rename it to something else in your Cython module. + +As well as functions, C names can be specified for variables, structs, unions, +enums, struct and union members, and enum values. For example,:: + + cdef extern int one "ein", two "zwei" + cdef extern float three "drei" + + cdef struct spam "SPAM": + int i "eye" + + cdef enum surprise "inquisition": + first "alpha" + second "beta" = 3 + +Using Cython Declarations from C +================================== + +Cython provides two methods for making C declarations from a Cython module +available for use by external C code---public declarations and C API +declarations. + +.. note:: + + You do not need to use either of these to make declarations from one + Cython module available to another Cython module – you should use the + :keyword:`cimport` statement for that. Sharing Declarations Between Cython Modules. + +Public Declarations +--------------------- + +You can make C types, variables and functions defined in a Cython module +accessible to C code that is linked with the module, by declaring them with +the public keyword:: + + cdef public struct Bunny: # public type declaration + int vorpalness + + cdef public int spam # public variable declaration + + cdef public void grail(Bunny *): # public function declaration + ... + +If there are any public declarations in a Cython module, a header file called +:file:`modulename.h` file is generated containing equivalent C declarations for +inclusion in other C code. + +Any C code wanting to make use of these declarations will need to be linked, +either statically or dynamically, with the extension module. + +If the Cython module resides within a package, then the name of the ``.h`` +file consists of the full dotted name of the module, e.g. a module called +:mod:`foo.spam` would have a header file called :file:`foo.spam.h`. + +C API Declarations +------------------- + +The other way of making declarations available to C code is to declare them +with the :keyword:`api` keyword. You can use this keyword with C functions and +extension types. A header file called :file:`modulename_api.h` is produced +containing declarations of the functions and extension types, and a function +called :func:`import_modulename`. + +C code wanting to use these functions or extension types needs to include the +header and call the :func:`import_modulename` function. The other functions +can then be called and the extension types used as usual. + +Any public C type or extension type declarations in the Cython module are also +made available when you include :file:`modulename_api.h`.:: + + # delorean.pyx + cdef public struct Vehicle: + int speed + float power + + cdef api void activate(Vehicle *v): + if v.speed >= 88 and v.power >= 1.21: + print "Time travel achieved" + +.. sourcecode:: c + + # marty.c + #include "delorean_api.h" + + Vehicle car; + + int main(int argc, char *argv[]) { + import_delorean(); + car.speed = atoi(argv[1]); + car.power = atof(argv[2]); + activate(&car); + } + +.. note:: + + Any types defined in the Cython module that are used as argument or + return types of the exported functions will need to be declared public, + otherwise they won't be included in the generated header file, and you will + get errors when you try to compile a C file that uses the header. + +Using the :keyword:`api` method does not require the C code using the +declarations to be linked with the extension module in any way, as the Python +import machinery is used to make the connection dynamically. However, only +functions can be accessed this way, not variables. + +You can use both :keyword:`public` and :keyword:`api` on the same function to +make it available by both methods, e.g.:: + + cdef public api void belt_and_braces(): + ... + +However, note that you should include either :file:`modulename.h` or +:file:`modulename_api.h` in a given C file, not both, otherwise you may get +conflicting dual definitions. + +If the Cython module resides within a package, then: + +* The name of the header file contains of the full dotted name of the module. +* The name of the importing function contains the full name with dots replaced + by double underscores. + +E.g. a module called :mod:`foo.spam` would have an API header file called +:file:`foo.spam_api.h` and an importing function called +:func:`import_foo__spam`. + +Multiple public and API declarations +-------------------------------------- + +You can declare a whole group of items as :keyword:`public` and/or +:keyword:`api` all at once by enclosing them in a :keyword:`cdef` block, for +example,:: + + cdef public api: + void order_spam(int tons) + char *get_lunch(float tomato_size) + +This can be a useful thing to do in a ``.pxd`` file (see +:ref:`sharing-declarations`) to make the module's public interface +available by all three methods. + +Acquiring and Releasing the GIL +--------------------------------- + +Cython provides facilities for releasing the Global Interpreter Lock (GIL) +before calling C code, and for acquiring the GIL in functions that are to be +called back from C code that is executed without the GIL. + +Releasing the GIL +^^^^^^^^^^^^^^^^^ + +You can release the GIL around a section of code using the +:keyword:`with nogil` statement:: + + with nogil: + + +Code in the body of the statement must not manipulate Python objects in any +way, and must not call anything that manipulates Python objects without first +re-acquiring the GIL. Cython currently does not check this. + +Acquiring the GIL +^^^^^^^^^^^^^^^^^ + +A C function that is to be used as a callback from C code that is executed +without the GIL needs to acquire the GIL before it can manipulate Python +objects. This can be done by specifying with :keyword:`gil` in the function +header:: + + cdef void my_callback(void *data) with gil: + ... + +Declaring a function as callable without the GIL +-------------------------------------------------- + +You can specify :keyword:`nogil` in a C function header or function type to +declare that it is safe to call without the GIL.:: + + cdef void my_gil_free_func(int spam) nogil: + ... + +If you are implementing such a function in Cython, it cannot have any Python +arguments, Python local variables, or Python return type, and cannot +manipulate Python objects in any way or call any function that does so without +acquiring the GIL first. Some of these restrictions are currently checked by +Cython, but not all. It is possible that more stringent checking will be +performed in the future. + +Declaring a function with :keyword:`gil` also implicitly makes its signature +:keyword:`nogil`. + diff --git a/src/userguide/index.rst b/src/userguide/index.rst new file mode 100644 index 00000000..1d24a140 --- /dev/null +++ b/src/userguide/index.rst @@ -0,0 +1,33 @@ + +Welcome to Cython's Users Guide +================================= + +Contents: + +.. toctree:: + :maxdepth: 2 + + overview + tutorial + language_basics + extension_types + special_methods + sharing_declarations + external_C_code + source_files_and_compilation + wrapping_CPlusPlus + numpy_tutorial + profiling_tutorial + limitations + pyrex_differences + early_binding_for_speed + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` + +.. toctree:: + diff --git a/src/userguide/language_basics.rst b/src/userguide/language_basics.rst new file mode 100644 index 00000000..3cbbcc04 --- /dev/null +++ b/src/userguide/language_basics.rst @@ -0,0 +1,567 @@ +.. highlight:: cython + +.. _language-basics: + +***************** +Language Basics +***************** + +C variable and type definitions +=============================== + +The :keyword:`cdef` statement is used to declare C variables, either local or +module-level:: + + cdef int i, j, k + cdef float f, g[42], *h + +and C :keyword:`struct`, :keyword:`union` or :keyword:`enum` types:: + + cdef struct Grail: + int age + float volume + + cdef union Food: + char *spam + float *eggs + + cdef enum CheeseType: + cheddar, edam, + camembert + + cdef enum CheeseState: + hard = 1 + soft = 2 + runny = 3 + +There is currently no special syntax for defining a constant, but you can use +an anonymous :keyword:`enum` declaration for this purpose, for example,:: + + cdef enum: + tons_of_spam = 3 + +.. note:: + the words ``struct``, ``union`` and ``enum`` are used only when + defining a type, not when referring to it. For example, to declare a variable + pointing to a ``Grail`` you would write:: + + cdef Grail *gp + + and not:: + + cdef struct Grail *gp # WRONG + + There is also a ``ctypedef`` statement for giving names to types, e.g.:: + + ctypedef unsigned long ULong + + ctypedef int *IntPtr + +Grouping multiple C declarations +-------------------------------- + +If you have a series of declarations that all begin with :keyword:`cdef`, you +can group them into a :keyword:`cdef` block like this:: + + cdef: + struct Spam: + int tons + + int i + float f + Spam *p + + void f(Spam *s): + print s.tons, "Tons of spam" + + +Python functions vs. C functions +================================== + +There are two kinds of function definition in Cython: + +Python functions are defined using the def statement, as in Python. They take +Python objects as parameters and return Python objects. + +C functions are defined using the new :keyword:`cdef` statement. They take +either Python objects or C values as parameters, and can return either Python +objects or C values. + +Within a Cython module, Python functions and C functions can call each other +freely, but only Python functions can be called from outside the module by +interpreted Python code. So, any functions that you want to "export" from your +Cython module must be declared as Python functions using def. +There is also a hybrid function, called :keyword:`cpdef`. A :keyword:`cpdef` +can be called from anywhere, but uses the faster C calling conventions +when being called from other Cython code. + +Parameters of either type of function can be declared to have C data types, +using normal C declaration syntax. For example,:: + + def spam(int i, char *s): + ... + + cdef int eggs(unsigned long l, float f): + ... + +When a parameter of a Python function is declared to have a C data type, it is +passed in as a Python object and automatically converted to a C value, if +possible. Automatic conversion is currently only possible for numeric types +and string types; attempting to use any other type for the parameter of a +Python function will result in a compile-time error. + +C functions, on the other hand, can have parameters of any type, since they're +passed in directly using a normal C function call. + +A more complete comparison of the pros and cons of these different method +types can be found at :ref:`early-binding-for-speed`. + +Python objects as parameters and return values +---------------------------------------------- + +If no type is specified for a parameter or return value, it is assumed to be a +Python object. (Note that this is different from the C convention, where it +would default to int.) For example, the following defines a C function that +takes two Python objects as parameters and returns a Python object:: + + cdef spamobjs(x, y): + ... + +Reference counting for these objects is performed automatically according to +the standard Python/C API rules (i.e. borrowed references are taken as +parameters and a new reference is returned). + +The name object can also be used to explicitly declare something as a Python +object. This can be useful if the name being declared would otherwise be taken +as the name of a type, for example,:: + + cdef ftang(object int): + ... + +declares a parameter called int which is a Python object. You can also use +object as the explicit return type of a function, e.g.:: + + cdef object ftang(object int): + ... + +In the interests of clarity, it is probably a good idea to always be explicit +about object parameters in C functions. + + +Error return values +------------------- + +If you don't do anything special, a function declared with :keyword:`cdef` that +does not return a Python object has no way of reporting Python exceptions to +its caller. If an exception is detected in such a function, a warning message +is printed and the exception is ignored. + +If you want a C function that does not return a Python object to be able to +propagate exceptions to its caller, you need to declare an exception value for +it. Here is an example:: + + cdef int spam() except -1: + ... + +With this declaration, whenever an exception occurs inside spam, it will +immediately return with the value ``-1``. Furthermore, whenever a call to spam +returns ``-1``, an exception will be assumed to have occurred and will be +propagated. + +When you declare an exception value for a function, you should never +explicitly return that value. If all possible return values are legal and you +can't reserve one entirely for signalling errors, you can use an alternative +form of exception value declaration:: + + cdef int spam() except? -1: + ... + +The "?" indicates that the value ``-1`` only indicates a possible error. In this +case, Cython generates a call to :cfunc:`PyErr_Occurred` if the exception value is +returned, to make sure it really is an error. + +There is also a third form of exception value declaration:: + + cdef int spam() except *: + ... + +This form causes Cython to generate a call to :cfunc:`PyErr_Occurred` after +every call to spam, regardless of what value it returns. If you have a +function returning void that needs to propagate errors, you will have to use +this form, since there isn't any return value to test. +Otherwise there is little use for this form. + +An external C++ function that may raise an exception can be declared with:: + + cdef int spam() except + + +See :ref:`wrapping-cplusplus` for more details. + +Some things to note: + +* Exception values can only declared for functions returning an integer, enum, + float or pointer type, and the value must be a constant expression. + Void functions can only use the ``except *`` form. +* The exception value specification is part of the signature of the function. + If you're passing a pointer to a function as a parameter or assigning it + to a variable, the declared type of the parameter or variable must have + the same exception value specification (or lack thereof). Here is an + example of a pointer-to-function declaration with an exception + value:: + + int (*grail)(int, char *) except -1 + +* You don't need to (and shouldn't) declare exception values for functions + which return Python objects. Remember that a function with no declared + return type implicitly returns a Python object. (Exceptions on such functions + are implicitly propagated by returning NULL.) + +Checking return values of non-Cython functions +---------------------------------------------- + +It's important to understand that the except clause does not cause an error to +be raised when the specified value is returned. For example, you can't write +something like:: + + cdef extern FILE *fopen(char *filename, char *mode) except NULL # WRONG! + +and expect an exception to be automatically raised if a call to :func:`fopen` +returns ``NULL``. The except clause doesn't work that way; its only purpose is +for propagating Python exceptions that have already been raised, either by a Cython +function or a C function that calls Python/C API routines. To get an exception +from a non-Python-aware function such as :func:`fopen`, you will have to check the +return value and raise it yourself, for example,:: + + cdef FILE *p + p = fopen("spam.txt", "r") + if p == NULL: + raise SpamError("Couldn't open the spam file") + + +Automatic type conversions +========================== + +In most situations, automatic conversions will be performed for the basic +numeric and string types when a Python object is used in a context requiring a +C value, or vice versa. The following table summarises the conversion +possibilities. + ++----------------------------+--------------------+------------------+ +| C types | From Python types | To Python types | ++============================+====================+==================+ +| [unsigned] char | int, long | int | +| [unsigned] short | | | +| int, long | | | ++----------------------------+--------------------+------------------+ +| unsigned int | int, long | long | +| unsigned long | | | +| [unsigned] long long | | | ++----------------------------+--------------------+------------------+ +| float, double, long double | int, long, float | float | ++----------------------------+--------------------+------------------+ +| char * | str/bytes | str/bytes [#]_ | ++----------------------------+--------------------+------------------+ +| struct | | dict | ++----------------------------+--------------------+------------------+ + +.. [#] The conversion is to/from str for Python 2.x, and bytes for Python 3.x. + +Caveats when using a Python string in a C context +------------------------------------------------- + +You need to be careful when using a Python string in a context expecting a +``char *``. In this situation, a pointer to the contents of the Python string is +used, which is only valid as long as the Python string exists. So you need to +make sure that a reference to the original Python string is held for as long +as the C string is needed. If you can't guarantee that the Python string will +live long enough, you will need to copy the C string. + +Cython detects and prevents some mistakes of this kind. For instance, if you +attempt something like:: + + cdef char *s + s = pystring1 + pystring2 + +then Cython will produce the error message ``Obtaining char * from temporary +Python value``. The reason is that concatenating the two Python strings +produces a new Python string object that is referenced only by a temporary +internal variable that Cython generates. As soon as the statement has finished, +the temporary variable will be decrefed and the Python string deallocated, +leaving ``s`` dangling. Since this code could not possibly work, Cython refuses to +compile it. + +The solution is to assign the result of the concatenation to a Python +variable, and then obtain the ``char *`` from that, i.e.:: + + cdef char *s + p = pystring1 + pystring2 + s = p + +It is then your responsibility to hold the reference p for as long as +necessary. + +Keep in mind that the rules used to detect such errors are only heuristics. +Sometimes Cython will complain unnecessarily, and sometimes it will fail to +detect a problem that exists. Ultimately, you need to understand the issue and +be careful what you do. + +Statements and expressions +========================== + +Control structures and expressions follow Python syntax for the most part. +When applied to Python objects, they have the same semantics as in Python +(unless otherwise noted). Most of the Python operators can also be applied to +C values, with the obvious semantics. + +If Python objects and C values are mixed in an expression, conversions are +performed automatically between Python objects and C numeric or string types. + +Reference counts are maintained automatically for all Python objects, and all +Python operations are automatically checked for errors, with appropriate +action taken. + +Differences between C and Cython expressions +-------------------------------------------- + +There are some differences in syntax and semantics between C expressions and +Cython expressions, particularly in the area of C constructs which have no +direct equivalent in Python. + +* An integer literal is treated as a C constant, and will + be truncated to whatever size your C compiler thinks appropriate. + To get a Python integer (of arbitrary precision) cast immediately to + an object (e.g. ``100000000000000000000``). The ``L``, ``LL``, + and ``U`` suffixes have the same meaning as in C. +* There is no ``->`` operator in Cython. Instead of ``p->x``, use ``p.x`` +* There is no unary ``*`` operator in Cython. Instead of ``*p``, use ``p[0]`` +* There is an ``&`` operator, with the same semantics as in C. +* The null C pointer is called ``NULL``, not ``0`` (and ``NULL`` is a reserved word). +* Type casts are written ``value`` , for example:: + + cdef char *p, float *q + p = q + +Scope rules +----------- + +Cython determines whether a variable belongs to a local scope, the module +scope, or the built-in scope completely statically. As with Python, assigning +to a variable which is not otherwise declared implicitly declares it to be a +Python variable residing in the scope where it is assigned. + +.. note:: + A consequence of these rules is that the module-level scope behaves the + same way as a Python local scope if you refer to a variable before assigning + to it. In particular, tricks such as the following will not work in Cython:: + + try: + x = True + except NameError: + True = 1 + + because, due to the assignment, the True will always be looked up in the + module-level scope. You would have to do something like this instead:: + + import __builtin__ + try: + True = __builtin__.True + except AttributeError: + True = 1 + + +Built-in Functions +------------------ + +Cython compiles calls to the following built-in functions into direct calls to +the corresponding Python/C API routines, making them particularly fast. + ++------------------------------+-------------+----------------------------+ +| Function and arguments | Return type | Python/C API Equivalent | ++==============================+=============+============================+ +| abs(obj) | object | PyNumber_Absolute | ++------------------------------+-------------+----------------------------+ +| delattr(obj, name) | int | PyObject_DelAttr | ++------------------------------+-------------+----------------------------+ +| dir(obj) | object | PyObject_Dir | +| getattr(obj, name) (Note 1) | | | +| getattr3(obj, name, default) | | | ++------------------------------+-------------+----------------------------+ +| hasattr(obj, name) | int | PyObject_HasAttr | ++------------------------------+-------------+----------------------------+ +| hash(obj) | int | PyObject_Hash | ++------------------------------+-------------+----------------------------+ +| intern(obj) | object | PyObject_InternFromString | ++------------------------------+-------------+----------------------------+ +| isinstance(obj, type) | int | PyObject_IsInstance | ++------------------------------+-------------+----------------------------+ +| issubclass(obj, type) | int | PyObject_IsSubclass | ++------------------------------+-------------+----------------------------+ +| iter(obj) | object | PyObject_GetIter | ++------------------------------+-------------+----------------------------+ +| len(obj) | Py_ssize_t | PyObject_Length | ++------------------------------+-------------+----------------------------+ +| pow(x, y, z) (Note 2) | object | PyNumber_Power | ++------------------------------+-------------+----------------------------+ +| reload(obj) | object | PyImport_ReloadModule | ++------------------------------+-------------+----------------------------+ +| repr(obj) | object | PyObject_Repr | ++------------------------------+-------------+----------------------------+ +| setattr(obj, name) | void | PyObject_SetAttr | ++------------------------------+-------------+----------------------------+ + +Note 1: There are two different functions corresponding to the Python +:func:`getattr` depending on whether a third argument is used. In a Python +context, they both evaluate to the Python :func:`getattr` function. + +Note 2: Only the three-argument form of :func:`pow` is supported. Use the +``**`` operator otherwise. + +Only direct function calls using these names are optimised. If you do +something else with one of these names that assumes it's a Python object, such +as assign it to a Python variable, and later call it, the call will be made as +a Python function call. + + +Operator Precedence +------------------- + +Keep in mind that there are some differences in operator precedence between +Python and C, and that Cython uses the Python precedences, not the C ones. + +Integer for-loops +------------------ + +Cython recognises the usual Python for-in-range integer loop pattern:: + + for i in range(n): + ... + +If ``i`` is declared as a :keyword:`cdef` integer type, it will +optimise this into a pure C loop. This restriction is required as +otherwise the generated code wouldn't be correct due to potential +integer overflows on the target architecture. If you are worried that +the loop is not being converted correctly, use the annotate feature of +the cython commandline (``-a``) to easily see the generated C code. +See :ref:`automatic-range-conversion` + +For backwards compatibility to Pyrex, Cython also supports another +form of for-loop:: + + for i from 0 <= i < n: + ... + +or:: + + for i from 0 <= i < n by s: + ... + +where ``s`` is some integer step size. + +Some things to note about the for-from loop: + +* The target expression must be a variable name. +* The name between the lower and upper bounds must be the same as the target + name. +* The direction of iteration is determined by the relations. If they are both + from the set {``<``, ``<=``} then it is upwards; if they are both from the set + {``>``, ``>=``} then it is downwards. (Any other combination is disallowed.) + +Like other Python looping statements, break and continue may be used in the +body, and the loop may have an else clause. + + +The include statement +===================== + +.. warning:: + Historically the ``include`` statement was used for sharing declarations. + Use :ref:`sharing-declarations` instead. + +A Cython source file can include material from other files using the include +statement, for example:: + + include "spamstuff.pxi" + +The contents of the named file are textually included at that point. The +included file can contain any complete statements or declarations that are +valid in the context where the include statement appears, including other +include statements. The contents of the included file should begin at an +indentation level of zero, and will be treated as though they were indented to +the level of the include statement that is including the file. + +.. note:: + + There are other mechanisms available for splitting Cython code into + separate parts that may be more appropriate in many cases. See + :ref:`sharing-declarations`. + + +Conditional Compilation +======================= + +Some features are available for conditional compilation and compile-time +constants within a Cython source file. + +Compile-Time Definitions +------------------------ + +A compile-time constant can be defined using the DEF statement:: + + DEF FavouriteFood = "spam" + DEF ArraySize = 42 + DEF OtherArraySize = 2 * ArraySize + 17 + +The right-hand side of the ``DEF`` must be a valid compile-time expression. +Such expressions are made up of literal values and names defined using ``DEF`` +statements, combined using any of the Python expression syntax. + +The following compile-time names are predefined, corresponding to the values +returned by :func:`os.uname`. + + UNAME_SYSNAME, UNAME_NODENAME, UNAME_RELEASE, + UNAME_VERSION, UNAME_MACHINE + +The following selection of builtin constants and functions are also available: + + None, True, False, + abs, bool, chr, cmp, complex, dict, divmod, enumerate, + float, hash, hex, int, len, list, long, map, max, min, + oct, ord, pow, range, reduce, repr, round, slice, str, + sum, tuple, xrange, zip + +A name defined using ``DEF`` can be used anywhere an identifier can appear, +and it is replaced with its compile-time value as though it were written into +the source at that point as a literal. For this to work, the compile-time +expression must evaluate to a Python value of type ``int``, ``long``, +``float`` or ``str``.:: + + cdef int a1[ArraySize] + cdef int a2[OtherArraySize] + print "I like", FavouriteFood + +Conditional Statements +---------------------- + +The ``IF`` statement can be used to conditionally include or exclude sections +of code at compile time. It works in a similar way to the ``#if`` preprocessor +directive in C.:: + + IF UNAME_SYSNAME == "Windows": + include "icky_definitions.pxi" + ELIF UNAME_SYSNAME == "Darwin": + include "nice_definitions.pxi" + ELIF UNAME_SYSNAME == "Linux": + include "penguin_definitions.pxi" + ELSE: + include "other_definitions.pxi" + +The ``ELIF`` and ``ELSE`` clauses are optional. An ``IF`` statement can appear +anywhere that a normal statement or declaration can appear, and it can contain +any statements or declarations that would be valid in that context, including +``DEF`` statements and other ``IF`` statements. + +The expressions in the ``IF`` and ``ELIF`` clauses must be valid compile-time +expressions as for the ``DEF`` statement, although they can evaluate to any +Python value, and the truth of the result is determined in the usual Python +way. + diff --git a/src/userguide/limitations.rst b/src/userguide/limitations.rst new file mode 100644 index 00000000..66961594 --- /dev/null +++ b/src/userguide/limitations.rst @@ -0,0 +1,101 @@ +.. highlight:: cython + +.. _cython-limitations: + +************* +Limitations +************* + +Unsupported Python Features +============================ + +One of our goals is to make Cython as compatible as possible with standard +Python. This page lists the things that work in Python but not in Cython. + +.. TODO: this limitation seems to be removed +.. :: + +.. from module import * + +.. This relies on at-runtime insertion of objects into the current namespace and +.. probably will be one of the few features never implemented (as any +.. implementation would be very slow). However, there is the --pre-import option +.. with treats all un-declared names as coming from the specified module, which +.. has the same effect as putting "from module import *" at the top-level of the +.. code. Note: the one difference is that builtins cannot be overriden in this +.. way, as the 'pre-import' scope is even higher than the builtin scope. + +Nested def statements +---------------------- +Function definitions (whether using ``def`` or ``cdef``) cannot be nested within +other function definitions. :: + + def make_func(): + def f(x): + return x*x + return f + +(work in progress) This relies on functional closures + +Generators +----------- + +Using the yield keywords. (work in progress) This relies on functional closures + + +.. TODO Not really a limitation, rather an enchancement proposal + +.. Support for builtin types +.. -------------------------- + +.. Support for statically declaring types such as list and dict and sequence +.. should be provided, and optimized code produced. + +.. This needs to be well thought-out, and I think Pyrex has some plans along +.. these lines as well. + + +Other Current Limitations +========================== + +* The :func:`globals` and :func:`locals` functions cannot be used. +* Class and function definitions cannot be placed inside control structures. + +Semantic differences between Python and Cython +---------------------------------------------- + +Behaviour of class scopes +^^^^^^^^^^^^^^^^^^^^^^^^^ + +In Python, referring to a method of a class inside the class definition, i.e. +while the class is being defined, yields a plain function object, but in +Cython it yields an unbound method [#]_. A consequence of this is that the +usual idiom for using the :func:`classmethod` and :func:`staticmethod` functions, +e.g.:: + + class Spam: + + def method(cls): + ... + + method = classmethod(method) + +will not work in Cython. This can be worked around by defining the function +outside the class, and then assigning the result of ``classmethod`` or +``staticmethod`` inside the class, i.e.:: + + def Spam_method(cls): + ... + + class Spam: + + method = classmethod(Spam_method) + +.. rubric:: Footnotes + +.. [#] The reason for the different behaviour of class scopes is that + Cython-defined Python functions are ``PyCFunction`` objects, not + ``PyFunction`` objects, and are not recognised by the machinery that creates a + bound or unbound method when a function is extracted from a class. To get + around this, Cython wraps each method in an unbound method object itself + before storing it in the class's dictionary. diff --git a/src/userguide/numpy_tutorial.rst b/src/userguide/numpy_tutorial.rst new file mode 100644 index 00000000..aafeea70 --- /dev/null +++ b/src/userguide/numpy_tutorial.rst @@ -0,0 +1,496 @@ +.. highlight:: cython + +.. _numpy_tutorial: + +************************** +Cython for NumPy users +************************** + +This tutorial is aimed at NumPy users who have no experience with Cython at +all. If you have some knowledge of Cython you may want to skip to the +''Efficient indexing'' section which explains the new improvements made in +summer 2008. + +The main scenario considered is NumPy end-use rather than NumPy/SciPy +development. The reason is that Cython is not (yet) able to support functions +that are generic with respect to datatype and the number of dimensions in a +high-level fashion. This restriction is much more severe for SciPy development +than more specific, "end-user" functions. See the last section for more +information on this. + +The style of this tutorial will not fit everybody, so you can also consider: + +* Robert Bradshaw's `slides on cython for SciPy2008 + `_ + (a higher-level and quicker introduction) +* Basic Cython documentation (see `Cython front page `_). +* ``[:enhancements/buffer:Spec for the efficient indexing]`` + +.. Note:: + The fast array access documented below is a completely new feature, and + there may be bugs waiting to be discovered. It might be a good idea to do + a manual sanity check on the C code Cython generates before using this for + serious purposes, at least until some months have passed. + +Cython at a glance +==================== + +Cython is a compiler which compiles Python-like code files to C code. Still, +''Cython is not a Python to C translator''. That is, it doesn't take your full +program and "turns it into C" -- rather, the result makes full use of the +Python runtime environment. A way of looking at it may be that your code is +still Python in that it runs within the Python runtime environment, but rather +than compiling to interpreted Python bytecode one compiles to native machine +code (but with the addition of extra syntax for easy embedding of faster +C-like code). + +This has two important consequences: + +* Speed. How much depends very much on the program involved though. Typical Python numerical programs would tend to gain very little as most time is spent in lower-level C that is used in a high-level fashion. However for-loop-style programs can gain many orders of magnitude, when typing information is added (and is so made possible as a realistic alternative). +* Easy calling into C code. One of Cython's purposes is to allow easy wrapping + of C libraries. When writing code in Cython you can call into C code as + easily as into Python code. + +Some Python constructs are not yet supported, though making Cython compile all +Python code is a stated goal (among the more important omissions are inner +functions and generator functions). + +Your Cython environment +======================== + +Using Cython consists of these steps: + +1. Write a :file:`.pyx` source file +2. Run the Cython compiler to generate a C file +3. Run a C compiler to generate a compiled library +4. Run the Python interpreter and ask it to import the module + +However there are several options to automate these steps: + +1. The `SAGE `_ mathematics software system provides + excellent support for using Cython and NumPy from an interactive command + line (like IPython) or through a notebook interface (like + Maple/Mathematica). See `this documentation + `_. +2. A version of `pyximport `_ is shipped + with Cython, so that you can import pyx-files dynamically into Python and + have them compiled automatically (See :ref:`pyximport`). +3. Cython supports distutils so that you can very easily create build scripts + which automate the process, this is the preferred method for full programs. +4. Manual compilation (see below) + +.. Note:: + If using another interactive command line environment than SAGE, like + IPython or Python itself, it is important that you restart the process + when you recompile the module. It is not enough to issue an "import" + statement again. + +Installation +============= + +Unless you are used to some other automatic method: +`download Cython `_ (0.9.8.1.1 or later), unpack it, +and run the usual ```python setup.py install``. This will install a +``cython`` executable on your system. It is also possible to use Cython from +the source directory without installing (simply launch :file:`cython.py` in the +root directory). + +As of this writing SAGE comes with an older release of Cython than required +for this tutorial. So if using SAGE you should download the newest Cython and +then execute :: + + $ cd path/to/cython-distro + $ path-to-sage/sage -python setup.py install + +This will install the newest Cython into SAGE. + +Manual compilation +==================== + +As it is always important to know what is going on, I'll describe the manual +method here. First Cython is run:: + + $ cython yourmod.pyx + +This creates :file:`yourmod.c` which is the C source for a Python extension +module. A useful additional switch is ``-a`` which will generate a document +:file:`yourmod.html`) that shows which Cython code translates to which C code +line by line. + +Then we compile the C file. This may vary according to your system, but the C +file should be built like Python was built. Python documentation for writing +extensions should have some details. On Linux this often means something +like:: + + $ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o yourmod.so yourmod.c + +``gcc`` should have access to the NumPy C header files so if they are not +installed at :file:`/usr/include/numpy` or similar you may need to pass another +option for those. + +This creates :file:`yourmod.so` in the same directory, which is importable by +Python by using a normal ``import yourmod`` statement. + +The first Cython program +========================== + +The code below does 2D discrete convolution of an image with a filter (and I'm +sure you can do better!, let it serve for demonstration purposes). It is both +valid Python and valid Cython code. I'll refer to it as both +:file:`convolve_py.py` for the Python version and :file:`convolve1.pyx` for the +Cython version -- Cython uses ".pyx" as its file suffix. + +.. code-block:: python + + from __future__ import division + import numpy as np + def naive_convolve(f, g): + # f is an image and is indexed by (v, w) + # g is a filter kernel and is indexed by (s, t), + # it needs odd dimensions + # h is the output image and is indexed by (x, y), + # it is not cropped + if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: + raise ValueError("Only odd dimensions on filter supported") + # smid and tmid are number of pixels between the center pixel + # and the edge, ie for a 5x5 filter they will be 2. + # + # The output size is calculated by adding smid, tmid to each + # side of the dimensions of the input image. + vmax = f.shape[0] + wmax = f.shape[1] + smax = g.shape[0] + tmax = g.shape[1] + smid = smax // 2 + tmid = tmax // 2 + xmax = vmax + 2*smid + ymax = wmax + 2*tmid + # Allocate result image. + h = np.zeros([xmax, ymax], dtype=f.dtype) + # Do convolution + for x in range(xmax): + for y in range(ymax): + # Calculate pixel value for h at (x,y). Sum one component + # for each pixel (s, t) of the filter g. + s_from = max(smid - x, -smid) + s_to = min((xmax - x) - smid, smid + 1) + t_from = max(tmid - y, -tmid) + t_to = min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = x - smid + s + w = y - tmid + t + value += g[smid - s, tmid - t] * f[v, w] + h[x, y] = value + return h + +This should be compiled to produce :file:`yourmod.so` (for Linux systems). We +run a Python session to test both the Python version (imported from +``.py``-file) and the compiled Cython module. + +.. sourcecode:: ipython + + In [1]: import numpy as np + In [2]: import convolve_py + In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), + ... np.array([[1],[2],[1]], dtype=np.int)) + Out [3]: + array([[1, 1, 1], + [2, 2, 2], + [1, 1, 1]]) + In [4]: import convolve1 + In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), + ... np.array([[1],[2],[1]], dtype=np.int)) + Out [4]: + array([[1, 1, 1], + [2, 2, 2], + [1, 1, 1]]) + In [11]: N = 100 + In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) + In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9)) + In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g) + 2 loops, best of 3: 1.86 s per loop + In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g) + 2 loops, best of 3: 1.41 s per loop + +There's not such a huge difference yet; because the C code still does exactly +what the Python interpreter does (meaning, for instance, that a new object is +allocated for each number used). Look at the generated html file and see what +is needed for even the simplest statements you get the point quickly. We need +to give Cython more information; we need to add types. + +Adding types +============= + +To add types we use custom Cython syntax, so we are now breaking Python source +compatibility. Here's :file:`convolve2.pyx`. *Read the comments!* :: + + from __future__ import division + import numpy as np + # "cimport" is used to import special compile-time information + # about the numpy module (this is stored in a file numpy.pxd which is + # currently part of the Cython distribution). + cimport numpy as np + # We now need to fix a datatype for our arrays. I've used the variable + # DTYPE for this, which is assigned to the usual NumPy runtime + # type info object. + DTYPE = np.int + # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For + # every type in the numpy module there's a corresponding compile-time + # type with a _t-suffix. + ctypedef np.int_t DTYPE_t + # The builtin min and max functions works with Python objects, and are + # so very slow. So we create our own. + # - "cdef" declares a function which has much less overhead than a normal + # def function (but it is not Python-callable) + # - "inline" is passed on to the C compiler which may inline the functions + # - The C type "int" is chosen as return type and argument types + # - Cython allows some newer Python constructs like "a if x else b", but + # the resulting C file compiles with Python 2.3 through to Python 3.0 beta. + cdef inline int int_max(int a, int b): return a if a >= b else b + cdef inline int int_min(int a, int b): return a if a <= b else b + # "def" can type its arguments but not have a return type. The type of the + # arguments for a "def" function is checked at run-time when entering the + # function. + # + # The arrays f, g and h is typed as "np.ndarray" instances. The only effect + # this has is to a) insert checks that the function arguments really are + # NumPy arrays, and b) make some attribute access like f.shape[0] much + # more efficient. (In this example this doesn't matter though.) + def naive_convolve(np.ndarray f, np.ndarray g): + if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: + raise ValueError("Only odd dimensions on filter supported") + assert f.dtype == DTYPE and g.dtype == DTYPE + # The "cdef" keyword is also used within functions to type variables. It + # can only be used at the top indendation level (there are non-trivial + # problems with allowing them in other places, though we'd love to see + # good and thought out proposals for it). + # + # For the indices, the "int" type is used. This corresponds to a C int, + # other C types (like "unsigned int") could have been used instead. + # Purists could use "Py_ssize_t" which is the proper Python type for + # array indices. + cdef int vmax = f.shape[0] + cdef int wmax = f.shape[1] + cdef int smax = g.shape[0] + cdef int tmax = g.shape[1] + cdef int smid = smax // 2 + cdef int tmid = tmax // 2 + cdef int xmax = vmax + 2*smid + cdef int ymax = wmax + 2*tmid + cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE) + cdef int x, y, s, t, v, w + # It is very important to type ALL your variables. You do not get any + # warnings if not, only much slower code (they are implicitly typed as + # Python objects). + cdef int s_from, s_to, t_from, t_to + # For the value variable, we want to use the same data type as is + # stored in the array, so we use "DTYPE_t" as defined above. + # NB! An important side-effect of this is that if "value" overflows its + # datatype size, it will simply wrap around like in C, rather than raise + # an error like in Python. + cdef DTYPE_t value + for x in range(xmax): + for y in range(ymax): + s_from = int_max(smid - x, -smid) + s_to = int_min((xmax - x) - smid, smid + 1) + t_from = int_max(tmid - y, -tmid) + t_to = int_min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = x - smid + s + w = y - tmid + t + value += g[smid - s, tmid - t] * f[v, w] + h[x, y] = value + return h + +At this point, have a look at the generated C code for :file:`convolve1.pyx` and +:file:`convolve2.pyx`. Click on the lines to expand them and see corresponding C. +(Note that this code annotation is currently experimental and especially +"trailing" cleanup code for a block may stick to the last expression in the +block and make it look worse than it is -- use some common sense). + +* .. literalinclude: convolve1.html +* .. literalinclude: convolve2.html + +Especially have a look at the for loops: In :file:`convolve1.c`, these are ~20 lines +of C code to set up while in :file:`convolve2.c` a normal C for loop is used. + +After building this and continuing my (very informal) benchmarks, I get: + +.. sourcecode:: ipython + + In [21]: import convolve2 + In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g) + 2 loops, best of 3: 828 ms per loop + +Efficient indexing +==================== + +There's still a bottleneck killing performance, and that is the array lookups +and assignments. The ``[]``-operator still uses full Python operations -- +what we would like to do instead is to access the data buffer directly at C +speed. + +What we need to do then is to type the contents of the :obj:`ndarray` objects. +We do this with a special "buffer" syntax which must be told the datatype +(first argument) and number of dimensions ("ndim" keyword-only argument, if +not provided then one-dimensional is assumed). + +More information on this syntax [:enhancements/buffer:can be found here]. + +Showing the changes needed to produce :file:`convolve3.pyx` only:: + + ... + def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): + ... + cdef np.ndarray[DTYPE_t, ndim=2] h = ... + +Usage: + +.. sourcecode:: ipython + + In [18]: import convolve3 + In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g) + 3 loops, best of 100: 11.6 ms per loop + +Note the importance of this change. + +*Gotcha*: This efficient indexing only affects certain index operations, +namely those with exactly ``ndim`` number of typed integer indices. So if +``v`` for instance isn't typed, then the lookup ``f[v, w]`` isn't +optimized. On the other hand this means that you can continue using Python +objects for sophisticated dynamic slicing etc. just as when the array is not +typed. + +Tuning indexing further +======================== + +The array lookups are still slowed down by two factors: + +1. Bounds checking is performed. +2. Negative indices are checked for and handled correctly. The code above is + explicitly coded so that it doesn't use negative indices, and it + (hopefully) always access within bounds. We can add a decorator to disable + bounds checking:: + + ... + cimport cython + @cython.boundscheck(False) # turn of bounds-checking for entire function + def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): + ... + +Now bounds checking is not performed (and, as a side-effect, if you ''do'' +happen to access out of bounds you will in the best case crash your program +and in the worst case corrupt data). It is possible to switch bounds-checking +mode in many ways, see [:docs/compilerdirectives:compiler directives] for more +information. + +Negative indices are dealt with by ensuring Cython that the indices will be +positive, by casting the variables to unsigned integer types (if you do have +negative values, then this casting will create a very large positive value +instead and you will attempt to access out-of-bounds values). Casting is done +with a special ``<>``-syntax. The code below is changed to use either +unsigned ints or casting as appropriate:: + + ... + cdef int s, t # changed + cdef unsigned int x, y, v, w # changed + cdef int s_from, s_to, t_from, t_to + cdef DTYPE_t value + for x in range(xmax): + for y in range(ymax): + s_from = max(smid - x, -smid) + s_to = min((xmax - x) - smid, smid + 1) + t_from = max(tmid - y, -tmid) + t_to = min((ymax - y) - tmid, tmid + 1) + value = 0 + for s in range(s_from, s_to): + for t in range(t_from, t_to): + v = (x - smid + s) # changed + w = (y - tmid + t) # changed + value += g[(smid - s), (tmid - t)] * f[v, w] # changed + h[x, y] = value + ... + +(In the next Cython release we will likely add a compiler directive or +argument to the ``np.ndarray[]``-type specifier to disable negative indexing +so that casting so much isn't necessary; feedback on this is welcome.) + +The function call overhead now starts to play a role, so we compare the latter +two examples with larger N: + +.. sourcecode:: ipython + + In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g) + 3 loops, best of 100: 5.97 ms per loop + In [12]: N = 1000 + In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) + In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9)) + In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g) + 1 loops, best of 10: 1.16 s per loop + In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g) + 1 loops, best of 10: 597 ms per loop + +(Also this is a mixed benchmark as the result array is allocated within the +function call.) + +.. Warning:: + + Speed comes with some cost. Especially it can be dangerous to set typed + objects (like ``f``, ``g`` and ``h`` in our sample code) to :keyword:`None`. + Setting such objects to :keyword:`None` is entirely legal, but all you can do with them + is check whether they are None. All other use (attribute lookup or indexing) + can potentially segfault or corrupt data (rather than raising exceptions as + they would in Python). + + The actual rules are a bit more complicated but the main message is clear: Do + not use typed objects without knowing that they are not set to None. + +More generic code +================== + +It would be possible to do:: + + def naive_convolve(object[DTYPE_t, ndim=2] f, ...): + +i.e. use :obj:`object` rather than :obj:`np.ndarray`. Under Python 3.0 this +can allow your algorithm to work with any libraries supporting the buffer +interface; and support for e.g. the Python Imaging Library may easily be added +if someone is interested also under Python 2.x. + +There is some speed penalty to this though (as one makes more assumptions +compile-time if the type is set to :obj:`np.ndarray`, specifically it is +assumed that the data is stored in pure strided more and not in indirect +mode). + +[:enhancements/buffer:More information] + +The future +============ + +These are some points to consider for further development. All points listed +here has gone through a lot of thinking and planning already; still they may +or may not happen depending on available developer time and resources for +Cython. + +1. Support for efficient access to structs/records stored in arrays; currently + only primitive types are allowed. +2. Support for efficient access to complex floating point types in arrays. The + main obstacle here is getting support for efficient complex datatypes in + Cython. +3. Calling NumPy/SciPy functions currently has a Python call overhead; it + would be possible to take a short-cut from Cython directly to C. (This does + however require some isolated and incremental changes to those libraries; + mail the Cython mailing list for details). +4. Efficient code that is generic with respect to the number of dimensions. + This can probably be done today by calling the NumPy C multi-dimensional + iterator API directly; however it would be nice to have for-loops over + :func:`enumerate` and :func:`ndenumerate` on NumPy arrays create efficient + code. +5. A high-level construct for writing type-generic code, so that one can write + functions that work simultaneously with many datatypes. Note however that a + macro preprocessor language can help with doing this for now. + diff --git a/src/userguide/overview.rst b/src/userguide/overview.rst new file mode 100644 index 00000000..6e2a6dbf --- /dev/null +++ b/src/userguide/overview.rst @@ -0,0 +1,32 @@ +.. highlight:: cython + +.. _overview: + +******** +Overview +******** + +About Cython +============== + +Cython is a language that makes writing C extensions for the Python language +as easy as Python itself. Cython is based on the well-known `Pyrex +`_ language by Greg Ewing, +but supports more cutting edge functionality and optimizations [#]_. +The Cython language is very close to the Python language, but Cython +additionally supports calling C functions and declaring C types on variables +and class attributes. This allows the compiler to generate very efficient C +code from Cython code. + +This makes Cython the ideal language for wrapping external C libraries, +and for fast C modules that speed up the execution of Python code. + +Future Plans +============ +Cython is not finished. Substantial tasks remaining. See +:ref:`cython-limitations` for a current list. + +.. rubric:: Footnotes + +.. [#] For differences with Pyrex see :ref:`pyrex-differences`. + diff --git a/src/userguide/profiling_tutorial.rst b/src/userguide/profiling_tutorial.rst new file mode 100644 index 00000000..244ba2fd --- /dev/null +++ b/src/userguide/profiling_tutorial.rst @@ -0,0 +1,300 @@ +.. highlight:: cython + +.. _profiling: + +********* +Profiling +********* + +This part describes the profiling abilities of Cython. If you are familiar +with profiling pure Python code, you can only read the first section +(:ref:`profiling_basics`). If you are not familiar with python profiling you +should also read the tutorial (:ref:`profiling_tutorial`) which takes you +through a complete example step by step. + +.. _profiling_basics: + +Cython Profiling Basics +======================= + +Profiling in Cython is controlled by a compiler directive. +It can either be set either for an entire file or on a per function +via a Cython decorator. + +Enable profiling for a complete source file +------------------------------------------- + +Profiling is enable for a complete source file via a global directive to the +Cython compiler at the top of a file:: + + # cython: profile=True + +Note that profiling gives a slight overhead to each function call therefore making +your program a little slower (or a lot, if you call some small functions very +often). + +Once enabled, your Cython code will behave just like Python code when called +from the cProfile module. This means you can just profile your Cython code +together with your Python code using the same tools as for Python code alone. + +Disabling profiling function wise +------------------------------------------ + +If your profiling is messed up because of the call overhead to some small +functions that you rather do not want to see in your profile - either because +you plan to inline them anyway or because you are sure that you can't make them +any faster - you can use a special decorator to disable profiling for one +function only:: + + cimport cython + + @cython.profile(False) + def my_often_called_function(): + pass + + +.. _profiling_tutorial: + +Profiling Tutorial +================== + +This will be a complete tutorial, start to finish, of profiling python code, +turning it into Cython code and keep profiling until it is fast enough. + +As a toy example, we would like to evaluate the summation of the reciprocals of +squares up to a certain integer :math:`n` for evaluating :math:`\pi`. The +relation we want to use has been proven by Euler in 1735 and is known as the +`Basel problem `_. + + +.. math:: + \pi^2 = 6 \sum_{k=1}^{\infty} \frac{1}{k^2} = + 6 \lim_{k \to \infty} \big( \frac{1}{1^2} + + \frac{1}{2^2} + \dots + \frac{1}{k^2} \big) \approx + 6 \big( \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{n^2} \big) + +A simple python code for evaluating the truncated sum looks like this:: + + #!/usr/bin/env python + # encoding: utf-8 + # filename: calc_pi.py + + def recip_square(i): + return 1./i**2 + + def approx_pi(n=10000000): + val = 0. + for k in range(1,n+1): + val += recip_square(k) + return (6 * val)**.5 + +On my box, this needs approximately 4 seconds to run the function with the +default n. The higher we choose n, the better will be the approximation for +:math:`\pi`. An experienced python programmer will already see plenty of +places to optimize this code. But remember the golden rule of optimization: +Never optimize without having profiled. Let me repeat this: **Never** optimize +without having profiled your code. Your thoughts about which part of your +code takes too much time are wrong. At least, mine are always wrong. So let's +write a short script to profile our code:: + + #!/usr/bin/env python + # encoding: utf-8 + # filename: profile.py + + import pstats, cProfile + + import calc_pi + + cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof") + + s = pstats.Stats("Profile.prof") + s.strip_dirs().sort_stats("time").print_stats() + +Running this on my box gives the following output:: + + TODO: how to display this not as code but verbatimly? + + Sat Nov 7 17:40:54 2009 Profile.prof + + 10000004 function calls in 6.211 CPU seconds + + Ordered by: internal time + + ncalls tottime percall cumtime percall filename:lineno(function) + 1 3.243 3.243 6.211 6.211 calc_pi.py:7(approx_pi) + 10000000 2.526 0.000 2.526 0.000 calc_pi.py:4(recip_square) + 1 0.442 0.442 0.442 0.442 {range} + 1 0.000 0.000 6.211 6.211 :1() + 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} + +This contains the information that the code runs in 6.2 CPU seconds. Note that +the code got slower by 2 seconds because it ran inside the cProfile module. The +table contains the real valuable information. You might want to check the +python `profiling documentation `_ +for the nitty gritty details. The most important columns here are totime (total +time spend in this function **not** counting functions that were called by this +function) and cumtime (total time spend in this function **also** counting the +functions called by this function). Looking at the tottime column, we see that +approximately half the time is spend in approx_pi and the other half is spend +in recip_square. Also half a second is spend in range ... of course we should +have used xrange for such a big iteration. And in fact, just changing range to +xrange makes the code run in 5.8 seconds. + +We could optimize a lot in the pure python version, but since we are interested +in Cython, let's move forward and bring this module to Cython. We would do this +anyway at some time to get the loop run faster. Here is our first Cython version:: + + # encoding: utf-8 + # cython: profile=True + # filename: calc_pi.pyx + + def recip_square(int i): + return 1./i**2 + + def approx_pi(int n=10000000): + cdef double val = 0. + cdef int k + for k in xrange(1,n+1): + val += recip_square(k) + return (6 * val)**.5 + +Note the second line: We have to tell Cython that profiling should be enabled. +This makes the Cython code slightly slower, but without this we would not get +meaningful output from the cProfile module. The rest of the code is mostly +unchanged, I only typed some variables which will likely speed things up a bit. + +We also need to modify our profiling script to import the Cython module directly. +Here is the complete version adding the import of the pyximport module:: + + #!/usr/bin/env python + # encoding: utf-8 + # filename: profile.py + + import pstats, cProfile + + import pyximport + pyximport.install() + + import calc_pi + + cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof") + + s = pstats.Stats("Profile.prof") + s.strip_dirs().sort_stats("time").print_stats() + +We only added two lines, the rest stays completely the same. Alternatively, we could also +manually compile our code into an extension; we wouldn't need to change the +profile script then at all. The script now outputs the following:: + + Sat Nov 7 18:02:33 2009 Profile.prof + + 10000004 function calls in 4.406 CPU seconds + + Ordered by: internal time + + ncalls tottime percall cumtime percall filename:lineno(function) + 1 3.305 3.305 4.406 4.406 calc_pi.pyx:7(approx_pi) + 10000000 1.101 0.000 1.101 0.000 calc_pi.pyx:4(recip_square) + 1 0.000 0.000 4.406 4.406 {calc_pi.approx_pi} + 1 0.000 0.000 4.406 4.406 :1() + 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} + +We gained 1.8 seconds. Not too shabby. Comparing the output to the previous, we +see that recip_square function got faster while the approx_pi function has not +changed a lot. Let's concentrate on the approx_pi function a bit more. First +note, that this function is not to be called from code outside of our module; +so it would be wise to turn it into a cdef to reduce call overhead. We should +also get rid of the power operator: it is turned into a pow(i,2) function call by +Cython, but we could instead just write i*i which could be faster. The +whole function is also a good candidate for inlining. Let's look at the +necessary changes for these ideas:: + + # encoding: utf-8 + # cython: profile=True + # filename: calc_pi.pyx + + cdef inline double recip_square(int i): + return 1./(i*i) + + def approx_pi(int n=10000000): + cdef double val = 0. + cdef int k + for k in xrange(1,n+1): + val += recip_square(k) + return (6 * val)**.5 + +Now running the profile script yields:: + + Sat Nov 7 18:10:11 2009 Profile.prof + + 10000004 function calls in 2.622 CPU seconds + + Ordered by: internal time + + ncalls tottime percall cumtime percall filename:lineno(function) + 1 1.782 1.782 2.622 2.622 calc_pi.pyx:7(approx_pi) + 10000000 0.840 0.000 0.840 0.000 calc_pi.pyx:4(recip_square) + 1 0.000 0.000 2.622 2.622 {calc_pi.approx_pi} + 1 0.000 0.000 2.622 2.622 :1() + 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} + +That bought us another 1.8 seconds. Not the dramatic change we could have +expected. And why is recip_square still in this table; it is supposed to be +inlined, isn't it? The reason for this is that Cython still generates profiling code +even if the function call is eliminated. Let's tell it to not +profile recip_square any more; we couldn't get the function to be much faster anyway:: + + # encoding: utf-8 + # cython: profile=True + # filename: calc_pi.pyx + + cimport cython + + @cython.profile(False) + cdef inline double recip_square(int i): + return 1./(i*i) + + def approx_pi(int n=10000000): + cdef double val = 0. + cdef int k + for k in xrange(1,n+1): + val += recip_square(k) + return (6 * val)**.5 + +Running this shows an interesting result:: + + Sat Nov 7 18:15:02 2009 Profile.prof + + 4 function calls in 0.089 CPU seconds + + Ordered by: internal time + + ncalls tottime percall cumtime percall filename:lineno(function) + 1 0.089 0.089 0.089 0.089 calc_pi.pyx:10(approx_pi) + 1 0.000 0.000 0.089 0.089 {calc_pi.approx_pi} + 1 0.000 0.000 0.089 0.089 :1() + 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} + +First note the tremendous speed gain: this version only takes 1/50 of the time +of our first Cython version. Also note that recip_square has vanished from the +table like we wanted. But the most peculiar and import change is that +approx_pi also got much faster. This is a problem with all profiling: calling a +function in a profile run adds a certain overhead to the function call. This +overhead is **not** added to the time spend in the called function, but to the +time spend in the **calling** function. In this example, approx_pi didn't need 2.622 +seconds in the last run; but it called recip_square 10000000 times, each time taking a +little to set up profiling for it. This adds up to the massive time loss of +around 2.6 seconds. Having disable profiling for the often called function now +reveals realistic timings for approx_pi; we could continue optimizing it now if +needed. + +This concludes this profiling tutorial. There is still some room for +improvement in this code. We could try to replace the power operator in +approx_pi with a call to sqrt from the C stdlib; but this is not necessarily +faster than calling pow(x,0.5). + +Even so, the result we achieved here is quite satisfactory: we came up with a +solution that is much faster then our original python version while retaining +functionality and readability. + + diff --git a/src/userguide/pxd_package.rst b/src/userguide/pxd_package.rst new file mode 100644 index 00000000..7f424b65 --- /dev/null +++ b/src/userguide/pxd_package.rst @@ -0,0 +1,49 @@ +I think this is a result of a recent change to Pyrex that +has been merged into Cython. + +If a directory contains an :file:`__init__.py` or :file:`__init__.pyx` file, +it's now assumed to be a package directory. So, for example, +if you have a directory structure:: + + foo/ + __init__.py + shrubbing.pxd + shrubbing.pyx + +then the shrubbing module is assumed to belong to a package +called 'foo', and its fully qualified module name is +'foo.shrubbing'. + +So when Pyrex wants to find out whether there is a `.pxd` file for shrubbing, +it looks for one corresponding to a module called :module:`foo.shrubbing`. It +does this by searching the include path for a top-level package directory +called 'foo' containing a file called 'shrubbing.pxd'. + +However, if foo is the current directory you're running +the compiler from, and you haven't added foo to the +include path using a -I option, then it won't be on +the include path, and the `.pxd` won't be found. + +What to do about this depends on whether you really +intend the module to reside in a package. + +If you intend shrubbing to be a top-level module, you +will have to move it somewhere else where there is +no :file:`__init__.*` file. + +If you do intend it to reside in a package, then there +are two alternatives: + +1. cd to the directory containing foo and compile + from there:: + + cd ..; cython foo/shrubbing.pyx + +2. arrange for the directory containing foo to be + passed as a -I option, e.g.:: + + cython -I .. shrubbing.pyx + +Arguably this behaviour is not very desirable, and I'll +see if I can do something about it. + diff --git a/src/userguide/pyrex_differences.rst b/src/userguide/pyrex_differences.rst new file mode 100644 index 00000000..bf7cd03d --- /dev/null +++ b/src/userguide/pyrex_differences.rst @@ -0,0 +1,342 @@ +.. highlight:: cython + +.. _pyrex-differences: + +************************************** +Differences between Cython and Pyrex +************************************** + +.. warning:: + Both Cython and Pyrex are moving targets. It has come to the point + that an explicit list of all the differences between the two + projects would be laborious to list and track, but hopefully + this high-level list gives an idea of the differences that + are present. It should be noted that both projects make an effort + at mutual compatibility, but Cython's goal is to be as close to + and complete as Python as reasonable. + + +Python 3.0 Support +================== + +Cython creates ``.c`` files that can be built and used with both +Python 2.x and Python 3.x. In fact, compiling your module with +Cython may very well be the easiest way to port code to Python 3.0. +We are also working to make the compiler run in both Python 2.x and 3.0. + +Many Python 3 constructs are already supported by Cython. + +List/Set/Dict Comprehensions +---------------------------- + +Cython supports the different comprehensions defined by Python 3.0 for +lists, sets and dicts:: + + [expr(x) for x in A] # list + {expr(x) for x in A} # set + {key(x) : value(x) for x in A} # dict + +Looping is optimized if ``A`` is a list, tuple or dict. You can use +the :keyword:`for` ... :keyword:`from` syntax, too, but it is +generally preferred to use the usual :keyword:`for` ... :keyword:`in` +``range(...)`` syntax with a C run variable (e.g. ``cdef int i``). + +.. note:: see :ref:`automatic-range-conversion` + +Note that Cython also supports set literals starting from Python 2.3. + +Keyword-only arguments +---------------------- + +Python functions can have keyword-only arguments listed after the ``*`` +parameter and before the ``**`` parameter if any, e.g.:: + + def f(a, b, *args, c, d = 42, e, **kwds): + ... + +Here ``c``, ``d`` and ``e`` cannot be passed as position arguments and must be +passed as keyword arguments. Furthermore, ``c`` and ``e`` are required keyword +arguments, since they do not have a default value. + +If the parameter name after the ``*`` is omitted, the function will not accept any +extra positional arguments, e.g.:: + + def g(a, b, *, c, d): + ... + +takes exactly two positional parameters and has two required keyword parameters. + + + +Conditional expressions "x if b else y" (python 2.5) +===================================================== + +Conditional expressions as described in +http://www.python.org/dev/peps/pep-0308/:: + + X if C else Y + +Only one of ``X`` and ``Y`` is evaluated, (depending on the value of C). + +cdef inline +============= + +Module level functions can now be declared inline, with the :keyword:`inline` +keyword passed on to the C compiler. These can be as fast as macros.:: + + cdef inline int something_fast(int a, int b): + return a*a + b + +Note that class-level :keyword:`cdef` functions are handled via a virtual +function table, so the compiler won't be able to inline them in almost all +cases. + +Assignment on declaration (e.g. "cdef int spam = 5") +====================================================== + +In Pyrex, one must write:: + + cdef int i, j, k + i = 2 + j = 5 + k = 7 + +Now, with cython, one can write:: + + cdef int i = 2, j = 5, k = 7 + +The expression on the right hand side can be arbitrarily complicated, e.g.:: + + cdef int n = python_call(foo(x,y), a + b + c) - 32 + + +'by' expression in for loop (e.g. "for i from 0 <= i < 10 by 2") +================================================================== + +:: + + for i from 0 <= i < 10 by 2: + print i + + +yields:: + + 0 + 2 + 4 + 6 + 8 + +.. note:: see :ref:`automatic-range-conversion` + + +Boolean int type (e.g. it acts like a c int, but coerces to/from python as a boolean) +====================================================================================== + +In C, ints are used for truth values. In python, any object can be used as a +truth value (using the :meth:`__nonzero__` method, but the canonical choices +are the two boolean objects ``True`` and ``False``. The :keyword:`bint` of +"boolean int" object is compiled to a C int, but get coerced to and from +Cython as booleans. The return type of comparisons and several builtins is a +:ctype:`bint` as well. This allows one to avoid having to wrap things in +:func:`bool()`. For example, one can write:: + + def is_equal(x): + return x == y + +which would return ``1`` or ``0`` in Pyrex, but returns ``True`` or ``False`` in +python. One can declare variables and return values for functions to be of the +:ctype:`bint` type. For example:: + + cdef int i = x + cdef bint b = x + +The first conversion would happen via ``x.__int__()`` whereas the second would +happen via ``x.__nonzero__()``. (Actually, if ``x`` is the python object +``True`` or ``False`` then no method call is made.) + +Executable class bodies +======================= + +Including a working :func:`classmethod`:: + + cdef class Blah: + def some_method(self): + print self + some_method = classmethod(some_method) + a = 2*3 + print "hi", a + +cpdef functions +================= + +Cython adds a third function type on top of the usual :keyword:`def` and +:keyword:`cdef`. If a function is declared :keyword:`cpdef` it can be called +from and overridden by both extension and normal python subclasses. You can +essentially think of a :keyword:`cpdef` method as a :keyword:`cdef` method + +some extras. (That's how it's implemented at least.) First, it creates a +:keyword:`def` method that does nothing but call the underlying +:keyword:`cdef` method (and does argument unpacking/coercion if needed). At +the top of the :keyword:`cdef` method a little bit of code is added to check +to see if it's overridden. Specifically, in pseudocode:: + + if type(self) has a __dict__: + foo = self.getattr('foo') + if foo is not wrapper_foo: + return foo(args) + [cdef method body] + +To detect whether or not a type has a dictionary, it just checks the +tp_dictoffset slot, which is ``NULL`` (by default) for extension types, but +non- null for instance classes. If the dictionary exists, it does a single +attribute lookup and can tell (by comparing pointers) whether or not the +returned result is actually a new function. If, and only if, it is a new +function, then the arguments packed into a tuple and the method called. This +is all very fast. A flag is set so this lookup does not occur if one calls the +method on the class directly, e.g.:: + + cdef class A: + cpdef foo(self): + pass + + x = A() + x.foo() # will check to see if overridden + A.foo(x) # will call A's implementation whether overridden or not + +See :ref:`early-binding-for-speed` for explanation and usage tips. + +.. _automatic-range-conversion: + +Automatic range conversion +============================ + +This will convert statements of the form ``for i in range(...)`` to ``for i +from ...`` when ``i`` is any cdef'd integer type, and the direction (i.e. sign +of step) can be determined. + +.. warning:: + + This may change the semantics if the range causes + assignment to ``i`` to overflow. Specifically, if this option is set, an error + will be raised before the loop is entered, whereas without this option the loop + will execute until a overflowing value is encountered. If this effects you + change ``Cython/Compiler/Options.py`` (eventually there will be a better + way to set this). + +More friendly type casting +=========================== + +In Pyrex, if one types ``x`` where ``x`` is a Python object, one will get +the memory address of ``x``. Likewise, if one types ``i`` where ``i`` +is a C int, one will get an "object" at location ``i`` in memory. This leads +to confusing results and segfaults. + +In Cython ``x`` will try and do a coercion (as would happen on assignment of +``x`` to a variable of type type) if exactly one of the types is a python object. +It does not stop one from casting where there is no conversion (though it will +emit a warning). If one really wants the address, cast to a ``void *`` first. + +As in Pyrex ``x`` will cast ``x`` to type :ctype:`MyExtensionType` without any +type checking. Cython supports the syntax ```` to do the cast +with type checking (i.e. it will throw an error if ``x`` is not a (subclass of) +:ctype:`MyExtensionType`. + +Optional arguments in cdef/cpdef functions +============================================ + +Cython now supports optional arguments for :keyword:`cdef` and +:keyword:`cpdef` functions. + +The syntax in the ``.pyx`` file remains as in Python, but one declares such +functions in the ``.pxd`` file by writing ``cdef foo(x=*)``. The number of +arguments may increase on subclassing, but the argument types and order must +remain the same. There is a slight performance penalty in some cases when a +cdef/cpdef function without any optional is overridden with one that does have +default argument values. + +For example, one can have the ``.pxd`` file:: + + cdef class A: + cdef foo(self) + cdef class B(A) + cdef foo(self, x=*) + cdef class C(B): + cpdef foo(self, x=*, int k=*) + +with corresponding ``.pyx`` file:: + + cdef class A: + cdef foo(self): + print "A" + cdef class B(A) + cdef foo(self, x=None) + print "B", x + cdef class C(B): + cpdef foo(self, x=True, int k=3) + print "C", x, k + +.. note:: + + this also demonstrates how :keyword:`cpdef` functions can override + :keyword:`cdef` functions. + +Function pointers in structs +============================= + +Functions declared in :keyword:`structs` are automatically converted to +function pointers for convenience. + +C++ Exception handling +========================= + +:keyword:`cdef` functions can now be declared as:: + + cdef int foo(...) except + + cdef int foo(...) except +TypeError + cdef int foo(...) except +python_error_raising_function + +in which case a Python exception will be raised when a C++ error is caught. +See :ref:`wrapping-cplusplus` for more details. + +Synonyms +========= + +``cdef import from`` means the same thing as ``cdef extern from`` + +Source code encoding +====================== + +.. TODO: add the links to the relevent PEPs + +Cython supports PEP 3120 and PEP 263, i.e. you can start your Cython source +file with an encoding comment and generally write your source code in UTF-8. +This impacts the encoding of byte strings and the conversion of unicode string +literals like ``u'abcd'`` to unicode objects. + +Automatic ``typecheck`` +======================== + +Rather than introducing a new keyword :keyword:`typecheck` as explained in the +`Pyrex docs +`_, +Cython emits a (non-spoofable and faster) typecheck whenever +:func:`isinstance` is used with an extension type as the second parameter. + +From __future__ directives +========================== + +Cython supports several from __future__ directives, namely ``unicode_literals`` and ``division``. + +With statements are always enabled. + +Pure Python mode +================ + +Cython has support for compiling ``.py`` files, and +accepting type annotations using decorators and other +valid Python syntax. This allows the same source to +be interpreted as straight Python, or compiled for +optimized results. +See http://wiki.cython.org/pure +for more details. + diff --git a/src/userguide/sharing_declarations.rst b/src/userguide/sharing_declarations.rst new file mode 100644 index 00000000..ac01815b --- /dev/null +++ b/src/userguide/sharing_declarations.rst @@ -0,0 +1,241 @@ +.. highlight:: cython + +.. _sharing-declarations: + +******************************************** +Sharing Declarations Between Cython Modules +******************************************** + +This section describes a new set of facilities for making C declarations, +functions and extension types in one Cython module available for use in +another Cython module. These facilities are closely modelled on the Python +import mechanism, and can be thought of as a compile-time version of it. + +Definition and Implementation files +==================================== + +A Cython module can be split into two parts: a definition file with a ``.pxd`` +suffix, containing C declarations that are to be available to other Cython +modules, and an implementation file with a ``.pyx`` suffix, containing +everything else. When a module wants to use something declared in another +module's definition file, it imports it using the :keyword:`cimport` +statement. + +A ``.pxd`` file that consists solely of extern declarations does not need +to correspond to an actual ``.pyx`` file or Python module. This can make it a +convenient place to put common declarations, for example declarations of +functions from an :ref:`external library ` that one wants to use in several modules. + +What a Definition File contains +================================ + +A definition file can contain: + +* Any kind of C type declaration. +* extern C function or variable declarations. +* Declarations of C functions defined in the module. +* The definition part of an extension type (see below). + +It cannot contain any non-extern C variable declarations. + +It cannot contain the implementations of any C or Python functions, or any +Python class definitions, or any executable statements. It is needed when one +wants to access :keyword:`cdef` attributes and methods, or to inherit from +:keyword:`cdef` classes defined in this module. + +.. note:: + + You don't need to (and shouldn't) declare anything in a declaration file + public in order to make it available to other Cython modules; its mere + presence in a definition file does that. You only need a public + declaration if you want to make something available to external C code. + +What an Implementation File contains +====================================== + +An implementation file can contain any kind of Cython statement, although there +are some restrictions on the implementation part of an extension type if the +corresponding definition file also defines that type (see below). +If one doesn't need to :keyword:`cimport` anything from this module, then this +is the only file one needs. + +The cimport statement +======================= + +The :keyword:`cimport` statement is used in a definition or +implementation file to gain access to names declared in another definition +file. Its syntax exactly parallels that of the normal Python import +statement:: + + cimport module [, module...] + + from module cimport name [as name] [, name [as name] ...] + +Here is an example. The file on the left is a definition file which exports a +C data type. The file on the right is an implementation file which imports and +uses it. + +:file:`dishes.pxd`:: + + cdef enum otherstuff: + sausage, eggs, lettuce + + cdef struct spamdish: + int oz_of_spam + otherstuff filler + +:file:`restaurant.pyx`:: + + cimport dishes + from dishes cimport spamdish + + cdef void prepare(spamdish *d): + d.oz_of_spam = 42 + d.filler = dishes.sausage + + def serve(): + cdef spamdish d + prepare(&d) + print "%d oz spam, filler no. %d" % (d.oz_of_spam, d.otherstuff) + +It is important to understand that the :keyword:`cimport` statement can only +be used to import C data types, C functions and variables, and extension +types. It cannot be used to import any Python objects, and (with one +exception) it doesn't imply any Python import at run time. If you want to +refer to any Python names from a module that you have cimported, you will have +to include a regular import statement for it as well. + +The exception is that when you use :keyword:`cimport` to import an extension type, its +type object is imported at run time and made available by the name under which +you imported it. Using :keyword:`cimport` to import extension types is covered in more +detail below. + +If a ``.pxd`` file changes, any modules that :keyword:`cimport` from it may need to be +recompiled. + +Search paths for definition files +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When you :keyword:`cimport` a module called ``modulename``, the Cython +compiler searches for a file called :file:`modulename.pxd` along the search +path for include files, as specified by ``-I`` command line options. + +Also, whenever you compile a file :file:`modulename.pyx`, the corresponding +definition file :file:`modulename.pxd` is first searched for along the same +path, and if found, it is processed before processing the ``.pyx`` file. + +Using cimport to resolve naming conflicts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The :keyword:`cimport` mechanism provides a clean and simple way to solve the +problem of wrapping external C functions with Python functions of the same +name. All you need to do is put the extern C declarations into a ``.pxd`` file +for an imaginary module, and :keyword:`cimport` that module. You can then +refer to the C functions by qualifying them with the name of the module. +Here's an example: + +:file:`c_lunch.pxd` :: + + cdef extern from "lunch.h": + void eject_tomato(float) + +:file:`lunch.pyx` :: + + cimport c_lunch + + def eject_tomato(float speed): + c_lunch.eject_tomato(speed) + +You don't need any :file:`c_lunch.pyx` file, because the only things defined +in :file:`c_lunch.pxd` are extern C entities. There won't be any actual +``c_lunch`` module at run time, but that doesn't matter; the +:file:`c_lunch.pxd` file has done its job of providing an additional namespace +at compile time. + +Sharing C Functions +=================== + +C functions defined at the top level of a module can be made available via +:keyword:`cimport` by putting headers for them in the ``.pxd`` file, for +example,: + +:file:`volume.pxd`:: + + cdef float cube(float) + +:file:`spammery.pyx`:: + + from volume cimport cube + + def menu(description, size): + print description, ":", cube(size), \ + "cubic metres of spam" + + menu("Entree", 1) + menu("Main course", 3) + menu("Dessert", 2) + +:file:`volume.pyx`:: + + cdef float cube(float x): + return x * x * x + +.. note:: + + When a module exports a C function in this way, an object appears in the + module dictionary under the function's name. However, you can't make use of + this object from Python, nor can you use it from Cython using a normal import + statement; you have to use :keyword:`cimport`. + +Sharing Extension Types +======================= + +An extension type can be made available via :keyword:`cimport` by splitting +its definition into two parts, one in a definition file and the other in the +corresponding implementation file. + +The definition part of the extension type can only declare C attributes and C +methods, not Python methods, and it must declare all of that type's C +attributes and C methods. + +The implementation part must implement all of the C methods declared in the +definition part, and may not add any further C attributes. It may also define +Python methods. + +Here is an example of a module which defines and exports an extension type, +and another module which uses it.:: + + # Shrubbing.pxd + cdef class Shrubbery: + cdef int width + cdef int length + + # Shrubbing.pyx + cdef class Shrubbery: + def __cinit__(self, int w, int l): + self.width = w + self.length = l + + def standard_shrubbery(): + return Shrubbery(3, 7) + + + # Landscaping.pyx + cimport Shrubbing + import Shrubbing + + cdef Shrubbing.Shrubbery sh + sh = Shrubbing.standard_shrubbery() + print "Shrubbery size is %d x %d" % (sh.width, sh.height) + +Some things to note about this example: + +* There is a :keyword:`cdef` class Shrubbery declaration in both + :file:`Shrubbing.pxd` and :file:`Shrubbing.pyx`. When the Shrubbing module + is compiled, these two declarations are combined into one. +* In Landscaping.pyx, the :keyword:`cimport` Shrubbing declaration allows us + to refer to the Shrubbery type as :class:`Shrubbing.Shrubbery`. But it + doesn't bind the name Shrubbing in Landscaping's module namespace at run + time, so to access :func:`Shrubbing.standard_shrubbery` we also need to + ``import Shrubbing``. + diff --git a/src/userguide/source_files_and_compilation.rst b/src/userguide/source_files_and_compilation.rst new file mode 100644 index 00000000..32b7df63 --- /dev/null +++ b/src/userguide/source_files_and_compilation.rst @@ -0,0 +1,198 @@ +.. highlight:: cython + +.. _compilation: + +**************************** +Source Files and Compilation +**************************** + +Cython source file names consist of the name of the module followed by a +``.pyx`` extension, for example a module called primes would have a source +file named :file:`primes.pyx`. + +Once you have written your ``.pyx`` file, there are a couple of ways of turning it +into an extension module. One way is to compile it manually with the Cython +compiler, e.g.: + +.. sourcecode:: text + + $ cython primes.pyx + +This will produce a file called :file:`primes.c`, which then needs to be +compiled with the C compiler using whatever options are appropriate on your +platform for generating an extension module. For these options look at the +official Python documentation. + +The other, and probably better, way is to use the :mod:`distutils` extension +provided with Cython. The benifit of this method is that it will give the +platform specific compilation options, acting like a stripped down autotools. + +Basic setup.py +=============== +The distutils extension provided with Cython allows you to pass ``.pyx`` files +directly to the ``Extension`` constructor in your setup file. + +If you have a single Cython file that you want to turn into a compiled +extension, say with filename :file:`example.pyx` the associated :file:`setup.py` +would be:: + + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + setup( + cmdclass = {'build_ext': build_ext}, + ext_modules = [Extension("example", ["example.pyx"])] + ) + +To understand the :file:`setup.py` more fully look at the official +:mod:`distutils` documentation. To compile the extension for use in the +current directory use: + +.. sourcecode:: text + + $ python setup.py build_ext --inplace + +Cython Files Depending on C Files +=================================== + +When you have come C files that have been wrapped with cython and you want to +compile them into your extension the basic :file:`setup.py` file to do this +would be:: + + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + sourcefiles = ['example.pyx', 'helper.c', 'another_helper.c'] + + setup( + cmdclass = {'build_ext': build_ext}, + ext_modules = [Extension("example", sourcefiles)] + ) + +Notice that the files have been given a name, this is not necessary, but it +makes the file easier to format if the list gets long. + +The :class:`Extension` class takes many options, and a fuller explanation can +be found in the `distutils documentation`_. Some useful options to know about +are ``include_dirs``, ``libraries``, and ``library_dirs`` which specify where +to find the ``.h`` and library files when linking to external libraries. + +.. _distutils documentation: http://docs.python.org/extending/building.html + + + +Multiple Cython Files in a Package +=================================== + +TODO + +Distributing Cython modules +============================ +It is strongly recommended that you distribute the generated ``.c`` files as well +as your Cython sources, so that users can install your module without needing +to have Cython available. + +It is also recommended that Cython compilation not be enabled by default in the +version you distribute. Even if the user has Cython installed, he probably +doesn't want to use it just to install your module. Also, the version he has +may not be the same one you used, and may not compile your sources correctly. + +This simply means that the :file:`setup.py` file that you ship with will just +be a normal distutils file on the generated `.c` files, for the basic example +we would have instead:: + + from distutils.core import setup + from distutils.extension import Extension + + setup( + ext_modules = [Extension("example", ["example.c"])] + ) + + +.. _pyximport: + +Pyximport +=========== + +.. TODO add some text about how this is Paul Prescods code. Also change the + tone to be more universal (i.e. remove all the I statements) + +Cython is a compiler. Therefore it is natural that people tend to go +through an edit/compile/test cycle with Cython modules. But my personal +opinion is that one of the deep insights in Python's implementation is +that a language can be compiled (Python modules are compiled to ``.pyc``) +files and hide that compilation process from the end-user so that they +do not have to worry about it. Pyximport does this for Cython modules. +For instance if you write a Cython module called :file:`foo.pyx`, with +Pyximport you can import it in a regular Python module like this:: + + + import pyximport; pyximport.install() + import foo + +Doing so will result in the compilation of :file:`foo.pyx` (with appropriate +exceptions if it has an error in it). + +If you would always like to import Cython files without building them +specially, you can also the first line above to your :file:`sitecustomize.py`. +That will install the hook every time you run Python. Then you can use +Cython modules just with simple import statements. I like to test my +Cython modules like this: + +.. sourcecode:: text + + $ python -c "import foo" + +Dependency Handling +-------------------- + +In Pyximport 1.1 it is possible to declare that your module depends on +multiple files, (likely ``.h`` and ``.pxd`` files). If your Cython module is +named ``foo`` and thus has the filename :file:`foo.pyx` then you should make +another file in the same directory called :file:`foo.pyxdep`. The +:file:`modname.pyxdep` file can be a list of filenames or "globs" (like +``*.pxd`` or ``include/*.h``). Each filename or glob must be on a separate +line. Pyximport will check the file date for each of those files before +deciding whether to rebuild the module. In order to keep track of the +fact that the dependency has been handled, Pyximport updates the +modification time of your ".pyx" source file. Future versions may do +something more sophisticated like informing distutils of the +dependencies directly. + +Limitations +------------ + +Pyximport does not give you any control over how your Cython file is +compiled. Usually the defaults are fine. You might run into problems if +you wanted to write your program in half-C, half-Cython and build them +into a single library. Pyximport 1.2 will probably do this. + +Pyximport does not hide the Distutils/GCC warnings and errors generated +by the import process. Arguably this will give you better feedback if +something went wrong and why. And if nothing went wrong it will give you +the warm fuzzy that pyximport really did rebuild your module as it was +supposed to. + +For further thought and discussion +------------------------------------ + +I don't think that Python's :func:`reload` will do anything for changed +``.so``'s on some (all?) platforms. It would require some (easy) +experimentation that I haven't gotten around to. But reload is rarely used in +applications outside of the Python interactive interpreter and certainly not +used much for C extension modules. Info about Windows +``_ + +``setup.py install`` does not modify :file:`sitecustomize.py` for you. Should it? +Modifying Python's "standard interpreter" behaviour may be more than +most people expect of a package they install.. + +Pyximport puts your ``.c`` file beside your ``.pyx`` file (analogous to +``.pyc`` beside ``.py``). But it puts the platform-specific binary in a +build directory as per normal for Distutils. If I could wave a magic +wand and get Cython or distutils or whoever to put the build directory I +might do it but not necessarily: having it at the top level is *VERY* +*HELPFUL* for debugging Cython problems. + diff --git a/src/userguide/special_methods.rst b/src/userguide/special_methods.rst new file mode 100644 index 00000000..2186ad5d --- /dev/null +++ b/src/userguide/special_methods.rst @@ -0,0 +1,361 @@ +.. _special-methods: + +Special Methods of Extension Types +=================================== + +This page describes the special methods currently supported by Cython extension +types. A complete list of all the special methods appears in the table at the +bottom. Some of these methods behave differently from their Python +counterparts or have no direct Python counterparts, and require special +mention. + +.. Note: Everything said on this page applies only to extension types, defined + with the :keyword:`cdef class` statement. It doesn't apply to classes defined with the + Python :keyword:`class` statement, where the normal Python rules apply. + +Declaration +------------ +Special methods of extension types must be declared with :keyword:`def`, not +:keyword:`cdef`. This does not impact their performance--Python uses different +calling conventions to invoke these special methods. + +Docstrings +----------- + +Currently, docstrings are not fully supported in some special methods of extension +types. You can place a docstring in the source to serve as a comment, but it +won't show up in the corresponding :attr:`__doc__` attribute at run time. (This +seems to be is a Python limitation -- there's nowhere in the `PyTypeObject` +data structure to put such docstrings.) + +Initialisation methods: :meth:`__cinit__` and :meth:`__init__` +--------------------------------------------------------------- +There are two methods concerned with initialising the object. + +The :meth:`__cinit__` method is where you should perform basic C-level +initialisation of the object, including allocation of any C data structures +that your object will own. You need to be careful what you do in the +:meth:`__cinit__` method, because the object may not yet be fully valid Python +object when it is called. Therefore, you should be careful invoking any Python +operations which might touch the object; in particular, its methods. + +By the time your :meth:`__cinit__` method is called, memory has been allocated for the +object and any C attributes it has have been initialised to 0 or null. (Any +Python attributes have also been initialised to None, but you probably +shouldn't rely on that.) Your :meth:`__cinit__` method is guaranteed to be called +exactly once. + +If your extension type has a base type, the :meth:`__cinit__` method of the base type +is automatically called before your :meth:`__cinit__` method is called; you cannot +explicitly call the inherited :meth:`__cinit__` method. If you need to pass a modified +argument list to the base type, you will have to do the relevant part of the +initialisation in the :meth:`__init__` method instead (where the normal rules for +calling inherited methods apply). + +Any initialisation which cannot safely be done in the :meth:`__cinit__` method should +be done in the :meth:`__init__` method. By the time :meth:`__init__` is called, the object is +a fully valid Python object and all operations are safe. Under some +circumstances it is possible for :meth:`__init__` to be called more than once or not +to be called at all, so your other methods should be designed to be robust in +such situations. + +Any arguments passed to the constructor will be passed to both the +:meth:`__cinit__` method and the :meth:`__init__` method. If you anticipate +subclassing your extension type in Python, you may find it useful to give the +:meth:`__cinit__` method `*` and `**` arguments so that it can accept and +ignore extra arguments. Otherwise, any Python subclass which has an +:meth:`__init__` with a different signature will have to override +:meth:`__new__`[#] as well as :meth:`__init__`, which the writer of a Python +class wouldn't expect to have to do. Alternatively, as a convenience, if you declare +your :meth:`__cinit__`` method to take no arguments (other than self) it +will simply ignore any extra arguments passed to the constructor without +complaining about the signature mismatch. + +.. Note: Older Cython files may use :meth:`__new__` rather than :meth:`__cinit__`. The two are synonyms. + The name change from :meth:`__new__` to :meth:`__cinit__` was to avoid + confusion with Python :meth:`__new__` (which is an entirely different + concept) and eventually the use of :meth:`__new__` in Cython will be + disallowed to pave the way for supporting Python-style :meth:`__new__` + +.. [#] http://docs.python.org/reference/datamodel.html#object.__new__ + +Finalization method: :meth:`__dealloc__` +---------------------------------------- + +The counterpart to the :meth:`__cinit__` method is the :meth:`__dealloc__` +method, which should perform the inverse of the :meth:`__cinit__` method. Any +C data that you explicitly allocated (e.g. via malloc) in your +:meth:`__cinit__` method should be freed in your :meth:`__dealloc__` method. + +You need to be careful what you do in a :meth:`__dealloc__` method. By the time your +:meth:`__dealloc__` method is called, the object may already have been partially +destroyed and may not be in a valid state as far as Python is concerned, so +you should avoid invoking any Python operations which might touch the object. +In particular, don't call any other methods of the object or do anything which +might cause the object to be resurrected. It's best if you stick to just +deallocating C data. + +You don't need to worry about deallocating Python attributes of your object, +because that will be done for you by Cython after your :meth:`__dealloc__` method +returns. + +.. Note: There is no :meth:`__del__` method for extension types. + +Arithmetic methods +------------------- + +Arithmetic operator methods, such as :meth:`__add__`, behave differently from their +Python counterparts. There are no separate "reversed" versions of these +methods (:meth:`__radd__`, etc.) Instead, if the first operand cannot perform the +operation, the same method of the second operand is called, with the operands +in the same order. + +This means that you can't rely on the first parameter of these methods being +"self" or being the right type, and you should test the types of both operands +before deciding what to do. If you can't handle the combination of types you've +been given, you should return `NotImplemented`. + +This also applies to the in-place arithmetic method :meth:`__ipow__`. It doesn't apply +to any of the other in-place methods (:meth:`__iadd__`, etc.) which always +take `self` as the first argument. + +Rich comparisons +----------------- + +There are no separate methods for the individual rich comparison operations +(:meth:`__eq__`, :meth:`__le__`, etc.) Instead there is a single method +:meth:`__richcmp__` which takes an integer indicating which operation is to be +performed, as follows: + ++-----+-----+ +| < | 0 | ++-----+-----+ +| == | 2 | ++-----+-----+ +| > | 4 | ++-----+-----+ +| <= | 1 | ++-----+-----+ +| != | 3 | ++-----+-----+ +| >= | 5 | ++-----+-----+ + +The :meth:`__next__` method +---------------------------- + +Extension types wishing to implement the iterator interface should define a +method called :meth:`__next__`, not next. The Python system will automatically +supply a next method which calls your :meth:`__next__`. Do *NOT* explicitly +give your type a :meth:`next` method, or bad things could happen. + +Special Method Table +--------------------- + +This table lists all of the special methods together with their parameter and +return types. In the table below, a parameter name of self is used to indicate +that the parameter has the type that the method belongs to. Other parameters +with no type specified in the table are generic Python objects. + +You don't have to declare your method as taking these parameter types. If you +declare different types, conversions will be performed as necessary. + +General +^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __cinit__ |self, ... | | Basic initialisation (no direct Python equivalent) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __init__ |self, ... | | Further initialisation | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __dealloc__ |self | | Basic deallocation (no direct Python equivalent) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __cmp__ |x, y | int | 3-way comparison | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __richcmp__ |x, y, int op | object | Rich comparison (no direct Python equivalent) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __str__ |self | object | str(self) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __repr__ |self | object | repr(self) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __hash__ |self | int | Hash function | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __call__ |self, ... | object | self(...) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __iter__ |self | object | Return iterator for sequence | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getattr__ |self, name | object | Get attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __setattr__ |self, name, val | | Set attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __delattr__ |self, name | | Delete attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Arithmetic operators +^^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __add__ | x, y | object | binary `+` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __sub__ | x, y | object | binary `-` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __mul__ | x, y | object | `*` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __div__ | x, y | object | `/` operator for old-style division | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __floordiv__ | x, y | object | `//` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __truediv__ | x, y | object | `/` operator for new-style division | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __mod__ | x, y | object | `%` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __divmod__ | x, y | object | combined div and mod | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __pow__ | x, y, z | object | `**` operator or pow(x, y, z) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __neg__ | self | object | unary `-` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __pos__ | self | object | unary `+` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __abs__ | self | object | absolute value | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __nonzero__ | self | int | convert to boolean | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __invert__ | self | object | `~` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __lshift__ | x, y | object | `<<` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __rshift__ | x, y | object | `>>` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __and__ | x, y | object | `&` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __or__ | x, y | object | `|` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __xor__ | x, y | object | `^` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Numeric conversions +^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __int__ | self | object | Convert to integer | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __long__ | self | object | Convert to long integer | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __float__ | self | object | Convert to float | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __oct__ | self | object | Convert to octal | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __hex__ | self | object | Convert to hexadecimal | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __index__ (2.5+ only) | self | object | Convert to sequence index | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +In-place arithmetic operators +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __iadd__ | self, x | object | `+=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __isub__ | self, x | object | `-=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __imul__ | self, x | object | `*=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __idiv__ | self, x | object | `/=` operator for old-style division | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __ifloordiv__ | self, x | object | `//=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __itruediv__ | self, x | object | `/=` operator for new-style division | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __imod__ | self, x | object | `%=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __ipow__ | x, y, z | object | `**=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __ilshift__ | self, x | object | `<<=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __irshift__ | self, x | object | `>>=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __iand__ | self, x | object | `&=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __ior__ | self, x | object | `|=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __ixor__ | self, x | object | `^=` operator | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Sequences and mappings +^^^^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __len__ | self int | | len(self) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getitem__ | self, x | object | self[x] | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __setitem__ | self, x, y | | self[x] = y | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __delitem__ | self, x | | del self[x] | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getslice__ | self, Py_ssize_t i, Py_ssize_t j | object | self[i:j] | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __setslice__ | self, Py_ssize_t i, Py_ssize_t j, x | | self[i:j] = x | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __delslice__ | self, Py_ssize_t i, Py_ssize_t j | | del self[i:j] | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __contains__ | self, x | int | x in self | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Iterators +^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __next__ | self | object | Get next item (called next in Python) | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Buffer interface (no Python equivalents - see note 1) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __getreadbuffer__ | self, int i, void `**p` | | | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getwritebuffer__ | self, int i, void `**p` | | | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getsegcount__ | self, int `*p` | | | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __getcharbuffer__ | self, int i, char `**p` | | | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +Descriptor objects (see note 2) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| Name | Parameters | Return type | Description | ++=======================+=======================================+=============+=====================================================+ +| __get__ | self, instance, class | object | Get value of attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __set__ | self, instance, value | | Set value of attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ +| __delete__ | self, instance | | Delete attribute | ++-----------------------+---------------------------------------+-------------+-----------------------------------------------------+ + +.. note:: (1) The buffer interface is intended for use by C code and is not directly + accessible from Python. It is described in the Python/C API Reference Manual + under sections 6.6 and 10.6. + +.. note:: (2) Descriptor objects are part of the support mechanism for new-style + Python classes. See the discussion of descriptors in the Python documentation. + See also PEP 252, "Making Types Look More Like Classes", and PEP 253, + "Subtyping Built-In Types". + diff --git a/src/userguide/tutorial.rst b/src/userguide/tutorial.rst new file mode 100644 index 00000000..47766e2b --- /dev/null +++ b/src/userguide/tutorial.rst @@ -0,0 +1,171 @@ +.. highlight:: cython + +.. _tutorial: + +********* +Tutorial +********* + +The Basics of Cython +==================== + +The fundamental nature of Cython can be summed up as follows: Cython is Python +with C data types. + +Cython is Python: Almost any piece of Python code is also valid Cython code. +(There are a few :ref:`cython-limitations`, but this approximation will +serve for now.) The Cython compiler will convert it into C code which makes +equivalent calls to the Python/C API. + +But Cython is much more than that, because parameters and variables can be +declared to have C data types. Code which manipulates Python values and C +values can be freely intermixed, with conversions occurring automatically +wherever possible. Reference count maintenance and error checking of Python +operations is also automatic, and the full power of Python's exception +handling facilities, including the try-except and try-finally statements, is +available to you -- even in the midst of manipulating C data. + + + +Cython Hello World +=================== + +As Cython can accept almost any valid python source file, one of the hardest +things in getting started is just figuring out how to compile your extension. + +So lets start with the canonical python hello world:: + + print "Hello World" + +So the first thing to do is rename the file to :file:`helloworld.pyx`. Now we +need to make the :file:`setup.py`, which is like a python Makefile (for more +information see :ref:`compilation`). Your :file:`setup.py` should look like:: + + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + setup( + cmdclass = {'build_ext': build_ext}, + ext_modules = [Extension("helloworld", ["helloworld.pyx"])] + ) + +To use this to build your Cython file use the commandline options: + +.. sourcecode:: text + + $ python setup.py build_ext --inplace + +Which will leave a file in your local directory called :file:`helloworld.so` in unix +or :file:`helloworld.dll` in Windows. Now to use this file: start the python +interpreter and simply import it as if it was a regular python module:: + + >>> import helloworld + Hello World + +Congratulations! You now know how to build a Cython extension. But So Far +this example doesn't really give a feeling why one would ever want to use Cython, so +lets create a more realistic example. + +:mod:`pyximport`: Cython Compilation the Easy Way +================================================== + +If your module doesn't require any extra C libraries or a special +build setup, then you can use the pyximport module by Paul Prescod and +Stefan Behnel to load .pyx files directly on import, without having to +write a :file:`setup.py` file. It is shipped and installed with +Cython and can be used like this:: + + >>> import pyximport; pyximport.install() + >>> import helloworld + Hello World + +Since Cython 0.11, the :mod:`pyximport` module also has experimental +compilation support for normal Python modules. This allows you to +automatically run Cython on every .pyx and .py module that Python +imports, including the standard library and installed packages. +Cython will still fail to compile a lot of Python modules, in which +case the import mechanism will fall back to loading the Python source +modules instead. The .py import mechanism is installed like this:: + + >>> pyximport.install(pyimport = True) + +Fibonacci Fun +============== + +From the official Python tutorial a simple fibonacci function is defined as: + +.. literalinclude:: ../examples/tutorial/fib1/fib.pyx + +Now following the steps for the Hello World example we first rename the file +to have a `.pyx` extension, lets say :file:`fib.pyx`, then we create the +:file:`setup.py` file. Using the file created for the Hello World example, all +that you need to change is the name of the Cython filename, and the resulting +module name, doing this we have: + +.. literalinclude:: ../examples/tutorial/fib1/setup.py + +Build the extension with the same command used for the helloworld.pyx: + +.. sourcecode:: text + + $ python setup.py build_ext --inplace + +And use the new extension with:: + + >>> import fib + >>> fib.fib(2000) + 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 + +Primes +======= + +Here's a small example showing some of what can be done. It's a routine for +finding prime numbers. You tell it how many primes you want, and it returns +them as a Python list. + +:file:`primes.pyx`: + +.. literalinclude:: ../examples/tutorial/primes/primes.pyx + :linenos: + +You'll see that it starts out just like a normal Python function definition, +except that the parameter ``kmax`` is declared to be of type ``int`` . This +means that the object passed will be converted to a C integer (or a +``TypeError.`` will be raised if it can't be). + +Lines 2 and 3 use the ``cdef`` statement to define some local C variables. +Line 4 creates a Python list which will be used to return the result. You'll +notice that this is done exactly the same way it would be in Python. Because +the variable result hasn't been given a type, it is assumed to hold a Python +object. + +Lines 7-9 set up for a loop which will test candidate numbers for primeness +until the required number of primes has been found. Lines 11-12, which try +dividing a candidate by all the primes found so far, are of particular +interest. Because no Python objects are referred to, the loop is translated +entirely into C code, and thus runs very fast. + +When a prime is found, lines 14-15 add it to the p array for fast access by +the testing loop, and line 16 adds it to the result list. Again, you'll notice +that line 16 looks very much like a Python statement, and in fact it is, with +the twist that the C parameter ``n`` is automatically converted to a Python +object before being passed to the append method. Finally, at line 18, a normal +Python return statement returns the result list. + +Compiling primes.pyx with the Cython compiler produces an extension module +which we can try out in the interactive interpreter as follows:: + + >>> import primes + >>> primes.primes(10) + [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] + +See, it works! And if you're curious about how much work Cython has saved you, +take a look at the C code generated for this module. + +Language Details +================ + +For more about the Cython language, see :ref:`language-basics`. +To dive right in to using Cython in a numerical computation context, see :ref:`numpy_tutorial`. + diff --git a/src/userguide/wrapping_CPlusPlus.rst b/src/userguide/wrapping_CPlusPlus.rst new file mode 100644 index 00000000..99db8567 --- /dev/null +++ b/src/userguide/wrapping_CPlusPlus.rst @@ -0,0 +1,318 @@ +.. highlight:: cython + +.. _wrapping-cplusplus: + +******************************** +Wrapping C++ Classes in Cython +******************************** + +Overview +========= + +This page aims to get you quickly up to speed so you can wrap C++ interfaces +with a minimum of pain and 'surprises'. + +In the past, Pyrex only supported wrapping of C APIs, and not C++. To wrap +C++, one had to write a pure-C shim, containing functions for +constructors/destructors and method invocations. Object pointers were passed +around as opaque void pointers, and cast to/from object pointers as needed. +This approach did work, but it got awfully messy and error-prone when trying +to wrap APIs with large class hierarchies and lots of inheritance. + +These days, though, Pyrex offers an adequate bare minimum of C++ support, +which Cython has inherited. The approach described in this document will help +you wrap a lot of C++ code with only moderate effort. There are some +limitations, which we will discuss at the end of the document. + +Procedure Overview +==================== + +* Specify C++ language in :file:`setup.py` script +* Create ``cdef extern from`` blocks and declare classes as + ``ctypedef struct`` blocks +* Create constructors and destructors +* Add class methods as function pointers +* Create Cython wrapper class + +An example C++ API +=================== + +Here is a tiny C++ API which we will use as an example throughout this +document. Let's assume it will be in a header file called +:file:`Rectangle.h`: + +.. sourcecode:: c++ + + class Rectangle { + public: + int x0, y0, x1, y1; + Rectangle(int x0, int y0, int x1, int y1); + ~Rectangle(); + int getLength(); + int getHeight(); + int getArea(); + void move(int dx, int dy); + }; + +This is pretty dumb, but should suffice to demonstrate the steps involved. + +Specify C++ language in setup.py +================================= + +In Cython :file:`setup.py` scripts, one normally instantiates an Extension +object. To make Cython generate and compile a C++ source, you just need +to add a keyword to your Extension construction statement, as in:: + + ext = Extension( + "rectangle", # name of extension + ["rectangle.pyx"], # filename of our Cython source + language="c++", # this causes Cython to create C++ source + include_dirs=[...], # usual stuff + libraries=[...], # ditto + extra_link_args=[...], # if needed + cmdclass = {'build_ext': build_ext} + ) + +With the language="c++" keyword, Cython distutils will generate a C++ file. + +Create cdef extern from block +============================== + +The procedure for wrapping a C++ class is quite similar to that for wrapping +normal C structs, with a couple of additions. Let's start here by creating the +basic ``cdef extern from`` block:: + + cdef extern from "Rectangle.h": + +This will make the C++ class def for Rectangle available. + +Declare class as a ctypedef struct +----------------------------------- + +Now, let's add the Rectangle class to this extern from block -- just copy the +class def from :file:`Rectangle.h` and adjust for Cython syntax, so now it +becomes:: + + cdef extern from "Rectangle.h": + # known in Cython namespace as 'c_Rectangle' but in C++ as 'Rectangle' + ctypedef struct c_Rectangle "Rectangle": + int x0, y0, x1, y1 + +We don't have any way of accessing the constructor/destructor or methods, but +we'll cover this now. + +Add constructors and destructors +---------------------------------- + +We now need to expose a constructor and destructor into the Cython +namespace. Again, we'll be using C name specifications:: + + cdef extern from "Rectangle.h": + ctypedef struct c_Rectangle "Rectangle": + int x0, y0, x1, y1 + c_Rectangle *new_Rectangle "new Rectangle" (int x0, int y0, int x1, int y1) + void del_Rectangle "delete" (c_Rectangle *rect) + +Add class methods +------------------- + +Now, let's add the class methods. You can circumvent Cython syntax +limitations by declaring these as function pointers. Recall that in the C++ +class we have: + +.. sourcecode:: c++ + + int getLength(); + int getHeight(); + int getArea(); + void move(int dx, int dy); + +So if we convert each of these to function pointers and stick them in our +extern block, we now get:: + + cdef extern from "Rectangle.h": + ctypedef struct c_Rectangle "Rectangle": + int x0, y0, x1, y1 + int getLength() + int getHeight() + int getArea() + void move(int dx, int dy) + c_Rectangle *new_Rectangle "new Rectangle" (int x0, int y0, int x1, int y1) + void del_Rectangle "delete" (c_Rectangle *rect) + +This will fool Cython into generating C++ method calls even though +Cython is mostly oblivious to C++. + +In Pyrex you must explicitly declare these as function pointers, i.e. +``(int *getArea)()``. + +Create Cython wrapper class +============================= + +At this point, we have exposed into our pyx file's namespace a struct which +gives us access to the interface of a C++ Rectangle type. Now, we need to make +this accessible from external Python code (which is our whole point). + +Common programming practice is to create a Cython extension type which +holds a C++ instance pointer as an attribute ``thisptr``, and create a bunch of +forwarding methods. So we can implement the Python extension type as:: + + cdef class Rectangle: + cdef c_Rectangle *thisptr # hold a C++ instance which we're wrapping + def __cinit__(self, int x0, int y0, int x1, int y1): + self.thisptr = new_Rectangle(x0, y0, x1, y1) + def __dealloc__(self): + del_Rectangle(self.thisptr) + def getLength(self): + return self.thisptr.getLength() + def getHeight(self): + return self.thisptr.getHeight() + def getArea(self): + return self.thisptr.getArea() + def move(self, dx, dy): + self.thisptr.move(dx, dy) + +And there we have it. From a Python perspective, this extension type will look +and feel just like a natively defined Rectangle class. If you want to give +attribute access, you could just implement some properties:: + + property x0: + def __get__(self): return self.thisptr.x0 + def __set__(self, x0): self.thisptr.x0 = x0 + ... + +Caveats and Limitations +======================== + +In this document, we have discussed a relatively straightforward way of +wrapping C++ classes with Cython. However, there are some limitations in +this approach, some of which could be overcome with clever workarounds (anyone +here want to share some?), but some of which will require new features in +Cython. + +The major limitations I'm most immediately aware of (and there will be many +more) include: + +Overloading +------------ + +Presently, it's not easy to overload methods or constructors, but there may be +a workaround if you try some creative C name specifications + +Access to C-only functions +--------------------------- + +Whenever generating C++ code, Cython generates declarations of and calls +to functions assuming these functions are C++ (ie, not declared as extern "C" +{...} . This is ok if the C functions have C++ entry points, but if they're C +only, you will hit a roadblock. If you have a C++ Cython module needing +to make calls to pure-C functions, you will need to write a small C++ shim +module which: + +* includes the needed C headers in an extern "C" block +* contains minimal forwarding functions in C++, each of which calls the + respective pure-C function + +Inherited C++ methods +---------------------- + +If you have a class ``Foo`` with a child class ``Bar``, and ``Foo`` has a +method :meth:`fred`, then you'll have to cast to access this method from +``Bar`` objects. +For example:: + + class MyClass: + Bar *b + ... + def myfunc(self): + ... + b.fred() # wrong, won't work + ((self.b)).fred() # should work, Cython now thinks it's a 'Foo' + +It might take some experimenting by others (you?) to find the most elegant +ways of handling this issue. + +Advanced C++ features +---------------------- + +Exceptions +^^^^^^^^^^^ + +Cython cannot throw C++ exceptions, or catch them with a try-except statement, +but it is possible to declare a function as potentially raising an C++ +exception and converting it into a Python exception. For example, :: + + cdef extern from "some_file.h": + cdef int foo() except + + +This will translate try and the C++ error into an appropriate Python exception +(currently an IndexError on std::out_of_range and a RuntimeError otherwise +(preserving the what() message). :: + + cdef int bar() except +MemoryError + +This will catch any C++ error and raise a Python MemoryError in its place. +(Any Python exception is valid here.) :: + + cdef int raise_py_error() + cdef int something_dangerous() except +raise_py_error + +If something_dangerous raises a C++ exception then raise_py_error will be +called, which allows one to do custom C++ to Python error "translations." If +raise_py_error does not actually raise an exception a RuntimeError will be +raised. + +Templates +^^^^^^^^^^ + +Cython does not natively understand C++ templates but we can put them to use +in some way. As an example consider an STL vector of C ints:: + + cdef extern from "some .h file which includes ": + ctypedef struct intvec "std::vector": + void (* push_back)(int elem) + intvec intvec_factory "std::vector"(int len) + +now we can use the vector like this:: + + cdef intvec v = intvec_factory(2) + v.push_back(2) + +Overloading +^^^^^^^^^^^^ + +To support function overloading simply add a different alias to each +signature, so if you have e.g. + +.. sourcecode:: c++ + + int foo(int a); + int foo(int a, int b); + +in your C++ header then interface it like this in your :: + + int fooi "foo"(int) + int fooii "foo"(int, int) + +Operators +^^^^^^^^^^ + +Some operators (e.g. +,-,...) can be accessed from Cython like this:: + + ctypedef struct c_Rectangle "Rectangle": + c_Rectangle add "operator+"(c_Rectangle right) + +Declaring/Using References +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Question: How do you declare and call a function that takes a reference as an argument? + +Conclusion +============ + +A great many existing C++ classes can be wrapped using these techniques, in a +way much easier than writing a large messy C shim module. There's a bit of +manual work involved, and an annoying maintenance burden if the C++ library +you're wrapping is frequently changing, but this recipe should hopefully keep +the discomfort to a minimum. +