From: Greg Hudson Date: Mon, 13 Feb 2012 22:41:25 +0000 (+0000) Subject: Add explanatory README for ASN.1 infrastructure X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=995acf62fc1eaeb96bdcb0b49130676d2cc5a5e4;p=krb5.git Add explanatory README for ASN.1 infrastructure Since we're not yet at the point of having an ASN.1 compiler for our ASN.1 encoder, create a document explaining how to write macro invocations for type descriptors from an ASN.1 module. git-svn-id: svn://anonsvn.mit.edu/krb5/trunk@25699 dc483132-0cff-0310-8789-dd5450dbe970 --- diff --git a/src/lib/krb5/asn.1/README.asn1 b/src/lib/krb5/asn.1/README.asn1 new file mode 100644 index 000000000..2a033f8c4 --- /dev/null +++ b/src/lib/krb5/asn.1/README.asn1 @@ -0,0 +1,560 @@ +These notes attempt to explain how to use the ASN.1 infrastructure to +add new ASN.1 types. ASN.1 is complicated and easy to get wrong, so +it's best to verify your results against another tool (such as asn1c) +if at all possible. These notes are up to date as of 2012-02-13. + +If you are trying to debug a problem which shows up in the ASN.1 +encoder or decoder, skip to the last section. + +General +------- + +For the moment, a developer must hand-translate the ASN.1 module into +macro invocations which generate data structures used by the encoder +and decoder. Ideally we would have a tool to compile an ASN.1 module +(and probably some additional information about C identifier mappings) +and generate the macro invocations. + +Currently the ASN.1 infrastructure is not visible to applications or +plugins. For plugin modules shipped as part of the krb5 tree, the +types can be added to asn1_k_encode.c and exported from libkrb5. +Plugin modules built separately from the krb5 tree must use another +tool (such as asn1c) for now if they need to do ASN.1 encoding or +decoding. + +Tags +---- + +Before you start writing macro invocations, it's important to +understand a little bit about ASN.1 tags. You will most commonly see +tag notation in a sequence definition, like: + + TypeName ::= SEQUENCE { + field-name [0] IMPLICIT OCTET STRING OPTIONAL + } + +Contrary to intuition, the tag notation "[0] IMPLICIT" is not a +property of the sequence field; instead, it specifies a type which is +wraps the type named to the right (OCTET STRING). The right way to +think about the above definition is: + + TypeName is defined as a sequence type + which has an optional field named field-name + whose type is a tagged type + the tag's class is context-specific (by default) + the tag's number is 0 + it is an implicit tag + the tagged type wraps OCTET STRING + +The other case you are likely to see tag notation is something like: + + AS-REQ ::= [APPLICATION 10] KDC-REQ + +This example defines AS-REQ to be a tagged type whose class is +application, whose tag number is 10, and whose base type is KDC-REQ. +The tag may be implicit or explicit depending on the module's tag +environment, which we'll get to in a moment. + +Tags can have one of four classes: universal, application, private, +and context-specific. Universal tags are used for built-in ASN.1 +types. Application and context-specific tags are the most common to +see in ASN.1 modules; private is rarely used. If no tag class is +specified, the default is context-specific. + +Tags can be explicit or implicit, and the distinction is important to +the wire encoding. If a tag's closing bracket is followed by word +IMPLICIT or EXPLICIT, then it's clear which kind of tag it is, but +usually there will be no such annotation. If not, the default depends +on the header of the ASN.1 module. Look at the top of the module for +the word DEFINITIONS. It may be followed by one of three phrases: + +* EXPLICIT TAGS -- in this case, tags default to explicit +* IMPLICIT TAGS -- in this case, tags default to implicit (usually) +* AUTOMATIC TAGS -- tags default to implicit (usually) and are also + automatically added to sequence fields (usually) + +If none of those phrases appear, the default is explicit tags. + +Even if a module defaults to implicit tags, a tag defaults to explicit +if its base type is a choice type or ANY type (or the information +object equivalent of an ANY type). + +If the module's default is AUTOMATIC TAGS, sequence and set fields +should have ascending context-specific tags wrapped around the field +types, starting from 0, unless one of the fields of the sequence or +set is already a tagged type. See ITU X.680 section 24.2 for details, +particularly if COMPONENTS OF is used in the sequence definition. + +Basic types +----------- + +In our infrastructure, a type descriptor specifies a mapping between +an ASN.1 type and a C type. The first step is to ensure that type +descriptors are defined for the basic types used by your ASN.1 module, +as mapped to the C types used in your structures, in asn1_k_encode.c. +If not, you'll need to create it. For a BOOLEAN or INTEGER ASN.1 +type, you'll use one of these macros: + + DEFBOOLTYPE(descname, ctype) + DEFINTTYPE(descname, ctype) + DEFUINTTYPE(descname, ctype) + +where "descname" is an identifier you make up and "ctype" is the +integer type of the C object you want to map the ASN.1 value to. For +integers, use DEFINTTYPE if the C type is a signed integer type and +DEFUINTTYPE if it is an unsigned type. (For booleans, the distinction +is unimportant since all integer types can hold the values 0 and 1.) +We don't generally define integer mappings for every typedef name of +an integer type. For example, we use the type descriptor int32, which +maps an ASN.1 INTEGER to a krb5_int32, for krb5_enctype values. + +String types are a little more complicated. Our practice is to store +strings in a krb5_data structure (rather than a zero-terminated C +string), so our infrastructure currently assumes that all strings are +represented as "counted types", meaning the C representation is a +combination of a pointer and an integer type. So, first you must +declare a counted type descriptor (we'll describe those in more detail +later) with something like: + + DEFCOUNTEDSTRINGTYPE(generalstring, char *, unsigned int, + k5_asn1_encode_bytestring, k5_asn1_decode_bytestring, + ASN1_GENERALSTRING); + +The first parameter is an identifier you make up. The second and +third parameters are the C types of the pointer and integer holding +the string; for a krb5_data object, those should be the types in the +example. The pointer type must be char * or unsigned char *. The +fourth and fifth parameters reference primitive encoder and decoder +functions; these should almost always be the ones in the example, +unless the ASN.1 type is BIT STRING. The sixth parameter is the +universal tag number of the ASN.1 type, as defined in krbasn1.h. + +Once you have defined the counted type, you can define a normal type +descriptor to wrap it in a krb5_data structure with something like: + + DEFCOUNTEDTYPE(gstring_data, krb5_data, data, length, generalstring); + +Sequences +--------- + +In our infrastructure, we model ASN.1 sequences using an array of +normal type descriptors. Each type descriptor is applied in turn to +the C object to generate (or consume) an encoding of an ASN.1 value. + +Of course, each value needs to be stored in a different place within +the C object, or they would just overwrite each other. To address +this, you must create an offset type wrapper for each sequence field: + + DEFOFFSETTYPE(descname, structuretype, fieldname, basedesc) + +where "descname" is an identifier you make up, "structuretype" and +"fieldtype" are used to compute the offset and type-check the +structure field, and "basedesc" is the type of the ASN.1 object to be +stored at that offset. + +If your C structure contains a pointer to another C object, you'll +need to first define a pointer wrapper, which is very simple: + + DEFPTRTYPE(descname, basedesc) + +Then wrap the defined pointer type in an offset type as described +above. Once a pointer descriptor is defined for a base descriptor, it +can be reused many times, so pointer descriptors are usually defined +right after the types they wrap. When decoding, pointer wrappers +cause a pointer to be allocated with a block of memory equal to the +size of the C type corresponding to the base type. (For offset types, +the corresponding C type is the structure type inside which the offset +is computed.) It is okay for several fields of a sequence to +reference the same pointer field within a structure, as long as the +pointer types all wrap base types with the same corresponding C type. + +If the sequence field has a context tag attached to its type, you'll +also need to create a tag wrapper for it: + + DEFCTAGGEDTYPE(descname, tagnum, basedesc) + DEFCTAGGEDTYPE_IMPLICIT(descname, tagnum, basedesc) + +Use the first macro for explicit context tags and the second for +implicit context tags. "tagnum" is the number of the context-specific +tag, and "basedesc" is the name you chose for the offset type above. + +You don't actually need to separately write out DEFOFFSETTYPE and +DEFCTAGGEDTYPE for each field. The combination of offset and context +tag is so common that we have a macro to combine them: + + DEFFIELD(descname, structuretype, fieldname, tagnum, basedesc) + DEFFIELD_IMPLICIT(descname, structuretype, fieldname, tagnum, basedesc) + +Once you have defined tag and offset wrappers for each sequence field, +combine them together in an array and use the DEFSEQTYPE macro to +define the sequence type descriptor: + + static const struct atype_info *my_sequence_fields[] = { + &k5_atype_my_sequence_0, &k5_atype_my_sequence_1, + }; + DEFSEQTYPE(my_sequence, structuretype, my_sequence_fields) + +Each field name must by prefixed by "&k5_atype_" to get a pointer to +the actual variable used to hold the type descriptor. + +ASN.1 sequence types may or may not be defined to be extensible, and +may group extensions together in blocks which must appear together. +Our model does not distinguish these cases. Our decoder treats all +sequence types as extensible. Extension blocks must be modeled by +making all of the extension fields optional, and the decoder will not +enforce that they appear together. + +If your ASN.1 sequence contains optional fields, keep reading. + +Optional sequence fields +------------------------ + +ASN.1 sequence fields can be annotated with OPTIONAL or, less +commonly, with DEFAULT VALUE. (Be aware that if DEFAULT VALUE is +specified for a sequence field, DER mandates that fields with that +value not be encoded within the sequence. Most standards in the +Kerberos ecosystem avoid the use of DEFAULT VALUE for this reason.) +Although optionality is a property of sequence or set fields, not +types, we still model optional sequence fields using type wrappers. +Optional type wrappers must only be used as members of a sequence, +although they can be nested in offset or pointer wrappers first. + +The simplest way to represent an optional value in a C structure is +with a pointer which takes the value NULL if the field is not present. +In this case, you can just use DEFOPTIONALZEROTYPE to wrap the pointer +type: + + DEFPTRTYPE(ptr_basetype, basetype); + DEFOPTIONALZEROTYPE(opt_ptr_basetype, ptr_basetype); + +and then use opt_ptr_basetype in the DEFFIELD invocation for the +sequence field. DEFOPTIONALZEROTYPE can also be used for integer +types, if it's okay for the value 0 to represent that the +corresponding ASN.1 value is omitted. Optional-zero wrappers, like +pointer wrappers, are usually defined just after the types they wrap. + +For null-terminated sequences, you can use a wrapper like this: + + DEFOPTIONALEMPTYTYPE(opt_seqof_basetype, seqof_basetype) + +to omit the sequence if it is either NULL or of zero length. + +A more general way to wrap optional types is: + + DEFOPTIONALTYPE(descname, predicatefn, initfn, basedesc); + +where "predicatefn" has the signature "int (*fn)(const void *p)" and +is used by the encoder to test whether the ASN.1 value is present in +the C object. "initfn" has the signature "void (*fn)(void *p)" and is +used by the decoder to initialize the C object field if the +corresponding ASN.1 value is omitted in the wire encoding. "initfn" +can be NULL, in which case the C object will simply be left alone. +All C objects are initialized to zero-filled memory when they are +allocated by the decoder. + +An optional string type, represented in a krb5_data structure, can be +wrapped using the nonempty_data function already defined in +asn1_k_encode.c, like so: + + DEFOPTIONALTYPE(opt_ostring_data, nonempty_data, NULL, ostring_data); + +Sequence-of types +----------------- + +ASN.1 sequence-of types can be represented as C types in two ways. +The simplest is to use an array of pointers terminated in a null +pointer. A descriptor for a sequence-of represented this way is +defined in three steps: + + DEFPTRTYPE(ptr_basetype, basetype); + DEFNULLTERMSEQOFTYPE(seqof_basetype, ptr_basetype); + DEFPTRTYPE(ptr_seqof_basetype, seqof_basetype); + +If the C type corresponding to basetype is "ctype", then the C type +corresponding to ptr_seqof_basetype will be "ctype **". The middle +type sort of corresponds to "ctype *", but not exactly, as it +describes an object of variable size. + +You can also use DEFNONEMPTYNULLTERMSEQOFTYPE in the second step. In +this case, the encoder will throw an error if the sequence is empty. +For historical reasons, the decoder will *not* throw an error if the +sequence is empty, so the calling code must check before assuming a +first element is present. + +The other way of representing sequences is through a combination of +pointer and count. This pattern is most often used for compactness +when the base type is an integer type. A descriptor for a sequence-of +represented this way is defined using a counted type descriptor: + + DEFCOUNTEDSEQOFTYPE(descname, lentype, basedesc) + +where "lentype" is the C type of the length and "basedesc" is a +pointer wrapper for the sequence element type (*not* the element type +itself). For example, an array of 32-bit signed integers is defined +as: + + DEFINTTYPE(int32, krb5_int32); + DEFPTRTYPE(int32_ptr, int32); + DEFCOUNTEDSEQOFTYPE(cseqof_int32, krb5_int32, int32_ptr); + +To use a counted sequence-of type in a sequence, you use DEFCOUNTEDTYPE: + + DEFCOUNTEDTYPE(descname, structuretype, ptrfield, lenfield, cdesc) + +where "structuretype", "ptrfield", and "lenfield" are used to compute +the field offsets and type-check the structure fields, and "cdesc" is +the name of the counted type descriptor. + +The combination of DEFCOUNTEDTYPE and DEFCTAGGEDTYPE can be +abbreviated using DEFCNFIELD: + + DEFCNFIELD(descname, structuretype, ptrfield, lenfield, tagnum, cdesc) + +Tag wrappers +------------ + +We've previously covered DEFCTAGGEDTYPE and DEFCTAGGEDTYPE_IMPLICIT, +which are used to define context-specific tag wrappers. There are +two other macros for creating tag wrappers. The first is: + + DEFAPPTAGGEDTYPE(descname, tagnum, basedesc) + +Use this macro to model an "[APPLICATION tagnum]" tag wrapper in an +ASN.1 module. + +There is also a general tag wrapper macro: + + DEFTAGGEDTYPE(descname, class, construction, tag, implicit, basedesc) + +where "class" is one of UNIVERSAL, APPLICATION, CONTEXT_SPECIFIC, or +PRIVATE, "construction" is one of PRIMITIVE or CONSTRUCTED, "tag" is +the tag number, "implicit" is 1 for an implicit tag and 0 for an +explicit tag, and "basedesc" is the wrapped type. Note that that +primitive vs. constructed is not a concept within the abstract ASN.1 +type model, but is instead a concept used in DER. In general, all +explicit tags should be constructed (but see the section on "Dirty +tricks" below). The construction parameter is ignored for implicit +tags. + +Choice types +------------ + +ASN.1 CHOICE types are represented in C using a signed integer +distinguisher and a union. Modeling a choice type happens in three +steps: + +1. Define type descriptors for each alternative of the choice, +typically using DEFCTAGGEDTYPE to create a tag wrapper for an existing +type. There is no need to create offset type wrappers, as union +fields always have an offset of 0. For example: + + DEFCTAGGEDTYPE(my_choice_0, 0, firstbasedesc); + DEFCTAGGEDTYPE(my_choice_1, 1, secondbasedesc); + +2. Assemble them into an array, similar to how you would for a +sequence, and use DEFCHOICETYPE to create a counted type descriptor: + + static const struct atype_info *my_choice_alternatives[] = { + &k5_atype_my_choice_0, &k5_atype_my_choice_1 + }; + DEFCHOICETYPE(my_choice, union my_choice_choices, enum my_choice_selector, + my_choice_alternatives); + +The second and third parameters to DEFCHOICETYPE are the C types of +the union and distinguisher fields. + +3. Wrap the counted type descriptor in a type descriptor for the +structure containing the distinguisher and union: + + DEFCOUNTEDTYPE_SIGNED(descname, structuretype, u, choice, my_choice); + +The third and fourth parameters to DEFCOUNTEDTYPE_SIGNED are the field +names of the union and distinguisher fields within structuretype. + +ASN.1 choice types may be defined to be extensible, or may not be. +Our model does not distinguish between the two cases. Our decoder +treats all choice types as extensible. + +Our encoder will throw an error if the distinguisher is not within the +range of valid offsets of the alternatives array. Our decoder will +set the distinguisher to -1 if the tag of the ASN.1 value is not +matched by any of the alternatives, and will leave the union +zero-filled in that case. + +Counted type descriptors +------------------------ + +Several times in earlier sections we've referred to the notion of +"counted type descriptors" without defining what they are. Counted +type descriptors live in a separate namespace from normal type +descriptors, and specify a mapping between an ASN.1 type and two C +objects, one of them having integer type. There are four kinds of +counted type descriptors, defined using the following macros: + + DEFCOUNTEDSTRINGTYPE(descname, ptrtype, lentype, encfn, decfn, tagnum) + DEFCOUNTEDDERTYPE(descname, ptrtype, lentype) + DEFCOUNTEDSEQOFTYPE(descname, lentype, baseptrdesc) + DEFCHOICETYPE(descname, uniontype, distinguishertype, fields) + +DEFDERTYPE is described in the "Dirty tricks" section below. The +other three kinds of counted types have been covered previously. + +Counted types are always used by wrapping them in a normal type +descriptor with one of these macros: + + DEFCOUNTEDTYPE(descname, structuretype, datafield, countfield, cdesc) + DEFCOUNTEDTYPE_SIGNED(descname, structuretype, datafield, countfield, cdesc) + +These macros are similar in concept to an offset type, only with two +offsets. Use DEFCOUNTEDTYPE if the count field is unsigned or +DEFCOUNTEDTYPE_SIGNED if it is signed. + +Defining encoder and decoder functions +-------------------------------------- + +After you have created a type descriptor for your types, you need to +create encoder or decoder functions for the ones you want calling code +to be able to process. Do this with one of the following macros: + + MAKE_ENCODER(funcname, desc) + MAKE_DECODER(funcname, desc) + MAKE_CODEC(typename, desc) + +MAKE_ENCODER and MAKE_DECODER allow you to choose function names. +MAKE_CODEC defines encoder and decoder functions with the names +"encode_typename" and "decode_typename". + +If you are defining functions for a null-terminated sequence, use the +descriptor created with DEFNULLTERMSEQOFTYPE or +DEFNONEMPTYNULLTERMSEQOFTYPE, rather than the pointer to it. This is +because encoder and decoder functions implicitly traffic in pointers +to the C object being encoded or decoded. + +Encoder and decoder functions must be prototyped separately, either in +k5-int.h or in a subsidiary included by it. Encoder functions have +the prototype: + + krb5_error_code encode_typename(const ctype *rep, krb5_data **code_out); + +where "ctype" is the C type corresponding to desc. Decoder functions +have the prototype: + + krb5_error_code decode_typename(const krb5_data *code, ctype **rep_out); + +Decoder functions allocate a container for the C type of the object +being decoded and return a pointer to it in *rep_out. + +Writing test cases +------------------ + +New ASN.1 types in libkrb5 will typically only be accepted with test +cases. Our current test framework lives in src/tests/asn.1. Adding +new types to this framework involves the following steps: + +1. Define an initializer for a sample value of the type in ktest.c, +named ktest_make_sample_typename(). Also define a contents-destructor +for it, named ktest_empty_typename(). Prototype these functions in +ktest.h. + +2. Define an equality test for the type in ktest_equal.c. Prototype +this in ktest_equal.h. (This step is not necessary if the type has no +decoder.) + +3. Add a test case to krb5_encode_test.c, following the examples of +existing test cases there. Update reference_encode.out and +trval_reference.out to contain the output generated by your test case. + +4. Add a test case to krb5_decode_test.c, following the examples of +existing test cases there, and using the output generated by your +encode test. + +5. Add a test case to krb5_decode_leak.c, following the examples of +existing test cases there. + +Following these steps will not ensure the correctness of your +translation of the ASN.1 module to macro invocations; it only lets us +detect unintentional changes to the encodings after they are defined. +For that, you should use a different tool such as asn1c. There is +currently no blueprint for doing this; we should create one. + +Dirty tricks +------------ + +In rare cases you may want to represent the raw DER encoding of a +value in the C structure. If so, you can use DEFCOUNTEDDERTYPE (or +more likely, the existing der_data type descriptor). The encoder and +decoder will throw errors if the wire encoding doesn't have a valid +outermost tag, so be sure to use valid DER encodings in your test +cases (see ktest_make_sample_algorithm_identifier for an example). + +Conversely, the ASN.1 module may define an OCTET STRING wrapper around +a DER encoding which you want to represent as the decoded value. (The +existing example of this is in PKINIT hash agility, where the +PartyUInfo and PartyVInfo fields of OtherInfo are defined as octet +strings which contain the DER encodings of KRB5PrincipalName values.) +In this case you can use a DEFTAGGEDTYPE wrapper like so: + + DEFTAGGEDTYPE(descname, UNIVERSAL, PRIMITIVE, ASN1_OCTETSTRING, 0, + basedesc) + +Limitations +----------- + +We cannot currently encode or decode SET or SET OF types. + +We cannot model self-referential types (like "MATHSET ::= SET OF +MATHSET"). + +If a sequence uses an optional field which is a choice field (without +a context tag wrapper), or an optional field which uses a stored DER +encoding (again, without a context tag wrapper), our decoder may +assign a value to the choice or stored-DER field when the correct +behavior is to skip that field and assign the value to a subsequent +field. It should be very rare for ASN.1 modules to use choice or open +types this way. + +For historical interoperability reasons, our decoder the indefinite +length form for constructed tags, which is allowed by BER but not DER. +We still require the primitive forms of basic scalar types, however, +so we do not accept all BER encodings of ASN.1 values. + +Debugging +--------- + +If you're looking at a stack trace with a bunch of ASN.1 encoder or +decoder calls at the top, here are some things which might help with +debugging: + +1. You may have noticed that the entry point into the encoder is +defined by a macro like MAKE_CODEC. Don't worry about this; those +macros just define thin wrappers around k5_asn1_full_encode and +k5_asn1_full_decode. + +2. If you're in the encoder, look for stack frames in +encode_sequence(), and print the value of i within those stack frames. +You should be able to subtract 1 from those values and match them up +with the sequence field offsets in asn1_k_encode.c for the type being +encoded. For example, if an as-req is being encoded and the i values +(starting with the one closest to encode_krb5_as_req) are 4, 2, and 2, +you could match those up as following: + +* as_req_encode wraps untagged_as_req, whose field at offset 3 is the + descriptor for kdc_req_4, which wraps kdc_req_body. + +* kdc_req_body is a function wrapper around kdc_req_hack, whose field + at offset 1 is the descriptor for req_body_1, which wraps + opt_principal. + +* opt_principal wraps principal, which wraps principal_data, whose + field at offset 1 is the descriptor for princname_1. + +* princname_1 is a sequence of general strings represented in the data + and length fields of the krb5_principal_data structure. + +So the problem would likely be in the data components of the client +principal in the kdc_req structure. + +3. If you're in the decoder, look for stacks frames in +decode_sequence(), and again print the values of i. You can match +these up just as above, except without subtracting 1 from the i +values.