pgc 1.0b Reference Manual by Julian Graham Introduction ************ pgc, the Protein Geometry Calculator, is an interpreter for a simple, expression-based language, resembling GNU bc in its interface, and intended for use in performing calculations related to molecular geometry. While it does not, in itself, provide facilities for performing such calculations, in its current form it provides interfaces for software packages that do; its structure allows data from these packages to be imported, manipulated, and exported for the use of other packages. This manual documents the syntax of the language, the library of functions available to users, and methods for extending this library. Table of contents ***************** * Building and installing pgc * Using pgc * Invokation from the command prompt * Format of the configuration file * Language basics * Data types * Nil * Numerical types: int, float * Records * Strings * Vectors * Arithmetic operators * Logical operators * Loops and conditionals * Variable bindings * Variables * The special 'result' variable * Subscripts and record fields * Functions * Errors * pgc built-in functions * describe * float * float_to_string * int * int_to_string * print * record_type * vector_sum * vector_type * version * Current pgc library interfaces * libproteingeometry * LSQRMS * Inside pgc * Garbage collection * Hash tables * Extending pgc * Adding new functions * Linking with external libraries Building and installing pgc *************************** The system requirements for pgc are as follows: - GNU make - A working C compiler (preferably GNU gcc) - An sh-compatible shell - Zero or more of pgc's supported external packages: - libproteingeometry 2.0 - lsqrms 4.0.3 pgc is intended to be installed using the included autoconf build system: If you're installing on one of autoconf's supported systems (and, chances are, you are -- it supports most UNIX-like platforms and even some not-very-much-like-UNIX platforms), all you have to do is run the "configure" script in the top level of the source code tree. For each software package supported by pgc, "configure" will determine whether or not it is already installed; if it is, the script will properly configure the build process to include pgc's interface for it. Following this, run "make", which will compile and link pgc, and then run "make install", which will install pgc in the standard location for your system (on most Unices, this will be something like /usr/local/bin). Depending on your system type, you may need to have administratrive privileges to perform that last step. Using pgc ********* Invokation from the command prompt ================================== The format of command line arguments to pgc is defined as follows: pgc [-c (--config) conf_filename] [-h (--help)] [source_filename] ... The -c option allows the user to specify an alternate configuration file -- the default configuration file is pgc.conf. Only one configuration file may be specified, so each successive -c option overwrites the name of the configuration file. The -h option causes pgc to print out a brief help message and exit -- no further processing will be done. Command line arguments not preceded by -c or -h will be interpreted as the names of files containing stored pgc expression lists. A list of these files will be compiled during command line argument parsing. Before pgc begins accepting input from stdin, the contents of each file in the list will be evaluated (including having variable bindings added to the environment). Format of the configuration file ================================ The pgc configuration file allows a user to define a set of constants to be used by functions that make use of complicated and rarely-changed parameters. For example, the packing_eff function (available via the optional support for the libproteingeometry package) needs to know the locations of files containing definitions of atoms, residues, and standard volumes. Rather than hard-coding the locations of these files into pgc itself, a user can leave them in the configuration file, where they will be accessible to any functions that looks up the default values in the configuration table. The configuration file is a series of new-line terminated strings in the following format: key = value where key is a series of characters that does not contain whitespace or the '=' character or the '\' character unescaped -- these can be escaped by preceding them with a '\', and value is a series of characters of any length with no restrictions on content, except that they are terminated by a newline. Any preceding whitespace is stripped from value before it is stored in the configuration table. Each successive presence of a given key in the configuration file causes the previous value to be overwritten. Language basics *************** The language pgc interprets is a loosely functional language. As such, a 'program' written for pgc is an expression, and has a value. The kinds of expressions pgc knows how to evaluate are described below. Expressions may be strung together into a list, each separated by a semicolon (in this way, the general syntax of pgc resembles that of various widely-used programming languages, such as C and Perl). The value of a list of semicolon-separated expressions is the value of the final expression in the list. 1; 2; 3; 4 /* This expression has the value 4 */ Expressions may also be nested within parentheses -- evaluation follows a PEMDAS-style order-of-operations schema, so parenthesized sub-expressions will be evaluated first. 4 * (1 + 2) /* Naturally, the value is 12, not 6 */ Comments ======== pgc supports a nested comment system with syntax similar to C. Comments begin with the character sequence '/*'; following this sequence, pgc will ignore everything up to and including the character sequence '*/'. Additionally, comments may be nested inside other comments. pgc will keep track of the level of nesting, so that the closing sequence will only cause pgc to resume evaluation if it is closing the top level of nested comments. For example: 4 * /* Comment 1 /* Comment 2 Close 2 */ Close 1 */ (1 + 2) ^ ^ | | Closes the second level of comments | Closes the top level of comments Data types ========== There are six data types available to the user. They are: nil: a type for representing nothing, or a null value. Typically, nil is used as the result type for functions useful purely for their side-effects, such as print (). int: a type for representing signed integers; the size of this type is determined by the architecture and the C library under which pgc is compiled. (Typically, it is 32 bits.) ints may be written, as one might expect, as a string of the digits from 0 to 9. float: a type for representing floating point numbers. The precision and size of this type are determined by the implementation of double-precision floating point numbers under the architecture and C library of the target platform. string: a type for representing null-terminated sequences of characters. The kinds of characters that can be represented in a string is determined, to some extent, by the platform's architecture and C library implementation, but, at the very least, will contain the low ASCII characters (i.e., those numbering from 0 to 127). You can denote a string by enclosing the desired character sequence in double-quote characters -- "hello", for example, is the string containing the sequence of characters 'h', 'e', 'l', 'l', and 'o'. To include the '"' character in your string, you must escape (prefix) it with the '\' (back-slash) character. To include a newline, use the sequence '\n'. To include the escape character '\' itself, use the sequence '\\'. record: a type for representing amalgams of these six core pgc data types; records are pgc's answer to structs in C. Every record has a record type, a descriptive string identifier describing its contents; a list of named, read-only variables called "public fields," each with a type from the list of core pgc data types; and a hidden value accessible only by pgc function code. Records cannot be declared explicitly by the user -- they must be generated and have their public fields initialized by pgc function code. vector: a type for representing lists of other data types. The positions of data elements within a vector is fixed: That is, consecutive reads of the same position in a vector will always return the same value. Vectors also have a fixed size that is determined at the time of their creation. The type of the elements in a vector must all be the same -- though you should note that vector size does not have anything to do with its type, so it is perfectly acceptable to define a vector containing vectors of different sizes; in addition, the record type of a record does not have anything to do with the pgc's recognizing it as a record, so it is also perfectly acceptable to define a vector of records that have different record types. The syntax for defining a vector is as follows: [item1, item2, item3, ...] where item1, item2, etc. are the values with which to initialize the vector. The vector's type is initialized to the basic type of the first element. Arithmetic operators ==================== The following binary arithmetic operators are available: '+': Gives the sum of the number on the left with the number on the right. This operator is overloaded for both ints and floats, and for combinations of the two. If either number is a float, the result will be a float. The '+' operator is also overloaded for strings -- the result of applying this operator to a pair of strings will be the concatenation of the string on the right to the end of the string on the left. '-': Gives the difference between the number on the left and the number on the right. This operator is overloaded for both ints and floats, and for combinations of the two. If either number is a float, the result will be a float. '*': Gives the product of the number on the left with number on the right. This operator is overloaded for both ints and floats, and for combinations of the two. If either number is a float, the result will be a float. '/': Gives the quotient of the division of the number on the left by the number on the right. If the number on the right is zero, then this operator will issue a division-by-zero error. Otherwise, the result will be a float. '%': Gives the modulus of the number on the left by the number on the right. Both numbers must be ints or pgc will issue a type error. '^': Gives the result of taking the number on the left to a power given by the number on the right. This operator is overloaded for both ints and floats, and for combinations of the two. If the number on the left is negative, and the number on the right is a non-integral value, pgc will issue an arithmetic error. Logical operators ================= The following binary logical operators are available: '==': If the value on the left is equal to the value on the right, then the result of this operator is one (1). Otherwise, it is zero (0). Equality is determined as follows: Two strings are equal if they are equal by character-by-character comparison. Two '!=': If the value on the left is not equal to the value on the right, then the result of this operator is one (1). Otherwise, it is zero (0). '<': If the number on the left is less than the number on the right, then the result of this operator is one (1). Otherwise, it is zero (0). '>': If the number on the left is greater than to the number on the right, then the result of this operator is one (1). Otherwise, it is zero (0). '<=': If the number on the left is less than or equal to the number on the right, then the result of this operator is one (1). Otherwise, it is zero (0). '>=': If the number on the left is greater than or equal to the number on the right, then the result of this operator is one (1). Otherwise, it is zero (0). Loops and conditionals ====================== Loops ----- pgc provides two different types of loops: The do-while (or while-do) and the foreach. The syntax for do-while (and while-do) loops is as follows: do (expression_body) while (conditional_expression) while (conditional_expression) do (expression_body) pgc cycles through the loop, evaluating expression_body until conditional_expression evaluates to zero (0). Naturally, if the type of conditional_expression is anything besides int, pgc will give a type error. The syntax for a foreach loop is as follows: foreach var in (vector_expression) (expression_body) To determine the value of a foreach loop expression, pgc iterates over the contents of vector_expression, binding each element in turn to var -- creating, in the process, a new environment -- and then evaluates expression_body within the context of this new environment. The type of vector_expression must be vector, or pgc will give a type error. pgc uses the environment generated by evaluating the expression body of a loop in its next iteration through the loop, so it is possible to have loops generate cumulative values, such as sums over the contents of vectors (though this operation is already provided in the vector_sum function), like so: j = 0; k = [1, 2, 3]; l = 0; while (l < 3) do (j = j + k[l]) Conditionals ------------ pgc supports two methods for conditional evaluation of expressions, the if construct and the if-else construct. The if construct has the following syntax: if (conditional_expression) (result_expression) where conditional_expression is a value of type int. If conditional_expression evaluates to a non-zero value, the value of the if-expression is the value of result_expression (if conditional_expression is non-integral, pgc will issue a type error). Otherwise, the value of the if-expression is nil. The if-else construct has the following syntax: if (conditional_expression) (true_expression) else (false_expression) where the behavior is the same as that of if, with the following exceptions: if the value of conditional_expression is zero, the value of the if-else construct is equal to the value of false_expression; true_expression and false_expression must have the same type, or pgc will complain (type error). Variable bindings ================= Variables --------- It is often convenient, if not necessary, to associate a name with a particular piece of data. pgc provides this functionality in the form of variable bindings. If you are at all familiar with the C language, you may be accustomed to declaring your variables the first time you use them; in pgc, a variable is declared the first time you assign a value to it. If you use a variable in an expression before you've assigned a value to it, pgc will complain. You may also be used to variables having a fixed type -- that is, that you must declare a variable to hold a certain type of data and that any values you assign to this variable must be of that type. In pgc, the types of variables may change: A variable's type is recorded the first time you assign a value to it; assigning a value of a different type will change the variable's type: a = 2; /* a holds the value 2 and has type int */ a = "a" /* a now holds the string "a" and has type string */ All these changing types might be a problem if pgc weren't a pass-by-value language -- this means that the variables in pgc expressions are evaluated to retrieve their values before the expression is evaluated. So tricks like this: a = 1; b = [a, 2, 3]; a = 0; does not change the values of the vector stored in b. As mentioned above, once you have declared a variable, you can use it in an expression, but you can also use it as an expression: a = 3.141; a is a perfectly valid expression. It has the value 3.141. The special 'result' variable ----------------------------- Every pgc expression has a value. At certain times during evaluation, pgc will bind this value to a special variable named 'result'. pgc performs this binding when it encounters a semicolon, indicating the next expression in a sequence of expressions. So, for the expression: "hello"; "goodbye" upon finding the semicolon between "hello" and "goodbye," pgc would bind the value of the previous expression in the sequence ("hello") to the result variable. Using the result variable allows you to perform computations a line at a time from the keyboard, without having to worry about binding your results to variables. For example, you might type: 4 + 4 (which has the int value 8) and then type: result + 2 (which will have the int value 10). The result variable is also unique in that it is read-only. Attempting to explicitly assign a value to it will result in an error. Subscripts and public fields ---------------------------- When complex values, such as vectors or records, are bound to variables, it is still necessary to access their simply-typed components: In other words, we need a way to get at a particular element of a vector or a record. To this end, pgc supports subscripts and public fields, methods for retrieving values from, respectively, vectors and records. The syntax for subscripts is as follows: x[n] where x is a vector and n is integer between, inclusively, 0 and one less than the length of the vector. For a vector of size 4, for example, the integers 0 through 3 are valid subscripts: 0 refers to the first element, 3 to the fourth. Subscripts (and field references) can only be applied to vectors that have been bound to variables. So: ([1,2,3])[0] does not evaluate to 1 -- it's an error. Instead, write: a = [1,2,3]; a[0] or even [1,2,3]; result[0] Keep in mind, however, that [1,2,3]; result[0] = 4 is illegal, because of result's read-only property. (And you'd never want to write that anyway, since the result of "result[0] = 4" is would be nil, and you'd have no way of ever seeing your changes to the vector -- or the vector itself.) The syntax for a public field reference is as follows: x.fieldname where x is a record and fieldname is the name of a field listed in the record's type definition in the record table -- pgc will complain if it can't find a definition for the record's record type, or if fieldname is not one of the record's named public fields. The rule for vectors mentioned above also holds for records. That is: (record_function ()).fieldname is illegal, even if the function record_function () returns a record value with a record type containing a field named fieldname. Instead, write: a = record_function (); a.fieldname Since a record is a read-only data type, you may not re-assign the values of its public fields. Functions ========= Much as in popular languages like C, functions in pgc are blocks of pre-written code that compute a value based on a list of zero or more argument values. Every pgc function has an entry in a table of function definitions. This entry consists of a name, the type of value the function returns, and a list of the types of the arguments the function requires. For a single function, these details are fixed -- unlike functions in C, pgc functions may not take a variable-length list of arguments, and the basic return type is always the same (though the vector type or record type of, respectively, a returned vector or record may be different between two calls to the same function, depending on how the function is written). Syntax for function call expressions is as follows: fun (arg1, arg2, arg3, ...) where fun is the name of the function and arg1, arg2, arg3, etc. are arguments of the type and number specified in the function definition. Refer to the sections "pgc built-in functions" and "current pgc library interfaces" for information on the external functions available to users. Note that the namespaces used by functions, variables, and record types are all distinct, so it is perfectly alright to declare a variable that has the same name as a function or record type -- pgc will figure out which one you mean based on the way you use it. Errors ====== When pgc detects an error in an expression during evaluation, it responds by displaying an error message and setting the value of the current expression to nil. Note that this behavior may trigger further errors, as in the following example: 4 + (5 + "2") pgc will report two type errors during the evaluation of this expression -- first when it tries to compute the sum of an int (5) with a string ("2"), then when it tries to compute the sum of an int (4) with nil, the result of the error in the previous set. A list of the errors generated by pgc and a brief description of their meaning follows: * lexical error: During initial lexical analysis, pgc encountered a character or combination of character that it didn't recognize * unexpected end of input: During lexical analysis, pgc encountered an unclosed comment or string -- that is, the input ended before pgc read a closing '*/' or '"' * syntax error: pgc couldn't parse an expression into an acceptable form -- usually caused by mismatched parentheses or a missing operator * type error: pgc encountered an expression containing the wrong type of data for a particular operator or construct. You tried to use the '+' operator to sum two nils, for example, or an int and a vector * operator cannot compare this type: pgc found an expression requesting a comparison of a type that it does not know how to compare -- the use of the '>' operator to compare two strings, for example * divide by zero: Self-explanatory -- the right-hand side of a division expression evaluated to 0 (or 0.0) * modulo zero: Self-explanatory -- the right-hand side of a modulus expression evaluated to 0 * result is imaginary (NaN): The result of evaluating an expression containing an exponent resulted in the floating point value "nan," or Not a Number. This occurs most often when you raise a negative value to a non-integral power * wrong type for vector: pgc noticed an attempt to assign a value to a position in a vector (during declaration or afterwards) in which the type of the value and the vector type of the vector were not the same * subscript out of range for vector: pgc noticed an attempt, by way of a subscript, to refer to a value in a vector outside of the range (either less than zero or too large) of the vector length * subscript expression for vector must be of type int: Self-explanatory -- the type of the expression in the subscript field did not evaluate to int * record fields are read-only: pgc noticed an attempt to assign a value to one of the public fields in a record * no field in definition for record type: Either pgc could not find the field you referred to in the record type definition or it could not even find the record type definition of the parent record in the record table * unbound identifier: You referred to a variable without first assigning a value to it * wrong argument type for function: One of the arguments passed to a function as part of a function expression was not consistent with the prototype of the function in the function table * wrong number of arguments for function: You passed the wrong number of arguments, either too few or too many, to a function as part of a function expression * undefined function: pgc could not find the name of the function you referred to in the function table * cannot assign to result: pgc detected an attempt to assign a value to the special, read-only result variable * unknown error: A rare beastie -- one of pgc's internal pieces reported an error, but it wasn't one that the error handler understood. Probably cause for a bug report... pgc built-in functions ********************** describe ======== Return type: nil Argument format: describe (string record_type_name) Description: Given a string corresponding to the name of a record type present in the record table, describe () prints a list of the public fields defined for that record type. If the record type has no public fields, or if the record type is not defined, the message printed by describe () will indicate as such. float ===== Return type: float Argument format: float (int int_value) Description: float () converts the integer value given in int_value to a pgc floating point value. float_to_string =============== Return type: string Argument format: float_to_string (float float_value) Describe: float_to_string () returns a string corresponding to the floating point value given by float_value. The conversion from floating point value to string is accomplished via the printf function in the C standard library, so the result may be "inf" or "nan" or any of the possible conversion results specified for that function. int === Return type: int Argument format: int (float float_value) Description: int () converts the floating point value given in int_value to a pgc integer value. int_to_string ============= Return type: string Argument format: int_to_string (int int_value) Describe: int_to_string () returns a string corresponding to the integer value given by int_value. print ===== Return type: nil Argument format: print (string print_string) Description: print () displays the contents of the string given by print_string on stdout. Note that print appends a newline to the print_string before printing it; this is so stdout may be flushed properly before the next expression is evaluated. record_type =========== Return type: string Argument format: record_type (record input_record) Description: record_type () returns a string corresponding to the record type of the record given by input_record. Note that just because a record has a type (and all records *should* have a type), it doesn't mean that this type has a definition in the record table. See the notes on *Inside pgc for more information. vector_sum ========== Return type: float Argument format: vector_sum (vector num_vector) Description: If the vector type of the vector num_vector is int or float, vector_sum () returns a floating point value corresponding to the sum of the values in the vector. If the vector type is not a numerical type, vector_sum () prints an error message and returns the floating point value "NaN" (Not a Number). vector_type =========== Return type: string Argument format: vector_sum (vector input_vector) Description: vector_sum () returns a string corresponding to the vector type of the vector given by input_vector. version ======= Return type: string Argument format: version () Description: version () returns a string corresponding to the version number of the currently executing pgc interpreter. Current pgc library interfaces ****************************** libproteingeometry ================== pgc provides interfaces for the libproteingeometry package written by Dr. Mark Gerstein at Yale University. These functions perform various calculations related to macromolecular motion and geometry. See http://www.molmovdb.org/geometry/ for more information on this package. read_pdb -------- Return type: record Argument format: read_pdb (string pdb_filename) Description: version () returns a string corresponding to the version number of the currently executing pgc interpreter. If successful, read_pdb () returns a record of type "pdb_record" containing the data from the pdb file. If the file could not be opened or read, or was not in the correct format, read_pdb () returns a record of type "null". volume ------ Return type: vector Argument format: volume (record input_pdb) Description: volume () uses the Voronoi method to calculate the volumes of the set of atoms described by the record input_pdb -- input_pdb must have the record type "pdb_record," or volume will complain (). On success, it returns a vector of type vector -- each sub-vector of the return value corresponds to a residue from the pdb file represented by the record, in the same order that they appear in the file -- the first vector in the return value corresponds to the residue in the pdb file with residue number 1, and the first element in this vector is the Voronoi volume of the first atom in this residue. On error, the function returns a zero-element vector. surface ------- Return type: vector Argument format: surface (record input_pdb) Description: surface () returns a vector of sub-vectors in the same foramt as the volume () function above, in which each element represents the surface area, in square angstroms, of an atom from the pdb file read into input_pdb. input_pdb must have record type "pdb_record" or surface () will print an error and return a zero-element vector. packing_eff ----------- Return type: vector Argument format: packing_eff (record input_pdb) Description: Like the two functions above, packing_eff () returns a vector of vectors, each corresponding to a residue in the pdb file represented by input_pdb. Each element in one of these sub-vectors will be the ratio of atom volume computed by the volume () function to the reference volume. packing_eff () requires that the keys "types-file", "residues-file", and "stdvols-file" be assigned, in the configuration file, values corresponding, respectively, to the full paths for an atom types definition file, a residue type definition file, and a standard volumes file. See the documentation for libproteingeometry for more information. input_pdb must have record type "pdb_record" or packing_eff will print an error and return a zero-element vector. packing_eff_advanced -------------------- Return type: vector Argument format: packing_eff (string atom_defs_filename, string residue_defs_filename, string standard_volumes_filename, record input_pdb) Description: packing_eff_advanced () is identical to the packing_eff () function described above, except that it allows the user to specify filenames, in the form of string values, for, respectively, the atom types definition file, the residue type definition file, and the standard volumes file. LSQRMS ====== Interfaces for the LSQRMS package written by Vadim Alexandrov. The following interfaces are provided: lsqrms ------ Return type: float Argument format: lsqrms (string query_filename, string target_filename) Description: lsqrms () returns a floating point value corresponding to the minimum distance between the structures specfied in the pdb file indicated by query_filename and those in the file indicated by target_filename, under least-squares moving structure fitting. Inside pgc ********** Garbage collection ================== pgc allocates relatively large quantities of memory, especially during its type-checking / evaluation phase. To make sure its memory resources are used efficiently, pgc implements a simple garbage collection scheme based on reference counting. Every time pgc generates a new environment, as the result of evaluating an expression, the variable bindings from the old environment are initialized as pointers to the bindings from the old environment. The actual data structure used for storing a variable binding includes a reference count that is incremented whenever the binding is copied and decremented whenever a copy goes out of scope -- this can happen when the variable is bound to a new value or the environment is deleted. Hash tables =========== Several internal data structures in pgc are maintained as hash tables. The hash function used to index these tables was taken from: [JENK97] Bob Jenkins. "Algorithm Alley: Hash Functions". Dr. Dobb's Journal. Semptember 1997. Extending pgc ************* pgc was written with the expectation that its users would want to add functionality to the code. Adding new functions ==================== To add a new function to pgc, you're going to have to modify a few files. Here are the ones that most need your attention: pgc_function.c: To the body of the function init_function_table (), add a call to the function add_function (), which has the following prototype: void add_function (char *function_name, void *function_pointer, int return_type, int arg_count, ...) Where: function_name is the name of your new function, function_pointer is a pointer to your function, return_type is a member of the enumeration pgc_types (defined in pgc.h), arg_count is the number of arguments your function takes. Following arg_count, you must pass add_function a list of types for the arguments to your function. These types must be specified as enumeration values (again, from pgc_types in pgc.h). This adds a prototype for your function to the function table and makes it available to pgc. For example pgc_add_function ("print", pgc_internal_print, PGC_NIL, 1, PGC_STRING); creates a prototype for a function named "print", whose interface function is pgc_internal_print (). print () returns nil and takes a single argument, a string. For more examples, see the init_function_table () code in pgc_function.c pgc_function_interface.h: If you have added a file containing prototypes for your new functions, you must #include it here. Your function must be prototyped in the following way: void *pgc_interface_mylibrary_myfunction (void **args); (The naming scheme pgc_interface_LIBRARYNAME_FUNCTIONNAME is suggested to avoid namespace collisions.) pgc's function-handling code guarantees that, if your function actually gets called, it will be passed the correct number of arguments and that the arguments will be of the correct type. You will have to case them into the types you are expecting, like so: void * -> int * for PGC_INT void * -> double * for PGC_FLOAT void * -> struct pgc_record * for PGC_RECORD void * -> char * for PGC_STRING void * -> struct pgc_vector * for PGC_VECTOR PGC_NIL is passed as a void pointer to NULL. In return for this guarantee, you must in turn guarantee that your function returns a void * to heap memory -- that means you have to malloc the memory you use to store the result of your function. So constructs like "return 4" are out of the question. Instead, you have to do this: int *result = malloc (sizeof (int)); *result = 4; return (void *) result; pgc will free the memory used by this result when the value goes out of scope, so memory leaks are not a worry. Some notes about special pgc data types: PGC_RECORD: The public fields available in a record live in the array of void pointers named "public_fields." To accurately dereference these, you must either look up the record definition in the record_table to find out the types of these public fields or simply know what the types of these fields are. In addition to the read-only public fields provided by a record, pgc provides a special field in the pgc_record struct that can be used to any type of data; it is called "struct_data." pgc makes no guarantees about this field; functions can trash it with impunity, and the fact that a record has a particular type doesn't mean the struct_data can be safely dereferenced to any particular type. Plus, when a record is copied, this field of the struct is copied using memcpy () in C -- if it contains complicated pointer relationships, they may get broken in the process. However, you can use this field of pgc_record to store any actual struct data that your functions need to work with, so it is somewhat useful, especially for storing data too complicated to be represented by pgc's basic types. PGC_VECTOR: Within the struct pgc_vector, the field vector_type, holds the type (via the pgc_types enumeration) of every element in the vector. The field vector_length holds the number of elements in the vector. Finally, the array of void pointers vector holds the actual values in the vector. pgc guarantees that for all of these fields will be consistent: That is, the vector field actually does contain vector_length number of elements, and each one of these may be safely dereferenced as the type given by vector_type. Two caveats: Firstly, for vectors with vector type PGC_RECORD, pgc only promises that all the elements in the vector have type PGC_RECORD, not that the record type is the same for all elements. Secondly, for vectors with vector type PGC_VECTOR, pgc only promises that all the elements are vectors; it makes no guarantees about the types or lengths of these sub-vectors. Linking with external libraries =============================== In many situations, you will want pgc to provide support for programs or libraries that may not be installed on other users' systems. If you plan on distributing your changes to pgc, you will probably want to provide some facility for helping pgc figure out whether or not a particular piece of software is installed on a target system. pgc uses the GNU packages autoconf and automake for these determinations; a complete description of this software is beyond the scope of this manual, but the cut-and-paste solutions below may be of use to you. To test for the existence of a library, use the following code snippets: To configure.ac, add: AC_CHECK_LIB([libraryname], [libraryfunction], [AC_DEFINE([HAVE_LIBRARY], [1], [library description]) AM_CONDITIONAL([LIBRARY], [true])], [AM_CONDITIONAL([LIBRARY], [false])]) where libraryname is the actual filename of your library -- remove any file extension, the directory prefix (typically "/usr/lib/"), and the "lib" prefix. So, for the C math library, /usr/lib/libm.so, libraryname would be "m"; libraryfunction is the name of a unique public function in your library, so the configure script can verify that the library in question is the correct one; LIBRARY is a variable name of your own choosing to be used by automake for doing conditional compilation; and library description is a description of the library, to be included for human-readable purposes in the config.h file generated by the "configure" script. To src/Makefile.am, add: if LIBRARY pgc_SOURCES += pgc_function_interface_YOURSOURCECODENAME.c pgc_LDADD += -llibraryname endif where YOURSOURCECODENAME is the suffix of the new source code file (if any) to be included if the library is found by the configuration script -- it is recommended that you use the "pgc_function_interface_" prefix for the sake of organizational clarity. Testing for the existence of a program is much the same, even though it looks more complicated. Use the following code snippets: To configure.ac, add: AC_PATH_PROG([program_path_variable], [program_filename], [no]) if test $program_path_variable != no; then AC_DEFINE_UNQUOTED([PGC_PROGRAM_FULL_PATH], ["$program_path_variable"], [full path to program]) AM_CONDITIONAL([PROGRAM], [true]) AC_DEFINE([HAVE_PROGRAM], [1], [lsqrms]) else AM_CONDITIONAL([PROGRAM], [false]) fi where program_path_variable is a variable name of your own choosing to be used to store the full path to the program; program_filename is the simple filename of the program -- any directory prefix (e.g., "/usr/local/bin/") removed; PGC_PROGRAM_FULL_PATH is a variable name of your own choosing that will be #defined for you in the config.h file generated by the "configure" script -- you can use it for making exec () calls to the progam within your new functions; PROGRAM is a variable name of your own choosing to be used by automake for doing conditional compilation; and HAVE_PROGRAM is another variable of your own choosing that will #defined in config.h if the program is found successfully. To src/Makefile.am, add: if PROGRAM pgc_SOURCES += pgc_function_interface_YOURSOURCECODENAME.c endif