pgc 1.0b Reference Manual                                      by Julian Graham

Introduction
************

	pgc, the Protein Geometry Calculator, is an interpreter for a simple,
expression-based language, resembling GNU bc in its interface, and intended for
use in performing calculations related to molecular geometry.  While it does
not, in itself, provide facilities for performing such calculations, in its
current form it provides interfaces for software packages that do; its 
structure allows data from these packages to be imported, manipulated, and 
exported for the use of other packages.  This manual documents the syntax of 
the language, the library of functions available to users, and methods for 
extending this library.


Table of contents
*****************

	* Building and installing pgc
	* Using pgc
		* Invokation from the command prompt
		* Format of the configuration file
	* Language basics
		* Data types
			* Nil
			* Numerical types: int, float
			* Records
			* Strings
			* Vectors
		* Arithmetic operators
		* Logical operators
		* Loops and conditionals
		* Variable bindings
			* Variables
			* The special 'result' variable
			* Subscripts and record fields
		* Functions
		* Errors
	* pgc built-in functions
		* describe
		* float
		* float_to_string
		* int
		* int_to_string
		* print
		* record_type
		* vector_sum
		* vector_type
		* version
	* Current pgc library interfaces
		* libproteingeometry
		* LSQRMS
	* Inside pgc
		* Garbage collection
		* Hash tables
	* Extending pgc
		* Adding new functions
		* Linking with external libraries


Building and installing pgc
***************************

	The system requirements for pgc are as follows:
	
	- GNU make
	- A working C compiler (preferably GNU gcc)
	- An sh-compatible shell
	- Zero or more of pgc's supported external packages:
		- libproteingeometry 2.0
		- lsqrms 4.0.3

	pgc is intended to be installed using the included autoconf build 
system: If you're installing on one of autoconf's supported systems (and,
chances are, you are -- it supports most UNIX-like platforms and even some
not-very-much-like-UNIX platforms), all you have to do is run the "configure"
script in the top level of the source code tree.  For each software package
supported by pgc, "configure" will determine whether or not it is already
installed; if it is, the script will properly configure the build process to
include pgc's interface for it.  Following this, run "make", which will compile
and link pgc, and then run "make install", which will install pgc in the 
standard location for your system (on most Unices, this will be something like
/usr/local/bin).  Depending on your system type, you may need to have 
administratrive privileges to perform that last step.


Using pgc
*********

Invokation from the command prompt
==================================

	The format of command line arguments to pgc is defined as follows:

	pgc [-c (--config) conf_filename] [-h (--help)] [source_filename] ...

The -c option allows the user to specify an alternate configuration file -- the
default configuration file is pgc.conf.  Only one configuration file may be 
specified, so each successive -c option overwrites the name of the 
configuration file.  The -h option causes pgc to print out a brief help
message and exit -- no further processing will be done.  Command line arguments
not preceded by -c or -h will be interpreted as the names of files containing
stored pgc expression lists.  A list of these files will be compiled during
command line argument parsing.  Before pgc begins accepting input from stdin, 
the contents of each file in the list will be evaluated (including having 
variable bindings added to the environment).


Format of the configuration file
================================

	The pgc configuration file allows a user to define a set of constants
to be used by functions that make use of complicated and rarely-changed 
parameters.  For example, the packing_eff function (available via the optional
support for the libproteingeometry package) needs to know the locations of
files containing definitions of atoms, residues, and standard volumes.  Rather
than hard-coding the locations of these files into pgc itself, a user can
leave them in the configuration file, where they will be accessible to any
functions that looks up the default values in the configuration table.
	The configuration file is a series of new-line terminated strings in
the following format:

	key = value

where key is a series of characters that does not contain whitespace or the
'=' character or the '\' character unescaped -- these can be escaped by
preceding them with a '\', and value is a series of characters of any length
with no restrictions on content, except that they are terminated by a newline.
Any preceding whitespace is stripped from value before it is stored in the 
configuration table.  Each successive presence of a given key in the 
configuration file causes the previous value to be overwritten.


Language basics
***************

	The language pgc interprets is a loosely functional language.  As such,
a 'program' written for pgc is an expression, and has a value.  The kinds of
expressions pgc knows how to evaluate are described below.  Expressions may be
strung together into a list, each separated by a semicolon (in this way, the
general syntax of pgc resembles that of various widely-used programming
languages, such as C and Perl).  The value of a list of semicolon-separated
expressions is the value of the final expression in the list.  

	1; 2; 3; 4	/* This expression has the value 4 */

Expressions may also be nested within parentheses -- evaluation follows a 
PEMDAS-style order-of-operations schema, so parenthesized sub-expressions will 
be evaluated first.

	4 * (1 + 2)	/* Naturally, the value is 12, not 6 */


Comments
========

	pgc supports a nested comment system with syntax similar to C.  
Comments begin with the character sequence '/*'; following this sequence, pgc 
will ignore everything up to and including the character sequence '*/'. 
Additionally, comments may be nested inside other comments.  pgc will keep
track of the level of nesting, so that the closing sequence will only cause
pgc to resume evaluation if it is closing the top level of nested comments.
For example:

	4 * /* Comment 1 /* Comment 2 Close 2 */ Close 1 */ (1 + 2)
                                               ^          ^
                                               |          |
                                 Closes the second level of comments
                                                          |
                                          Closes the top level of comments


Data types
==========

There are six data types available to the user.  They are:

nil: a type for representing nothing, or a null value.  Typically, nil is used
  as the result type for functions useful purely for their side-effects, such 
  as print ().

int: a type for representing signed integers; the size of this type is
  determined by the architecture and the C library under which pgc is compiled.
  (Typically, it is 32 bits.)  ints may be written, as one might expect, as a
  string of the digits from 0 to 9.

float: a type for representing floating point numbers.  The precision and size
  of this type are determined by the implementation of double-precision
  floating point numbers under the architecture and C library of the target
  platform.

string: a type for representing null-terminated sequences of characters.  The
  kinds of characters that can be represented in a string is determined, to
  some extent, by the platform's architecture and C library implementation,
  but, at the very least, will contain the low ASCII characters (i.e., those
  numbering from 0 to 127).  You can denote a string by enclosing the desired
  character sequence in double-quote characters -- "hello", for example, is the
  string containing the sequence of characters 'h', 'e', 'l', 'l', and 'o'.  To
  include the '"' character in your string, you must escape (prefix) it with 
  the '\' (back-slash) character.  To include a newline, use the sequence '\n'.   To include the escape character '\' itself, use the sequence '\\'.

record: a type for representing amalgams of these six core pgc data types;
  records are pgc's answer to structs in C.  Every record has a record type, a
  descriptive string identifier describing its contents; a list of named,
  read-only variables called "public fields," each with a type from the list
  of core pgc data types; and a hidden value accessible only by pgc function
  code.  Records cannot be declared explicitly by the user -- they must be
  generated and have their public fields initialized by pgc function code.  

vector: a type for representing lists of other data types.  The positions of
  data elements within a vector is fixed: That is, consecutive reads of the
  same position in a vector will always return the same value.  Vectors also 
  have a fixed size that is determined at the time of their creation.  The type
  of the elements in a vector must all be the same -- though you should note
  that vector size does not have anything to do with its type, so it is
  perfectly acceptable to define a vector containing vectors of different 
  sizes; in addition, the record type of a record does not have anything to do 
  with the pgc's recognizing it as a record, so it is also perfectly acceptable
  to define a vector of records that have different record types.  The syntax 
  for defining a vector is as follows:

	[item1, item2, item3, ...]

  where item1, item2, etc. are the values with which to initialize the vector.
  The vector's type is initialized to the basic type of the first element.


Arithmetic operators
====================

The following binary arithmetic operators are available:

'+': Gives the sum of the number on the left with the number on the right.
  This operator is overloaded for both ints and floats, and for combinations
  of the two.  If either number is a float, the result will be a float.  The
  '+' operator is also overloaded for strings -- the result of applying this
  operator to a pair of strings will be the concatenation of the string on the
  right to the end of the string on the left.

'-': Gives the difference between the number on the left and the number on the
  right.  This operator is overloaded for both ints and floats, and for 
  combinations of the two.  If either number is a float, the result will be a 
  float.

'*': Gives the product of the number on the left with number on the right.
  This operator is overloaded for both ints and floats, and for combinations
  of the two.  If either number is a float, the result will be a float.

'/': Gives the quotient of the division of the number on the left by the number
  on the right.  If the number on the right is zero, then this operator will
  issue a division-by-zero error.  Otherwise, the result will be a float.

'%': Gives the modulus of the number on the left by the number on the right.
  Both numbers must be ints or pgc will issue a type error.

'^': Gives the result of taking the number on the left to a power given by the
  number on the right.  This operator is overloaded for both ints and floats, 
  and for combinations of the two.  If the number on the left is negative, and
  the number on the right is a non-integral value, pgc will issue an arithmetic
  error.


Logical operators
=================

The following binary logical operators are available:

'==': If the value on the left is equal to the value on the right, then the
  result of this operator is one (1).  Otherwise, it is zero (0).  Equality is
  determined as follows:

	Two strings are equal if they are equal by character-by-character
	  comparison.
	Two 

'!=': If the value on the left is not equal to the value on the right, then the
  result of this operator is one (1).  Otherwise, it is zero (0).

'<': If the number on the left is less than the number on the right, then the
  result of this operator is one (1).  Otherwise, it is zero (0).

'>': If the number on the left is greater than to the number on the right, then
  the result of this operator is one (1).  Otherwise, it is zero (0).

'<=': If the number on the left is less than or equal to the number on the 
  right, then the result of this operator is one (1).  Otherwise, it is zero 
  (0).

'>=': If the number on the left is greater than or equal to the number on the 
  right, then the result of this operator is one (1).  Otherwise, it is zero 
  (0).


Loops and conditionals
======================

Loops
-----

pgc provides two different types of loops: The do-while (or while-do) and the
foreach.  The syntax for do-while (and while-do) loops is as follows:

	do (expression_body) while (conditional_expression)
	while (conditional_expression) do (expression_body)

pgc cycles through the loop, evaluating expression_body until 
conditional_expression evaluates to zero (0).  Naturally, if the type of
conditional_expression is anything besides int, pgc will give a type error.

The syntax for a foreach loop is as follows:

	foreach var in (vector_expression) (expression_body)

To determine the value of a foreach loop expression, pgc iterates over the
contents of vector_expression, binding each element in turn to var -- creating,
in the process, a new environment -- and then evaluates expression_body within
the context of this new environment.  The type of vector_expression must be
vector, or pgc will give a type error.

pgc uses the environment generated by evaluating the expression body of a loop
in its next iteration through the loop, so it is possible to have loops
generate cumulative values, such as sums over the contents of vectors (though
this operation is already provided in the vector_sum function), like so:

	j = 0;
        k = [1, 2, 3];
	l = 0;
        while (l < 3) do (j = j + k[l])


Conditionals
------------

pgc supports two methods for conditional evaluation of expressions, the if
construct and the if-else construct.  The if construct has the following
syntax:

	if (conditional_expression) (result_expression)

where conditional_expression is a value of type int.  If conditional_expression
evaluates to a non-zero value, the value of the if-expression is the value of
result_expression (if conditional_expression is non-integral, pgc will issue a
type error).  Otherwise, the value of the if-expression is nil.  The if-else
construct has the following syntax:

	if (conditional_expression) (true_expression) else (false_expression)

where the behavior is the same as that of if, with the following exceptions:
if the value of conditional_expression is zero, the value of the if-else
construct is equal to the value of false_expression; true_expression and
false_expression must have the same type, or pgc will complain (type error).


Variable bindings
=================

Variables
---------

	It is often convenient, if not necessary, to associate a name with a
particular piece of data.  pgc provides this functionality in the form of
variable bindings.  If you are at all familiar with the C language, you may be
accustomed to declaring your variables the first time you use them; in pgc, a
variable is declared the first time you assign a value to it.  If you use a
variable in an expression before you've assigned a value to it, pgc will
complain.  You may also be used to variables having a fixed type -- that is,
that you must declare a variable to hold a certain type of data and that any
values you assign to this variable must be of that type.  In pgc, the types of
variables may change: A variable's type is recorded the first time you assign a
value to it; assigning a value of a different type will change the variable's
type:

	a = 2;		/* a holds the value 2 and has type int */
	a = "a"		/* a now holds the string "a" and has type string */

All these changing types might be a problem if pgc weren't a pass-by-value
language -- this means that the variables in pgc expressions are evaluated to
retrieve their values before the expression is evaluated.  So tricks like this:

	a = 1;
	b = [a, 2, 3];
	a = 0;

does not change the values of the vector stored in b.  
	As mentioned above, once you have declared a variable, you can use it
in an expression, but you can also use it as an expression:

	a = 3.141; a

is a perfectly valid expression.  It has the value 3.141. 


The special 'result' variable
-----------------------------

	Every pgc expression has a value.  At certain times during evaluation,
pgc will bind this value to a special variable named 'result'.  pgc performs
this binding when it encounters a semicolon, indicating the next expression in 
a sequence of expressions.  So, for the expression:

	"hello"; "goodbye"

upon finding the semicolon between "hello" and "goodbye," pgc would bind the
value of the previous expression in the sequence ("hello") to the result
variable.  Using the result variable allows you to perform computations a line
at a time from the keyboard, without having to worry about binding your results
to variables.  For example, you might type:

	4 + 4

(which has the int value 8) and then type:

	result + 2

(which will have the int value 10).  
	The result variable is also unique in that it is read-only.  Attempting
to explicitly assign a value to it will result in an error.


Subscripts and public fields
----------------------------

	When complex values, such as vectors or records, are bound to 
variables, it is still necessary to access their simply-typed components: In
other words, we need a way to get at a particular element of a vector or a
record.  To this end, pgc supports subscripts and public fields, methods for
retrieving values from, respectively, vectors and records.  The syntax for
subscripts is as follows:

	x[n]

where x is a vector and n is integer between, inclusively, 0 and one less than
the length of the vector.  For a vector of size 4, for example, the integers
0 through 3 are valid subscripts: 0 refers to the first element, 3 to the
fourth.  Subscripts (and field references) can only be applied to vectors that
have been bound to variables.  So:

	([1,2,3])[0]

does not evaluate to 1 -- it's an error.  Instead, write:

	a = [1,2,3]; a[0]

or even

	[1,2,3]; result[0]

Keep in mind, however, that

	[1,2,3]; result[0] = 4

is illegal, because of result's read-only property.  (And you'd never want to
write that anyway, since the result of "result[0] = 4" is would be nil, and
you'd have no way of ever seeing your changes to the vector -- or the vector
itself.)
	The syntax for a public field reference is as follows:

	x.fieldname

where x is a record and fieldname is the name of a field listed in the record's
type definition in the record table -- pgc will complain if it can't find a
definition for the record's record type, or if fieldname is not one of the
record's named public fields.  The rule for vectors mentioned above also holds
for records.  That is:

	(record_function ()).fieldname

is illegal, even if the function record_function () returns a record value with
a record type containing a field named fieldname.  Instead, write:

	 a = record_function (); a.fieldname

Since a record is a read-only data type, you may not re-assign the values of
its public fields.


Functions
=========

	Much as in popular languages like C, functions in pgc are blocks of
pre-written code that compute a value based on a list of zero or more argument
values.  Every pgc function has an entry in a table of function definitions.  
This entry consists of a name, the type of value the function returns, and a 
list of the types of the arguments the function requires.  For a single 
function, these details are fixed -- unlike functions in C, pgc functions may 
not take a variable-length list of arguments, and the basic return type is 
always the same (though the vector type or record type of, respectively, a 
returned vector or record may be different between two calls to the same 
function, depending on how the function is written).  Syntax for function call
expressions is as follows:

	fun (arg1, arg2, arg3, ...)

where fun is the name of the function and arg1, arg2, arg3, etc. are arguments
of the type and number specified in the function definition.  Refer to the 
sections "pgc built-in functions" and "current pgc library interfaces" for 
information on the external functions available to users.
	Note that the namespaces used by functions, variables, and record types
are all distinct, so it is perfectly alright to declare a variable that has the
same name as a function or record type -- pgc will figure out which one you 
mean based on the way you use it.


Errors
======

	When pgc detects an error in an expression during evaluation, it
responds by displaying an error message and setting the value of the current
expression to nil.  Note that this behavior may trigger further errors, as in
the following example:

	4 + (5 + "2")

pgc will report two type errors during the evaluation of this expression --
first when it tries to compute the sum of an int (5) with a string ("2"), then 
when it tries to compute the sum of an int (4) with nil, the result of the
error in the previous set.  A list of the errors generated by pgc and a brief
description of their meaning follows:

* lexical error: During initial lexical analysis, pgc encountered a character 
    or combination of character that it didn't recognize
* unexpected end of input: During lexical analysis, pgc encountered an unclosed
    comment or string -- that is, the input ended before pgc read a closing
    '*/' or '"'
* syntax error: pgc couldn't parse an expression into an acceptable form --
    usually caused by mismatched parentheses or a missing operator
* type error: pgc encountered an expression containing the wrong type of data
    for a particular operator or construct.  You tried to use the '+' operator
    to sum two nils, for example, or an int and a vector 
* operator cannot compare this type: pgc found an expression requesting a
    comparison of a type that it does not know how to compare -- the use of the
    '>' operator to compare two strings, for example
* divide by zero: Self-explanatory -- the right-hand side of a division 
    expression evaluated to 0 (or 0.0)
* modulo zero: Self-explanatory -- the right-hand side of a modulus expression
    evaluated to 0
* result is imaginary (NaN): The result of evaluating an expression containing
    an exponent resulted in the floating point value "nan," or Not a Number.
    This occurs most often when you raise a negative value to a non-integral
    power
* wrong type for vector: pgc noticed an attempt to assign a value to a position
    in a vector (during declaration or afterwards) in which the type of the
    value and the vector type of the vector were not the same
* subscript out of range for vector: pgc noticed an attempt, by way of a
    subscript, to refer to a value in a vector outside of the range (either 
    less than zero or too large) of the vector length
* subscript expression for vector must be of type int: Self-explanatory -- the
    type of the expression in the subscript field did not evaluate to int
* record fields are read-only: pgc noticed an attempt to assign a value to one
    of the public fields in a record
* no field in definition for record type: Either pgc could not find the field
    you referred to in the record type definition or it could not even find the
    record type definition of the parent record in the record table
* unbound identifier: You referred to a variable without first assigning a
    value to it
* wrong argument type for function: One of the arguments passed to a function
    as part of a function expression was not consistent with the prototype of
    the function in the function table
* wrong number of arguments for function: You passed the wrong number of
    arguments, either too few or too many, to a function as part of a function
    expression
* undefined function: pgc could not find the name of the function you referred
    to in the function table
* cannot assign to result: pgc detected an attempt to assign a value to the
    special, read-only result variable
* unknown error: A rare beastie -- one of pgc's internal pieces reported an
    error, but it wasn't one that the error handler understood.  Probably cause
    for a bug report...


pgc built-in functions
**********************

describe
========

  Return type: nil
  Argument format: describe (string record_type_name)

  Description: Given a string corresponding to the name of a record type
    present in the record table, describe () prints a list of the public
    fields defined for that record type.  If the record type has no public
    fields, or if the record type is not defined, the message printed by 
    describe () will indicate as such. 


float
=====

  Return type: float
  Argument format: float (int int_value)

  Description: float () converts the integer value given in int_value to a
    pgc floating point value.


float_to_string
===============

  Return type: string
  Argument format: float_to_string (float float_value)

  Describe: float_to_string () returns a string corresponding to the floating
    point value given by float_value.  The conversion from floating point
    value to string is accomplished via the printf function in the C standard 
    library, so the result may be "inf" or "nan" or any of the possible
    conversion results specified for that function.


int
===

  Return type: int
  Argument format: int (float float_value)

  Description: int () converts the floating point value given in int_value to a
    pgc integer value.


int_to_string
=============

  Return type: string
  Argument format: int_to_string (int int_value)

  Describe: int_to_string () returns a string corresponding to the integer
    value given by int_value.


print
=====

  Return type: nil
  Argument format: print (string print_string)

  Description: print () displays the contents of the string given by 
    print_string on stdout.  Note that print appends a newline to the
    print_string before printing it; this is so stdout may be flushed properly
    before the next expression is evaluated.


record_type
===========

  Return type: string
  Argument format: record_type (record input_record)
 
  Description: record_type () returns a string corresponding to the record type
    of the record given by input_record.  Note that just because a record has a
    type (and all records *should* have a type), it doesn't mean that this type
    has a definition in the record table.  See the notes on *Inside pgc for 
    more information.


vector_sum
==========

  Return type: float
  Argument format: vector_sum (vector num_vector)

  Description: If the vector type of the vector num_vector is int or float,
    vector_sum () returns a floating point value corresponding to the sum of
    the values in the vector.  If the vector type is not a numerical type,
    vector_sum () prints an error message and returns the floating point value
    "NaN" (Not a Number).


vector_type
===========

  Return type: string
  Argument format: vector_sum (vector input_vector)

  Description: vector_sum () returns a string corresponding to the vector type 
    of the vector given by input_vector.  


version
=======

  Return type: string
  Argument format: version ()

  Description: version () returns a string corresponding to the version number
    of the currently executing pgc interpreter.


Current pgc library interfaces
******************************

libproteingeometry
==================

	pgc provides interfaces for the libproteingeometry package written by
Dr. Mark Gerstein at Yale University.  These functions perform various
calculations related to macromolecular motion and geometry.  See
http://www.molmovdb.org/geometry/ for more information on this package.


read_pdb
--------

  Return type: record
  Argument format: read_pdb (string pdb_filename)

  Description: version () returns a string corresponding to the version number
    of the currently executing pgc interpreter.  If successful, read_pdb () 
    returns a record of type "pdb_record" containing the data from the pdb
    file.  If the file could not be opened or read, or was not in the correct
    format, read_pdb () returns a record of type "null".

volume
------

  Return type: vector
  Argument format: volume (record input_pdb)

  Description: volume () uses the Voronoi method to calculate the volumes of 
    the set of atoms described by the record input_pdb -- input_pdb must have 
    the record type "pdb_record," or volume will complain ().  On success, it 
    returns a vector of type vector -- each sub-vector of the return value 
    corresponds to a residue from the pdb file represented by the record, in
    the same order that they appear in the file -- the first vector in the 
    return value corresponds to the residue in the pdb file with residue 
    number 1, and the first element in this vector is the Voronoi volume of the
    first atom in this residue.  On error, the function returns a zero-element
    vector.


surface
-------

  Return type: vector
  Argument format: surface (record input_pdb)

  Description: surface () returns a vector of sub-vectors in the same foramt
    as the volume () function above, in which each element represents the
    surface area, in square angstroms, of an atom from the pdb file read into
    input_pdb.  input_pdb must have record type "pdb_record" or surface ()
    will print an error and return a zero-element vector.


packing_eff
-----------

  Return type: vector
  Argument format: packing_eff (record input_pdb)

  Description: Like the two functions above, packing_eff () returns a vector of
    vectors, each corresponding to a residue in the pdb file represented by
    input_pdb.  Each element in one of these sub-vectors will be the ratio of
    atom volume computed by the volume () function to the reference volume.
    packing_eff () requires that the keys "types-file", "residues-file", and 
    "stdvols-file" be assigned, in the configuration file, values 
    corresponding, respectively, to the full paths for an atom types definition
    file, a residue type definition file, and a standard volumes file.  See the
    documentation for libproteingeometry for more information.  input_pdb must
    have record type "pdb_record" or packing_eff will print an error and return
    a zero-element vector.


packing_eff_advanced
--------------------

  Return type: vector
  Argument format: packing_eff (string atom_defs_filename, 
                                string residue_defs_filename,
                                string standard_volumes_filename,
                                record input_pdb)

  Description: packing_eff_advanced () is identical to the packing_eff ()
    function described above, except that it allows the user to specify 
    filenames, in the form of string values, for, respectively, the atom types 
    definition file, the residue type definition file, and the standard volumes
    file.


LSQRMS
======

	Interfaces for the LSQRMS package written by Vadim Alexandrov.  The
following interfaces are provided:

lsqrms
------

  Return type: float
  Argument format: lsqrms (string query_filename, string target_filename)

  Description: lsqrms () returns a floating point value corresponding to the
    minimum distance between the structures specfied in the pdb file indicated 
    by query_filename and those in the file indicated by target_filename, under
    least-squares moving structure fitting.


Inside pgc
**********

Garbage collection
==================

	pgc allocates relatively large quantities of memory, especially during 
its type-checking / evaluation phase.  To make sure its memory resources are 
used efficiently, pgc implements a simple garbage collection scheme based on
reference counting.  Every time pgc generates a new environment, as the result
of evaluating an expression, the variable bindings from the old environment are
initialized as pointers to the bindings from the old environment.  The actual
data structure used for storing a variable binding includes a reference count
that is incremented whenever the binding is copied and decremented whenever a
copy goes out of scope -- this can happen when the variable is bound to a new
value or the environment is deleted.


Hash tables
===========

	Several internal data structures in pgc are maintained as hash tables.
The hash function used to index these tables was taken from:

	[JENK97] Bob Jenkins. "Algorithm Alley: Hash Functions". 
	Dr. Dobb's Journal. Semptember 1997.


Extending pgc
*************

	pgc was written with the expectation that its users would want to add
functionality to the code.


Adding new functions
====================

	To add a new function to pgc, you're going to have to modify a few
files.  Here are the ones that most need your attention:

pgc_function.c: To the body of the function init_function_table (), add a call 
  to the function add_function (), which has the following prototype:

	void add_function (char *function_name, void *function_pointer,
	                   int return_type, int arg_count, ...)

  Where: function_name is the name of your new function, function_pointer is a
  pointer to your function, return_type is a member of the enumeration
  pgc_types (defined in pgc.h), arg_count is the number of arguments your
  function takes.  Following arg_count, you must pass add_function a list of
  types for the arguments to your function.  These types must be specified as
  enumeration values (again, from pgc_types in pgc.h).  This adds a prototype
  for your function to the function table and makes it available to pgc.  For
  example

	pgc_add_function ("print", pgc_internal_print, PGC_NIL, 1, PGC_STRING);
 
  creates a prototype for a function named "print", whose interface function is
  pgc_internal_print ().  print () returns nil and takes a single argument, a
  string.  For more examples, see the init_function_table () code in
  pgc_function.c

pgc_function_interface.h: If you have added a file containing prototypes for
  your new functions, you must #include it here.

Your function must be prototyped in the following way:

	void *pgc_interface_mylibrary_myfunction (void **args);

(The naming scheme pgc_interface_LIBRARYNAME_FUNCTIONNAME is suggested to avoid
namespace collisions.)  pgc's function-handling code guarantees that, if your
function actually gets called, it will be passed the correct number of
arguments and that the arguments will be of the correct type.  You will have to
case them into the types you are expecting, like so:

	void * -> int *				for PGC_INT
	void * -> double *			for PGC_FLOAT
	void * -> struct pgc_record *		for PGC_RECORD
	void * -> char *			for PGC_STRING
	void * -> struct pgc_vector *		for PGC_VECTOR

PGC_NIL is passed as a void pointer to NULL.  In return for this guarantee, you
must in turn guarantee that your function returns a void * to heap memory -- 
that means you have to malloc the memory you use to store the result of your 
function.  So constructs like "return 4" are out of the question.  Instead, you
have to do this:

	int *result = malloc (sizeof (int));
	*result = 4;
	return (void *) result;

pgc will free the memory used by this result when the value goes out of scope,
so memory leaks are not a worry.  Some notes about special pgc data types:

  PGC_RECORD: The public fields available in a record live in the array of
    void pointers named "public_fields."  To accurately dereference these, you 
    must either look up the record definition in the record_table to find out 
    the types of these public fields or simply know what the types of these 
    fields are.  In addition to the read-only public fields provided by a 
    record, pgc provides a special field in the pgc_record struct that can be 
    used to any type of data; it is called "struct_data."  pgc makes no 
    guarantees about this field; functions can trash it with impunity, and the 
    fact that a record has a particular type doesn't mean the struct_data can 
    be safely dereferenced to any particular type.  Plus, when a record is
    copied, this field of the struct is copied using memcpy () in C -- if it
    contains complicated pointer relationships, they may get broken in the
    process.  However, you can use this field of pgc_record to store any actual
    struct data that your functions need to work with, so it is somewhat
    useful, especially for storing data too complicated to be represented by
    pgc's basic types.

  PGC_VECTOR: Within the struct pgc_vector, the field vector_type, holds the
    type (via the pgc_types enumeration) of every element in the vector.  The
    field vector_length holds the number of elements in the vector.  Finally,
    the array of void pointers vector holds the actual values in the vector.
    pgc guarantees that for all of these fields will be consistent: That is,
    the vector field actually does contain vector_length number of elements,
    and each one of these may be safely dereferenced as the type given by
    vector_type.  Two caveats: Firstly, for vectors with vector type 
    PGC_RECORD, pgc only promises that all the elements in the vector have type
    PGC_RECORD, not that the record type is the same for all elements. 
    Secondly, for vectors with vector type PGC_VECTOR, pgc only promises that
    all the elements are vectors; it makes no guarantees about the types or
    lengths of these sub-vectors.


Linking with external libraries
===============================

	In many situations, you will want pgc to provide support for programs
or libraries that may not be installed on other users' systems.  If you plan on
distributing your changes to pgc, you will probably want to provide some
facility for helping pgc figure out whether or not a particular piece of
software is installed on a target system.  pgc uses the GNU packages autoconf
and automake for these determinations; a complete description of this software
is beyond the scope of this manual, but the cut-and-paste solutions below may
be of use to you.  To test for the existence of a library, use the following
code snippets:

To configure.ac, add:

AC_CHECK_LIB([libraryname], [libraryfunction],
[AC_DEFINE([HAVE_LIBRARY], [1], [library description])
 AM_CONDITIONAL([LIBRARY], [true])],
[AM_CONDITIONAL([LIBRARY], [false])])

where libraryname is the actual filename of your library -- remove any file 
extension, the directory prefix (typically "/usr/lib/"), and the "lib" prefix.
So, for the C math library, /usr/lib/libm.so, libraryname would be "m"; 
libraryfunction is the name of a unique public function in your library, so the
configure script can verify that the library in question is the correct one; 
LIBRARY is a variable name of your own choosing to be used by automake for 
doing conditional compilation; and library description is a description of the 
library, to be included for human-readable purposes in the config.h file 
generated by the "configure" script.

To src/Makefile.am, add:

if LIBRARY
pgc_SOURCES += pgc_function_interface_YOURSOURCECODENAME.c
pgc_LDADD += -llibraryname
endif

where YOURSOURCECODENAME is the suffix of the new source code file (if any) to 
be included if the library is found by the configuration script -- it is
recommended that you use the "pgc_function_interface_" prefix for the sake of
organizational clarity.

	Testing for the existence of a program is much the same, even though it
looks more complicated.  Use the following code snippets:

To configure.ac, add:

AC_PATH_PROG([program_path_variable], [program_filename], [no])
if test $program_path_variable != no; then
  AC_DEFINE_UNQUOTED([PGC_PROGRAM_FULL_PATH], ["$program_path_variable"], 
  [full path to program])
  AM_CONDITIONAL([PROGRAM], [true])
  AC_DEFINE([HAVE_PROGRAM], [1], [lsqrms])
else
  AM_CONDITIONAL([PROGRAM], [false])
fi

where program_path_variable is a variable name of your own choosing to be used
to store the full path to the program; program_filename is the simple filename
of the program -- any directory prefix (e.g., "/usr/local/bin/") removed;
PGC_PROGRAM_FULL_PATH is a variable name of your own choosing that will be
#defined for you in the config.h file generated by the "configure" script -- 
you can use it for making exec () calls to the progam within your new 
functions; PROGRAM is a variable name of your own choosing to be used by
automake for doing conditional compilation; and HAVE_PROGRAM is another
variable of your own choosing that will #defined in config.h if the program is
found successfully.

To src/Makefile.am, add:

if PROGRAM
pgc_SOURCES += pgc_function_interface_YOURSOURCECODENAME.c
endif