PyShottab

Developer Documentation:


This page is intended to give an overview of some of the more interesting components of the
PyShottab utility, most of which are reusable in many different types of Python applications.

TODO: rename this to "Interesting Python Modules".


Interesting Modules:

The rest of ths page's intended audience is for Python programmers wishing to use some of the
reusable components of the PyShottab package.  This is a more technical discussion, and perhaps
not appropriate for beginning Python programmers - but these modules have easily understood
interfaces which should be easy to handle.

FileIO:

    This module provides (among other things) a class called LazyLineReader - this uses the
    mmap module and some simplistic caching logic to provide a sequence-like object
    whose array index (eg a[10]) is the line of the file to be read.  The data is converted
    using a (user supplied) regular expression, giving a list of converted items for this line.

    Essentially, one can treat a structured text file as a two-dimensional array of values.

    So line[0] is actually a sequence.  If there were three ints converted through the regular
    expression, line[0] could be (123, 456, 789).  The regular expression is actually a verbose
    regular expressions - allowing comments and multiple lines, and is also expanded with
    the string interpolation library, allowing "$foo(123)" to be called, and replace it's output
    in the string.

    Please see TextDataDialog below for a wxWindows based preview/editor of this
    expression.

    For sequential access, exact line indexing is assured

    For files with exactly the same line length, this module provides extremely fast access to
    any line in the file.  If the file has varying line length that is aproximately the same for
    most lines, this class provides approximate line seeking - if the data is sorted, the caller
    can (for example) make use of a binary search to give O(lg n) seek time to anywhere in the
    file.  For text files where the usual mode of operation to skip to a line number is to
    read in N-1 lines then to read the target line (which is of course a O(n) strategy, meaning that
    the processing time scales linearly with the size of the input data).

    NOTE: this seeking is approximate for non-uniform line length - this means file[100000]
    might not be the 100000'th line (but will be close), so the calling application must
    be aware of this fact and capable of handling it (such as an index tag at the start of the line).
    Also note that for sequential access, the line number IS exact (but this gives no real advantage
    over readline except that is uses mmap and handles large files).

    Using a binary search and the approximate seek strategy, this allows one to seek to appropriate
    positions in an appropriately formatted text file of extremely large length with a logarithmic
    running time O(lg n) which is far less than the linear time for the line-skipping approach.

    One must be aware of the different way in which this module is used - the maximum line is
    approximated - but not exact, so one should ensure the caller is capable of handling (and
    perhaps recovering from) the LazyEOF exception being thrown.

    In comparison on a 2 gigabyte text file, this module was able to seek to the proper position
    in well under half a second, while the line-skipping technique took over 20 minutes (I stopped it
    rather than wait).  Testing seems to indicate an extremely favorable scalability rate.

    Please see shotCreate.py - specifically the TimeReader subclass for an example of how this
    is used.

Options:

    This module builds upon the ConfigParser object to allow one to load and save "options" - that is
    key/object pars to a .ini style file.  In this module, one simply defines the option names and types
    as class variables, and these are loaded from the config file when an Options object is created.

    NOTE: currently the PyShottab logic is inline with this class, but can be easily removed/customized
    in a subclass approach. 

    Usage is as simple as (given that sampleOption is a class variable with Options):
        options = Options("test.ini")
        print option.sampleOption

        option.sampleOption = 123
        option.save("new.ini")

    Note also that if sampleOption is an int, and a string is saved in the file, an exception is generated
    upon construction of the object (this is a good thing).


The following are some useful GUI modules when using wxWindows - see guiComponents.py:

NumValidator:

    This is similar to the various validator classes of Qt - it allows one to constrain the input of the user
    such that only valid numeric data can be entered.  If an invalid character is entered, the system beeps.

    It is generic in the sense that the input type and allowed characters are specified in the constructor -
    to validate an int, one would do:
       NumValidator(int, string.digits + "+-")
   
    Since using this validator requires more work with the wxTextCtrl, I provide two sample ones for
    Int's and Floats that perform all the work (see below).

IntEditCtrl:

    This simply subclasses wxTextCtrl and uses NumValidator to provide an elegant Integer only text input.

IntEditCtrl:

    This simply subclasses wxTextCtrl and uses NumValidator to provide an elegant Float only text input.

Other modules (within guiComponents.py) of interest include Lat/Lon editors, Date editors (day hour
minute second format), and a very interesting dialog which allows one to describe (in very generic terms)
the data format of a text file (see below).

TextDataDialog:

    This class allows the definition of the format of a wide class of text files that store data in a line by line
    format.  This definition is provided by a special kind of regular expression - a verbose regular expression   
    (see Python docs) with "String Interpolation" which allows the use of predefined "helper" functions
    which expand into complex regular expressions.  This functionality is provided by the LazyLineReader
    class (see above).

    Also provided is a window to test the input on the first 100 lines of the file - in this way the user can
    tailor the expression to match the input properly.

    Please see PyShottab screenshots - specifically the Navigation Data Dialog - this is an instance of the
    TextDataDialog.

    This class has a dependency on the LazyLineReader, which handles the conversion and loading of data.

    In this fashion, a structured text file of almost any format can be converted on the fly (generically and
    transparently).  See shotGui.py for an example of how this is used.

--

On behalf on the Dalhousie Department of Oceanography

Dave LeBlanc
dleblanc (at) cs.dal.ca
February, 2003