Developer Documentation:
This page is intended to give an overview of some of the more interesting
components of the
PyShottab utility, most of which are reusable in many different types of
Python applications.
TODO: rename this to "Interesting Python Modules".
Interesting Modules:
The rest of ths page's intended audience is for Python programmers wishing
to use some of the
reusable components of the PyShottab package. This is a more technical
discussion, and perhaps
not appropriate for beginning Python programmers - but these modules have
easily understood
interfaces which should be easy to handle.
FileIO:
This module provides (among other things) a class called
LazyLineReader - this uses the
mmap module and some simplistic caching logic to provide
a sequence-like object
whose array index (eg a[10]) is the line of the file to
be read. The data is converted
using a (user supplied) regular expression, giving a list
of converted items for this line.
Essentially, one can treat a structured text file as a
two-dimensional array of values.
So line[0] is actually a sequence. If there were
three ints converted through the regular
expression, line[0] could be (123, 456, 789). The
regular expression is actually a verbose
regular expressions - allowing comments and multiple lines,
and is also expanded with
the string interpolation library, allowing "$foo(123)"
to be called, and replace it's output
in the string.
Please see TextDataDialog below for a wxWindows based
preview/editor of this
expression.
For sequential access, exact line indexing is assured
For files with exactly the same line length, this module
provides extremely fast access to
any line in the file. If the file has varying line
length that is aproximately the same for
most lines, this class provides approximate line seeking
- if the data is sorted, the caller
can (for example) make use of a binary search to give
O(lg n) seek time to anywhere in the
file. For text files where the usual mode of operation
to skip to a line number is to
read in N-1 lines then to read the target line (which
is of course a O(n) strategy, meaning that
the processing time scales linearly with the size of the
input data).
NOTE: this seeking is approximate for non-uniform
line length - this means file[100000]
might not be the 100000'th line (but will be close), so
the calling application must
be aware of this fact and capable of handling it (such
as an index tag at the start of the line).
Also note that for sequential access, the line number
IS exact (but this gives no real advantage
over readline except that is uses mmap and handles large
files).
Using a binary search and the approximate seek strategy,
this allows one to seek to appropriate
positions in an appropriately formatted text file of extremely
large length with a logarithmic
running time O(lg n) which is far less than the linear
time for the line-skipping approach.
One must be aware of the different way in which this module
is used - the maximum line is
approximated - but not exact, so one should ensure the
caller is capable of handling (and
perhaps recovering from) the LazyEOF exception being thrown.
In comparison on a 2 gigabyte text file, this module was
able to seek to the proper position
in well under half a second, while the line-skipping technique
took over 20 minutes (I stopped it
rather than wait). Testing seems to indicate an
extremely favorable scalability rate.
Please see shotCreate.py - specifically the TimeReader
subclass for an example of how this
is used.
Options:
This module builds upon the ConfigParser object to allow
one to load and save "options" - that is
key/object pars to a .ini style file. In this module,
one simply defines the option names and types
as class variables, and these are loaded from the config
file when an Options object is created.
NOTE: currently the PyShottab logic is inline with
this class, but can be easily removed/customized
in a subclass approach.
Usage is as simple as (given that sampleOption is a class
variable with Options):
options = Options("test.ini")
print option.sampleOption
option.sampleOption = 123
option.save("new.ini")
Note also that if sampleOption is an int, and a string
is saved in the file, an exception is generated
upon construction of the object (this is a good thing).
The following are some useful GUI modules when using wxWindows - see guiComponents.py:
NumValidator:
This is similar to the various validator classes of Qt
- it allows one to constrain the input of the user
such that only valid numeric data can be entered. If
an invalid character is entered, the system beeps.
It is generic in the sense that the input type and allowed
characters are specified in the constructor -
to validate an int, one would do:
NumValidator(int, string.digits + "+-")
Since using this validator requires more work with the
wxTextCtrl, I provide two sample ones for
Int's and Floats that perform all the work (see below).
IntEditCtrl:
This simply subclasses wxTextCtrl and uses NumValidator
to provide an elegant Integer only text input.
IntEditCtrl:
This simply subclasses wxTextCtrl and uses NumValidator
to provide an elegant Float only text input.
Other modules (within guiComponents.py) of interest include Lat/Lon editors,
Date editors (day hour
minute second format), and a very interesting dialog which allows one to
describe (in very generic terms)
the data format of a text file (see below).
TextDataDialog:
This class allows the definition of the format of a wide
class of text files that store data in a line by line
format. This definition is provided by a special
kind of regular expression - a verbose regular expression
(see Python docs) with "String Interpolation" which allows
the use of predefined "helper" functions
which expand into complex regular expressions. This
functionality is provided by the LazyLineReader
class (see above).
Also provided is a window to test the input on the first
100 lines of the file - in this way the user can
tailor the expression to match the input properly.
Please see PyShottab screenshots
- specifically the Navigation Data Dialog - this is an instance of the
TextDataDialog.
This class has a dependency on the LazyLineReader, which
handles the conversion and loading of data.
In this fashion, a structured text file of almost any
format can be converted on the fly (generically and
transparently). See shotGui.py for an example of
how this is used.
--
On behalf on the Dalhousie
Department of Oceanography
Dave LeBlanc
dleblanc (at) cs.dal.ca
February, 2003