Package glue :: Package ligolw :: Module dbtables
[hide private]
[frames] | no frames]

Module dbtables

source code

This module provides an implementation of the Table element that uses a database engine for storage. On top of that it then re-implements a number of the tables from the lsctables module to provide versions of their methods that work against the SQL database.


Version: git id 8cbd1b7187ce3ed9a825d6ed11cc432f3cfde9a5

Date: 2017-12-05 15:29:36 +0000

Author: Kipp Cannon <kipp.cannon@ligo.org>

Classes [hide private]
  DBTable
A special version of the Table class using an SQL database for storage.
  ProcessParamsTable
  TimeSlideTable
Functions [hide private]
 
build_indexes(connection, verbose=False)
Using the how_to_index annotations in the table class definitions, construct a set of indexes for the database at the given connection.
source code
 
connection_db_type(connection)
A totally broken attempt to determine what type of database a connection object is attached to.
source code
 
discard_connection_filename(filename, working_filename, verbose=False)
Like put_connection_filename(), but the working copy is simply deleted instead of being copied back to its original location.
source code
 
get_column_info(connection, table_name)
Return an in order list of (name, type) tuples describing the columns in the given table.
source code
 
get_connection_filename(filename, tmp_path=None, replace_file=False, verbose=False)
Utility code for moving database files to a (presumably local) working location for improved performance and reduced fileserver load.
source code
 
get_table_names(connection)
Return a list of the table names in the database.
source code
 
get_xml(connection, table_names=None)
Construct an XML document tree wrapping around the contents of the database.
source code
 
idmap_create(connection)
Create the _idmap_ table.
source code
 
idmap_get_max_id(connection, id_class)
Given an ilwd:char ID class, return the highest ID from the table for whose IDs that is the class.
source code
 
idmap_get_new(connection, old, tbl)
From the old ID string, obtain a replacement ID string by either grabbing it from the _idmap_ table if one has already been assigned to the old ID, or by using the current value of the Table instance's next_id class attribute.
source code
 
idmap_reset(connection)
Erase the contents of the _idmap_ table, but leave the table in place.
source code
 
idmap_sync(connection)
Iterate over the tables in the database, ensure that there exists a custom DBTable class for each, and synchronize that table's ID generator to the ID values in the database.
source code
 
install_signal_trap(signums=(15, 20), retval=1)
Installs a signal handler to erase temporary scratch files when a signal is received.
source code
 
put_connection_filename(filename, working_filename, verbose=False)
This function reverses the effect of a previous call to get_connection_filename(), restoring the working copy to its original location if the two are different.
source code
 
set_temp_store_directory(connection, temp_store_directory, verbose=False)
Sets the temp_store_directory parameter in sqlite.
source code
 
uninstall_signal_trap(signums=None)
Undo the effects of install_signal_trap().
source code
 
use_in(ContentHandler)
Modify ContentHandler, a sub-class of glue.ligolw.LIGOLWContentHandler, to cause it to use the DBTable class defined in this module when parsing XML documents.
source code
Variables [hide private]
  TableByName = {'process_params': <class 'glue.ligolw.dbtables....
  __package__ = 'glue.ligolw'
  _sql_coldef_pattern = re.compile(r'\s*(?P<name>\w+)\s+(?P<type...
  _sql_create_table_pattern = re.compile(r'(?i)CREATE\s+TABLE\s+...
  origactions = {}
  temporary_files = {}
  temporary_files_lock = <thread.lock object at 0x7f390525fad0>
Function Details [hide private]

connection_db_type(connection)

source code 

A totally broken attempt to determine what type of database a connection object is attached to. Don't use this.

The input is a DB API 2.0 compliant connection object, the return value is one of the strings "sqlite3" or "mysql". Raises TypeError when the database type cannot be determined.

discard_connection_filename(filename, working_filename, verbose=False)

source code 

Like put_connection_filename(), but the working copy is simply deleted instead of being copied back to its original location. This is a useful performance boost if it is known that no modifications were made to the file, for example if queries were performed but no updates.

Note that the file is not deleted if the working copy and original file are the same, so it is always safe to call this function after a call to get_connection_filename() even if a separate working copy is not created.

get_xml(connection, table_names=None)

source code 

Construct an XML document tree wrapping around the contents of the database. On success the return value is a ligolw.LIGO_LW element containing the tables as children. Arguments are a connection to to a database, and an optional list of table names to dump. If table_names is not provided the set is obtained from get_table_names()

idmap_create(connection)

source code 

Create the _idmap_ table. This table has columns "old" and "new" containing text strings mapping old IDs to new IDs. The old column is a primary key (is indexed and must contain unique entries). The table is created as a temporary table, so it will be automatically dropped when the database connection is closed.

This function is for internal use, it forms part of the code used to re-map row IDs when merging multiple documents.

idmap_get_max_id(connection, id_class)

source code 

Given an ilwd:char ID class, return the highest ID from the table for whose IDs that is the class.

Example:

>>> event_id = ilwd.ilwdchar("sngl_burst:event_id:0")
>>> print(event_id)
sngl_inspiral:event_id:0
>>> max_id = get_max_id(connection, type(event_id))
>>> print(max_id)
sngl_inspiral:event_id:1054

idmap_get_new(connection, old, tbl)

source code 

From the old ID string, obtain a replacement ID string by either grabbing it from the _idmap_ table if one has already been assigned to the old ID, or by using the current value of the Table instance's next_id class attribute. In the latter case, the new ID is recorded in the _idmap_ table, and the class attribute incremented by 1.

This function is for internal use, it forms part of the code used to re-map row IDs when merging multiple documents.

idmap_reset(connection)

source code 

Erase the contents of the _idmap_ table, but leave the table in place.

This function is for internal use, it forms part of the code used to re-map row IDs when merging multiple documents.

install_signal_trap(signums=(15, 20), retval=1)

source code 

Installs a signal handler to erase temporary scratch files when a signal is received. This can be used to help ensure scratch files are erased when jobs are evicted by Condor. signums is a squence of the signals to trap, the default value is a list of the signals used by Condor to kill and/or evict jobs.

The logic is as follows. If the current signal handler is signal.SIG_IGN, i.e. the signal is being ignored, then the signal handler is not modified since the reception of that signal would not normally cause a scratch file to be leaked. Otherwise a signal handler is installed that erases the scratch files. If the original signal handler was a Python callable, then after the scratch files are erased the original signal handler will be invoked. If program control returns from that handler, i.e. that handler does not cause the interpreter to exit, then sys.exit() is invoked and retval is returned to the shell as the exit code.

Note: by invoking sys.exit(), the signal handler causes the Python interpreter to do a normal shutdown. That means it invokes atexit() handlers, and does other garbage collection tasks that it normally would not do when killed by a signal.

Note: this function will not replace a signal handler more than once, that is if it has already been used to set a handler on a signal then it will be a no-op when called again for that signal until uninstall_signal_trap() is used to remove the handler from that signal.

Note: this function is called by get_connection_filename() whenever it creates a scratch file.

put_connection_filename(filename, working_filename, verbose=False)

source code 

This function reverses the effect of a previous call to get_connection_filename(), restoring the working copy to its original location if the two are different. This function should always be called after calling get_connection_filename() when the file is no longer in use.

During the move operation, this function traps the signals used by Condor to evict jobs. This reduces the risk of corrupting a document by the job terminating part-way through the restoration of the file to its original location. When the move operation is concluded, the original signal handlers are restored and if any signals were trapped they are resent to the current process in order. Typically this will result in the signal handlers installed by the install_signal_trap() function being invoked, meaning any other scratch files that might be in use get deleted and the current process is terminated.

uninstall_signal_trap(signums=None)

source code 

Undo the effects of install_signal_trap(). Restores the original signal handlers. If signums is a sequence of signal numbers only the signal handlers for those signals will be restored (KeyError will be raised if one of them is not one that install_signal_trap() installed a handler for, in which case some undefined number of handlers will have been restored). If signums is None (the default) then all signals that have been modified by previous calls to install_signal_trap() are restored.

Note: this function is called by put_connection_filename() and discard_connection_filename() whenever they remove a scratch file and there are then no more scrach files in use.

use_in(ContentHandler)

source code 

Modify ContentHandler, a sub-class of glue.ligolw.LIGOLWContentHandler, to cause it to use the DBTable class defined in this module when parsing XML documents. Instances of the class must provide a connection attribute. When a document is parsed, the value of this attribute will be passed to the DBTable class' .__init__() method as each table object is created, and thus sets the database connection for all table objects in the document.

Example:

>>> import sqlite3
>>> from glue.ligolw import ligolw
>>> class MyContentHandler(ligolw.LIGOLWContentHandler):
...     def __init__(self, *args):
...             super(MyContentHandler, self).__init__(*args)
...             self.connection = sqlite3.connection()
...
>>> use_in(MyContentHandler)

Multiple database files can be in use at once by creating a content handler class for each one.


Variables Details [hide private]

TableByName

Value:
{'process_params': <class 'glue.ligolw.dbtables.ProcessParamsTable'>,
 'time_slide': <class 'glue.ligolw.dbtables.TimeSlideTable'>}

_sql_coldef_pattern

Value:
re.compile(r'\s*(?P<name>\w+)\s+(?P<type>\w+)[^,]*')

_sql_create_table_pattern

Value:
re.compile(r'(?i)CREATE\s+TABLE\s+(?P<name>\w+)\s*\((?P<coldefs>.*)\)'\
)