Package pylal :: Module ligolw_sqlutils
[hide private]
[frames] | no frames]

Module ligolw_sqlutils

source code


Author: Collin Capano <cdcapano@physics.syr.edu>

Classes [hide private]
  aggregate_concatenate
This class builds on the concatenate method to allow string concatenation across multiple columns and rows.
  parse_param_ranges
  parse_coinc_options
  Summaries
This class stores information about the foreground and background in a database for making calculation of uncombined fars and combined fars quick and efficient.
  rank_stats
Class to return a rank for stats.
  sim_tag_proc_id_mapper
Class to map sim_proc_ids in the experiment summary table to simulation names and vice-versa.
  get_mapped_tables
Convenience class to retrieve all the tables mapped to a coinc_event_id.
  segdict_from_segment
Class to a build a segmentlist dict out of the entries in the segment and segment_definer table in the sqlite database.
Functions [hide private]
 
concatenate(*args)
SQLite doesn't have a tuple type built-in.
source code
 
validate_option(option, lower=True)
Strips and checks that there are no newlines, tabs, spaces, or semi-colons in the given option.
source code
 
del_rows_from_table(connection, del_table, del_table_id, join_conditions, del_filters=None, save_filters=None, verbose=False)
Deletes triggers from any specified table in the del_table option.
source code
 
get_tables_in_database(connection)
Gets the names of tables that are in the database.
source code
 
vacuum_database(connection, vacuum=None, verbose=None)
Remove empty space and defragment the database.
source code
 
get_column_names_from_table(connection, table_name)
Gets the column names from a table and returns them as a list.
source code
 
get_user_created_indices(connection, table_names)
Get all index names and associated SQL CREATE statements associated with tables in table_names.
source code
 
convert_duration(duration, convert_to)
Converts durations stored in the experiment_summary_table from seconds to other units of time.
source code
 
get_next_id(connection, table, id_column)
Gets the next available id in the specified id_column in the specified table.
source code
 
increment_id(last_id, nsteps=1)
Increments the given id by nsteps.
source code
 
end_time_in_ns(end_time, end_time_ns) source code
 
get_col_type(table_name, col_name, default='lstring')
Attempts to get column type from lsctables.py for the given table name and column name.
source code
 
create_column(connection, table_name, column_name)
Creates a column in the given table if it doesn't exist.
source code
 
clean_metadata(connection, key_tables, verbose=False)
Cleans metadata from tables that don't have process_ids in any of the tables listed in the key_tables list.
source code
 
clean_metadata_using_end_time(connection, key_table, key_column, verbose=False)
An alternate to clean_metadata, this cleans metadata from tables who's start/end_times don't encompass the end_times in the given table.
source code
 
get_process_info(connection, verbose=False, debug=False)
Create a map between process_ids so duplicate entries in numerous ligolw_xml tables can be removed without a loss of information.
source code
 
get_pids_to_update(cursor, programs)
Make a tuple of the process_ids from the temp table '_pidmap_' that need to be updated for a given list of programs.
source code
 
simplify_summ_tbls(connection, verbose=False, debug=False)
Delete duplicate information in summary tables according to the process_ids.
source code
 
update_pid_in_snglstbls(connection, verbose=False, debug=False)
Update process_ids for events in the sngl_inspiral &/or sngl_ringdown table.
source code
 
simplify_proc_tbls(connection, verbose=False, debug=False)
Delete duplicate rows in the process & process params table.
source code
 
join_experiment_tables_to_coinc_table(table)
Writes JOIN string to join the experiment, experiment_summary, and experiment_map tables to the specified table.
source code
 
join_experiment_tables_to_sngl_table(table)
Writes JOIN string to join the experiment, experiment_summary, and experiment_map tables to the specified table.
source code
 
clean_experiment_tables(connection, verbose=False)
Removes entries from the experiment, experiment_summary, and time_slide tables that have no events in them, i.e., that have no mapping to any coinc_event_ids via the experiment_map table.
source code
 
simplify_expr_tbl(connection, verbose=False, debug=False)
Cleaning up the experiment table by removing duplicate rows and remapping events to the appropriate experiment.
source code
 
simplify_exprsumm_tbl(connection, verbose=False, debug=False)
Cleaning up the experiment_summary and the experiment_map tables by removing duplicate rows and remapping events to the appropriate experiment.
source code
 
update_experiment_summ_nevents(connection, verbose=False)
Updates the number of events in the num_events column of the experiment_summary table.
source code
 
clean_using_coinc_table(connection, table_name, verbose=False, clean_experiment_map=True, clean_coinc_event_table=True, clean_coinc_definer=True, clean_coinc_event_map=True, clean_mapped_tables=True, selected_tables=[])
Clears experiment_map, coinc_event, coinc_event_map, and all tables pointing to the coinc_event_map of triggers that are no longer in the specified table.
source code
 
apply_inclusion_rules_to_coinc_table(connection, coinc_table, exclude_coincs=None, include_coincs=None, param_filters=None, verbose=False)
Clears the given table of coinc triggers falling outside of the desired ranges, as specified by parse_param_ranges and parse_coinc_opts.
source code
 
write_newstyle_coinc_def_entry(connection, description, search=None, search_coinc_type=None)
Adds a new entry to the coinc_definer_table.
source code
 
get_map_labels(connection)
Retrieves values in the description column of the coinc_definer table.
source code
 
get_coinc_types(connection)
Retrieves all of the distinct map-label and coinc-type pairs in the database.
source code
 
delete_map(connection, map_label)
Deletes all mappings that have the given map_label in the description column of the coinc_definer table.
source code
 
simplify_coincdef_tbl(connection, verbose=False, debug=False)
Remove duplicate entries in the coinc_definer table and update the coinc_event table with new coinc_def_ids.
source code
 
delete_coinc_type(connection, map_label, coincTables)
Deletes all mappings that have the given coinc type.
source code
 
add_coinc_event_entries(connection, process_id, coinc_def_id, time_slide_id, num_new_entries=1)
Writes N new entries in the coinc_event table, where N is given by num_new_entries.
source code
 
update_coinctab_nevents(connection)
Updates the nevents column based on what's in the coinc_event_map table.
source code
 
get_cem_table_names(connection)
Retrieves the all of the table names present in the coinc_event_map table.
source code
 
get_matching_tables(connection, coinc_event_ids)
Gets all the tables that are directly mapped to a list of coinc_event_ids.
source code
 
create_mapped_tables_func(connection, nargs=1)
Creates a function in the database called get_mapped_tables that allows one to quickly get all the mapped tables to a coinc_event_id.
source code
 
clean_mapped_event_tables(connection, tableList, raise_err_on_missing_evid=False, verbose=False)
Cleans tables given in tableList of events whose event_ids aren't in the coinc_event_map table.
source code
 
join_experiment_tables_to_coinc_inspiral()
Writes JOIN string to join the experiment, experiment_summary, and experiment_map table to the coinc_inspiral table.
source code
 
apply_inclusion_rules_to_coinc_inspiral(connection, exclude_coincs=None, include_coincs=None, param_filters=None, verbose=False)
Clears coinc_inspiral table of coinc triggers falling outside of the desired ranges, as specified by parse_param_ranges and parse_coinc_opts.
source code
 
clean_inspiral_tables(connection, verbose=False)
Clears experiment_map, coinc_event, coinc_event_map, and all tables pointing to the coinc_event_map of triggers that are no longer in the coinc_inspiral table.
source code
 
create_sim_rec_map_table(connection, simulation_table, recovery_table, map_label, ranking_stat=None)
Creates a temporary table in the sqlite database called sim_rec_map.
source code
 
simplify_sim_tbls(connection, verbose=False, debug=False)
Remove duplicates from simulation tables (sim_inspiral & sim_ringdown) if those tables exist in the database.
source code
 
simplify_segments_tbls(connection, verbose=False, debug=False)
Removing duplicate row in the segment, segment_definer and segment_summary tables as well as the associated entries in the process & process_params tables
source code
 
get_zero_lag_time_slide_ids(connection)
Gets zero-lag time_slide_id's from the time_slide_table.
source code
 
get_zero_lag_instrument_sets(connection)
Gets instrument sets from time slide table by using the ids of the zero-lag time-slides (Assumption is there is a zero-lag row in the time-slide table).
source code
 
get_instrument_sets_and_time_slide_ids(connection)
Gets all instrument sets available in the time slide table and gets all time-slide ids associated with that instrument set.
source code
 
simplify_timeslide_tbl(connection, verbose=False, debug=False)
Remove duplicate entries in the time_slide table and update entries in the the time_slide_id column of both the experiment_summary and coinc_event tables.
source code
 
simplify_vetodef_tbl(connection, verbose=False, debug=False)
Cleaning up the veto_definer table as well as the associated entries in the process & process_params tables
source code
Variables [hide private]
  __version__ = git_version.verbose_msg
A collection of utilities to assist in carrying out operations on a SQLite database containing lsctables.
Function Details [hide private]

concatenate(*args)

source code 

SQLite doesn't have a tuple type built-in. This can be frustrating if one
needs to compare values from multiple columns when doing queries. For example,
if one wanted to do something like:

connection.cursor().execute('''
    SELECT *
    FROM a 
    WHERE (a.val1, a.val2) IN (
        SELECT (b.val1, b.val2) 
        FROM b)
    ''')

an error would be raised.

This function tries to alleiviate the problem by giving the ability to concatenate
results from multiple columns into a single colon-seperated string. These strings can then be
compared directly. So, in the above example, one would do:

from pylal import ligolw_sqlutils as sqlutils
connection.create_function("concatenate", 2, sqlutils.concatenate)
connection.cursor().execute('''
    SELECT *
    FROM a 
    WHERE concatenate(a.val1, a.val2) IN (
        SELECT concatenate(b.val1, b.val2) 
        FROM b)
''')

Note that the create_function method must be called first with the number of 
values that will be passed to concatenate before using it in any query.

validate_option(option, lower=True)

source code 

Strips and checks that there are no newlines, tabs, spaces, or semi-colons in the given option. This should be used for options that will be plugged into sqlite statements to protect against injection attacks. If lower is set on, will also make all letters lower-case in the option.

get_user_created_indices(connection, table_names)

source code 

Get all index names and associated SQL CREATE statements associated with tables in table_names. Return a list of (idx_name, sql) tuples.

convert_duration(duration, convert_to)

source code 

Converts durations stored in the experiment_summary_table from seconds 
to other units of time.

@duration: duration to convert; assumed to be a float or long in seconds
@convert_to: the unit to convert to. Options are:
    's': to seconds - will just divide by 1.
        This can be useful if need to convert
        the duration from a long int to a float.
    'min': to minutes - will divide by 60.
    'hr': to hours - will divide by 3600.
    'days': to days - will divide by 86400.
    'yr': to years - will divide by 31557600. 
        This is the Julian year, which is the
        accepted astronomical year

get_col_type(table_name, col_name, default='lstring')

source code 

Attempts to get column type from lsctables.py for the given table name and column name. If the table doesn't exist in lsctables or the column doesn't exist in the lsctables definition of the table, returns the default type.

create_column(connection, table_name, column_name)

source code 

Creates a column in the given table if it doesn't exist. Note that table_name and column_name must be all lower-case.

clean_metadata(connection, key_tables, verbose=False)

source code 

Cleans metadata from tables that don't have process_ids in any of the tables
listed in the key_tables list.

@connection: connection to a sqlite database
@key_tables: list of tuples that must have the following order:
    (table, column, filter)
 where:
    table is the name of a table to get a save process id from,
    column is the name of the process_id column in that table
    (this doesn't have to be 'process_id', but it should be a
    process_id type),
    filter is a filter to apply to the table when selecting process_ids

get_process_info(connection, verbose=False, debug=False)

source code 

Create a map between process_ids so duplicate entries in numerous ligolw_xml tables can be removed without a loss of information. The tables the _pidmap_ is used to simplify are:

-- process, process_params -- filter, summ_value, search_summary, search_summvars -- sim_inspiral, sim_ringdown -- time_slide -- update the sim_proc_ids in the experiment_summary table

simplify_summ_tbls(connection, verbose=False, debug=False)

source code 

Delete duplicate information in summary tables according to the process_ids. tbl_name: filter, summ_value, search_summary, search_summvars

update_pid_in_snglstbls(connection, verbose=False, debug=False)

source code 

Update process_ids for events in the sngl_inspiral &/or sngl_ringdown table. This function does *not* remove duplicate events that have different event_ids.

simplify_proc_tbls(connection, verbose=False, debug=False)

source code 

Delete duplicate rows in the process & process params table. The temp table _pidmap_ created by the get_process_info function is dropped.

join_experiment_tables_to_coinc_table(table)

source code 

Writes JOIN string to join the experiment, experiment_summary, and experiment_map tables to the specified table. This allows querying across any of these tables.

join_experiment_tables_to_sngl_table(table)

source code 

Writes JOIN string to join the experiment, experiment_summary, and experiment_map tables to the specified table. This allows querying across any of these tables.

clean_experiment_tables(connection, verbose=False)

source code 

Removes entries from the experiment, experiment_summary, and time_slide tables that have no events in them, i.e., that have no mapping to any coinc_event_ids via the experiment_map table. Entries are only removed if none of the experiment_summ_ids associated with an experiment_id have coinc_events. In other words, Even if only one of the experiment_summ_ids associated with an experiment_id has an event, all of the experiment_summ_ids and experiment_ids associated with that event are saved. This perserves the background time and slide set associated with an experiment.

WARNING: This should only be used for purposes of scaling down a temporary database in prep. for xml extraction. In general, all experiment and time_slide entries should be left in the experiment tables even if they don't have events in them.

update_experiment_summ_nevents(connection, verbose=False)

source code 

Updates the number of events in the num_events column of the experiment_summary table. This should be used whenever coincs are deleted from the experiment_map table or when new files are added to a database.

clean_using_coinc_table(connection, table_name, verbose=False, clean_experiment_map=True, clean_coinc_event_table=True, clean_coinc_definer=True, clean_coinc_event_map=True, clean_mapped_tables=True, selected_tables=[])

source code 

Clears experiment_map, coinc_event, coinc_event_map, and all tables pointing to the coinc_event_map of triggers that are no longer in the specified table. Note that the experiment_summary, experiment, and time_slide_tables are left alone. This is because even if no events are left in an experiment, we still want info. about the experiment that was performed.

@connection to a sqlite database

apply_inclusion_rules_to_coinc_table(connection, coinc_table, exclude_coincs=None, include_coincs=None, param_filters=None, verbose=False)

source code 

Clears the given table of coinc triggers falling outside of the
desired ranges, as specified by parse_param_ranges and parse_coinc_opts.

@connection: connection to a SQLite database with lsctables
@coinc_table: name of the coinc_table to delete the triggers from.
 Can be any table with a coinc_event_id.
@param_filters: output of parse_param_ranges(...).get_param_filters()
@include_coincs: output of parse_coinc_opts(...).get_coinc_filters().
   The coincs that are specified in this list will be SAVED.
@exclude_coincs: output of parse_coinc_opts(...).get_coinc_filters().
    The coincs that are specified in this list will be DELETED.
Note: exclude_coincs is applied first, so anything falling in it will 
be deleted, regardless of wether or not the same falls in include_coincs.
To avoid confusion, it is best to only specify one or the other, not both.

write_newstyle_coinc_def_entry(connection, description, search=None, search_coinc_type=None)

source code 

Adds a new entry to the coinc_definer_table. The only thing used to discriminate different coinc_definer entries is the description column. Search and search_coinc_type can also be optionally specified.

get_coinc_types(connection)

source code 

Retrieves all of the distinct map-label and coinc-type pairs in the database. A dictionary is returned in which they keys are the map-labels and the values are lists of tuples. Each tuple gives all of the tables mapped to a coinc_event_id.

get_matching_tables(connection, coinc_event_ids)

source code 

Gets all the tables that are directly mapped to a list of coinc_event_ids. Returns a dictionary mapping the tables their matching coinc_event_ids.

join_experiment_tables_to_coinc_inspiral()

source code 

Writes JOIN string to join the experiment, experiment_summary, and experiment_map table to the coinc_inspiral table. NOTE: Should only use when querying the coinc_inspiral table (i.e., the only table listed in the FROM statement is the coinc_inspiral).

apply_inclusion_rules_to_coinc_inspiral(connection, exclude_coincs=None, include_coincs=None, param_filters=None, verbose=False)

source code 

Clears coinc_inspiral table of coinc triggers falling outside of the desired ranges, as specified by parse_param_ranges and parse_coinc_opts.

See apply_inclusion_rules_to_coinc_table for more info.

clean_inspiral_tables(connection, verbose=False)

source code 

Clears experiment_map, coinc_event, coinc_event_map, and all tables pointing to the coinc_event_map of triggers that are no longer in the coinc_inspiral table. Note that the experiment_summary, experiment, and time_slide_tables are left alone. This is because even if no events are left in an experiment, we still want info. about the experiment that was performed.

create_sim_rec_map_table(connection, simulation_table, recovery_table, map_label, ranking_stat=None)

source code 

Creates a temporary table in the sqlite database called sim_rec_map.
This table creates a direct mapping between simulation_ids in the simulation table
and coinc_event_ids from the recovery_table, along with a ranking stat from the 
recovery_table.
The columns in the sim_rec_map table are:
    * rec_id: coinc_event_ids of matching events from the recovery table
    * sim_id: the simulation_id from the sim_inspiral table
    * ranking_stat: any stat from the recovery table by which to rank
In addition, indices on the sim and rec ids are put on the table.

Note that because this is a temporary table, as soon as the connection is
closed, it will be deleted.

@connection: connection to a sqlite database
@simulation_table: any lsctable with a simulation_id column; e.g., sim_inspiral
@recovery_table: any lsctable with a coinc_event_id column; e.g., coinc_inspiral
@map_label: the label applied to the mapping between the injections and recovered
@ranking_stat: the name of the ranking stat in the recovery table to use.
 If set to None, ranking_stat column won't be populated.

simplify_sim_tbls(connection, verbose=False, debug=False)

source code 

Remove duplicates from simulation tables (sim_inspiral & sim_ringdown) if those tables exist in the database. Also update the sim_proc_id column in the experiment_summary table.

get_instrument_sets_and_time_slide_ids(connection)

source code 

Gets all instrument sets available in the time slide table and gets all time-slide ids associated with that instrument set. Since this only uses the time-slide table, will get everything even if there were no coincident events during a time-slide.