Package glue :: Package ligolw :: Module ilwd
[hide private]
[frames] | no frames]

Source Code for Module glue.ligolw.ilwd

  1  # Copyright (C) 2006,2012,2013,2016  Kipp Cannon 
  2  # 
  3  # This program is free software; you can redistribute it and/or modify it 
  4  # under the terms of the GNU General Public License as published by the 
  5  # Free Software Foundation; either version 3 of the License, or (at your 
  6  # option) any later version. 
  7  # 
  8  # This program is distributed in the hope that it will be useful, but 
  9  # WITHOUT ANY WARRANTY; without even the implied warranty of 
 10  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General 
 11  # Public License for more details. 
 12  # 
 13  # You should have received a copy of the GNU General Public License along 
 14  # with this program; if not, write to the Free Software Foundation, Inc., 
 15  # 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA. 
 16   
 17   
 18  # 
 19  # ============================================================================= 
 20  # 
 21  #                                    ILWDs 
 22  # 
 23  # ============================================================================= 
 24  # 
 25   
 26   
 27  """ 
 28  The ilwd:char type is used to store ID strings for objects within LIGO 
 29  Light-Weight XML files.  This module and its associated C extention module 
 30  _ilwd provide a class for memory-efficient storage of ilwd:char strings. 
 31   
 32  LIGO Light Weight XML "ilwd:char" IDs are strings of the form 
 33  "table:column:integer", for example "process:process_id:10".  Large complex 
 34  documents can have many millions of these strings, and their storage 
 35  represents a significant RAM burden.  However, while there can be millions 
 36  of ID strings in a document there might be only a small number (e.g., 10 or 
 37  fewer) unique ID prefixes in a document (the table name and column name 
 38  part).  The amount of RAM required to load a document can be significantly 
 39  reduced if the small number of unique string prefixes are stored separately 
 40  and reused.  This module provides the machinery used to do this. 
 41   
 42  The ilwdchar class in this module converts a string or unicode object 
 43  containing an ilwd:char ID into a more memory efficient representation. 
 44   
 45  Example: 
 46   
 47  >>> x = ilwdchar("process:process_id:10") 
 48  >>> print(x) 
 49  process:process_id:10 
 50   
 51  Like strings, the object resulting from this is immutable.  It provides two 
 52  read-only attributes, "table_name" and "column_name", that can be used to 
 53  access the table and column parts of the original ID string.  The integer 
 54  suffix can be retrieved by converting the object to an integer. 
 55   
 56  Example: 
 57   
 58  >>> x.table_name 
 59  u'process' 
 60  >>> int(x) 
 61  10 
 62   
 63  The object also provides the read-only attribute "index_offset", giving the 
 64  length of the string preceding the interger suffix. 
 65   
 66  Example: 
 67   
 68  >>> x.index_offset 
 69  19 
 70   
 71  The objects support some arithmetic operations. 
 72   
 73  Example: 
 74   
 75  >>> y = x + 5 
 76  >>> str(y) 
 77  'process:process_id:15' 
 78  >>> int(y - x) 
 79  5 
 80   
 81  The objects are pickle-able. 
 82   
 83  Example: 
 84   
 85  >>> import pickle 
 86  >>> x == pickle.loads(pickle.dumps(x)) 
 87  True 
 88   
 89  To simplify interaction with documents that do not contain fully-populated 
 90  columns, None is allowed as an input value and is not converted. 
 91   
 92  Example: 
 93   
 94  >>> print(ilwdchar(None)) 
 95  None 
 96   
 97   
 98  Implementation details 
 99  ====================== 
100   
101  Memory is reduced by storing the table_name, column_name, and index_offset 
102  values as class attributes, so only one copy is present in memory and is 
103  shared across all instances of the class.  This means that each unique 
104  table_name and column_name pair requires its own class.  These classes are 
105  created on the fly as new IDs are processed, and get added to this module's 
106  name space.  They are all subclasses of _ilwd.ilwdchar, which implements 
107  the low-level machinery.  After a new class is created it can be accessed 
108  as a symbol in this module, but each of those symbols does not exist until 
109  at least one corresponding ID string has been processed. 
110   
111  Example: 
112   
113  >>> import ilwd 
114  >>> "foo_bar_class" in ilwd.__dict__ 
115  False 
116  >>> x = ilwd.ilwdchar("foo:bar:0") 
117  >>> type(x) 
118  <class 'glue.ligolw.ilwd.foo_bar_class'> 
119  >>> "foo_bar_class" in ilwd.__dict__ 
120  True 
121  >>> print(ilwd.foo_bar_class(10)) 
122  foo:bar:10 
123   
124  The ilwdchar class itself is never instantiated, its .__new__() method 
125  parses the ID string parameter and creates an instance of the appropriate 
126  subclass of _ilwd.ilwdchar, creating a new subclass before doing so if 
127  neccessary. 
128  """ 
129   
130   
131  import six.moves.copyreg 
132   
133   
134  from glue import git_version 
135  from . import _ilwd 
136  import six 
137   
138   
139  __author__ = "Kipp Cannon <kipp.cannon@ligo.org>" 
140  __version__ = "git id %s" % git_version.id 
141  __date__ = git_version.date 
142   
143   
144  # 
145  # ============================================================================= 
146  # 
147  #                                Cached Classes 
148  # 
149  # ============================================================================= 
150  # 
151   
152   
153  # 
154  # Function for retrieving ilwdchar subclasses. 
155  # 
156   
157   
158 -def get_ilwdchar_class(tbl_name, col_name, namespace = globals()):
159 """ 160 Searches this module's namespace for a subclass of _ilwd.ilwdchar 161 whose table_name and column_name attributes match those provided. 162 If a matching subclass is found it is returned; otherwise a new 163 class is defined, added to this module's namespace, and returned. 164 165 Example: 166 167 >>> process_id = get_ilwdchar_class("process", "process_id") 168 >>> x = process_id(10) 169 >>> str(type(x)) 170 "<class 'glue.ligolw.ilwd.process_process_id_class'>" 171 >>> str(x) 172 'process:process_id:10' 173 174 Retrieving and storing the class provides a convenient mechanism 175 for quickly constructing new ID objects. 176 177 Example: 178 179 >>> for i in range(10): 180 ... print str(process_id(i)) 181 ... 182 process:process_id:0 183 process:process_id:1 184 process:process_id:2 185 process:process_id:3 186 process:process_id:4 187 process:process_id:5 188 process:process_id:6 189 process:process_id:7 190 process:process_id:8 191 process:process_id:9 192 """ 193 # 194 # if the class already exists, retrieve and return it 195 # 196 197 key = six.text_type(tbl_name), six.text_type(col_name) 198 cls_name = str("%s_%s_class" % key) 199 assert cls_name != "get_ilwdchar_class" 200 try: 201 return namespace[cls_name] 202 except KeyError: 203 pass 204 205 # 206 # otherwise define a new class, and add it to the cache 207 # 208 209 class new_class(_ilwd.ilwdchar): 210 __slots__ = () 211 table_name, column_name = key 212 index_offset = len(u"%s:%s:" % key)
213 214 new_class.__name__ = cls_name 215 216 namespace[cls_name] = new_class 217 218 # 219 # pickle support 220 # 221 222 six.moves.copyreg.pickle(new_class, lambda x: (ilwdchar, (six.text_type(x),))) 223 224 # 225 # return the new class 226 # 227 228 return new_class 229 230 231 # 232 # Metaclass to redirect instantiation to the correct subclass for 233 # _ilwd.ilwdchar 234 # 235 236
237 -class ilwdchar(object):
238 """ 239 Metaclass wrapper of glue.ligolw._ilwd.ilwdchar class. 240 Instantiating this class constructs and returns an instance of a 241 subclass of glue.ligolw._ilwd.ilwdchar. 242 """
243 - def __new__(cls, s):
244 """ 245 Convert an ilwd:char-formated string into an instance of 246 the matching subclass of _ilwd.ilwdchar. If the input is 247 None then the return value is None. 248 249 Example: 250 251 >>> x = ilwdchar(u"process:process_id:10") 252 >>> str(x) 253 'process:process_id:10' 254 >>> x.table_name 255 u'process' 256 >>> x.column_name 257 u'process_id' 258 >>> int(x) 259 10 260 >>> x.index_offset 261 19 262 >>> str(x)[x.index_offset:] 263 '10' 264 >>> print(ilwdchar(None)) 265 None 266 """ 267 # 268 # None is no-op 269 # 270 271 if s is None: 272 return None 273 274 # 275 # try parsing the string as an ilwd:char formated string 276 # 277 278 try: 279 table_name, column_name, i = s.strip().split(u":") 280 except (ValueError, AttributeError): 281 raise ValueError("invalid ilwd:char '%s'" % repr(s)) 282 283 # 284 # retrieve the matching class from the ID class cache, and 285 # return an instance initialized to the desired value 286 # 287 288 return get_ilwdchar_class(table_name, column_name)(int(i))
289