Package glue :: Package ligolw :: Module table :: Class InterningRowBuilder
[hide private]
[frames] | no frames]

Class InterningRowBuilder

source code

          object --+    
tokenizer.RowBuilder --+

This subclass of the tokenizer.RowBuilder class respects the "interning" hints provided by table definitions, and attempts to replace the values of row attributes associated with interned columns with references to shared instances of those values. This results in a reduction in memory use which is small for most documents, but can be subtantial when dealing with tables containing large volumes of repeated information.


>>> class Row(object):
...     pass
>>> # 3rd arg is optional list of attributes to intern
>>> rows = InterningRowBuilder(Row, ["name", "age"], ("name",))
>>> l = list(rows.append(["Dick", 20., "Jane", 75., "Dick", 22.]))
>>> l[0].name
>>> l[2].name
>>> l[2].name is l[0].name

Note that Python naturally interns short strings, so this example would return True regardless; it is intended only to demonstrate the use of the class.

The values are stored in a dictionary that is shared between all instances of this class, and which survives forever. Nothing is ever naturally "uninterned", so the string dictionary grows without bound as more documents are processed. This can be a problem in some use cases, and the work-around is to run

>>> InterningRowBuilder.strings.clear()

to reset the dictionary at appropriate points in the application. Typically this would be done immediately after each document is loaded.

Instance Methods [hide private]
append(self, tokens)
Append a sequence of tokens to the row builder, returning an iterator for generating a sequence of new row instances.
source code

Inherited from tokenizer.RowBuilder: __init__, __iter__, __new__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  strings = {}
Properties [hide private]

Inherited from tokenizer.RowBuilder: attributes, i, interns, row, rowtype

Inherited from object: __class__

Method Details [hide private]

append(self, tokens)

source code 

Append a sequence of tokens to the row builder, returning an iterator for generating a sequence of new row instances. The tokens argument should be an iterable, producing a sequence of token objects. If fewer tokens are yielded from the iterable than are required to construct a complete row, then the row is stored in its partially-populated state and its construction will continue upon the next invocation. Note that it is possible that a call to this method will yield no new rows at all.


>>> for row in rows.append([10, 6.8, 15, 29.1]):
...     print row.snr
Overrides: tokenizer.RowBuilder.append
(inherited documentation)