graviti.utility.lazy#

Lazy list related class.

Module Contents#

Classes#

LazyList

LazyList is a lazy load list which follows Sequence protocol.

MutableLazyList

MutableLazyList is a lazy load list which follows Sequence protocol.

LazyPage

LazyPage is a placeholder of the pages in the lazy list when the page is not loaded yet.

LazyFactory

LazyFactory is a factory for creating lazy lists.

class graviti.utility.lazy.LazyList(total_count, limit, fetcher, extractor, *, dtype=None)[source]#

Bases: Sequence[_T], graviti.utility.repr.ReprMixin

LazyList is a lazy load list which follows Sequence protocol.

Parameters
  • total_count (int) – The total count of the elements in the lazy list.

  • limit (int) – The size of each lazy load page.

  • fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.

  • extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.

  • dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.

pages#

A list of pyarrow arrays that contains the data in the lazy list.

update(self, pos, data)[source]#

Update one page by the given data.

Parameters
  • pos (int) – The page number.

  • data (Any) – The source data which needs to be input to the extractor.

Return type

None

class graviti.utility.lazy.MutableLazyList(total_count, limit, fetcher, extractor, *, dtype=None)[source]#

Bases: LazyList[_T]

MutableLazyList is a lazy load list which follows Sequence protocol.

It supports extend method to add items into the list.

Parameters
  • total_count (int) – The total count of the elements in the lazy list.

  • limit (int) – The size of each lazy load page.

  • fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.

  • extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.

  • dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.

pages#

A list of pyarrow arrays that contains the data in the lazy list.

extend(self, values)[source]#

Extend mutable sequence by appending elements from the iterable.

Parameters

values (Iterable[_T]) – Elements to be Extended into the mutable sequence.

Return type

None

class graviti.utility.lazy.LazyPage(pos, fetcher, parent)[source]#

Bases: Generic[_T]

LazyPage is a placeholder of the pages in the lazy list when the page is not loaded yet.

Parameters
  • pos (int) – The page number.

  • fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.

  • parent (LazyList[_T]) – The parent lazy list.

class graviti.utility.lazy.LazyFactory(total_count, limit, getter)[source]#

LazyFactory is a factory for creating lazy lists.

Parameters
  • total_count (int) – The total count of the elements in the lazy lists.

  • limit (int) – The size of each lazy load page.

  • getter (Callable[[int, int], Any]) – A callable object to get the source data.

Examples

>>> TOTAL_COUNT = 1000
>>> def getter(offset: int, limit: int) -> Dict[str, Any]:
...     stop = min(offset + limit, TOTAL_COUNT)
...     data = [
...         {
...             "remotePath": f"{i:06}.jpg",
...             "label": {"CLASSIFICATION": {"category": "cat" if i % 2 else "dog"}},
...         }
...         for i in range(offset, stop)
...     ]
...
...     return {
...         "data": data,
...         "offset": offset,
...         "recordSize": len(data),
...         "totalCount": TOTAL_COUNT,
...     }
>>> factory = LazyFactory(TOTAL_COUNT, 128, getter)
>>> paths = factory.create_list(
...     lambda data: (item["remotePath"] for item in data["data"]), dtype="<U10"
... )
>>> categories = factory.create_list(
...     lambda data: (item["label"]["CLASSIFICATION"]["category"] for item in data["data"]),
...     dtype="<U3",
... )
>>> paths
LazyList [
  '000000.jpg',
  '000001.jpg',
  '000002.jpg',
  '000003.jpg',
  '000004.jpg',
  '000005.jpg',
  '000006.jpg',
  '000007.jpg',
  '000008.jpg',
  '000009.jpg',
  '000010.jpg',
  '000011.jpg',
  '000012.jpg',
  '000013.jpg',
  ... (985 items are folded),
  '000999.jpg'
]
>>> categories
LazyList [
  'dog',
  'cat',
  'dog',
  'cat',
  'dog',
  'cat',
  'dog',
  'cat',
  'dog',
  'cat',
  'dog',
  'cat',
  'dog',
  'cat',
  ... (985 items are folded),
  'cat'
]
create_list(self, extractor, dtype=None)[source]#

Create a lazy list from the factory.

Parameters
  • extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.

  • dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.

Returns

A lazy list created by the given extractor and dtype.

Return type

LazyList[Any]

create_mutable_list(self, extractor, dtype=None)[source]#

Create a mutable lazy list from the factory.

Parameters
  • extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.

  • dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.

Returns

A lazy list created by the given extractor and dtype.

Return type

LazyList[Any]

fetch(self, pos)[source]#

Fetch the source data and load the data to all lazy lists.

Parameters

pos (int) – The page number.

Return type

None