graviti.utility.lazy
#
Lazy list related class.
Module Contents#
Classes#
LazyList is a lazy load list which follows Sequence protocol. |
|
MutableLazyList is a lazy load list which follows Sequence protocol. |
|
LazyPage is a placeholder of the pages in the lazy list when the page is not loaded yet. |
|
LazyFactory is a factory for creating lazy lists. |
- class graviti.utility.lazy.LazyList(total_count, limit, fetcher, extractor, *, dtype=None)[source]#
Bases:
Sequence
[_T
],graviti.utility.repr.ReprMixin
LazyList is a lazy load list which follows Sequence protocol.
- Parameters
total_count (int) – The total count of the elements in the lazy list.
limit (int) – The size of each lazy load page.
fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.
extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.
dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.
- pages#
A list of pyarrow arrays that contains the data in the lazy list.
- class graviti.utility.lazy.MutableLazyList(total_count, limit, fetcher, extractor, *, dtype=None)[source]#
Bases:
LazyList
[_T
]MutableLazyList is a lazy load list which follows Sequence protocol.
It supports extend method to add items into the list.
- Parameters
total_count (int) – The total count of the elements in the lazy list.
limit (int) – The size of each lazy load page.
fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.
extractor (Callable[[Any], Iterable[Any]]) – A callable object to make the source data to an iterable object.
dtype (Optional[pyarrow.DataType]) – The pyarrow data type of the elements in the lazy list.
- pages#
A list of pyarrow arrays that contains the data in the lazy list.
- class graviti.utility.lazy.LazyPage(pos, fetcher, parent)[source]#
Bases:
Generic
[_T
]LazyPage is a placeholder of the pages in the lazy list when the page is not loaded yet.
- Parameters
pos (int) – The page number.
fetcher (Callable[[int], None]) – A callable object to fetch the data and load it to the lazy list.
parent (LazyList[_T]) – The parent lazy list.
- class graviti.utility.lazy.LazyFactory(total_count, limit, getter)[source]#
LazyFactory is a factory for creating lazy lists.
- Parameters
total_count (int) – The total count of the elements in the lazy lists.
limit (int) – The size of each lazy load page.
getter (Callable[[int, int], Any]) – A callable object to get the source data.
Examples
>>> TOTAL_COUNT = 1000 >>> def getter(offset: int, limit: int) -> Dict[str, Any]: ... stop = min(offset + limit, TOTAL_COUNT) ... data = [ ... { ... "remotePath": f"{i:06}.jpg", ... "label": {"CLASSIFICATION": {"category": "cat" if i % 2 else "dog"}}, ... } ... for i in range(offset, stop) ... ] ... ... return { ... "data": data, ... "offset": offset, ... "recordSize": len(data), ... "totalCount": TOTAL_COUNT, ... } >>> factory = LazyFactory(TOTAL_COUNT, 128, getter) >>> paths = factory.create_list( ... lambda data: (item["remotePath"] for item in data["data"]), dtype="<U10" ... ) >>> categories = factory.create_list( ... lambda data: (item["label"]["CLASSIFICATION"]["category"] for item in data["data"]), ... dtype="<U3", ... ) >>> paths LazyList [ '000000.jpg', '000001.jpg', '000002.jpg', '000003.jpg', '000004.jpg', '000005.jpg', '000006.jpg', '000007.jpg', '000008.jpg', '000009.jpg', '000010.jpg', '000011.jpg', '000012.jpg', '000013.jpg', ... (985 items are folded), '000999.jpg' ] >>> categories LazyList [ 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', ... (985 items are folded), 'cat' ]
- create_list(self, extractor, dtype=None)[source]#
Create a lazy list from the factory.
- Parameters
- Returns
A lazy list created by the given extractor and dtype.
- Return type