graviti.paging.factory#

Paging list related class.

Module Contents#

Classes#

LazyFactoryBase

LazyFactoryBase is the base class of the lazy facotry.

LazyFactory

LazyFactory is a factory for requesting source data and creating paging lists.

LazySubFactory

LazySubFactory is a factory for creating paging lists.

LazyLowerCaseFactory

LazyLowerCaseFactory is a factory to handle the case insensitive data from graviti back-end.

LazyLowerCaseSubFactory

LazyLowerCaseSubFactory is a sub-factory to handle the case insensitive data.

class graviti.paging.factory.LazyFactoryBase[source]#

LazyFactoryBase is the base class of the lazy facotry.

abstract create_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Raises

NotImplementedError – The method of the base class should not be called.

Return type

graviti.paging.lists.PagingList[_T]

abstract create_mapped_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Raises

NotImplementedError – The method of the base class should not be called.

Return type

graviti.paging.lists.MappedPagingList[_T]

abstract create_pyarrow_list(self)[source]#

Create a paging list from the factory.

Raises

NotImplementedError – The method of the base class should not be called.

Return type

graviti.paging.lists.PyArrowPagingList[Any]

class graviti.paging.factory.LazyFactory(total_count, limit, getter, patype)[source]#

Bases: LazyFactoryBase

LazyFactory is a factory for requesting source data and creating paging lists.

Parameters
  • total_count (int) – The total count of the elements in the paging lists.

  • limit (int) – The size of each lazy load page.

  • getter (Callable[[int, int], Any]) – A callable object to get the source data.

  • patype (pyarrow.DataType) – The pyarrow DataType of the data in the factory.

Examples

>>> import pyarrow as pa
>>> patype = pa.struct(
...     {
...         "remotePath": pa.string(),
...         "label": pa.struct({"CLASSIFICATION": pa.struct({"category": pa.string()})}),
...     }
... )
>>> TOTAL_COUNT = 1000
>>> def getter(offset: int, limit: int) -> List[Dict[str, Any]]:
...     stop = min(offset + limit, TOTAL_COUNT)
...     return [
...         {
...             "remotePath": f"{i:06}.jpg",
...             "label": {"CLASSIFICATION": {"category": "cat" if i % 2 else "dog"}},
...         }
...         for i in range(offset, stop)
...     ]
...
>>> factory = LazyFactory(TOTAL_COUNT, 128, getter, patype)
>>> paths = factory["remotePath"].create_pyarrow_list()
>>> categories = factory["label"]["CLASSIFICATION"]["category"].create_pyarrow_list()
>>> len(paths)
1000
>>> list(paths)
[<pyarrow.StringScalar: '000000.jpg'>,
 <pyarrow.StringScalar: '000001.jpg'>,
 <pyarrow.StringScalar: '000002.jpg'>,
 <pyarrow.StringScalar: '000003.jpg'>,
 <pyarrow.StringScalar: '000004.jpg'>,
 <pyarrow.StringScalar: '000005.jpg'>,
 ...
 ...
 <pyarrow.StringScalar: '000999.jpg'>]
>>> len(categories)
1000
>>> list(categories)
[<pyarrow.StringScalar: 'dog'>,
 <pyarrow.StringScalar: 'cat'>,
 <pyarrow.StringScalar: 'dog'>,
 <pyarrow.StringScalar: 'cat'>,
 <pyarrow.StringScalar: 'dog'>,
 ...
 ...
 <pyarrow.StringScalar: 'cat'>]
get_array(self, pos, keys)[source]#

Get the array from the factory.

Parameters
  • pos (int) – The page number.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.

Returns

The requested pyarrow array.

Return type

pyarrow.Array

create_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.PagingList[_T]

create_mapped_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.MappedPagingList[_T]

create_pyarrow_list(self)[source]#

Create a paging list from the factory.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.PyArrowPagingList[Any]

get_page_lengths(self)[source]#

A Generator which generates the length of the pages in the factory.

Yields

The page lengths.

Return type

Iterator[int]

get_offsets(self)[source]#

Get the Offsets instance created by the total_count and limit of this factory.

Returns

The Offsets instance created by the total_count and limit of this factory.

Return type

graviti.paging.offset.Offsets

class graviti.paging.factory.LazySubFactory(factory, keys, patype)[source]#

Bases: LazyFactoryBase

LazySubFactory is a factory for creating paging lists.

Parameters
  • factory (LazyFactory) – The source LazyFactory instance.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from the source LazyFactory.

  • patype (pyarrow.DataType) – The pyarrow DataType of the data in the sub-factory.

create_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.PagingList[_T]

create_mapped_list(self, mapper)[source]#

Create a paging list from the factory.

Parameters

mapper (Callable[[Any], _T]) – A callable object to convert every item in the pyarrow array.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.MappedPagingList[_T]

create_pyarrow_list(self)[source]#

Create a paging list from the factory.

Returns

A paging list created from the factory.

Return type

graviti.paging.lists.PyArrowPagingList[Any]

class graviti.paging.factory.LazyLowerCaseFactory(total_count, limit, getter, patype)[source]#

Bases: LazyFactory

LazyLowerCaseFactory is a factory to handle the case insensitive data from graviti back-end.

Parameters
  • total_count (int) – The total count of the elements in the paging lists.

  • limit (int) – The size of each lazy load page.

  • getter (Callable[[int, int], Any]) – A callable object to get the source data.

  • patype (pyarrow.DataType) – The pyarrow DataType of the data in the factory.

get_array(self, pos, keys)[source]#

Get the array from the factory.

Parameters
  • pos (int) – The page number.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.

Returns

The requested pyarrow array.

Return type

pyarrow.Array

class graviti.paging.factory.LazyLowerCaseSubFactory(factory, keys, patype)[source]#

Bases: LazySubFactory

LazyLowerCaseSubFactory is a sub-factory to handle the case insensitive data.

Parameters
  • factory (LazyFactory) – The source LazyFactory instance.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from the source LazyFactory.

  • patype (pyarrow.DataType) – The pyarrow DataType of the data in the sub-factory.