abacusai.api_client_utils

Attributes

INVALID_PANDAS_COLUMN_NAME_CHARACTERS

Classes

StreamingHandler

str(object='') -> str

StreamType

Generic enumeration.

DocstoreUtils

Utility class for loading docstore data.

Functions

clean_column_name(column)

avro_to_pandas_dtype(avro_type)

get_non_nullable_type(types)

get_object_from_context(client, context, ...)

load_as_pandas_from_avro_fd(fd)

load_as_pandas_from_avro_files(files, download_method)

try_abacus_internal_copy(src_suffix, dst_local[, ...])

Retuns true if the file was copied, false otherwise

Module Contents

abacusai.api_client_utils.INVALID_PANDAS_COLUMN_NAME_CHARACTERS = '[^A-Za-z0-9_]'
abacusai.api_client_utils.clean_column_name(column)
abacusai.api_client_utils.avro_to_pandas_dtype(avro_type)
abacusai.api_client_utils.get_non_nullable_type(types)
class abacusai.api_client_utils.StreamingHandler

Bases: str

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

classmethod process_streaming_data(value, context, section_key, data_type)
abacusai.api_client_utils.get_object_from_context(client, context, variable_name, return_type)
abacusai.api_client_utils.load_as_pandas_from_avro_fd(fd)
Parameters:

fd (IO)

abacusai.api_client_utils.load_as_pandas_from_avro_files(files, download_method, max_workers=10)
Parameters:
  • files (List[str])

  • download_method (Callable)

  • max_workers (int)

class abacusai.api_client_utils.StreamType

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

MESSAGE = 'message'
SECTION_OUTPUT = 'section_output'
SEGMENT = 'segment'
class abacusai.api_client_utils.DocstoreUtils

Utility class for loading docstore data. Needs to be updated if docstore formats change.

DOC_ID = 'doc_id'
PREDICTION_PREFIX = 'prediction'
FIRST_PAGE = 'first_page'
LAST_PAGE = 'last_page'
PAGE_TEXT = 'page_text'
PAGES = 'pages'
CONTENT = 'content'
TOKENS = 'tokens'
PAGES_ZIP_METADATA = 'pages_zip_metadata'
PAGE_DATA = 'page_data'
HEIGHT = 'height'
WIDTH = 'width'
METADATA = 'metadata'
PAGE = 'page'
BLOCK = 'block'
LINE = 'line'
EXTRACTED_TEXT = 'extracted_text'
EMBEDDED_TEXT = 'embedded_text'
PAGE_MARKDOWN = 'page_markdown'
PAGE_LLM_OCR = 'page_llm_ocr'
PAGE_TABLE_TEXT = 'page_table_text'
MARKDOWN_FEATURES = 'markdown_features'
DOCUMENT_PROCESSING_CONFIG = 'document_processing_config'
DOCUMENT_PROCESSING_VERSION = 'document_processing_version'
static get_archive_id(doc_id)
Parameters:

doc_id (str)

static get_page_id(doc_id, page)
Parameters:
static get_content_hash(doc_id)
Parameters:

doc_id (str)

classmethod get_pandas_pages_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, get_document_processing_result_infos, max_workers=10)
Parameters:
  • feature_group_version (str)

  • doc_id_column (str)

  • document_column (str)

  • get_docstore_resource_bytes (Callable[Ellipsis, bytes])

  • get_document_processing_result_infos (Callable)

  • max_workers (int)

classmethod get_pandas_documents_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, get_document_processing_result_infos, max_workers=10)
Parameters:
  • feature_group_version (str)

  • doc_id_column (str)

  • document_column (str)

  • get_docstore_resource_bytes (Callable)

  • get_document_processing_result_infos (Callable)

  • max_workers (int)

abacusai.api_client_utils.try_abacus_internal_copy(src_suffix, dst_local, raise_exception=True)

Retuns true if the file was copied, false otherwise