hats.io.size_estimates#

General utilities for estimating size of input and output.

Functions#

estimate_dir_size([path, divisor])

Estimate the disk usage of a directory, and recursive contents.

get_mem_size_per_row(data)

Given a 2D array of data, return a list of memory sizes for each row in the chunk.

Module Contents#

estimate_dir_size(path: str | pathlib.Path | upath.UPath | None = None, *, divisor=1)[source]#

Estimate the disk usage of a directory, and recursive contents.

When divisor == 1, returns size in bytes.

get_mem_size_per_row(data)[source]#

Given a 2D array of data, return a list of memory sizes for each row in the chunk.

Args:

data (pd.DataFrame or pa.Table): the data chunk to measure

Returns:

list[int]: list of memory sizes for each row in the chunk