hats.catalog.healpix_dataset.healpix_dataset#

Classes#

HealpixDataset

A HATS dataset partitioned with a HEALPix partitioning structure.

Module Contents#

class HealpixDataset(catalog_info: hats.catalog.dataset.table_properties.TableProperties, pixels: hats.catalog.partition_info.PartitionInfo | hats.pixel_tree.pixel_tree.PixelTree | list[hats.pixel_math.HealpixPixel], catalog_path: str | pathlib.Path | upath.UPath | None = None, moc: mocpy.MOC | None = None, schema: pyarrow.Schema | None = None, snapshot: hats.catalog.catalog_snapshot.CatalogSnapshot | None = None, generate_snapshot: bool = False)[source]#

Bases: hats.catalog.dataset.Dataset

A HATS dataset partitioned with a HEALPix partitioning structure.

Catalogs of this type are partitioned based on the ra and dec of the points with each partition containing points within a given HEALPix pixel. The files are in the form:

Norder=/Dir=/Npix=.parquet
partition_info[source]#
pixel_tree[source]#
moc = None[source]#
get_healpix_pixels() list[hats.pixel_math.HealpixPixel][source]#

Get healpix pixel objects for all pixels contained in the catalog.

Returns:
list[HealpixPixel]

List of HealpixPixel

__len__()[source]#

The number of rows in the catalog.

Returns:
int

The number of rows in the catalog, as specified in its metadata. This value is undetermined when the catalog is modified, and therefore an error is raised.

get_max_coverage_order(default_order: int = 3) int[source]#

Gets the maximum HEALPix order for which the coverage of the catalog is known from the pixel tree and moc if it exists

Parameters:
default_orderint

The order to return if the dataset has no pixels. (Default value = 3)

Returns:
int

maximum HEALPix order

filter_from_pixel_list(pixels: list[hats.pixel_math.HealpixPixel]) Self[source]#

Filter the pixels in the catalog to only include any that overlap with the requested pixels.

Parameters:
pixelslist[HealpixPixel]

the pixels to include

Returns:
HealpixDataset

A new catalog with only the pixels that overlap with the given pixels. Note that we reset the total_rows to None, as updating would require a scan over the new pixel sizes.

filter_by_cone(ra: float, dec: float, radius_arcsec: float) Self[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a cone

Parameters:
rafloat

Right ascension of the center of the cone, in degrees

decfloat

Declination of the center of the cone, in degrees

radius_arcsecfloat

Radius of the cone, in arcseconds

Returns:
HealpixDataset

A new catalog with only the pixels that overlap with the specified cone

filter_by_box(ra: tuple[float, float], dec: tuple[float, float]) Self[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a zone, defined by right ascension and declination ranges. The right ascension edges follow great arc circles and the declination edges follow small arc circles.

Parameters:
ratuple[float, float]

Right ascension range, in degrees

dectuple[float, float]

Declination range, in degrees

Returns:
HealpixDataset

A new catalog with only the pixels that overlap with the specified region

filter_by_polygon(vertices: list[tuple[float, float]]) Self[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a polygonal sky region.

Parameters:
verticeslist[tuple[float, float]]

The list of vertice coordinates for the polygon, (ra, dec), in degrees.

Returns:
HealpixDataset

A new catalog with only the pixels that overlap with the specified polygon.

filter_by_moc(moc: mocpy.MOC) Self[source]#

Filter the pixels in the catalog to only include the pixels that overlap with the moc provided.

Parameters:
mocmocpy.MOC

the moc to filter by

Returns:
HealpixDataset

A new catalog with only the pixels that overlap with the moc. Note that we reset the total_rows to 0, as updating would require a scan over the new pixel sizes.

align(other_cat: Self, alignment_type: hats.pixel_tree.PixelAlignmentType = PixelAlignmentType.INNER) hats.pixel_tree.PixelAlignment[source]#

Performs an alignment to another catalog, using the pixel tree and mocs if available

An alignment compares the pixel structures of the two catalogs, checking which pixels overlap. The alignment includes the mapping of all pairs of pixels in each tree that overlap with each other, and the aligned tree which consists of the overlapping pixels in the two input catalogs, using the higher order pixels where there is overlap with differing orders.

For more information, see this document: https://docs.google.com/document/d/1gqb8qb3HiEhLGNav55LKKFlNjuusBIsDW7FdTkc5mJU/edit?usp=sharing

Parameters:
other_catCatalog

The catalog to align to

alignment_typePixelAlignmentType

The type of alignment describing how to handle nodes which exist in one tree but not the other. Mirrors the ‘how’ argument of a pandas/sql join. Options are:

  • “inner” - only use pixels that appear in both catalogs

  • “left” - use all pixels that appear in the left catalog and any overlapping from the right

  • “right” - use all pixels that appear in the right catalog and any overlapping from the left

  • “outer” - use all pixels from both catalogs

Returns:
PixelAlignment

A PixelAlignment object with the alignment from the two catalogs

plot_pixels(**kwargs)[source]#

Create a visual map of the pixel density of the catalog.

Parameters:
**kwargs

Additional args to pass to hats.inspection.visualize_catalog.plot_healpix_map

plot_moc(**kwargs)[source]#

Create a visual map of the coverage of the catalog.

Parameters:
**kwargs

Additional args to pass to hats.inspection.visualize_catalog.plot_moc

aggregate_column_statistics(exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None)[source]#

Read footer statistics in parquet metadata, and report on global min/max values.

Parameters:
exclude_hats_columnsbool

exclude HATS spatial and partitioning fields from the statistics. Defaults to True.

exclude_columnslist[str] | None

additional columns to exclude from the statistics.

include_columnslist[str] | None

if specified, only return statistics for the column names provided. Defaults to None, and returns all non-hats columns.

only_numeric_columnsbool

only include columns that are numeric (integer or floating point) in the statistics. If True, the entire frame should be numeric. (Default value = False)

include_pixels: list[HealpixPixel] | None

(Default value = None)

Returns:
Dataframe

aggregated statistics

per_pixel_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index=False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None, per_row_group: bool = False)[source]#

Read footer statistics in parquet metadata, and report on statistics about each pixel partition.

per_partition_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index=False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None, per_row_group: bool = False)[source]#

Read footer statistics in parquet metadata, and report on statistics about each pixel partition.

Parameters:
exclude_hats_columnsbool

exclude HATS spatial and partitioning fields from the statistics. Defaults to True.

exclude_columnslist[str] | None

additional columns to exclude from the statistics.

include_columnslist[str] | None

if specified, only return statistics for the column names provided. Defaults to None, and returns all non-hats columns.

include_statslist[str] | None

if specified, only return the kinds of values from list (min_value, max_value, null_count, row_count). Defaults to None, and returns all values.

multi_indexbool

should the returned frame be created with a multi-index, first on pixel, then on column name? (Default value = False)

include_pixelslist[HealpixPixel] | None

if specified, only return statistics for the pixels indicated. Defaults to none, and returns all pixels.

Returns:
Dataframe

granular statistics

has_healpix_column()[source]#

Does this catalog’s schema contain a healpix spatial index column?

This is True if either:

  • there is a value for the hats_col_healpix property, and that string exists as a column name in the pyarrow schema

  • there is a _healpix_29 column in the pyarrow schema

Returns:
bool

if the dataset has a healpix column in the properties

get_pixel_paths()[source]#

Generate paths to all pixel files.

Pixels will be traversed in “breadth-first” healpix order. If any spatial filters have been applied to this catalog, only those pixels that remain will be included.

Yields:
UPath

Universal Pathlib pointing to either an npix directory, or to a single pixel partition data file.

read_pixel_to_pandas(pixel: hats.pixel_math.HealpixPixel, **kwargs) nested_pandas.NestedFrame[source]#

Read the parquet file(s) for this pixel into a pandas dataframe.

Parameters:
pixelHealpixPixel

desired data partition, by healpix pixel

**kwargs

Additional arguments to pass to pandas read_parquet method

Returns:
NestedFrame

Pandas DataFrame with the data from the parquet file(s)