hats.catalog.healpix_dataset.healpix_dataset#
Classes#
A HATS dataset partitioned with a HEALPix partitioning structure. |
Module Contents#
- class HealpixDataset(catalog_info: hats.catalog.dataset.table_properties.TableProperties, pixels: hats.catalog.partition_info.PartitionInfo | hats.pixel_tree.pixel_tree.PixelTree | list[hats.pixel_math.HealpixPixel], catalog_path: str | pathlib.Path | upath.UPath | None = None, moc: mocpy.MOC | None = None, schema: pyarrow.Schema | None = None, snapshot: hats.catalog.catalog_snapshot.CatalogSnapshot | None = None, generate_snapshot: bool = False)[source]#
Bases:
hats.catalog.dataset.DatasetA HATS dataset partitioned with a HEALPix partitioning structure.
Catalogs of this type are partitioned based on the ra and dec of the points with each partition containing points within a given HEALPix pixel. The files are in the form:
Norder=/Dir=/Npix=.parquet
- get_healpix_pixels() list[hats.pixel_math.HealpixPixel][source]#
Get healpix pixel objects for all pixels contained in the catalog.
- Returns:
- list[HealpixPixel]
List of HealpixPixel
- __len__()[source]#
The number of rows in the catalog.
- Returns:
- int
The number of rows in the catalog, as specified in its metadata. This value is undetermined when the catalog is modified, and therefore an error is raised.
- get_max_coverage_order(default_order: int = 3) int[source]#
Gets the maximum HEALPix order for which the coverage of the catalog is known from the pixel tree and moc if it exists
- Parameters:
- default_orderint
The order to return if the dataset has no pixels. (Default value = 3)
- Returns:
- int
maximum HEALPix order
- filter_from_pixel_list(pixels: list[hats.pixel_math.HealpixPixel]) Self[source]#
Filter the pixels in the catalog to only include any that overlap with the requested pixels.
- Parameters:
- pixelslist[HealpixPixel]
the pixels to include
- Returns:
- HealpixDataset
A new catalog with only the pixels that overlap with the given pixels. Note that we reset the total_rows to None, as updating would require a scan over the new pixel sizes.
- filter_by_cone(ra: float, dec: float, radius_arcsec: float) Self[source]#
Filter the pixels in the catalog to only include the pixels that overlap with a cone
- Parameters:
- rafloat
Right ascension of the center of the cone, in degrees
- decfloat
Declination of the center of the cone, in degrees
- radius_arcsecfloat
Radius of the cone, in arcseconds
- Returns:
- HealpixDataset
A new catalog with only the pixels that overlap with the specified cone
- filter_by_box(ra: tuple[float, float], dec: tuple[float, float]) Self[source]#
Filter the pixels in the catalog to only include the pixels that overlap with a zone, defined by right ascension and declination ranges. The right ascension edges follow great arc circles and the declination edges follow small arc circles.
- Parameters:
- ratuple[float, float]
Right ascension range, in degrees
- dectuple[float, float]
Declination range, in degrees
- Returns:
- HealpixDataset
A new catalog with only the pixels that overlap with the specified region
- filter_by_polygon(vertices: list[tuple[float, float]]) Self[source]#
Filter the pixels in the catalog to only include the pixels that overlap with a polygonal sky region.
- Parameters:
- verticeslist[tuple[float, float]]
The list of vertice coordinates for the polygon, (ra, dec), in degrees.
- Returns:
- HealpixDataset
A new catalog with only the pixels that overlap with the specified polygon.
- filter_by_moc(moc: mocpy.MOC) Self[source]#
Filter the pixels in the catalog to only include the pixels that overlap with the moc provided.
- Parameters:
- mocmocpy.MOC
the moc to filter by
- Returns:
- HealpixDataset
A new catalog with only the pixels that overlap with the moc. Note that we reset the total_rows to 0, as updating would require a scan over the new pixel sizes.
- align(other_cat: Self, alignment_type: hats.pixel_tree.PixelAlignmentType = PixelAlignmentType.INNER) hats.pixel_tree.PixelAlignment[source]#
Performs an alignment to another catalog, using the pixel tree and mocs if available
An alignment compares the pixel structures of the two catalogs, checking which pixels overlap. The alignment includes the mapping of all pairs of pixels in each tree that overlap with each other, and the aligned tree which consists of the overlapping pixels in the two input catalogs, using the higher order pixels where there is overlap with differing orders.
For more information, see this document: https://docs.google.com/document/d/1gqb8qb3HiEhLGNav55LKKFlNjuusBIsDW7FdTkc5mJU/edit?usp=sharing
- Parameters:
- other_catCatalog
The catalog to align to
- alignment_typePixelAlignmentType
The type of alignment describing how to handle nodes which exist in one tree but not the other. Mirrors the ‘how’ argument of a pandas/sql join. Options are:
“inner” - only use pixels that appear in both catalogs
“left” - use all pixels that appear in the left catalog and any overlapping from the right
“right” - use all pixels that appear in the right catalog and any overlapping from the left
“outer” - use all pixels from both catalogs
- Returns:
- PixelAlignment
A PixelAlignment object with the alignment from the two catalogs
- plot_pixels(**kwargs)[source]#
Create a visual map of the pixel density of the catalog.
- Parameters:
- **kwargs
Additional args to pass to hats.inspection.visualize_catalog.plot_healpix_map
- plot_moc(**kwargs)[source]#
Create a visual map of the coverage of the catalog.
- Parameters:
- **kwargs
Additional args to pass to hats.inspection.visualize_catalog.plot_moc
- aggregate_column_statistics(exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None)[source]#
Read footer statistics in parquet metadata, and report on global min/max values.
- Parameters:
- exclude_hats_columnsbool
exclude HATS spatial and partitioning fields from the statistics. Defaults to True.
- exclude_columnslist[str] | None
additional columns to exclude from the statistics.
- include_columnslist[str] | None
if specified, only return statistics for the column names provided. Defaults to None, and returns all non-hats columns.
- only_numeric_columnsbool
only include columns that are numeric (integer or floating point) in the statistics. If True, the entire frame should be numeric. (Default value = False)
- include_pixels: list[HealpixPixel] | None
(Default value = None)
- Returns:
- Dataframe
aggregated statistics
- per_pixel_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index=False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None, per_row_group: bool = False)[source]#
Read footer statistics in parquet metadata, and report on statistics about each pixel partition.
- per_partition_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index=False, include_pixels: list[hats.pixel_math.HealpixPixel] | None = None, per_row_group: bool = False)[source]#
Read footer statistics in parquet metadata, and report on statistics about each pixel partition.
- Parameters:
- exclude_hats_columnsbool
exclude HATS spatial and partitioning fields from the statistics. Defaults to True.
- exclude_columnslist[str] | None
additional columns to exclude from the statistics.
- include_columnslist[str] | None
if specified, only return statistics for the column names provided. Defaults to None, and returns all non-hats columns.
- include_statslist[str] | None
if specified, only return the kinds of values from list (min_value, max_value, null_count, row_count). Defaults to None, and returns all values.
- multi_indexbool
should the returned frame be created with a multi-index, first on pixel, then on column name? (Default value = False)
- include_pixelslist[HealpixPixel] | None
if specified, only return statistics for the pixels indicated. Defaults to none, and returns all pixels.
- Returns:
- Dataframe
granular statistics
- has_healpix_column()[source]#
Does this catalog’s schema contain a healpix spatial index column?
This is True if either:
there is a value for the
hats_col_healpixproperty, and that string exists as a column name in the pyarrow schemathere is a
_healpix_29column in the pyarrow schema
- Returns:
- bool
if the dataset has a healpix column in the properties
- get_pixel_paths()[source]#
Generate paths to all pixel files.
Pixels will be traversed in “breadth-first” healpix order. If any spatial filters have been applied to this catalog, only those pixels that remain will be included.
- Yields:
- UPath
Universal Pathlib pointing to either an npix directory, or to a single pixel partition data file.
- read_pixel_to_pandas(pixel: hats.pixel_math.HealpixPixel, **kwargs) nested_pandas.NestedFrame[source]#
Read the parquet file(s) for this pixel into a pandas dataframe.
- Parameters:
- pixelHealpixPixel
desired data partition, by healpix pixel
- **kwargs
Additional arguments to pass to pandas read_parquet method
- Returns:
- NestedFrame
Pandas DataFrame with the data from the parquet file(s)