hats.pixel_math.partition_stats#

Utilities for generating and manipulating object count histograms

Functions#

empty_histogram(highest_order)

Use numpy to create an histogram array with the right shape, filled with zeros.

generate_histogram(data, highest_order[, ra_column, ...])

Generate a histogram of counts for objects found in data

generate_alignment(histogram[, highest_order, ...])

Generate alignment from high order pixels to those of equal or lower order

_get_alignment(nested_sums, highest_order, ...)

Method to aggregate pixels up to the threshold.

_get_alignment_dropping_siblings(nested_sums, ...)

Method to aggregate pixels up to the threshold that collapses completely empty pixels away.

Module Contents#

empty_histogram(highest_order)[source]#

Use numpy to create an histogram array with the right shape, filled with zeros.

Parameters:

highest_order (int) – the highest healpix order (e.g. 0-10)

Returns:

one-dimensional numpy array of long integers, where the length is equal to the number of pixels in a healpix map of target order, and all values are set to 0.

generate_histogram(data: pandas.DataFrame, highest_order, ra_column='ra', dec_column='dec')[source]#

Generate a histogram of counts for objects found in data

Parameters:
  • data (pd.DataFrame) – tabular object data

  • highest_order (int) – the highest healpix order (e.g. 0-10)

  • ra_column (str) – where in the input to find the celestial coordinate, right ascension

  • dec_column (str) – where in the input to find the celestial coordinate, declination

Returns:

one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.

Raises:

ValueError – if the ra_column or dec_column cannot be found in the input file.

generate_alignment(histogram, highest_order=10, lowest_order=0, threshold=1000000, drop_empty_siblings=False)[source]#

Generate alignment from high order pixels to those of equal or lower order

We may initially find healpix pixels at order 10, but after aggregating up to the pixel threshold, some final pixels are order 4 or 7. This method provides a map from pixels at order 10 to their destination pixel. This may be used as an input into later partitioning map reduce steps.

Parameters:
  • histogram (np.array) – one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.

  • highest_order (int) – the highest healpix order (e.g. 5-10)

  • lowest_order (int) – the lowest healpix order (e.g. 1-5). specifying a lowest order constrains the partitioning to prevent spatially large pixels.

  • threshold (int) – the maximum number of objects allowed in a single pixel

  • drop_empty_siblings (bool) – if 3 of 4 pixels are empty, keep only the non-empty pixel

Returns:

one-dimensional numpy array of integer 3-tuples, where the value at each index corresponds to the destination pixel at order less than or equal to the highest_order.

The tuple contains three integers:

  • order of the destination pixel

  • pixel number at the above order

  • the number of objects in the pixel

Raises:

ValueError – if the histogram is the wrong size, or some initial histogram bins exceed threshold.

_get_alignment(nested_sums, highest_order, lowest_order, threshold)[source]#

Method to aggregate pixels up to the threshold.

Checks from low order (large areas), drilling down into higher orders (smaller areas) to find the appropriate order for an area of sky.

_get_alignment_dropping_siblings(nested_sums, highest_order, lowest_order, threshold)[source]#

Method to aggregate pixels up to the threshold that collapses completely empty pixels away.

Checks from higher order (smaller areas) out to lower order (large areas). In this way, we are able to keep spatially isolated areas in pixels of higher order.

This method can be slower than the above _get_alignment method, and so should only be used when the smaller area pixels are desired.

This uses a form of hiearchical agglomeration (building a tree bottom-up). For each cell at order n, we look at the counts in all 4 subcells at order (n+1). We have two numeric values that are easy to compute that we can refer to easily:

  • quad_sum: the total number of counts in this cell

  • quad_max: the largest count within the 4 subcells

Our agglomeration criteria (the conditions under which we collapse) must BOTH be met:

  • total number in cell is less than the global threshold (quad_sum <= threshold)

  • more than one subcell contains values (quad_sum != quad_max) (if exactly 1 subcell contains counts, then all of the quad_sum will come from that single quad_max)

Inversely, we will NOT collapse when EITHER is true:

  • total number in cell is greater than the threshold

  • only one subcell contains values