hats.pixel_math.partition_stats#
Utilities for generating and manipulating object count histograms
Functions#
|
Use numpy to create an histogram array with the right shape, filled with zeros. |
|
Generate a histogram of counts for objects found in data |
|
Generate alignment from high order pixels to those of equal or lower order |
|
Method to aggregate pixels up to the threshold. |
|
Method to aggregate pixels up to the threshold that collapses completely empty pixels away. |
Module Contents#
- empty_histogram(highest_order)[source]#
Use numpy to create an histogram array with the right shape, filled with zeros.
- Parameters:
highest_order (int) – the highest healpix order (e.g. 0-10)
- Returns:
one-dimensional numpy array of long integers, where the length is equal to the number of pixels in a healpix map of target order, and all values are set to 0.
- generate_histogram(data: pandas.DataFrame, highest_order, ra_column='ra', dec_column='dec')[source]#
Generate a histogram of counts for objects found in data
- Parameters:
data (
pd.DataFrame
) – tabular object datahighest_order (int) – the highest healpix order (e.g. 0-10)
ra_column (str) – where in the input to find the celestial coordinate, right ascension
dec_column (str) – where in the input to find the celestial coordinate, declination
- Returns:
one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.
- Raises:
ValueError – if the ra_column or dec_column cannot be found in the input file.
- generate_alignment(histogram, highest_order=10, lowest_order=0, threshold=1000000, drop_empty_siblings=False)[source]#
Generate alignment from high order pixels to those of equal or lower order
We may initially find healpix pixels at order 10, but after aggregating up to the pixel threshold, some final pixels are order 4 or 7. This method provides a map from pixels at order 10 to their destination pixel. This may be used as an input into later partitioning map reduce steps.
- Parameters:
histogram (
np.array
) – one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.highest_order (int) – the highest healpix order (e.g. 5-10)
lowest_order (int) – the lowest healpix order (e.g. 1-5). specifying a lowest order constrains the partitioning to prevent spatially large pixels.
threshold (int) – the maximum number of objects allowed in a single pixel
drop_empty_siblings (bool) – if 3 of 4 pixels are empty, keep only the non-empty pixel
- Returns:
one-dimensional numpy array of integer 3-tuples, where the value at each index corresponds to the destination pixel at order less than or equal to the highest_order.
The tuple contains three integers:
order of the destination pixel
pixel number at the above order
the number of objects in the pixel
- Raises:
ValueError – if the histogram is the wrong size, or some initial histogram bins exceed threshold.
- _get_alignment(nested_sums, highest_order, lowest_order, threshold)[source]#
Method to aggregate pixels up to the threshold.
Checks from low order (large areas), drilling down into higher orders (smaller areas) to find the appropriate order for an area of sky.
- _get_alignment_dropping_siblings(nested_sums, highest_order, lowest_order, threshold)[source]#
Method to aggregate pixels up to the threshold that collapses completely empty pixels away.
Checks from higher order (smaller areas) out to lower order (large areas). In this way, we are able to keep spatially isolated areas in pixels of higher order.
This method can be slower than the above _get_alignment method, and so should only be used when the smaller area pixels are desired.
This uses a form of hiearchical agglomeration (building a tree bottom-up). For each cell at order n, we look at the counts in all 4 subcells at order (n+1). We have two numeric values that are easy to compute that we can refer to easily:
quad_sum: the total number of counts in this cell
quad_max: the largest count within the 4 subcells
Our agglomeration criteria (the conditions under which we collapse) must BOTH be met:
total number in cell is less than the global threshold (quad_sum <= threshold)
more than one subcell contains values (quad_sum != quad_max) (if exactly 1 subcell contains counts, then all of the quad_sum will come from that single quad_max)
Inversely, we will NOT collapse when EITHER is true:
total number in cell is greater than the threshold
only one subcell contains values