hpmcm.match module

class hpmcm.match.Match(**kwargs)[source]

Bases: object

Class to do N-way matching

Uses a provided WCS to define a Skymap that covers the full region begin matched.

Uses that WCS to assign pixel locations to all sources in the input catalogs

Iterates over cells and does source clustering in each cell using Footprint detection on a Skymap of source counts per pixel.

Assigns each input source to a cluster.

At that stage the clusters are not the final product as they can include more than one soruce from a given catalog.

Loops over clusters and processes each cluster to resolve confusion.

If there is not a unqiue source per-catalog redo the clustering with half-size pixels to try to split the cluster (down to minimum pixel scale)

Parameters:

kwargs (Any)

pix_size

Pixel size in arcseconds

Type:

float

n_pix_side

Number of pixels in the match region

Type:

int

cell_size

Number of pixels in a Cell

Type:

int

cell_buffer

Number of overlapping pixel in a Cell

Type:

int

cell_max_object

Max number of objects in a cell, used to make unique IDs

Type:

int

max_sub_division

Maximum number of cell sub-divisions

Type:

int

pixel_r2_cut

Distance cut for Object membership, in pixels**2

Type:

float

n_cell

Number of cells in match region

Type:

np.ndarray

full_data

Full input DataFrames

Type:

list[DataFrame]

red_data

Reduced DataFrames with only the columns needed for matching

Type:

list[DataFrame]

cell_dict

Dictionary providing access to cell data

Type:

OrderedDict[int, CellData]

Notes

This expectes a list of parquet files with pandas DataFrames. The expected columns depend on which sub-class of Match is being used.

Four output tables are produced:

Key

Class

_cluster_assoc

hpmcm.output_tables.ClusterAssocTable

_cluster_stats

hpmcm.output_tables.ClusterStatsTable

_object_assoc

hpmcm.output_tables.ObjectAssocTable

_object_stats

hpmcm.output_tables.ObjectStatsTable

analysisLoop(x_range=None, y_range=None)[source]

Does matching for all cells.

This stores the results, but does not write or return them.

Return type:

None

Parameters:
  • x_range (Iterable | None) – Range of cells to analysze in X. None -> Entire range.

  • y_range (Iterable | None) – Range of cells to analysis in Y. None -> Entire range.

analyzeCell(ix, iy, full_data=False)[source]

Analyze a single cell

Return type:

dict | None

Parameters:
  • ix (int) – Cell index in x-coord

  • iy (int) – Cell index in y-coord

  • full_data (bool)

Return type:

Output of cell analysis

Notes

cell_data : CellData : The analysis data for the Cell

image : np.ndarray : Image of cell source counts map

countsMap : np.ndarray : Numpy array with cell source counts

clusters : FootprintSet : Clusters as dectected by finding FootprintSet on source counts map

clusterKey : np.ndarray : Map of cell with pixels filled with index of associated Footprints

Notes

If full_data is False, only cell_data will be returned

extraCols: list[str] = []
extractStats()[source]

Extracts cluster statisistics

Return type:

list[DataFrame]

Returns:

getCellIdx(ix, iy)[source]

Get the Index to use for a given cell

Return type:

int

Parameters:
  • ix (int)

  • iy (int)

getCluster(i_k)[source]

Get a particular cluster

Return type:

ClusterData

Parameters:

i_k (tuple[int, int]) – CellId, ClusterId

Return type:

Requested cluster

getIdOffset(ix, iy)[source]

Get the ID offset to use for a given cell

Return type:

int

Parameters:
  • ix (int)

  • iy (int)

getObject(i_k)[source]

Get a particular object

Return type:

ObjectData

Parameters:

i_k (tuple[int, int]) – CellId, ObjectId

Return type:

Requested object

inputTableClass

alias of SourceTable

pixToArcsec()[source]

Convert pixel size (in degrees) to arcseconds

Return type:

float

pixToWorld(x_pix, y_pix)[source]

Convert local coords in pixels to world coordinates (RA, DEC)

Return type:

tuple[ndarray, ndarray]

Parameters:
  • x_pix (ndarray)

  • y_pix (ndarray)

reduceData(input_files, catalog_id)[source]

Read input files and filter out only the columns we need

Each input file should have an associated catalog_id. This is used to test if we have more than one-source per input catalog.

If the inputs files have a pre-defined ID associated with them that can be used. Otherwise it is fine just to give a range from 0 to nInputs.

Return type:

None

Parameters:
  • input_files (list[str])

  • catalog_id (list[int])

reduceDataFrame(df)[source]

Reduce a single input DataFrame

Return type:

DataFrame

Parameters:

df (DataFrame) – Input data frame

Return type:

Reduced DataFrame