hpmcm.match module

class hpmcm.match.Match(**kwargs)[source]

Bases: object

Class to do N-way matching

Uses a provided WCS to define a Skymap that covers the full region begin matched.

Uses that WCS to assign pixel locations to all sources in the input catalogs

Iterates over cells and does source clustering in each cell using Footprint detection on a Skymap of source counts per pixel.

Assigns each input source to a cluster.

At that stage the clusters are not the final product as they can include more than one soruce from a given catalog.

Loops over clusters and processes each cluster to resolve confusion.

If there is not a unqiue source per-catalog redo the clustering with half-size pixels to try to split the cluster (down to minimum pixel scale)

Parameters:: kwargs (Any)

pix_size

Pixel size in arcseconds

Type:: float

n_pix_side

Number of pixels in the match region

Type:: int

cell_size

Number of pixels in a Cell

Type:: int

cell_buffer

Number of overlapping pixel in a Cell

Type:: int

cell_max_object

Max number of objects in a cell, used to make unique IDs

Type:: int

max_sub_division

Maximum number of cell sub-divisions

Type:: int

pixel_r2_cut

Distance cut for Object membership, in pixels**2

Type:: float

n_cell

Number of cells in match region

Type:: np.ndarray

full_data

Full input DataFrames

Type:: list[DataFrame]

red_data

Reduced DataFrames with only the columns needed for matching

Type:: list[DataFrame]

cell_dict

Dictionary providing access to cell data

Type:: OrderedDict[int, CellData]

Notes

This expectes a list of parquet files with pandas DataFrames. The expected columns depend on which sub-class of Match is being used.

Four output tables are produced:

Key	Class
_cluster_assoc	`hpmcm.output_tables.ClusterAssocTable`
_cluster_stats	`hpmcm.output_tables.ClusterStatsTable`
_object_assoc	`hpmcm.output_tables.ObjectAssocTable`
_object_stats	`hpmcm.output_tables.ObjectStatsTable`

analysisLoop(x_range=None, y_range=None)[source]

Does matching for all cells.

This stores the results, but does not write or return them.

Return type:

None

Parameters:

x_range (Iterable | None) – Range of cells to analysze in X. None -> Entire range.
y_range (Iterable | None) – Range of cells to analysis in Y. None -> Entire range.

analyzeCell(ix, iy, full_data=False)[source]

Analyze a single cell

Return type:

dict | None

Parameters:

ix (int) – Cell index in x-coord
iy (int) – Cell index in y-coord
full_data (bool)

Return type:

Output of cell analysis

Notes

cell_data : CellData : The analysis data for the Cell

image : np.ndarray : Image of cell source counts map

countsMap : np.ndarray : Numpy array with cell source counts

clusters : FootprintSet : Clusters as dectected by finding FootprintSet on source counts map

clusterKey : np.ndarray : Map of cell with pixels filled with index of associated Footprints

Notes

If full_data is False, only cell_data will be returned

extraCols: list[str] = []

extractStats()[source]

Extracts cluster statisistics

Return type:

list[DataFrame]

Returns:

DataFrames with matching info,
hpmcm.output_tables.ClusterAssocTable,
hpmcm.output_tables.ObjectAssocTable.
hpmcm.output_tables.ClusterStatsTable.
hpmcm.output_tables.ObjectStatsTable.

getCellIdx(ix, iy)[source]

Get the Index to use for a given cell

Return type:

int

Parameters:

ix (int)
iy (int)

getCluster(i_k)[source]

Get a particular cluster

Return type:: ClusterData
Parameters:: i_k (tuple[int, int]) – CellId, ClusterId
Return type:: Requested cluster

getIdOffset(ix, iy)[source]

Get the ID offset to use for a given cell

Return type:

int

Parameters:

ix (int)
iy (int)

getObject(i_k)[source]

Get a particular object

Return type:: ObjectData
Parameters:: i_k (tuple[int, int]) – CellId, ObjectId
Return type:: Requested object

inputTableClass: alias of SourceTable

pixToArcsec()[source]

Convert pixel size (in degrees) to arcseconds

Return type:: float

pixToWorld(x_pix, y_pix)[source]

Convert local coords in pixels to world coordinates (RA, DEC)

Return type:

tuple[ndarray, ndarray]

Parameters:

x_pix (ndarray)
y_pix (ndarray)

reduceData(input_files, catalog_id)[source]

Read input files and filter out only the columns we need

Each input file should have an associated catalog_id. This is used to test if we have more than one-source per input catalog.

If the inputs files have a pre-defined ID associated with them that can be used. Otherwise it is fine just to give a range from 0 to nInputs.

Return type:

None

Parameters:

input_files (list[str])
catalog_id (list[int])

reduceDataFrame(df)[source]

Reduce a single input DataFrame

Return type:: DataFrame
Parameters:: df (DataFrame) – Input data frame
Return type:: Reduced DataFrame