Run cell-based matching
This takes as set of input metadetect catalogs and runs matching using the cell-based ShearMatch matcher.
Standard imports
[1]:
import hpmcm
import glob
import os
import numpy as np
import matplotlib.pyplot as plt
Set up the configuration
[2]:
DATADIR = "test_data" # Input data directory
shear_st = "0p01" # Applied shear as a string
shear = 0.01 # Decimal version of applied shear
shear_type = "wmom" # which object characterization to use
tract = 10463 # which tract to study
SOURCE_TABLEFILES = sorted(glob.glob(os.path.join(DATADIR, f"shear_{shear_type}_{shear_st}_uncleaned_{tract}_*.pq")))
SOURCE_TABLEFILES.reverse()
VISIT_IDS = np.arange(len(SOURCE_TABLEFILES))
PIXEL_R2CUT = 4. # Cut at distance**2 = 4 pixels
PIXEL_MATCH_SCALE = 1 # Use pixel scale to do matching
Make the matcher, reduce the data
[3]:
matcher = hpmcm.ShearMatch.createShearMatch(pixelR2Cut=PIXEL_R2CUT, pixelMatchScale=PIXEL_MATCH_SCALE, deshear=-1*shear)
[4]:
matcher.reduceData(SOURCE_TABLEFILES, VISIT_IDS)
This should have made 200 x 200 cells
[5]:
matcher.n_cell
[5]:
array([200., 200.])
Run the data
Note the option to run all the cells. By default we only run a small subset for testing
[6]:
do_partial = True
if do_partial:
x_range = range(50, 70)
y_range = range(170, 190)
#xRange = [55]
#yRange = [170]
matcher.analysisLoop(x_range, y_range)
else:
matcher.analysisLoop()
50!
......... 60!
......... Done!
Show the source counts map for a single cell
The x and y axes here are the in the cell frame. The color scale shows the number of sources per/pixel. The analysis looks for clusters of adjacent pixels with counts.
[7]:
cell = matcher.cell_dict[matcher.getCellIdx(50, 170)]
od = cell.analyze(None, 4)
_ = plt.imshow(od['counts_map'], origin='lower')
_ = plt.colorbar(label="n sources / pixel")
_ = plt.xlabel(r"$x_{\rm cell}$ [pixels]")
_ = plt.ylabel(r"$y_{\rm cell}$ [pixels]")
Show a single cluster
The x and y axes here are the in the cluster frame for a single cluster. The color scale shows the number of sources per/pixel.
The x markers are the original source postions. The o makters are the deshear positions.
[8]:
cluster = list(cell.cluster_dict.values())[0]
fig = hpmcm.viz_utils.showCluster(od['image'], cluster, cell)
_ = fig.axes[0].set_xlim(-1, 1)
_ = fig.axes[0].set_ylim(0, 2)
_ = fig.axes[0].set_xlabel(r"$x_{\rm cluster}$ [pixels]")
_ = fig.axes[0].set_ylabel(r"$y_{\rm cluster}$ [pixels]")
Extract the output of the matching
There are a few empty cells to play around with the output data.
stats and shear_stats are both tuples of pandas.DataFrame
[9]:
stats = matcher.extractStats()
shear_stats = matcher.extractShearStats()
obj_shear = shear_stats[1]
[10]:
stats[0]
[10]:
| cluster_id | source_id | source_idx | catalog_id | distance | cell_idx | |
|---|---|---|---|---|---|---|
| 0 | 10170004 | 165811 | 0 | 0 | 0.008754 | 10170 |
| 1 | 10170004 | 165615 | 0 | 1 | 0.052315 | 10170 |
| 2 | 10170004 | 165601 | 0 | 2 | 0.051875 | 10170 |
| 3 | 10170004 | 165706 | 1 | 3 | 0.044073 | 10170 |
| 4 | 10170004 | 165350 | 0 | 4 | 0.038651 | 10170 |
| ... | ... | ... | ... | ... | ... | ... |
| 33 | 13989002 | 203175 | 7 | 0 | 0.002944 | 13989 |
| 34 | 13989002 | 202967 | 7 | 1 | 0.031898 | 13989 |
| 35 | 13989002 | 202882 | 6 | 2 | 0.029662 | 13989 |
| 36 | 13989002 | 203019 | 7 | 3 | 0.018704 | 13989 |
| 37 | 13989002 | 202591 | 6 | 4 | 0.017916 | 13989 |
12932 rows × 6 columns
Get the offsets between the cluster centroid and the sources
This is to check that the deshearing is correctly applied
[11]:
def get_offsets(matcher):
n = 0
dd = {
0:dict(dx=[], dy=[], x=[], y=[]),
1:dict(dx=[], dy=[], x=[], y=[]),
2:dict(dx=[], dy=[], x=[], y=[]),
3:dict(dx=[], dy=[], x=[], y=[]),
4:dict(dx=[], dy=[], x=[], y=[]),
}
for cellData in matcher.cell_dict.values():
n += len(cellData.data[0])
for obj in cellData.object_dict.values():
if not obj.n_unique == 5 and obj.n_src == 5:
continue
for iCat in range(5):
mask = obj.catalog_id == iCat
if mask.sum() == 0:
continue
for dx, dy in zip((obj.x_pix[mask] - obj.x_cent), (obj.y_pix[mask] - obj.y_cent)):
dd[iCat]["dx"].append(dx)
dd[iCat]["dy"].append(dy)
dd[iCat]["x"].append(float(obj.data[mask].iloc[0].x_cell))
dd[iCat]["y"].append(float(obj.data[mask].iloc[0].y_cell))
for i in range(5):
dd[i]['dx'] = np.array(dd[i]['dx'])
dd[i]['dy'] = np.array(dd[i]['dy'])
dd[i]['x'] = np.array(dd[i]['x'])
dd[i]['y'] = np.array(dd[i]['y'])
print(n)
return dd
[12]:
dd = get_offsets(matcher)
2598
Plots the residuals, they should be flat
[13]:
_ = plt.scatter(dd[4]['x'], dd[4]['dx'])
Look at how the sources lie within the cells
[14]:
_ = plt.hist(matcher.full_data[0].x_cell_coadd, bins=np.linspace(-100, 100, 201))
[15]:
_ = plt.hist(matcher.full_data[0].y_cell_coadd, bins=np.linspace(-100, 100, 201))
Classify the objects by match type
This looks at the characteristics of the matched objects and categorizes them.
[16]:
obj_lists = hpmcm.classify.classifyObjects(matcher, snr_cut=10.)
hpmcm.classify.printObjectTypes(obj_lists)
All Objects: 2767
cut 1 374
cut 2 38
Used: 2355
good (n source from n catalogs): 1031
good faint 1133
faint (< n sources, snr < cut): 181
mixed (n source from < n catalogs): 0
edge_mixed (mixed near edge of cell): 0
edge_missing (< n sources, near edge of cell): 0
edge_extra (> n sources, near edge of cell): 0
faint (< n sources, snr < cut): 181
orphan (split off from larger cluster 7
one missing (n-1 sources, not near edge): 0
two missing (n-2 sources, not near edge): 0
many missing (< n-2 sources, not near edge): 3
extra (> n sources, not near edge): 0
Measure the matching efficiency for objects above the SNRCut
[17]:
n_good = len(obj_lists['ideal'])
bad_list = ['edge_mixed', 'edge_missing', 'edge_extra', 'orphan', 'missing', 'two_missing', 'many_missing', 'extra', 'caught']
n_bad = np.sum([len(obj_lists[x]) for x in bad_list])
effic = n_good/(n_good+n_bad)
effic_err = np.sqrt(effic*(1-effic)/(n_good+n_bad))
print(f"Effic: {effic:.5} +- {effic_err:.5f}")
Effic: 0.99039 +- 0.00302
Classify the clusters by match type
This looks at the characteristics of the matched cluster and categorizes them.
[18]:
cluster_lists = hpmcm.classify.classifyClusters(matcher, snr_cut=10.)
hpmcm.classify.printClusterTypes(cluster_lists)
All Clusters: 2745
cut 1 372
cut 2 38
Used: 2335
good (n source from n catalogs): 1031
good faint 1135
faint (< n sources, snr < cut): 165
mixed (n source from < n catalogs): 0
edge_mixed (mixed near edge of cell): 0
edge_missing (< n sources, near edge of cell): 0
edge_extra (> n sources, near edge of cell): 0
faint (< n sources, snr < cut): 165
one missing (n-1 sources, not near edge): 0
two missing (n-2 sources, not near edge): 0
many missing (< n-2 sources, not near edge): 3
extra (> n sources, not near edge): 1
Display a few objects
The various markers show the sources from different shear catalogs: ns=., 1m = <, 1p = >, 2m = ^, 2p = v.
[19]:
_ = hpmcm.viz_utils.showShearObjs(matcher, cluster_lists['ideal'][5])
[20]:
_ = hpmcm.viz_utils.showShearObj(matcher, obj_lists['many_missing'][0])
[ ]:
[ ]:
[ ]: