{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "efcfc806-08c7-4ee5-894c-92b6235ebe91",
   "metadata": {},
   "source": [
    "## Run cell-based matching\n",
    "\n",
    "This takes as set of input metadetect catalogs and runs matching using the cell-based `ShearMatch` matcher."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b9fb907-ce60-43d5-a06a-8d7d75291ec2",
   "metadata": {},
   "source": [
    "#### Standard imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3fe89f9-cf35-486c-b15b-0bd9566a64d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hpmcm\n",
    "import glob\n",
    "import os\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed7edf2e-e25e-4b28-b5ea-8dd0e806566c",
   "metadata": {},
   "source": [
    "#### Set up the configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0ceddec-40d2-42d4-98a3-88b88dcdbd81",
   "metadata": {},
   "outputs": [],
   "source": [
    "DATADIR = \"test_data\"   # Input data directory\n",
    "shear_st = \"0p01\"       # Applied shear as a string\n",
    "shear = 0.01            # Decimal version of applied shear\n",
    "shear_type = \"wmom\"     # which object characterization to use \n",
    "tract = 10463           # which tract to study\n",
    "\n",
    "SOURCE_TABLEFILES = sorted(glob.glob(os.path.join(DATADIR, f\"shear_{shear_type}_{shear_st}_uncleaned_{tract}_*.pq\")))\n",
    "SOURCE_TABLEFILES.reverse()\n",
    "VISIT_IDS = np.arange(len(SOURCE_TABLEFILES))\n",
    "\n",
    "PIXEL_R2CUT = 4.         # Cut at distance**2 = 4 pixels\n",
    "PIXEL_MATCH_SCALE = 1    # Use pixel scale to do matching"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcecff53-e644-4e0a-b568-3061cd81cecd",
   "metadata": {},
   "source": [
    "#### Make the matcher, reduce the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5662f9fc-40b9-4c75-b1a6-a66dec54ff43",
   "metadata": {},
   "outputs": [],
   "source": [
    "matcher = hpmcm.ShearMatch.createShearMatch(pixelR2Cut=PIXEL_R2CUT, pixelMatchScale=PIXEL_MATCH_SCALE, deshear=-1*shear)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f2b49c5-36cd-47a6-9480-c207fadfe552",
   "metadata": {},
   "outputs": [],
   "source": [
    "matcher.reduceData(SOURCE_TABLEFILES, VISIT_IDS)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22d7092b-b4a2-4f43-bd46-5c8c04c708e3",
   "metadata": {},
   "source": [
    "#### This should have made 200 x 200 cells"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4508b9e-ee8e-4d44-94c5-21dd594b49a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "matcher.n_cell"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eb110bf-913a-4292-9c90-d5a8191aa01b",
   "metadata": {},
   "source": [
    "#### Run the data\n",
    "\n",
    "Note the option to run all the cells.  By default we only run a small subset for testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70f180a5-a400-49c7-97c3-f5bac3204da5",
   "metadata": {},
   "outputs": [],
   "source": [
    "do_partial = True\n",
    "if do_partial:\n",
    "    x_range = range(50, 70)\n",
    "    y_range = range(170, 190)\n",
    "    #xRange = [55]\n",
    "    #yRange = [170]\n",
    "    matcher.analysisLoop(x_range, y_range)\n",
    "else:\n",
    "    matcher.analysisLoop()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46e95402-fbda-423e-83ec-c7d2adb60100",
   "metadata": {},
   "source": [
    "#### Show the source counts map for a single cell\n",
    "\n",
    "The x and y axes here are the in the cell frame.\n",
    "The color scale shows the number of sources per/pixel.\n",
    "The analysis looks for clusters of adjacent pixels with counts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "791146c8-ec2d-493e-b18a-3907565138d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "cell = matcher.cell_dict[matcher.getCellIdx(50, 170)]\n",
    "od = cell.analyze(None, 4)\n",
    "_ = plt.imshow(od['counts_map'], origin='lower')\n",
    "_ = plt.colorbar(label=\"n sources / pixel\")\n",
    "_ = plt.xlabel(r\"$x_{\\rm cell}$ [pixels]\")\n",
    "_ = plt.ylabel(r\"$y_{\\rm cell}$ [pixels]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dab7b77-96f0-4fe2-96fb-0284ce8f0b07",
   "metadata": {},
   "source": [
    "#### Show a single cluster\n",
    "\n",
    "The x and y axes here are the in the cluster frame for a single cluster.\n",
    "The color scale shows the number of sources per/pixel.\n",
    "\n",
    "The `x` markers are the original source postions.   The `o` makters are the deshear positions.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bb87ecf-0aa7-4006-8e8b-8eaafcf3a8db",
   "metadata": {},
   "outputs": [],
   "source": [
    "cluster = list(cell.cluster_dict.values())[0]\n",
    "fig = hpmcm.viz_utils.showCluster(od['image'], cluster, cell)\n",
    "_ = fig.axes[0].set_xlim(-1, 1)\n",
    "_ = fig.axes[0].set_ylim(0, 2)\n",
    "_ = fig.axes[0].set_xlabel(r\"$x_{\\rm cluster}$ [pixels]\")\n",
    "_ = fig.axes[0].set_ylabel(r\"$y_{\\rm cluster}$ [pixels]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66548e0b-89a1-4b54-9ade-4bf7401e2079",
   "metadata": {},
   "source": [
    "#### Extract the output of the matching\n",
    "\n",
    "There are a few empty cells to play around with the output data.\n",
    "\n",
    "`stats` and `shear_stats` are both tuples of pandas.DataFrame "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22dbca58-4278-43c0-b3bf-751f47a093ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "stats = matcher.extractStats()\n",
    "shear_stats = matcher.extractShearStats()\n",
    "obj_shear = shear_stats[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ddabe9fb-427f-4899-b5de-816abacbea45",
   "metadata": {},
   "outputs": [],
   "source": [
    "stats[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf3ec9b9-f3b5-4aed-acb5-a2465510d32e",
   "metadata": {},
   "source": [
    "#### Get the offsets between the cluster centroid and the sources\n",
    "\n",
    "This is to check that the deshearing is correctly applied"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b9466f5-a16f-401d-b91f-2096ba841655",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_offsets(matcher):\n",
    "    n = 0\n",
    "    dd = {\n",
    "        0:dict(dx=[], dy=[], x=[], y=[]), \n",
    "        1:dict(dx=[], dy=[], x=[], y=[]), \n",
    "        2:dict(dx=[], dy=[], x=[], y=[]), \n",
    "        3:dict(dx=[], dy=[], x=[], y=[]), \n",
    "        4:dict(dx=[], dy=[], x=[], y=[]), \n",
    "    }\n",
    "    for cellData in matcher.cell_dict.values():\n",
    "        n += len(cellData.data[0])\n",
    "        for obj in cellData.object_dict.values():\n",
    "            if not obj.n_unique == 5 and obj.n_src == 5:\n",
    "                continue\n",
    "            for iCat in range(5):\n",
    "                mask = obj.catalog_id == iCat\n",
    "                if mask.sum() == 0:\n",
    "                    continue\n",
    "                for dx, dy in zip((obj.x_pix[mask] - obj.x_cent), (obj.y_pix[mask] - obj.y_cent)):\n",
    "                    dd[iCat][\"dx\"].append(dx)\n",
    "                    dd[iCat][\"dy\"].append(dy)\n",
    "                    dd[iCat][\"x\"].append(float(obj.data[mask].iloc[0].x_cell))\n",
    "                    dd[iCat][\"y\"].append(float(obj.data[mask].iloc[0].y_cell))\n",
    "\n",
    "    for i in range(5):\n",
    "        dd[i]['dx'] = np.array(dd[i]['dx'])\n",
    "        dd[i]['dy'] = np.array(dd[i]['dy'])\n",
    "        dd[i]['x'] = np.array(dd[i]['x'])\n",
    "        dd[i]['y'] = np.array(dd[i]['y'])\n",
    "    print(n)\n",
    "    return dd                  \n",
    "                    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fe094c7-efeb-49d0-b08d-0b1823bcedad",
   "metadata": {},
   "outputs": [],
   "source": [
    "dd = get_offsets(matcher)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4c85519-e236-49b9-b068-ebf8aa1571b9",
   "metadata": {},
   "source": [
    "#### Plots the residuals, they should be flat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dadfaaa4-11ac-4418-b802-5cd0c5320c0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = plt.scatter(dd[4]['x'], dd[4]['dx'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "552fbfde-e913-48ba-b81e-160e22571e9b",
   "metadata": {},
   "source": [
    "#### Look at how the sources lie within the cells"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3b637c4-290d-4ef4-a592-68ad8146a4d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = plt.hist(matcher.full_data[0].x_cell_coadd, bins=np.linspace(-100, 100, 201))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53ecf906-9d0b-403a-8c8e-240b3e3ff1d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = plt.hist(matcher.full_data[0].y_cell_coadd, bins=np.linspace(-100, 100, 201))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92c67824-dc58-42fe-b787-01048fa4a7d0",
   "metadata": {},
   "source": [
    "#### Classify the objects by match type\n",
    "\n",
    "This looks at the characteristics of the matched objects and categorizes them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79c084b7-923e-41e3-813a-411a53ecdcad",
   "metadata": {},
   "outputs": [],
   "source": [
    "obj_lists = hpmcm.classify.classifyObjects(matcher, snr_cut=10.)\n",
    "hpmcm.classify.printObjectTypes(obj_lists)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fa170d6-b454-4e1e-929b-daf63b773daf",
   "metadata": {},
   "source": [
    "#### Measure the matching efficiency for objects above the SNRCut"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24bdd872-fad4-467a-950e-68e6a0998175",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_good = len(obj_lists['ideal'])\n",
    "bad_list = ['edge_mixed', 'edge_missing', 'edge_extra', 'orphan', 'missing', 'two_missing', 'many_missing', 'extra', 'caught']\n",
    "n_bad = np.sum([len(obj_lists[x]) for x in bad_list])\n",
    "effic = n_good/(n_good+n_bad)\n",
    "effic_err = np.sqrt(effic*(1-effic)/(n_good+n_bad))\n",
    "print(f\"Effic: {effic:.5} +- {effic_err:.5f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65c36b4b-901a-4e21-9196-d90f5645a055",
   "metadata": {},
   "source": [
    "#### Classify the clusters by match type\n",
    "\n",
    "This looks at the characteristics of the matched cluster and categorizes them.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee22a960-3193-4d5f-bb76-0a6aa193784b",
   "metadata": {},
   "outputs": [],
   "source": [
    "cluster_lists = hpmcm.classify.classifyClusters(matcher, snr_cut=10.)\n",
    "hpmcm.classify.printClusterTypes(cluster_lists)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1552d538-dcc1-49ff-a865-685283a9fccf",
   "metadata": {},
   "source": [
    "#### Display a few objects\n",
    "\n",
    "The various markers show the sources from different shear catalogs:  `ns=.`, `1m = <`, `1p = >`, `2m = ^`, `2p = v`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96301ea4-49bc-4109-8f19-726f9ccd3e27",
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = hpmcm.viz_utils.showShearObjs(matcher, cluster_lists['ideal'][5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cb7f95b-122f-43d4-918c-0de1f5a620ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = hpmcm.viz_utils.showShearObj(matcher, obj_lists['many_missing'][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e6b438f-8256-4173-9b0a-70ff89335084",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a480019-b559-46eb-bb58-a357c550361b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fcf73b8-8152-45c4-b022-8906883d4755",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}