EPIC XMM-Newton Observation Data (EXOD) Documentation

This is the documentation for the EPIC XMM-Newton Observation Data (EXOD) project.

EXOD is designed for the detection of rapid transients in archival XMM-Newton data.

Contact: Norman Khan & Erwan Quintin (2024)

Norman.Khan@irap.omp.eu

Erwan.Quintin@irap.omp.eu

Installation

git clone https://github.com/nx1/EXOD2
cd EXOD
pip install -e .
Then set the 'EXOD' enviroment in your .bashrc to point to this repo i.e.
export EXOD='/home/{username}/EXOD2'

Running

The main script for the pipeline is found in EXOD2/exod/main.py This will run over all the obsids specified in EXOD2/data/observations.txt and perform the transient search, the output is then saved in /data/results/obsid/

Full Module Listing

`post_processing`

`cluster_regions`

Cluster regions in a DataFrame of regions by clustering sources within a certain radius using a K-D tree.

An example of how this works is provided below:

Input Regions (cartesian coords) xyz = [[0,0,0], # 0 [0,0,1], # 1 [0,1,0], # 2 [5,0,0], # 3 [4,0,0], # 4 [0,2,0], # 5 [0,3,0], # 6

cluster[0] = [0,1,3] --> cluster_num = 0 | mean position = 0,0,0 + 0,0,1 + 0,1,0 = 0.00, 0.33, 0.33 cluster[1] = [0,1,3] --> cluster_num = 0 | mean position = 0,0,0 + 0,0,1 + 0,1,0 = 0.00, 0.33, 0.33 cluster[2] = [1,3,6] --> cluster_num = 1 | mean position = 0,0,1 + 5,0,0 + 0,3,0 = 1.66, 1.00, 0.33 cluster[3] = [1,3,6] --> cluster_num = 1 | mean position = 0,0,1 + 5,0,0 + 0,3,0 = 1.66, 1.00, 0.33

Row 1 is both in cluster_num=1 and cluster_num=2, which one should we label it to? The closest mean position of course!

xyz[1] = [0,0,1]

distance.euclidean([0,0,1], [0.00, 0.33, 0.33]) = 0.75 (lowest!) distance.euclidean([0,0,1], 1.66, 1.00, 0.33) = 2.05

Row 1 therefore gets associated with cluster num 0

unique_regions = OrderedDict([((0,) : 0), ((1, 7) : 1), ((2, 8) : 2), ((3, 9) : 3), ((4, 10, 13, 16, 18) : 4), ((5, 11, 14, 17, 19, 27) : 5), ((6,) : 6), ((12,) : 7), ((15,) : 8), ((20, 23) : 9),

region_to_clusters = defaultdict(list, {0: [[0]], 1: [[1, 7], [1, 7]], 7: [[1, 7], [1, 7]], 2: [[2, 8], [2, 8]], 8: [[2, 8], [2, 8]], 3: [[3, 9], [3, 9]], 9: [[3, 9], [3, 9]], 4: [[4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18]], ... }

cluster_xyz_means = {(0,) : array([ 0.71092614, 0.12281395, -0.69245994]), (1, 7) : array([ 0.71165566, 0.12214711, -0.69182823]), (2, 8) : array([ 0.71478684, 0.1221836 , -0.68858618]), (3, 9) : array([ 0.71486105, 0.11933079, -0.68900932]), (4, 10, 13, 16, 18) : array([ 0.71399846, 0.11878942, -0.68999657]),

cluster_labels = [0, 1, 2, 3, 4, 5, 6, 1, 2, ...]

`ClusterRegions`

Cluster regions by clustering sources within a certain radius using a K-D tree.

Some definitions

Region Number: The index of the region in the DataFrame. Cluster: A collection of regions, e.g (0,) or (1,7) Cluster Number: The index of the cluster in the list of clusters.

Attributes:

Name	Type	Description
`df_regions`	`DataFrame`	DataFrame containing ra_deg and dec_deg columns.
`clustering_radius`	`Quantity`	Clustering radius in arcseconds.
`xyz`	`ndarray`	Cartesian coordinates of the regions.
`clusters`	`list`	List of lists containing the regions in each cluster.
`cluster_xyz_means`	`dict`	Maps the cluster number to the mean cartesian position of the cluster on the unit sphere.
`region_to_clusters`	`dict`	Maps the region number to the containing the regions associated with each cluster.
`cluster_num_to_cluster`	`dict`	Maps the cluster number to the cluster.
`cluster_to_cluster_num`	`dict`	Maps the cluster to the cluster number. (reverse of cluster_num_to_cluster)
`region_num_to_cluster_num`	`dict`	Maps the region number to the cluster number (unique region number).
`df_regions_unique`	`DataFrame`	DataFrame containing the unique regions.
`n_clusters`	`int`	Number of unique regions.
`save`	`bool`	If True, save the clustered regions to a file.

Source code in exod/post_processing/cluster_regions.py

class ClusterRegions:
    """
    Cluster regions by clustering sources within a certain radius using a K-D tree.

    Some definitions:
        Region Number: The index of the region in the DataFrame.
        Cluster: A collection of regions, e.g (0,) or (1,7)
        Cluster Number: The index of the cluster in the list of clusters.

    Attributes:
        df_regions (pd.DataFrame): DataFrame containing ra_deg and dec_deg columns.
        clustering_radius (astropy.units.Quantity): Clustering radius in arcseconds.
        xyz (np.ndarray): Cartesian coordinates of the regions.
        clusters (list): List of lists containing the regions in each cluster.
        cluster_xyz_means (dict): Maps the cluster number to the mean cartesian position of the cluster on the unit sphere.
        region_to_clusters (dict): Maps the region number to the containing the regions associated with each cluster.
        cluster_num_to_cluster (dict): Maps the cluster number to the cluster.
        cluster_to_cluster_num (dict): Maps the cluster to the cluster number. (reverse of cluster_num_to_cluster)
        region_num_to_cluster_num (dict): Maps the region number to the cluster number (unique region number).
        df_regions_unique (pd.DataFrame): DataFrame containing the unique regions.
        n_clusters (int): Number of unique regions.
        save (bool): If True, save the clustered regions to a file.
    """
    def __init__(self, df_regions, clustering_radius=20 * u.arcsec, save=False):
        self.df_regions = df_regions
        self.clustering_radius = clustering_radius

        self.xyz = []                           # Cartesian coordinates of the regions
        self.clusters = []                      # List of lists containing the regions in each cluster
        self.cluster_xyz_means = []             # Maps the cluster number to the mean cartesian position of the cluster on the unit sphere
        self.region_to_clusters = {}            # Maps the region number to the containing the regions associated with each cluster
        self.cluster_num_to_cluster = {}        # Maps the cluster number to the cluster
        self.cluster_to_cluster_num = {}        # Maps the cluster to the cluster number (reverse of cluster_num_to_cluster)
        self.region_num_to_cluster_num = {}     # Maps the region number to the cluster number (unique region number)
        self.df_regions_unique = pd.DataFrame() # DataFrame containing the unique regions
        self.n_clusters = 0                     # Number of unique regions
        self.save = save

        self.run()

    def number_clusters(self):
        """Create dictionary that maps each unique cluster to its unique cluster number."""
        cluster_to_cluster_num = OrderedDict()
        i = 0
        for j, cluster in enumerate(self.clusters):
            c_t = tuple(cluster)
            if c_t not in cluster_to_cluster_num:
                cluster_to_cluster_num[c_t] = i
                i += 1
        logger.info(f'Initially found {len(cluster_to_cluster_num)} unique region clusters...')
        self.cluster_to_cluster_num = cluster_to_cluster_num
        # Create reverse mapping (cluster number to cluster)
        self.cluster_num_to_cluster = dict((v, k) for k, v in self.cluster_to_cluster_num.items())

    def map_region_num_to_clusters(self):
        """Create dictionary that maps the region number to the cluster."""
        self.region_to_clusters = defaultdict(list)
        for c in self.clusters:
            for region_num in c:
                self.region_to_clusters[region_num].append(c)

    def calc_cluster_xyz_means(self):
        """ Calculate the mean cartesian positions of each cluster."""
        self.cluster_xyz_means = {tuple(cluster): np.mean([self.xyz[region] for region in cluster], axis=0) for cluster in self.clusters}

    def cluster_regions(self):
        """Find unique regions by clustering sources within a certain radius using a K-D tree."""
        self.xyz = ra_dec_to_xyz(self.df_regions['ra_deg'].values * u.deg, self.df_regions['dec_deg'].values * u.deg)
        kd_tree = KDTree(data=self.xyz, leafsize=10, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)
        clustering_radius_rad = self.clustering_radius.to('rad').value
        self.clusters = kd_tree.query_ball_tree(other=kd_tree, r=clustering_radius_rad, p=2.0, eps=0)

    def label_region_num_to_cluster_num(self):
        """Loop over all regions and label them to the closest cluster."""
        cluster_xyz_means = self.cluster_xyz_means
        region_to_clusters = self.region_to_clusters
        cluster_to_cluster_num = self.cluster_to_cluster_num
        xyz = self.xyz

        cluster_labels = []
        for i in range(len(xyz)):
            clusters = region_to_clusters[i]
            if len(clusters) == 1:
                c_t = tuple(clusters[0])
                unique_region_num = cluster_to_cluster_num[c_t]
                cluster_labels.append(unique_region_num)

            elif len(clusters) > 1: # More than one cluster associated with region i
                cluster_distances = {}
                for cluster in region_to_clusters[i]:
                    c_t = tuple(cluster)
                    d = distance.euclidean(xyz[i], cluster_xyz_means[c_t])
                    cluster_distances[c_t] = d

                # Find the closest cluster
                closest_cluster = min(cluster_distances, key=cluster_distances.get)
                cluster_labels.append(cluster_to_cluster_num[closest_cluster])

        self.cluster_labels = cluster_labels
        self.df_regions['cluster_label'] = cluster_labels
        self.n_clusters = len(set(self.cluster_labels))
        logger.info(f'Final number of unique regions = {self.n_clusters}')

    def calc_unique_regions_table(self):
        df_regions_unique = self.df_regions.groupby(['cluster_label'])[['ra_deg', 'dec_deg']].agg('mean')
        df_regions_unique['idxs'] = [list(self.cluster_num_to_cluster[c_num]) for c_num in df_regions_unique.index]
        self.df_regions_unique = df_regions_unique

    def renumber_clusters(self):
        """Renumber the clusters so that they go from 0 to n_clusters."""
        logger.info('Renumbering clusters...')
        old2new = {old:new for new, old in zip(range(self.n_clusters), self.df_regions_unique.index)}
        new2old = {new:old for old, new in old2new.items()}

        self.df_regions_unique.reset_index(drop=True, inplace=True)
        self.df_regions_unique.index.name = 'cluster_label'
        self.df_regions['cluster_label'] = self.df_regions['cluster_label'].map(old2new)
        self.cluster_labels = self.df_regions['cluster_label'].values
        self.cluster_num_to_cluster = {c_num : self.cluster_num_to_cluster[new2old[c_num]] for c_num in range(self.n_clusters)}
        self.cluster_to_cluster_num = {v:k for k,v in self.cluster_num_to_cluster.items()}
        self.region_num_to_cluster_num = {reg:cluster for reg, cluster in zip(self.df_regions.index, self.cluster_labels)}

    def save_unique_regions_table(self):
        if self.save:
            logger.info(f'Saving unique regions table to {savepaths_combined["regions_unique"]}')
            tab_regions_unique = Table.from_pandas(self.df_regions_unique[['ra_deg', 'dec_deg']])
            tab_regions_unique.write(savepaths_combined['regions_unique'], overwrite=True)

    def run(self):
        self.cluster_regions()
        self.number_clusters()
        self.map_region_num_to_clusters()
        self.calc_cluster_xyz_means()
        self.label_region_num_to_cluster_num()
        self.calc_unique_regions_table()
        self.renumber_clusters()
        self.save_unique_regions_table()

`calc_cluster_xyz_means()`

Calculate the mean cartesian positions of each cluster.

Source code in exod/post_processing/cluster_regions.py

def calc_cluster_xyz_means(self):
    """ Calculate the mean cartesian positions of each cluster."""
    self.cluster_xyz_means = {tuple(cluster): np.mean([self.xyz[region] for region in cluster], axis=0) for cluster in self.clusters}

`cluster_regions()`

Find unique regions by clustering sources within a certain radius using a K-D tree.

Source code in exod/post_processing/cluster_regions.py

def cluster_regions(self):
    """Find unique regions by clustering sources within a certain radius using a K-D tree."""
    self.xyz = ra_dec_to_xyz(self.df_regions['ra_deg'].values * u.deg, self.df_regions['dec_deg'].values * u.deg)
    kd_tree = KDTree(data=self.xyz, leafsize=10, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)
    clustering_radius_rad = self.clustering_radius.to('rad').value
    self.clusters = kd_tree.query_ball_tree(other=kd_tree, r=clustering_radius_rad, p=2.0, eps=0)

`label_region_num_to_cluster_num()`

Loop over all regions and label them to the closest cluster.

Source code in exod/post_processing/cluster_regions.py

def label_region_num_to_cluster_num(self):
    """Loop over all regions and label them to the closest cluster."""
    cluster_xyz_means = self.cluster_xyz_means
    region_to_clusters = self.region_to_clusters
    cluster_to_cluster_num = self.cluster_to_cluster_num
    xyz = self.xyz

    cluster_labels = []
    for i in range(len(xyz)):
        clusters = region_to_clusters[i]
        if len(clusters) == 1:
            c_t = tuple(clusters[0])
            unique_region_num = cluster_to_cluster_num[c_t]
            cluster_labels.append(unique_region_num)

        elif len(clusters) > 1: # More than one cluster associated with region i
            cluster_distances = {}
            for cluster in region_to_clusters[i]:
                c_t = tuple(cluster)
                d = distance.euclidean(xyz[i], cluster_xyz_means[c_t])
                cluster_distances[c_t] = d

            # Find the closest cluster
            closest_cluster = min(cluster_distances, key=cluster_distances.get)
            cluster_labels.append(cluster_to_cluster_num[closest_cluster])

    self.cluster_labels = cluster_labels
    self.df_regions['cluster_label'] = cluster_labels
    self.n_clusters = len(set(self.cluster_labels))
    logger.info(f'Final number of unique regions = {self.n_clusters}')

`map_region_num_to_clusters()`

Create dictionary that maps the region number to the cluster.

Source code in exod/post_processing/cluster_regions.py

def map_region_num_to_clusters(self):
    """Create dictionary that maps the region number to the cluster."""
    self.region_to_clusters = defaultdict(list)
    for c in self.clusters:
        for region_num in c:
            self.region_to_clusters[region_num].append(c)

`number_clusters()`

Create dictionary that maps each unique cluster to its unique cluster number.

Source code in exod/post_processing/cluster_regions.py

def number_clusters(self):
    """Create dictionary that maps each unique cluster to its unique cluster number."""
    cluster_to_cluster_num = OrderedDict()
    i = 0
    for j, cluster in enumerate(self.clusters):
        c_t = tuple(cluster)
        if c_t not in cluster_to_cluster_num:
            cluster_to_cluster_num[c_t] = i
            i += 1
    logger.info(f'Initially found {len(cluster_to_cluster_num)} unique region clusters...')
    self.cluster_to_cluster_num = cluster_to_cluster_num
    # Create reverse mapping (cluster number to cluster)
    self.cluster_num_to_cluster = dict((v, k) for k, v in self.cluster_to_cluster_num.items())

`renumber_clusters()`

Renumber the clusters so that they go from 0 to n_clusters.

Source code in exod/post_processing/cluster_regions.py

def renumber_clusters(self):
    """Renumber the clusters so that they go from 0 to n_clusters."""
    logger.info('Renumbering clusters...')
    old2new = {old:new for new, old in zip(range(self.n_clusters), self.df_regions_unique.index)}
    new2old = {new:old for old, new in old2new.items()}

    self.df_regions_unique.reset_index(drop=True, inplace=True)
    self.df_regions_unique.index.name = 'cluster_label'
    self.df_regions['cluster_label'] = self.df_regions['cluster_label'].map(old2new)
    self.cluster_labels = self.df_regions['cluster_label'].values
    self.cluster_num_to_cluster = {c_num : self.cluster_num_to_cluster[new2old[c_num]] for c_num in range(self.n_clusters)}
    self.cluster_to_cluster_num = {v:k for k,v in self.cluster_num_to_cluster.items()}
    self.region_num_to_cluster_num = {reg:cluster for reg, cluster in zip(self.df_regions.index, self.cluster_labels)}

`crossmatch`

This module contains code for crossmatching the regions with various catalogues.

`crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col)`

Crossmatch with an arbitrary Fits Table.

Parameters:

Name	Type	Description	Default
`tab`	`Table`	Astropy Table	required
`df_region`	`DataFrame`	DataFrame containing the regions to crossmatch (must have ra_deg and dec_deg columns)	required
`ra_col`	`str`	Column name for the RA values in degrees	required
`dec_col`	`str`	Column name for the DEC values in in degrees.	required

Returns:

Name	Type	Description
`tab_fits_cmatch`	`Table`	Table containing the crossmatched data.

Source code in exod/post_processing/crossmatch.py

def crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col):
    """
    Crossmatch with an arbitrary Fits Table.

    Parameters:
        tab (astropy.Table): Astropy Table
        df_region (pd.DataFrame): DataFrame containing the regions to crossmatch (must have ra_deg and dec_deg columns)
        ra_col (str): Column name for the RA values in degrees
        dec_col (str): Column name for the DEC values in in degrees.

    Returns:
        tab_fits_cmatch (astropy.Table): Table containing the crossmatched data.
    """
    sc1 = SkyCoord(ra=tab[ra_col], dec=tab[dec_col], unit=u.deg, frame='fk5', equinox='J2000')
    sc2 = SkyCoord(ra=df_region['ra_deg'].values, dec=df_region['dec_deg'].values, unit='deg', frame='fk5', equinox='J2000')

    cmatch = sc2.match_to_catalog_sky(sc1)

    tab_cmatch = Table(cmatch)
    tab_cmatch.rename_columns(names=tab_cmatch.colnames, new_names=['label', 'sep2d', 'dist3d'])
    tab_cmatch['sep2d_arcsec'] = tab_cmatch['sep2d'].to(u.arcsec)
    tab_cmatch['idx_orig']     = np.arange(len(tab_cmatch))
    df_region_matched = df_region.iloc[tab_cmatch['idx_orig']]

    tab_fits_cmatch = tab[tab_cmatch['label']]
    tab_fits_cmatch['SEP_ARCSEC']   = tab_cmatch['sep2d_arcsec']
    tab_fits_cmatch['RA_OFFSET']    = calc_ra_offset(ra_deg1=df_region_matched['ra_deg'], ra_deg2=tab_fits_cmatch[ra_col], dec_deg1=df_region_matched['dec_deg'])
    tab_fits_cmatch['DEC_OFFSET']   = calc_dec_offset(dec_deg1=df_region_matched['dec_deg'], dec_deg2=tab_fits_cmatch[dec_col])
    tab_fits_cmatch['IDX_ORIGINAL'] = tab_cmatch['idx_orig']
    return tab_fits_cmatch

`crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg')`

Crossmatch using CDS X-Match Service.

Source code in exod/post_processing/crossmatch.py

def crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg'):
    """Crossmatch using CDS X-Match Service."""
    print(f'Crossmatching {len(df)} rows with {catalogue} using CDS Xmatch max_sep={max_sep_arcsec}"')
    r = requests.post(
        url='http://cdsxmatch.u-strasbg.fr/xmatch/api/v1/sync',
        data={'request'        : 'xmatch',
              'distMaxArcsec'  : max_sep_arcsec,
              'RESPONSEFORMAT' : 'csv',
              'cat2'           : catalogue,
              'colRA1'         : ra_col,
              'colDec1'        : dec_col},
        files={'cat1'          : df.to_csv()})
    if r.status_code != 200:
        raise Exception(f'Failed to crossmatch with CDS Xmatch. Status code: {r.status_code}')
    df_res = pd.read_csv(io.StringIO(r.text), low_memory=False)
    print(f'Returned {len(df_res)} rows')
    return df_res

`crossmatch_dr14_slim(df_region, clobber=True)`

Crossmatch regions with the 4XMM DR14 slim catalogue.

Source code in exod/post_processing/crossmatch.py

def crossmatch_dr14_slim(df_region, clobber=True):
    """Crossmatch regions with the 4XMM DR14 slim catalogue."""
    logger.info('Crossmatching with 4XMM DR14 slim catalogue')
    if not clobber and savepaths_combined['cmatch_dr14'].exists():
        logger.info(f"{savepaths_combined['cmatch_dr14']} already exists and clobber=False, loading from files")
        return Table.read(savepaths_combined['cmatch_dr14'])
    else:
        logger.info('Some crossmatch files are missing. Recreating...')

    fits_path = savepaths_util['4xmm_dr14_slim']
    tab_xmm_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='SC_RA', dec_col='SC_DEC')
    logger.info(f"Saving XMM DR14 crossmatch to {savepaths_combined['cmatch_dr14']}")
    tab_xmm_cmatch.write(savepaths_combined['cmatch_dr14'], format='csv', overwrite=True)
    return tab_xmm_cmatch

`crossmatch_fits_table(fits_path, df_region, ra_col, dec_col)`

Crossmatch with an arbitrary Fits Table.

Parameters:

Name	Type	Description	Default
`fits_path`	`Path`	Path to the FITS file.	required
`df_region`	`DataFrame`	DataFrame containing the regions to crossmatch.	required
`ra_col`	`str`	Column name for the RA values in FITS file.	required
`dec_col`	`str`	Column name for the DEC values in FITS file.	required

Returns:

Name	Type	Description
`tab_fits_cmatch`	`Table`	Table containing the crossmatched data.

Source code in exod/post_processing/crossmatch.py

def crossmatch_fits_table(fits_path, df_region, ra_col, dec_col):
    """
    Crossmatch with an arbitrary Fits Table.

    Parameters:
        fits_path (Path): Path to the FITS file.
        df_region (pd.DataFrame): DataFrame containing the regions to crossmatch.
        ra_col (str): Column name for the RA values in FITS file.
        dec_col (str): Column name for the DEC values in FITS file.

    Returns:
        tab_fits_cmatch (astropy.Table): Table containing the crossmatched data.
    """
    tab_fits = Table.read(fits_path)

    tab_fits_cmatch = crossmatch_astropy_table_with_regions(tab_fits, df_region, ra_col, dec_col)
    return tab_fits_cmatch

`crossmatch_glade(df_region, clobber=True)`

Crossmatch regions with GLADE+ catalogue, (converted to fits using topcat)

Source code in exod/post_processing/crossmatch.py

def crossmatch_glade(df_region, clobber=True):
    """Crossmatch regions with GLADE+ catalogue, (converted to fits using topcat)"""
    logger.info("Crossmatching with GLADEP slim catalogue (can take a second)")
    if not clobber and savepaths_combined['cmatch_glade'].exists():
        logger.info(f"{savepaths_combined['cmatch_glade']} already exists and clobber=False, loading from files")
        return Table.read(savepaths_combined['cmatch_glade'])
    else:
        logger.info('Some crossmatch files are missing. Recreating...')

    fits_path = savepaths_util['GLADE+']
    tab_glade_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='RA', dec_col='Dec')
    logger.info(f"Saving GLADE crossmatch to {savepaths_combined['cmatch_glade']}")
    tab_glade_cmatch.write(savepaths_combined['cmatch_glade'], format='csv', overwrite=True)
    return tab_glade_cmatch

`crossmatch_tranin_dr12(df_region)`

Crossmatch the regions with the CLAXON Hugo Tranin DR12 catalogue.

Source code in exod/post_processing/crossmatch.py

def crossmatch_tranin_dr12(df_region):
    """Crossmatch the regions with the CLAXON Hugo Tranin DR12 catalogue."""
    logger.info('Crossmatching with CLAXON Hugo Tranin DR12 catalogue')
    fits_path = data_util / 'tranin/classification_DR12_with_input.fits'
    tab_xmm_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='RA', dec_col='DEC')
    return tab_xmm_cmatch

`crossmatch_runs`

This module is used for various the simulation subsets with each other In order to determine the crossmatch fraction and various other metrics.

`calc_subset_cmatch_fraction(dfs_subset_crossmatch)`

Calculate the fraction of successfully crossmatched regions.

Parameters:

Name	Type	Description	Default
`dfs_subset_crossmatch`	`dict`	A dictionary containing the crossmatch results for each subset.	required

Returns:

Name	Type	Description
`df_subset_cmatch_fraction`	`DataFrame`	A DataFrame containing the fraction of successfully crossmatched regions.

Source code in exod/post_processing/crossmatch_runs.py

def calc_subset_cmatch_fraction(dfs_subset_crossmatch):
    """
    Calculate the fraction of successfully crossmatched regions.

    Parameters:
        dfs_subset_crossmatch (dict): A dictionary containing the crossmatch results for each subset.

    Returns:
        df_subset_cmatch_fraction (pd.DataFrame): A DataFrame containing the fraction of successfully crossmatched regions.
    """
    all_res = []
    for k, df in dfs_subset_crossmatch.items():
        res = {}
        for col in df.columns:
            count = (df[col] > -1).sum()
            perc = count / len(df)
            res[col] = perc
        all_res.append(res)
    df_subset_cmatch_fraction = pd.DataFrame(all_res)
    df_subset_cmatch_fraction.index = dfs_subset_crossmatch.keys()
    return df_subset_cmatch_fraction

`calc_subset_n_regions(dfs_subset_crossmatch)`

Calculate the number of regions in each subset.

Source code in exod/post_processing/crossmatch_runs.py

def calc_subset_n_regions(dfs_subset_crossmatch):
    """Calculate the number of regions in each subset."""
    n_regions_sim = {}
    for k, df in dfs_subset_crossmatch.items():
        n_regions_sim[k] = len(df)
    return n_regions_sim

`crossmatch_simulation_subsets(dfs_subsets)`

Crossmatch the subsets of simulations to each other.

Parameters:

Name	Type	Description	Default
`dfs_subsets`	`dict`	A dictionary containing the DataFrames for each subset.	required

Returns:

Name	Type	Description
`dfs_subset_crossmatch`	`dict`	A dictionary containing the crossmatch results for each subset. Each DataFrame contains the indices of the crossmatched sources in the other subsets. A placeholder value of -1 is denoted for sources without a crossmatch.

Source code in exod/post_processing/crossmatch_runs.py

def crossmatch_simulation_subsets(dfs_subsets):
    """
    Crossmatch the subsets of simulations to each other.

    Parameters:
        dfs_subsets (dict): A dictionary containing the DataFrames for each subset.

    Returns:
        dfs_subset_crossmatch (dict): A dictionary containing the crossmatch results for each subset.
            Each DataFrame contains the indices of the crossmatched sources in the other subsets.
            A placeholder value of -1 is denoted for sources without a crossmatch.
    """
    max_sep = 15 * u.arcsec  # Maximum separation for considering a crossmatch
    no_cmatch_id = -1  # Placeholder value for sources without a crossmatch
    dfs_subset_crossmatch = {}
    for k1, df1 in dfs_subsets.items():
        sc1 = SkyCoord(ra=df1['ra_deg'], dec=df1['dec_deg'], unit='deg', frame='fk5', equinox='J2000')  #
        res = {}
        res[k1] = np.arange(len(sc1))
        for k2, df2 in dfs_subsets.items():
            if k1 == k2:
                continue

            logger.info(f'Matching {k1} ({len(df1)}) With {k2:<12} ({len(df2)})')
            sc2 = SkyCoord(ra=df2['ra_deg'], dec=df2['dec_deg'], unit='deg', frame='fk5', equinox='J2000')
            cmatch = sc1.match_to_catalog_sky(sc2)

            tab_cmatch = Table(cmatch)
            tab_cmatch.rename_columns(names=tab_cmatch.colnames, new_names=['label', 'sep2d', 'dist3d'])
            tab_cmatch['sep2d'] = tab_cmatch['sep2d'].to(u.arcsec)

            is_match = np.where(tab_cmatch['sep2d'] < max_sep, tab_cmatch['label'], no_cmatch_id)  # Replace
            res[k2] = is_match

        df = pd.DataFrame(res)
        dfs_subset_crossmatch[k1] = df
    return dfs_subset_crossmatch

`get_run_subset_keys()`

Get the keys for the simulation subsets.

Source code in exod/post_processing/crossmatch_runs.py

def get_run_subset_keys():
    """Get the keys for the simulation subsets."""
    subsets = ['5_0.2_2.0',
               '5_2.0_12.0',
               '5_0.2_12.0',
               '50_0.2_2.0',
               '50_2.0_12.0',
               '50_0.2_12.0',
               '200_0.2_2.0',
               '200_2.0_12.0',
               '200_0.2_12.0']
    return subsets

`plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim)`

Plot the confusion matrix for the crossmatch fractions.

Source code in exod/post_processing/crossmatch_runs.py

def plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim):
    """Plot the confusion matrix for the crossmatch fractions."""
    labels_readable = {'5_0.2_2.0'    : r'$t_{\mathrm{bin}} = 5  ;E=0.2-2.0$',
                       '5_2.0_12.0'   : r'$t_{\mathrm{bin}} = 5  ;E=2.0-12.0$',
                       '5_0.2_12.0'   : r'$t_{\mathrm{bin}} = 5  ;E=0.2-12.0$',
                       '50_0.2_2.0'   : r'$t_{\mathrm{bin}} = 50 ;E=0.2-2.0$',
                       '50_2.0_12.0'  : r'$t_{\mathrm{bin}} = 50 ;E=2.0-12.0$',
                       '50_0.2_12.0'  : r'$t_{\mathrm{bin}} = 50 ;E=0.2-12.0$',
                       '200_0.2_2.0'  : r'$t_{\mathrm{bin}} = 200;E=0.2-2.0$',
                       '200_2.0_12.0' : r'$t_{\mathrm{bin}} = 200;E=2.0-12.0$',
                       '200_0.2_12.0' : r'$t_{\mathrm{bin}} = 200;E=0.2-12.0$'}
    lab1 = [labels_readable[i] for i in df_subset_cmatch_fraction.index]
    lab2 = [fr'$N_{{\mathrm{{reg}}}}$ = {n}' for n in n_regions_sim.values()]

    X = df_subset_cmatch_fraction.values
    fig, ax = plt.subplots(figsize=(10, 9.2))
    ax.set_title('Fraction of successfully crossmatched Regions')
    ax.imshow(X, cmap='Blues', interpolation='none')
    ax.set_xticks(range(X.shape[0]), lab1, rotation=90)
    ax.set_yticks(range(X.shape[0]), lab1, rotation=0)
    ax2 = ax.twinx()
    ax2.set_yticks(range(X.shape[0]), lab2, rotation=0)
    ax2.set_ylim(ax.get_ylim())
    for (j, i), label in np.ndenumerate(X):
        label = f'{label:.2f}'
        ax.text(i, j, label, ha='center', va='center')
    ax.set_xlabel('Crossmatch A')
    ax.set_ylabel('Crossmatch B')
    plt.tight_layout()
    logger.info(f'Saving Crossmatch Confusion Matrix to: crossmatch_confusion_matrix.png...')
    plt.savefig(data_plots / 'crossmatch_confusion_matrix.png')
    plt.savefig(data_plots / 'crossmatch_confusion_matrix.pdf')
    plt.show()

`plot_mean_count_histogram(dfs_subsets)`

Plot the mean count histograms for each subset.

Source code in exod/post_processing/crossmatch_runs.py

def plot_mean_count_histogram(dfs_subsets):
    """Plot the mean count histograms for each subset."""
    linestyles = {'5_': 'solid',
                  '50_': 'dashed',
                  '200_': 'dotted'}

    colors = {'0.2_2.0': 'red',
              '2.0_12.0': 'blue',
              '0.2_12.0': 'black'}

    fig, ax = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
    ax[0].set_title('Mean Count Histograms')
    for k, df in dfs_subsets.items():
        ls = [linestyles[key] for key in linestyles.keys() if key in k]
        c = [colors[key] for key in colors.keys() if key in k]

        hist_kwargs = {'bins'     : np.linspace(start=0, stop=500, num=50),
                       'histtype' : 'step',
                       'lw'       : 1.5,
                       'ls'       : ls[0],
                       'color'    : c[0],
                       'label'    : k}

        ax[0].hist(df['intensity_mean'], **hist_kwargs)
        ax[1].hist(df['intensity_mean'], density=True, **hist_kwargs)
    ax[0].set_ylabel('Number')
    ax[1].set_ylabel('Fraction')
    ax[1].set_xlabel('Intensity Mean (Counts)')
    for a in ax:
        a.legend()
        a.grid()
    plt.subplots_adjust(hspace=0)
    plt.savefig(data_plots / 'mean_count_histogram.png')
    plt.savefig(data_plots / 'mean_count_histogram.pdf')
    plt.show()

`split_subsets(df_regions)`

Split the regions into subsets based on the t_bin, E_lo, E_hi values in the runid.

Parameters:

Name	Type	Description	Default
`df_regions`	`DataFrame`	The DataFrame containing the region information.	required

Returns:

Name	Type	Description
`dfs_subsets`	`dict`	A dictionary containing the DataFrames for each subset.

Source code in exod/post_processing/crossmatch_runs.py

def split_subsets(df_regions):
    """
    Split the regions into subsets based on the t_bin, E_lo, E_hi values in the runid.

    Parameters:
        df_regions (pd.DataFrame): The DataFrame containing the region information.

    Returns:
        dfs_subsets (dict): A dictionary containing the DataFrames for each subset.
    """
    logger.info('Splitting df_regions into simulations subsets...')
    subsets = get_run_subset_keys()
    dfs_subsets = {}
    for s in subsets:
        dfs_subsets[s] = df_regions[df_regions['runid'].str.contains(s)]
    return dfs_subsets

`ds9`

View images from an XMM observation in DS9.

`estimate_variability_properties`

`calc_sigma_cube(cube_n, cube_mu)`

Calculate the sigma equivalent over a data cube. cube_n : observed data cube cube_mu : Expected data cube

Source code in exod/post_processing/estimate_variability_properties.py

def calc_sigma_cube(cube_n, cube_mu):
    """
    Calculate the sigma equivalent over a data cube.
    cube_n  : observed data cube
    cube_mu : Expected data cube
    """
    result_cube = np.zeros_like(cube_n)
    for i in range(cube_n.shape[0]):
        print(f'Calculating sigma cube... {i}/{cube_n.shape[0]}')
        for j in range(cube_n.shape[1]):
            for k in range(cube_n.shape[2]):
                n = cube_n[i, j, k]
                mu = cube_mu[i, j, k]
                result_cube[i, j, k] = sigma_equivalent(n, mu)
    return result_cube

`clean_up_eclipses(data_cube, eclipses)`

Removes eclipses in bright sources coming from merging of partial exposures. This is done by checking if the flux change is the same in the source and the rest of the frame

Source code in exod/post_processing/estimate_variability_properties.py

def clean_up_eclipses(data_cube, eclipses):
    """Removes eclipses in bright sources coming from merging of partial exposures. This is done by checking if the flux
    change is the same in the source and the rest of the frame"""
    new_eclipses = np.copy(eclipses)
    eclipse_x, eclipse_y, eclipse_t = np.where(eclipses == True)
    cube = data_cube.data
    for (x, y, t) in zip(eclipse_x, eclipse_y, eclipse_t):
        non_source_eclipseframe = np.nansum(cube[:, :, t]) - np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        non_source_around_eclipseframe = (np.nansum(cube[:, :, t - 5:t + 6]) - np.nansum(
            cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - non_source_eclipseframe) / 10
        source_eclipseframe = np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        source_around_eclipseframe = (np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - source_eclipseframe) / 10
        # We reject eclipses for which the rest of the frame has had an eclipse as well, and this background "eclipse"
        # is at most 25% larger in relative amplitude than the source eclipse.
        if ((non_source_eclipseframe < non_source_around_eclipseframe) and
                ((((non_source_around_eclipseframe - non_source_eclipseframe) / non_source_around_eclipseframe) -
                  ((source_around_eclipseframe - source_eclipseframe) / source_around_eclipseframe)) < 0.25)):
            new_eclipses[x, y, t] = False
            # data_cube.data[:,:,t]=np.full(data_cube.shape[:2],np.nan)
    return new_eclipses

`clean_up_peaks(data_cube, peaks)`

Removes peaks coming from flaring CCD edges

Source code in exod/post_processing/estimate_variability_properties.py

def clean_up_peaks(data_cube, peaks):
    """Removes peaks coming from flaring CCD edges"""
    new_peaks = np.copy(peaks)
    peak_x, peak_y, peak_t = np.where(peaks == True)
    cube = data_cube.data
    for (x, y, t) in zip(peak_x, peak_y, peak_t):
        non_source_peakframe = np.nansum(cube[:, :, t]) - np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        non_source_around_peakframe = (np.nansum(cube[:, :, t - 5:t + 6]) - np.nansum(
            cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - non_source_peakframe) / 10
        source_peakframe = np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        source_around_peakframe = (np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - source_peakframe) / 10
        # We reject peaks for which the rest of the frame has a peak as well, and the source peak
        # is at most 25% larger in relative amplitude than the background peak.
        background_peak_amplitude = ((non_source_peakframe - non_source_around_peakframe) / non_source_around_peakframe)
        source_peak_amplitude = ((source_peakframe - source_around_peakframe) / source_around_peakframe)
        if (background_peak_amplitude > 0) and ((source_peak_amplitude - background_peak_amplitude) < 0.25):
            new_peaks[x, y, t] = False
            # data_cube.data[:,:,t]=np.full(data_cube.shape[:2],np.nan)
    return new_peaks

`convert_count_to_flux(count, position, data_cube)`

Used to convert count rates to fluxes, using vignetting / EEF / ECF

Source code in exod/post_processing/estimate_variability_properties.py

def convert_count_to_flux(count, position, data_cube):
    # TODO: find the vignetting functions depending on energy / submode / filters. Maybe build the exposure map for
    # each observation  (https://xmm-tools.cosmos.esa.int/external/sas/current/doc/eexpmap.pdf)
    # Find the EEF (Encircled Energy Frac.), maybe with ARFGEN if we manage to convert data_cube regions to XY regions
    # Think about which Energy Conversion Factor (ECF) to use. Maybe depends on spectral search (hard or soft).
    """Used to convert count rates to fluxes, using vignetting / EEF / ECF"""

    """
    ds9 --> square region
    from this create an ARF for the region. http://xmm-tools.cosmos.esa.int/external/sas/current/doc/arfgen/
    This will allow you to convert a count to a count rate.
    from a count rate, you can get the flux by using an ECF.

    A spectral should be assumed that will likely depend on the band you're looking at:
    0.2 - 2.0 --> bbody
    2.0 - 12.0 --> something else
    this will change the flux by a factor of 2-3x

    this will give us an L for a given distance. :)
    """

    return count / data_cube.time_interval

`count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)`

Counts the individual number of times the lightcurve went above the threshold for variability, and there was a given number of bins since the last time it was above this threshold last

Source code in exod/post_processing/estimate_variability_properties.py

def count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks):
    """Counts the individual number of times the lightcurve went above the threshold for variability, and there was
    a given number of bins since the last time it was above this threshold last"""
    if np.sum(peaks_or_eclipses)==0.:
        return 0
    indices_peaksoreclipses = np.where(peaks_or_eclipses)[0]
    number_last_peak_end = np.searchsorted(indices_peaksoreclipses, np.arange(len(peaks_or_eclipses)), side='left')-1
    indices_last_peak_end = np.take(indices_peaksoreclipses,number_last_peak_end)
    distance_since_last_peak_end = (np.arange(len(peaks_or_eclipses))-indices_last_peak_end)
    peaks_distant_enough = np.array(peaks_or_eclipses, dtype=bool)&(distance_since_last_peak_end>bins_min_between_peaks)
    nbr_of_variability_events = np.nansum(peaks_distant_enough)+1
    return nbr_of_variability_events

`count_peaks(peaks_or_eclipses)`

Counts the individual number of times the lightcurve went above the threshold for variability

Source code in exod/post_processing/estimate_variability_properties.py

def count_peaks(peaks_or_eclipses):
    """Counts the individual number of times the lightcurve went above the threshold for variability"""
    nbr_of_variability_events = np.nansum(np.abs(np.diff(peaks_or_eclipses, axis=2)), axis=2) / 2
    return nbr_of_variability_events

`eclipse_count_estimate(fraction, N, mu)`

Estimate the upper limit on the count of the eclipse, given an data_cube and observed counts, and a confidence fraction

Source code in exod/post_processing/estimate_variability_properties.py

def eclipse_count_estimate(fraction, N, mu):
    """Estimate the upper limit on the count of the eclipse, given an data_cube and observed counts,
     and a confidence fraction"""
    return mu - gammaincinv(N + 1, gammainc(N + 1, mu) - fraction * gammainc(N + 1, mu))

`peak_count_estimate(fraction, N, mu)`

Estimate the upper limit on the count of the peak, given an data_cube and observed counts, and a confidence fraction

Source code in exod/post_processing/estimate_variability_properties.py

def peak_count_estimate(fraction, N, mu):
    """Estimate the upper limit on the count of the peak, given an data_cube and observed counts,
     and a confidence fraction"""
    return gammaincinv(N + 1, fraction * gammaincc(N + 1, mu) + gammainc(N + 1, mu)) - mu

`plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes)`

This function creates the multi-panel lightcurve of the transient object. It will retrieve the background, and use it to compare the (source+background) to the background, and compute the likelihood in each frame :param cube: full data cube :param cube_background: data cube of de-sourced background estimate :param cube_background_withsource: data cube of de-sourced background estimate + constant contribution from the sources (i.e. we assume they are constant, take their stacked flux and distribute it over all frames) :param tab_boundingboxes: bounding boxes of variable objects, as obtained from variability.py :return: nothing, but saves the lightcurve of each (source+background) and its background, lightcurve of background-subtracted source, and lightcurve of likelihood of creating the signal with the background

Source code in exod/post_processing/estimate_variability_properties.py

def plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes):
    """
    This function creates the multi-panel lightcurve of the transient object. It will retrieve the background,
    and use it to compare the (source+background) to the background, and compute the likelihood in each frame
    :param cube: full data cube
    :param cube_background: data cube of de-sourced background estimate
    :param cube_background_withsource: data cube of de-sourced background estimate + constant contribution from the
    sources (i.e. we assume they are constant, take their stacked flux and distribute it over all frames)
    :param tab_boundingboxes: bounding boxes of variable objects, as obtained from variability.py
    :return: nothing, but saves the lightcurve of each (source+background) and its background, lightcurve of
    background-subtracted source, and lightcurve of likelihood of creating the signal with the background
    """
    color = cmr.take_cmap_colors('cmr.ocean', N=1, cmap_range=(0.3, 0.3))[0]
    for ind, source in enumerate(tab_boundingboxes):
        legend_plots_ax1 = []
        legend_labels_ax1 = []
        legend_plots_ax2 = []
        legend_labels_ax2 = []
        legend_plots_ax3 = []
        legend_labels_ax3 = []

        # First step: we generate the lightcurves and a number of Poisson realisations of it
        lc = np.sum(cube[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_generated = np.random.poisson(lc, (5000, len(lc)))
        lc_percentiles = np.nanpercentile(lc_generated, (16, 84), axis=0)
        lc_back = np.sum(cube_background[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_back_generated = np.random.poisson(lc_back, (5000, len(lc)))
        lc_back_percentiles = np.nanpercentile(lc_back_generated, (16, 84), axis=0)
        lc_back_withsource = np.sum(cube_background_withsource[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_back_generated_withsource = np.random.poisson(lc_back_withsource, (5000, len(lc)))

        fig, (ax1, ax2, ax3) = plt.subplots(3, 1)

        # First panel: Source + background lightcurve
        p1 = ax1.step(range(len(lc)), lc, c=color, where="mid")
        p2 = ax1.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        ax1.fill_between(range(len(lc)), lc_percentiles[0], lc_percentiles[1], alpha=0.4, facecolor=color, step="mid")
        legend_plots_ax1.append((p1[0], p2[0]))
        legend_labels_ax1.append("Source+background")
        p1 = ax1.step(range(len(lc_back)), lc_back, c="gray", where="mid")
        p2 = ax1.fill(np.NaN, np.NaN, facecolor="gray", alpha=0.4)
        ax1.fill_between(range(len(lc_back)), lc_back_percentiles[0], lc_back_percentiles[1],
                         alpha=0.4, facecolor="gray", step="mid")
        legend_plots_ax1.append((p1[0], p2[0]))
        legend_labels_ax1.append("Background")
        ax1.legend(legend_plots_ax1, legend_labels_ax1)

        # Second panel: Background subtracted lightcurve
        p1 = ax2.step(range(len(lc)), lc - lc_back, c=color, where='mid')
        p2 = ax2.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        lc_diff_percentiles = np.nanpercentile(lc_generated - lc_back_generated, (16, 84), axis=0)
        ax2.fill_between(range(len(lc)), lc_diff_percentiles[0], lc_diff_percentiles[1],
                         alpha=0.4, facecolor=color, step='mid')
        legend_plots_ax2.append((p1[0], p2[0]))
        legend_labels_ax2.append("Source-Background")
        ax2.legend(legend_plots_ax2, legend_labels_ax2)

        # Third panel: Poisson likelihood
        p1 = ax3.step(range(len(lc)), -poisson.logpmf(lc, lc_back_withsource), c=color, where='mid')
        p2 = ax3.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        likelihood_percentiles = np.nanpercentile(-poisson.logpmf(lc_generated, lc_back_generated_withsource), (16, 84),
                                                  axis=0)
        ax3.fill_between(range(len(lc)), likelihood_percentiles[0], likelihood_percentiles[1],
                         alpha=0.4, facecolor=color, step='mid')
        legend_plots_ax3.append((p1[0], p2[0]))
        legend_labels_ax3.append("Poisson likelihood")
        ax3.legend(legend_plots_ax2, legend_labels_ax2)

        plt.savefig(os.path.join(data_processed, '0831790701', f'lightcurve_transient_{ind}.png'))

`extract_lc_features`

`calc_df_lc_feat_filter_flags(df_lc_feat)`

Calculate the quality flags for the lightcurves in the sample.

Source code in exod/post_processing/extract_lc_features.py

def calc_df_lc_feat_filter_flags(df_lc_feat):
    """Calculate the quality flags for the lightcurves in the sample."""
    print('Calculating Light Curve Feature Filter Flags...')
    # Filter flag for regions that have less than 5 counts maximum in 5 second binning
    df_lc_feat['filt_tbin_5_n_l_5'] = (df_lc_feat['runid'].str.contains('_5_')) & (df_lc_feat['n_max'] < 5)

    # Filter flag for runids with more than 20 detected regions
    vc_runid = df_lc_feat['runid'].value_counts()
    df_lc_feat['filt_g_20_detections'] = df_lc_feat['runid'].isin(vc_runid.index[vc_runid > 20])

    # Filter flag for 5 sigma detections
    df_lc_feat['filt_5sig'] = (df_lc_feat['B_peak_log_max'] > 13.2) | (df_lc_feat['B_eclipse_log_max'] > 12.38)

    # Filter flag for excluded obsids
    df_lc_feat['obsid'] = df_lc_feat['runid'].str.extract(r'(\d{10})')
    df_lc_feat['filt_exclude_obsid'] = df_lc_feat['obsid'].isin(obsids_to_exclude)
    return df_lc_feat

`calc_features(df_lc, key)`

Calculate Features for a single lightcurve.

Source code in exod/post_processing/extract_lc_features.py

def calc_features(df_lc, key):
    """Calculate Features for a single lightcurve."""
    parts    = key.strip("()").split(", ")
    parts    = [part.strip("'") for part in parts]
    runid    = parts[0]
    label    = parts[1]
    runid_sp = runid.split("_")
    obsid    = runid_sp[0]
    subset   = int(runid_sp[1])
    t_bin    = int(runid_sp[2])
    E_low    = float(runid_sp[3])
    E_hi     = float(runid_sp[4])

    length = len(df_lc)
    n_bccd = df_lc['bccd'].sum()
    n_bti = df_lc['bti'].sum()
    ratio_bccd = n_bccd / length
    ratio_bti = n_bti / length

    ks = ks_2samp(df_lc['n'], df_lc['mu'])

    peaks_or_eclipses = (df_lc['B_peak_log'] > 6.4) | (df_lc['B_eclipse_log'] > 5.5)
    t_bin_bin_spacing = {5 : 1000, 50 : 100, 200 : 5}
    bins_min_between_peaks = t_bin_bin_spacing[t_bin]
    n_peaks = count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)
    sig_bins = count_significant_bins(df_lc)

    num_B_peak_above_6_4    = count_recurring_peaks(df_lc['B_peak_log'].values, threshold=6.4)
    num_B_eclipse_above_5_5 = count_recurring_peaks(df_lc['B_eclipse_log'].values, threshold=5.5)

    n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare = largest_peak_info(df_lc)

    res = {'label'                     : key,
           'runid'                   : runid,
           'label'                   : label,
           'obsid'                   : obsid,
           'subset'                  : subset,
           't_bin'                   : t_bin,
           'E_low'                   : E_low,
           'E_hi'                    : E_hi,
           'len'                     : length,
           'n_bccd'                  : df_lc['bccd'].sum(),
           'n_bti'                   : df_lc['bti'].sum(),
           'ratio_bccd'              : ratio_bccd,
           'ratio_bti'               : ratio_bti,
           'ks_stat'                 : ks.statistic,
           'ks_pval'                 : ks.pvalue,
           'n_min'                   : df_lc['n'].min(),
           'n_max'                   : df_lc['n'].max(),
           'n_mean'                  : df_lc['n'].mean(),
           'n_std'                   : df_lc['n'].std(),
           'n_sum'                   : df_lc['n'].sum(),
           'n_skew'                  : skew(df_lc['n']),
           'n_kurt'                  : kurtosis(df_lc['n']),
           'n_max_idx'               : n_max_idx,
           'n_max_isolated_flare'    : n_max_isolated_flare,
           'n_max_first_bin'         : n_max_first_bin,
           'n_max_last_bin'          : n_max_last_bin,
           'mu_min'                  : df_lc['n'].min(),
           'mu_max'                  : df_lc['n'].max(),
           'mu_mean'                 : df_lc['n'].mean(),
           'mu_std'                  : df_lc['n'].std(),
           'mu_skew'                 : skew(df_lc['n']),
           'mu_kurt'                 : kurtosis(df_lc['n']),
           'B_peak_log_max'          : df_lc['B_peak_log'].max(),
           'B_eclipse_log_max'       : df_lc['B_eclipse_log'].max(),
           'sigma_max_B_peak'        : sigma_equivalent_B_peak(df_lc['B_peak_log'].max()),
           'sigma_max_B_eclipse'     : sigma_equivalent_B_eclipse(df_lc['B_eclipse_log'].max()),
           'num_B_peak_above_6_4'    : num_B_peak_above_6_4,
           'num_B_eclipse_above_5_5' : num_B_eclipse_above_5_5,
           'bin_min_between_peaks'   : bins_min_between_peaks,
           'n_peaks'                 : n_peaks}
    for k, v in sig_bins.items():
        res[k] = v

    return res

`count_significant_bins(df_lc)`

Count the total number of significant bins across all the lightcurves.

Source code in exod/post_processing/extract_lc_features.py

def count_significant_bins(df_lc):
    """Count the total number of significant bins across all the lightcurves."""
    # The following are obtained from bayesian_computations import get_bayes_thresholds
    B_peak_3_sig    = 5.94096891658419
    B_peak_5_sig    = 13.278830271385576
    B_eclipse_3_sig = 5.708917345972507
    B_eclipse_5_sig = 12.380488516107011

    res = {'n_3_sig_peak_bins'        : (df_lc['B_peak_log'] > B_peak_3_sig).sum(),
           'n_3_sig_peak_bins_bti'    : (df_lc[df_lc['bti'] == 1]['B_peak_log'] > B_peak_3_sig).sum(),
           'n_3_sig_peak_bins_gti'    : (df_lc[df_lc['bti'] == 0]['B_peak_log'] > B_peak_3_sig).sum(),
           'n_5_sig_peak_bins'        : (df_lc['B_peak_log'] > B_peak_5_sig).sum(),
           'n_5_sig_peak_bins_bti'    : (df_lc[df_lc['bti'] == 1]['B_peak_log'] > B_peak_5_sig).sum(),
           'n_5_sig_peak_bins_gti'    : (df_lc[df_lc['bti'] == 0]['B_peak_log'] > B_peak_5_sig).sum(),
           'n_3_sig_eclipse_bins'     : (df_lc['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_3_sig_eclipse_bins_bti' : (df_lc[df_lc['bti'] == 1]['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_3_sig_eclipse_bins_gti' : (df_lc[df_lc['bti'] == 0]['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_5_sig_eclipse_bins'     : (df_lc['B_eclipse_log'] > B_eclipse_5_sig).sum(),
           'n_5_sig_eclipse_bins_bti' : (df_lc[df_lc['bti'] == 1]['B_eclipse_log'] > B_eclipse_5_sig).sum(),
           'n_5_sig_eclipse_bins_gti' : (df_lc[df_lc['bti'] == 0]['B_eclipse_log'] > B_eclipse_5_sig).sum()}
    return res

`extract_lc_features(clobber=True)`

Loop over all lightcurves and extract features.

Source code in exod/post_processing/extract_lc_features.py

def extract_lc_features(clobber=True):
    """Loop over all lightcurves and extract features."""
    if not clobber:
        if savepaths_combined['lc_features'].exists():
            logger.info(f'Light curve features already exist at {savepaths_combined["lc_features"]}. Skipping...')
            df_lc_features = pd.read_csv(savepaths_combined['lc_features'])
            return df_lc_features

    df_lc_indexs = pd.read_csv(savepaths_combined['lc_idx'], index_col='Unnamed: 0')
    all_res = []
    for i, r in tqdm(df_lc_indexs.iterrows(), desc="Extracting Lightcurve Features."):
        df_lc = pd.read_hdf(savepaths_combined['lc'], start=r['start'], stop=r['stop'])
        res = calc_features(df_lc, key=i)
        all_res.append(res)
    df_lc_features = pd.DataFrame(all_res)
    logger.info(f'Saving light curve features to {savepaths_combined["lc_features"]}')
    df_lc_features.to_csv(savepaths_combined['lc_features'], index=False)
    return df_lc_features

`largest_peak_info(df_lc)`

Find the largest peak in the light curve and return some information about it.

Parameters:

Name	Type	Description	Default
`df_lc`	`DataFrame`	DataFrame containing the light curve data.	required

Returns:

Name	Type	Description
`n_max_idx`	`int`	Index of the largest peak.
`n_max_last_bin`	`bool`	True if the largest peak is in the last bin.
`n_max_first_bin`	`bool`	True if the largest peak is in the first bin.
`n_max_isolated_flare`	`bool`	True if the largest peak is surrounded by zeros.

Source code in exod/post_processing/extract_lc_features.py

def largest_peak_info(df_lc):
    """
    Find the largest peak in the light curve and return some information about it.

    Parameters:
        df_lc (pd.DataFrame): DataFrame containing the light curve data.

    Returns:
        n_max_idx (int): Index of the largest peak.
        n_max_last_bin (bool): True if the largest peak is in the last bin.
        n_max_first_bin (bool): True if the largest peak is in the first bin.
        n_max_isolated_flare (bool): True if the largest peak is surrounded by zeros.
    """
    n_max_last_bin       = False
    n_max_first_bin      = False
    n_max_isolated_flare = False

    n = df_lc['n'].values
    n_max_idx = np.argmax(n)
    if n_max_idx + 1 == len(n):
        n_max_last_bin = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare
    elif n_max_idx + 1 == -1:
        n_max_first_bin = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare

    val_before = n[n_max_idx - 1]
    val_after = n[n_max_idx + 1]
    if (val_before == 0) & (val_after == 0):
        n_max_isolated_flare = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare
    return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare

`print_df_lc_feat_filter_flag_stats(df_lc_feat)`

Print a summary of the lightcurve quality flags.

Source code in exod/post_processing/extract_lc_features.py

def print_df_lc_feat_filter_flag_stats(df_lc_feat):
    """Print a summary of the lightcurve quality flags."""
    print('Lightcurve Feature Flags Statistics:')
    print('------------------------------------')
    flag_cols = ['n_max_isolated_flare', 'n_max_first_bin', 'n_max_last_bin', 'filt_tbin_5_n_l_5', 'filt_5sig', 'filt_exclude_obsid', 'filt_g_20_detections']
    df_lc_feat_5_sig = df_lc_feat[df_lc_feat['filt_5sig']]
    N = len(df_lc_feat)
    for col in flag_cols:
        num = len(df_lc_feat[df_lc_feat[col] == True])
        percentage = num / N * 100
        count_5sig = len(df_lc_feat_5_sig[df_lc_feat_5_sig[col] == True])
        total_5sig = len(df_lc_feat_5_sig)
        percentage_5sig = count_5sig / total_5sig * 100

        col_formatted = f'{col:<20}'
        num_formatted = f'{num:<5} / {N} ({percentage:.2f}%)'
        five_sig_formatted = f'5sig: {count_5sig:<5} / {total_5sig} ({percentage_5sig:.2f}%)'
        print(f'{col_formatted} : {num_formatted:<22} | {five_sig_formatted}')

    print(f'n_exluded_obsids     : {len(obsids_to_exclude):,}')
    print('------------------------------------')

`filter`

This module contains classes for filtering regions and lightcurves.

`FilterBase`

Bases: ABC

Base class for filters. All filters should inherit from this class.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the filter.	required

Methods:

Name	Description
`apply`	Apply the filter to the input dataframe.
`get_parameters`	Return a dictionary of filter-specific parameters.
`info`	Return a dictionary with information about the filter.

Source code in exod/post_processing/filter.py

class FilterBase(ABC):
    """
    Base class for filters. All filters should inherit from this class.

    Parameters:
        name (str): Name of the filter.

    Methods:
        apply(df): Apply the filter to the input dataframe.
        get_parameters(): Return a dictionary of filter-specific parameters.
        info(): Return a dictionary with information about the filter.
    """
    def __init__(self, name):
        self.name = name
        self.df = None
        self.df_filtered = None
        self.df_removed = None

    @abstractmethod
    def apply(self, df):
        """This should set self.df_regions, self.df_filtered and self.removed."""

    @abstractmethod
    def get_parameters(self):
        """This should return a dictionary of filter-specific parameters."""

    def info(self):
        info = {}
        info['name'] = self.name
        info['parameters'] = self.get_parameters()
        info['n_original'] = len(self.df)
        info['n_filtered'] = len(self.df_filtered)
        info['n_removed'] = len(self.df_removed)
        return info

    def __repr__(self):
        return f'Filter(name="{self.name}")'

`apply(df)` `abstractmethod`

This should set self.df_regions, self.df_filtered and self.removed.

Source code in exod/post_processing/filter.py

@abstractmethod
def apply(self, df):
    """This should set self.df_regions, self.df_filtered and self.removed."""

`get_parameters()` `abstractmethod`

This should return a dictionary of filter-specific parameters.

Source code in exod/post_processing/filter.py

@abstractmethod
def get_parameters(self):
    """This should return a dictionary of filter-specific parameters."""

`generate_combinations_with_one_or_none(filters)`

Generates all combinations of zero or one filter from the given list.

This function creates an iterable that yields combinations of filters, where each combination includes either no filter or exactly one filter from the provided list. The function returns an iterator that produces these combinations on demand.

Parameters:

Name	Type	Description	Default
`filters`	`list`	A list of Filters() to generate combinations from.	required

Returns:

Type	Description
	itertools.chain: An iterator yielding tuples that contain either
	no filters or one filter from the provided list. Each tuple is a
	valid combination.

Example

filters = ['filter1', 'filter2'] result = generate_combinations_with_one_or_none(filters) print(list(result))

Output: [(), ('filter1',), ('filter2',)]

Source code in exod/post_processing/filter.py

def generate_combinations_with_one_or_none(filters):
    """
    Generates all combinations of zero or one filter from the given list.

    This function creates an iterable that yields combinations of filters, 
    where each combination includes either no filter or exactly one filter 
    from the provided list. The function returns an iterator that produces 
    these combinations on demand.

    Parameters:
        filters (list): A list of Filters() to generate combinations from.

    Returns:
        itertools.chain: An iterator yielding tuples that contain either 
        no filters or one filter from the provided list. Each tuple is a 
        valid combination.

    Example:
        filters = ['filter1', 'filter2']
        result = generate_combinations_with_one_or_none(filters)
        print(list(result)) 
        # Output: [(), ('filter1',), ('filter2',)]
    """
    return chain.from_iterable(combinations(filters, r) for r in range(2))

`generate_valid_combinations(*filter_lists)`

Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

This function takes multiple lists of filters as input and generates all possible combinations where each combination contains zero or one filter from each list. The Cartesian product of the combinations from each list is computed to create all valid combinations across multiple filter categories.

Parameters:

Name	Type	Description	Default
`*filter_lists`	`list of lists`	Variable number of filter lists. Each argument is a	`()`

Returns:

Name	Type	Description
`list`		A list of valid filter combinations. Each combination is represented as a list of filters,
		where the filters are chosen from the input lists. The result includes combinations with zero
		or one filter from each list.

Example

filters_energy = ['E1', 'E2'] filters_time = ['T1', 'T2'] result = generate_valid_combinations(filters_energy, filters_time) print(result)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]

Source code in exod/post_processing/filter.py

def generate_valid_combinations(*filter_lists):
    """
    Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

    This function takes multiple lists of filters as input and generates all possible combinations 
    where each combination contains zero or one filter from each list. The Cartesian product 
    of the combinations from each list is computed to create all valid combinations across 
    multiple filter categories.

    Parameters:
        *filter_lists (list of lists): Variable number of filter lists. Each argument is a 
        list of filters representing a different filter category. The function will combine 
        filters from each list, with each combination containing at most one filter from 
        each category.

    Returns:
        list: A list of valid filter combinations. Each combination is represented as a list of filters, 
        where the filters are chosen from the input lists. The result includes combinations with zero 
        or one filter from each list.

    Example:
        filters_energy = ['E1', 'E2']
        filters_time = ['T1', 'T2']
        result = generate_valid_combinations(filters_energy, filters_time)
        print(result)
        # Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
    """
    combinations_lists = [generate_combinations_with_one_or_none(filters) for filters in filter_lists]
    valid_combinations = product(*combinations_lists)
    valid_combinations = [list(chain.from_iterable(combo)) for combo in valid_combinations]
    return valid_combinations

`filter_subsets`

`FilterBase`

Bases: ABC

Base class for filters. All filters should inherit from this class.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the filter.	required

Methods:

Name	Description
`apply`	Apply the filter to the input dataframe.
`get_parameters`	Return a dictionary of filter-specific parameters.
`info`	Return a dictionary with information about the filter.

Source code in exod/post_processing/filter.py

class FilterBase(ABC):
    """
    Base class for filters. All filters should inherit from this class.

    Parameters:
        name (str): Name of the filter.

    Methods:
        apply(df): Apply the filter to the input dataframe.
        get_parameters(): Return a dictionary of filter-specific parameters.
        info(): Return a dictionary with information about the filter.
    """
    def __init__(self, name):
        self.name = name
        self.df = None
        self.df_filtered = None
        self.df_removed = None

    @abstractmethod
    def apply(self, df):
        """This should set self.df_regions, self.df_filtered and self.removed."""

    @abstractmethod
    def get_parameters(self):
        """This should return a dictionary of filter-specific parameters."""

    def info(self):
        info = {}
        info['name'] = self.name
        info['parameters'] = self.get_parameters()
        info['n_original'] = len(self.df)
        info['n_filtered'] = len(self.df_filtered)
        info['n_removed'] = len(self.df_removed)
        return info

    def __repr__(self):
        return f'Filter(name="{self.name}")'

`apply(df)` `abstractmethod`

This should set self.df_regions, self.df_filtered and self.removed.

Source code in exod/post_processing/filter.py

@abstractmethod
def apply(self, df):
    """This should set self.df_regions, self.df_filtered and self.removed."""

`get_parameters()` `abstractmethod`

This should return a dictionary of filter-specific parameters.

Source code in exod/post_processing/filter.py

@abstractmethod
def get_parameters(self):
    """This should return a dictionary of filter-specific parameters."""

`generate_combinations_with_one_or_none(filters)`

Generates all combinations of zero or one filter from the given list.

This function creates an iterable that yields combinations of filters, where each combination includes either no filter or exactly one filter from the provided list. The function returns an iterator that produces these combinations on demand.

Parameters:

Name	Type	Description	Default
`filters`	`list`	A list of Filters() to generate combinations from.	required

Returns:

Type	Description
	itertools.chain: An iterator yielding tuples that contain either
	no filters or one filter from the provided list. Each tuple is a
	valid combination.

Example

filters = ['filter1', 'filter2'] result = generate_combinations_with_one_or_none(filters) print(list(result))

Output: [(), ('filter1',), ('filter2',)]

Source code in exod/post_processing/filter.py

def generate_combinations_with_one_or_none(filters):
    """
    Generates all combinations of zero or one filter from the given list.

    This function creates an iterable that yields combinations of filters, 
    where each combination includes either no filter or exactly one filter 
    from the provided list. The function returns an iterator that produces 
    these combinations on demand.

    Parameters:
        filters (list): A list of Filters() to generate combinations from.

    Returns:
        itertools.chain: An iterator yielding tuples that contain either 
        no filters or one filter from the provided list. Each tuple is a 
        valid combination.

    Example:
        filters = ['filter1', 'filter2']
        result = generate_combinations_with_one_or_none(filters)
        print(list(result)) 
        # Output: [(), ('filter1',), ('filter2',)]
    """
    return chain.from_iterable(combinations(filters, r) for r in range(2))

`generate_valid_combinations(*filter_lists)`

Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

This function takes multiple lists of filters as input and generates all possible combinations where each combination contains zero or one filter from each list. The Cartesian product of the combinations from each list is computed to create all valid combinations across multiple filter categories.

Parameters:

Name	Type	Description	Default
`*filter_lists`	`list of lists`	Variable number of filter lists. Each argument is a	`()`

Returns:

Name	Type	Description
`list`		A list of valid filter combinations. Each combination is represented as a list of filters,
		where the filters are chosen from the input lists. The result includes combinations with zero
		or one filter from each list.

Example

filters_energy = ['E1', 'E2'] filters_time = ['T1', 'T2'] result = generate_valid_combinations(filters_energy, filters_time) print(result)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]

Source code in exod/post_processing/filter.py

def generate_valid_combinations(*filter_lists):
    """
    Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

    This function takes multiple lists of filters as input and generates all possible combinations 
    where each combination contains zero or one filter from each list. The Cartesian product 
    of the combinations from each list is computed to create all valid combinations across 
    multiple filter categories.

    Parameters:
        *filter_lists (list of lists): Variable number of filter lists. Each argument is a 
        list of filters representing a different filter category. The function will combine 
        filters from each list, with each combination containing at most one filter from 
        each category.

    Returns:
        list: A list of valid filter combinations. Each combination is represented as a list of filters, 
        where the filters are chosen from the input lists. The result includes combinations with zero 
        or one filter from each list.

    Example:
        filters_energy = ['E1', 'E2']
        filters_time = ['T1', 'T2']
        result = generate_valid_combinations(filters_energy, filters_time)
        print(result)
        # Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
    """
    combinations_lists = [generate_combinations_with_one_or_none(filters) for filters in filter_lists]
    valid_combinations = product(*combinations_lists)
    valid_combinations = [list(chain.from_iterable(combo)) for combo in valid_combinations]
    return valid_combinations

`hot_regions`

`calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions)`

Calculate the hot pixel flags for use on df_regions.

Source code in exod/post_processing/hot_regions.py

def calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions):
    """Calculate the hot pixel flags for use on df_regions."""
    all_masks = []
    for k, v in hot_regions.items():
        # Get the mask for the rows in the table that correspond to the time binning.
        mask_t_bin = df_regions_rotated['runid'].str.contains(k)

        # Loop over each hot region and calculate the mask on df_regions_rotated that corresponds
        # to that specific hot region, append these all together in a list.
        for i, (x,y,w,h) in enumerate(v):
            mask = ((df_regions_rotated['X_EPIC'] < x+w) & (df_regions_rotated['X_EPIC'] > x)
                  & (df_regions_rotated['Y_EPIC'] < y+h) & (df_regions_rotated['Y_EPIC'] > y)
                  & mask_t_bin)

            all_masks.append(mask)
    mask_hot_pixels = np.logical_or.reduce(all_masks)

    filt_hot_pixel  = np.isin(df_regions['detid'], df_regions_rotated[mask_hot_pixels]['detid'])

    return filt_hot_pixel

`get_pointing_angle(obsid, tab_xmm_obslist)`

Get the pointing angle for a given observation ID, reads from the xmm_obslist table.

Source code in exod/post_processing/hot_regions.py

def get_pointing_angle(obsid, tab_xmm_obslist):
    """
    Get the pointing angle for a given observation ID, reads from the xmm_obslist table.
    """
    angle = tab_xmm_obslist[tab_xmm_obslist['OBS_ID'] == obsid]['PA_PNT'].value[0]
    return angle

`plot_detector_coords_soft_and_hard(df_regions_rotated)`

Plot the detector coords plots for soft and hard energies.

Source code in exod/post_processing/hot_regions.py

def plot_detector_coords_soft_and_hard(df_regions_rotated):
    """Plot the detector coords plots for soft and hard energies."""
    energy_bands = ['_2.0_12.0', '_0.2_2.0']
    for e in energy_bands:
        sub = df_regions_rotated[df_regions_rotated['runid'].str.contains(e)]
        fig, ax = plt.subplots(figsize=(7, 7))
        ax.set_title(f'Energy Range: {e} keV')
        ax.scatter(sub['X_EPIC'], sub['Y_EPIC'], s=1.0, marker='.', color='black', alpha=0.2)
        ax.set_xlim(-17500, 17500)
        ax.set_ylim(-17500, 17500)
        plt.savefig(data_plots / f'spatial_dist{e}.png')
        plt.savefig(data_plots / f'spatial_dist{e}.pdf')
        print(f'Saving to: spatial_dist{e}.png')

`plot_hot_regions(df_regions_rotated, hot_regions)`

Plot the hot regions and label them.

Source code in exod/post_processing/hot_regions.py

def plot_hot_regions(df_regions_rotated, hot_regions):
    """
    Plot the hot regions and label them.
    """
    # Plot the hot regions.
    for k, v in hot_regions.items():
        sub = df_regions_rotated[df_regions_rotated['runid'].str.contains(k)]

        fig, ax = plt.subplots(figsize=(7, 7))
        ax.scatter(sub['X_EPIC'], sub['Y_EPIC'], s=1.0, marker='.', color='black', alpha=0.2)
        for i, (x, y, w, h) in enumerate(v):
            mask = (sub['X_EPIC'] < x + w) & (sub['X_EPIC'] > x) & (sub['Y_EPIC'] < y + h) & (sub['Y_EPIC'] > y)
            sub2 = sub[mask]
            n_reg = len(sub2)
            n_obs = len(sub2['obsid'].unique())

            ax.scatter(sub2['X_EPIC'], sub2['Y_EPIC'], s=2, alpha=1.0, color='red', label=f'{i} : $N_{{reg}}$={n_reg} $N_{{obs}}$={n_obs}')
            offset = 1500
            ax.text(x - offset, y + offset, s=f'{i}', color='black', fontsize=12,
                    bbox=dict(facecolor='white', edgecolor='black', boxstyle='round,pad=0.1'))
        ax.legend(ncol=3, bbox_to_anchor=(1, 0, 0, 1))
        ax.set_xlim(-17500, 17500)
        ax.set_ylim(-17500, 17500)
        plt.savefig(data_plots / f'hot_regions{k}s.png')
        plt.savefig(data_plots / f'hot_regions{k}s.pdf')
        print(f'Saving to: hot_regions{k}s.png')

`main`

`check_results_shape()`

Check if the length of the results are consistent.

Source code in exod/post_processing/main.py

def check_results_shape():
    """Check if the length of the results are consistent."""
    print('Checking results shape...')
    print('-------------------------')
    shapes = {}
    for name, path in savepaths_combined.items():
        if name == 'lc':
            continue
        if 'fits' in path.name:
            continue
        if 'alerts' in path.name:
            continue
        print(name, path)
        df = pd.read_csv(path)
        shape = df.shape
        shapes[name] = shape
        print(f'{name:<15}: {shape}')

    assert shapes['regions_unique'][0] == shapes['cmatch_simbad'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_gaia'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_om'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_dr14'][0]
    print('-------------------------\n\n')

`make_exod_cat`

`read_and_print_table_and_header(savepath)`

Used to checking the file at the end.

Source code in exod/post_processing/make_exod_cat.py

def read_and_print_table_and_header(savepath):
    """Used to checking the file at the end."""
    with fits.open(savepath) as hdul:
        hdul.info()
        hdr = hdul[1].header
        print(repr(hdr))
    tab = Table.read(savepath)
    tab.pprint(max_width=-1)

`results_manager`

`create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0)`

Create Source Identifiers from coordinates in degrees (IAU name).

Source code in exod/post_processing/results_manager.py

def create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0):
    """Create Source Identifiers from coordinates in degrees (IAU name)."""
    sc = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg')
    srcids = []
    for s in sc:
        ra   = s.ra.to_string(unit=u.hour, sep='', precision=ra_precision, pad=True)
        dec  = s.dec.to_string(sep='', precision=dec_precision, alwayssign=True, pad=True)
        name = f"EXOD J{ra}{dec}"
        srcids.append(name)
    return srcids

`util`

`calc_detid_column(df_regions)`

Calculate the detection id, defined as {runid}_{label}

Source code in exod/post_processing/util.py

def calc_detid_column(df_regions):
    """Calculate the detection id, defined as {runid}_{label}"""
    df_regions['detid'] = df_regions['runid'] + '_' + df_regions['label'].astype(str)
    return df_regions

`get_lc(key, df_lc_idx)`

label : ('0761070101_0_5_0.2_12.0', '0')

Source code in exod/post_processing/util.py

def get_lc(key, df_lc_idx):
    """ label : ('0761070101_0_5_0.2_12.0', '0')"""
    key = str(key)
    start, stop = df_lc_idx.loc[key]
    df_lc = pd.read_hdf(savepaths_combined['lc'], start=start, stop=stop)
    df_lc['t0'] = df_lc['time'] - df_lc['time'].min()
    return df_lc

`pre_processing`

`download_observations`

`download_observation_events(obsid, clobber=False)`

Download the post-processed event lists for PN, M1 and M2.

Source code in exod/pre_processing/download_observations.py

def download_observation_events(obsid, clobber=False):
    """
    Download the post-processed event lists for PN, M1 and M2.
    """
    logger.info(f'Downloading observations for obsid={obsid}')
    url_PN = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=PN&level=PPS&name=PIEVLI'
    url_M1 = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=M1&level=PPS&name=MIEVLI'
    url_M2 = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=M2&level=PPS&name=MIEVLI'
    url_src= f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&name=OBSMLI&level=PPS'

    urls = {'PN' : url_PN,
            'M1' : url_M1,
            'M2' : url_M2,
            'src': url_src}

    globs = {'PN' : '*PN*FTZ',
             'M1' : '*M1*FTZ',
             'M2' : '*M2*FTZ',
             'src': '*EP*OBSMLI*FTZ'}

    for inst, download_url in urls.items():
        # Create the folder to save to
        obs_path = path.data_raw / f'{obsid}'

        os.makedirs(obs_path, exist_ok=True)
        if not clobber and len(list(obs_path.glob(globs[inst]))) > 0: # Dirty check for fits files
            logger.info(f'{globs[inst]} found, skipping...')
            continue

        response = requests.get(download_url)
        logger.info(response)
        if response.status_code == 200:
            # Get the filename from the response header
            cd = response.headers.get('content-disposition')
            filename = cd.split('filename=')[1].strip('";')
            file_path = obs_path / f'{filename}'
            if file_path.is_file() and not clobber:
                logger.info(f'Skipping {file_path}. File already exists.')
            else:
                logger.info(f'Response 200, downloading to {file_path}')
                with open(file_path, 'wb') as file:
                    file.write(response.content)
                logger.info(f'Downloaded: {file_path}')

                # Deal with GUEST.tar files (these show up if you have multiple eventlists in an observation)
                if 'GUEST' in filename:
                    logger.info(f'GUEST tar file found! Extracting to current dir!')
                    cmd = f'tar -xvf {file_path} -C {obs_path} --strip-components=2'
                    logger.info(f'Executing: {cmd}')
                    subprocess.run(cmd, shell=True)
        else:
            logger.warning(f'Failed to download event files for: {obsid} {inst}')

`event_filtering`

Functions to filter events files and create images from the raw data.

Requires having pre-run the sas task 'setsas' and the enviroment variable 'export CCFPATH=' in the terminal.

`espfilt(eventfile)`

https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/espfilt.html https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/node9.html

Source code in exod/pre_processing/event_filtering.py

def espfilt(eventfile):
    """
    https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/espfilt.html
    https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/node9.html
    """
    cmd = f'espfilt eventfile={eventfile}'
    run_cmd(cmd)

`processing`

`background_inference`

`calc_background_template(image_sub, image_mask_source)`

Calculate the background template for a given image_sub.

This is done by the following

Remove the sources from the image.
Calculate the number of counts in the background.
Inpaint the holes where the sources were.
Divide the image by the total counts in the background.

We then blur the image and inpaint again, this is to deal with issues where if large sources were removed we can fill them in, we should probably only do this if we need to.

Parameters:

Name	Type	Description	Default
`image_sub`	`ndarray`	Summed image of BTI or GTI frames.	required
`image_mask_source`	`ndarray`	Mask of the sources.	required

Returns:

Name	Type	Description
`image_sub_background_template`	`ndarray`	The background template for the image subset.
`count_sub_outside_sources`	`float`	The number of counts outside the sources (total counts in background)

Source code in exod/processing/background_inference.py

def calc_background_template(image_sub, image_mask_source):
    """
    Calculate the background template for a given image_sub.

    This is done by the following:
        1. Remove the sources from the image.
        2. Calculate the number of counts in the background.
        3. Inpaint the holes where the sources were.
        4. Divide the image by the total counts in the background.

    We then blur the image and inpaint again, this is to deal with issues where if large
    sources were removed we can fill them in, we should probably only do this if we need to.

    Parameters:
        image_sub (np.ndarray): Summed image of BTI or GTI frames.
        image_mask_source (np.ndarray): Mask of the sources.

    Returns:
        image_sub_background_template (np.ndarray): The background template for the image subset.
        count_sub_outside_sources (float): The number of counts outside the sources (total counts in background)
    """
    sigma_blurring = 0.5
    inpaint_method = INPAINT_NS # INPAINT_TELEA

    image_sub_no_source                       = inpaint(image_sub.astype(np.float32), image_mask_source.astype(np.uint8), inpaintRadius=2, flags=inpaint_method)
    count_sub_outside_sources                 = np.nansum(image_sub[~image_mask_source])
    image_sub_no_source_template              = image_sub_no_source / count_sub_outside_sources
    image_sub_no_source_template_blur         = convolve(image_sub_no_source_template, Gaussian2DKernel(sigma_blurring))
    image_mask_missing_pixels                 = (image_sub > 0) & np.isnan(image_sub_no_source_template_blur)
    image_sub_no_source_template_blur_inpaint = inpaint(image_sub_no_source_template_blur.astype(np.float32), image_mask_missing_pixels.astype(np.uint8), inpaintRadius=2, flags=inpaint_method)
    image_sub_background_template             = np.where(image_sub > 0, image_sub_no_source_template_blur_inpaint, np.nan)

    compare_images(images=[image_sub, image_sub_no_source],
                   titles=['image_sub', 'image_sub_no_source'],
                   log=False)

    compare_images(images=[image_sub_no_source_template,
                           image_sub_no_source_template_blur,
                           image_sub_no_source_template_blur_inpaint,
                           image_sub_background_template],
                   titles=['image_sub_no_source_template',
                           'image_sub_no_source_template_blur',
                           'image_sub_no_source_template_blur_inpaint',
                           'image_sub_background_template'],
                   log=False)
    return image_sub_background_template, count_sub_outside_sources

`calc_cube_mu(data_cube, wcs)`

Calculates an expectation (mu) data cube.

Any departure from this cube corresponds to variability.

The background is dealt with by assuming that all GTIs and BTIs follow respective templates (i.e., once each frame is divided by its total counts, they all look the same).

The sources are dealt with assuming they are constant. We take their net emission, and distribute it evenly across all frames.

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube object.	required
`wcs`	`WCS`	wcs object.	required

Returns: cube_mu (np.ndarray): The expectation (mu) data cube.

Source code in exod/processing/background_inference.py

def calc_cube_mu(data_cube, wcs):
    """
    Calculates an expectation (mu) data cube.

    Any departure from this cube corresponds to variability.

    The background is dealt with by assuming that all GTIs and BTIs follow
    respective templates (i.e., once each frame is divided by its total counts,
    they all look the same).

    The sources are dealt with assuming they are constant.
    We take their net emission, and distribute it evenly across all frames.

    Parameters:
        data_cube (DataCube): DataCube object.
        wcs (astropy.wcs.WCS): wcs object.
    Returns:
        cube_mu (np.ndarray): The expectation (mu) data cube.
    """
    cube_n = data_cube.data
    bti_bin_idx = data_cube.bti_bin_idx
    gti_bin_idx = data_cube.gti_bin_idx
    n_gti_bin = data_cube.n_gti_bin
    n_bti_bin = data_cube.n_bti_bin
    cube_gti = cube_n[:, :, gti_bin_idx]
    cube_bti = cube_n[:, :, bti_bin_idx]

    # Get the summed Images.
    image_total = np.nansum(cube_n, axis=2)
    image_gti = np.nansum(cube_gti, axis=2)

    # Create the source masks.
    # Two source masks are calculated then combined, the first comes from the pipeline detected
    # sources in the OBSMLI file. The second mask takes the remaining image and masks pixels that are
    # above 3 x 75% of the total image. These two masks are then combined using an OR statement.
    # This works well in most cases but often struggles when there are extended sources.
    image_mask_source_list = mask_known_sources(data_cube, wcs=wcs)  # Image mask from OBSMLI file.
    # source_threshold = np.nanpercentile(image_gti.flatten(), 99) # This or from detected sources
    source_threshold = 3 * np.nanpercentile(image_gti[~image_mask_source_list], q=75)
    image_mask_source_percentile = image_gti > source_threshold
    image_mask_source = image_mask_source_list | image_mask_source_percentile

    if n_gti_bin:
        logger.info('Calculating gti template...')
        image_gti = np.nansum(cube_gti, axis=2)
        image_gti_background_template, count_gti_outside_sources = calc_background_template(image_sub=image_gti, image_mask_source=image_mask_source)

    if n_bti_bin:
        logger.info('Calculating bti template...')
        image_bti = np.nansum(cube_bti, axis=2) # Image of all btis combined, with sources
        image_bti_background_template, count_bti_outside_sources = calc_background_template(image_sub=image_bti, image_mask_source=image_mask_source)

    # image_bti_no_source_template_blur=image_bti_no_source_template

    # Obtain the Image with the mean source contribution (zero everywhere that is not a source)
    if n_gti_bin > n_bti_bin:
        logger.info('Calculating source contribution using gtis')
        image_source_template = calc_source_template(image_gti, image_gti_background_template, image_mask_source,
                                                     data_cube, count_gti_outside_sources, gti_bin_idx)
    else:
        logger.info('Calculating source contribution using btis')
        image_source_template = calc_source_template(image_bti, image_bti_background_template, image_mask_source,
                                                     data_cube, count_bti_outside_sources, bti_bin_idx)

    # Get data cube outside the sources to obtain the background light curve.
    cube_mask_source       = np.repeat(image_mask_source[:, :, np.newaxis], cube_n.shape[2], axis=2)
    cube_n_outside_sources = np.where(cube_mask_source, np.nan, cube_n)
    lc_outside_sources     = np.nansum(cube_n_outside_sources, axis=(0, 1)) # Essentially the background light curve.


    logger.info('Creating expected (mu) cube...')
    # Create the expectation cube by combining the source and background templates.
    cube_mu = np.empty(cube_n.shape)
    if n_gti_bin:
        cube_mu[:,:,gti_bin_idx] = image_gti_background_template[:,:,np.newaxis] * lc_outside_sources[gti_bin_idx]
    if n_bti_bin:
        cube_mu[:,:,bti_bin_idx] = image_bti_background_template[:,:,np.newaxis] * lc_outside_sources[bti_bin_idx]
    cube_mu += image_source_template[:,:,np.newaxis] * data_cube.relative_frame_exposures
    cube_mu = np.where(np.nansum(cube_n, axis=(0,1)) > 0, cube_mu, np.nan)
    logger.info('Expected cube created!')
    return cube_mu

`calc_source_template(image_sub, image_sub_background_template, image_mask_source, data_cube, count_sub_outside_sources, subset_bin_idx)`

Calculate the average contribution from the sources in the field of the observation.

This is done by masking out the background and dividing by the effective exposed frames. This essentially gives the image of the average contribution of the sources in each frame.

Parameters:

Name	Type	Description	Default
`image_sub`	`ndarray`	image of GTI or BTIs obtained by summing the frames.	required
`image_sub_background_template`	`ndarray`	The template for the BTI or GTI obtained via calc_background_template.	required
`image_mask_source`	`ndarray`	Mask of the sources.	required
`data_cube`	`DataCube`	DataCube object.	required
`count_sub_outside_sources`	`float`	The number of counts outside the sources (total counts in background).	required
`subset_bin_idx`	`ndarray`	The bins corresponding to the subset (either the GTI or BTIs).	required

Returns:

Name	Type	Description
`image_source_template`	`ndarray`	The average contribution of the sources in each frame.

Source code in exod/processing/background_inference.py

def calc_source_template(image_sub, image_sub_background_template, image_mask_source, data_cube,
                         count_sub_outside_sources, subset_bin_idx):
    """
    Calculate the average contribution from the sources in the field of the observation.

    This is done by masking out the background and dividing by the effective exposed frames.
    This essentially gives the image of the average contribution of the sources in each frame.

    Parameters:
        image_sub (np.ndarray): image of GTI or BTIs obtained by summing the frames.
        image_sub_background_template (np.ndarray): The template for the BTI or GTI obtained via calc_background_template.
        image_mask_source (np.ndarray): Mask of the sources.
        data_cube (DataCube): DataCube object.
        count_sub_outside_sources (float): The number of counts outside the sources (total counts in background).
        subset_bin_idx (np.ndarray): The bins corresponding to the subset (either the GTI or BTIs).

    Returns:
        image_source_template (np.ndarray): The average contribution of the sources in each frame.
    """
    image_sub_source_only1   = image_sub - image_sub_background_template * count_sub_outside_sources
    image_sub_source_only2   = np.where(image_mask_source, image_sub_source_only1, 0)           # Replace everything that is not a source with 0
    image_sub_source_only3   = np.where(image_sub_source_only2 > 0, image_sub_source_only2, 0)  # Replace negative values with 0
    effective_exposed_frames = np.sum(data_cube.relative_frame_exposures[subset_bin_idx])       # Analagous to n_gti_bin if relative exposures = 1
    image_source_template    = image_sub_source_only3 / effective_exposed_frames                # Average value of the source contribution per frame.

    compare_images(images=[image_sub_source_only1, image_sub_source_only2, image_sub_source_only3, image_source_template],
                   titles=['image_sub_source_only1', 'image_sub_source_only2', 'image_sub_source_only3', 'image_source_template'],
                   log=False)
    return image_source_template

`mask_known_sources(data_cube, wcs)`

Mask known sources from the data cube.

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube object.	required
`wcs`	`WCS`	wcs object.	required

Returns:

Name	Type	Description
`image_mask`	`ndarray`	The mask of the sources.

Source code in exod/processing/background_inference.py

def mask_known_sources(data_cube, wcs):
    """
    Mask known sources from the data cube.

    Parameters:
        data_cube (DataCube): DataCube object.
        wcs (astropy.wcs.WCS): wcs object.

    Returns:
        image_mask (np.ndarray): The mask of the sources.
    """
    def cropping_radius_counts(epic_total_rate, size_arcsec):
        """
        Get cropping radius for a given EPIC count rate.
        0.0 < RATE < 0.1 = 20
        0.1 < RATE < 1.0 = 40
        RATE > 1.0 = 80
        """
        if epic_total_rate > 1:
            return 80
        elif epic_total_rate > 0.1:
            return 40
        else:
            return size_arcsec

    observation = Observation(data_cube.event_list.obsid)
    observation.get_source_list()
    obsmli_file_path = observation.source_list
    logger.info(f'OBSMLI file: {obsmli_file_path}')
    tab_src = Table(fits.open(obsmli_file_path)[1].data)

    # We include bright (i.e. large DET_ML) point sources & extended sources
    tab_src_point    = tab_src[(tab_src['EP_DET_ML'] > 8) & (tab_src['EP_EXTENT'] == 0)]
    tab_src_extended = tab_src[(tab_src['EP_DET_ML'] > 8) & (tab_src['EP_EXTENT'] > 0)]

    radius_point_sources    = [cropping_radius_counts(counts, data_cube.size_arcsec) for counts in tab_src_point['EP_TOT']]
    radius_extended_sources = tab_src_extended['EP_EXTENT']

    logger.info(f'Total rows: {len(tab_src)} Point sources: {len(tab_src_point)} Extended Sources: {len(tab_src_extended)}')

    image_mask = np.full(data_cube.shape[:2], fill_value=False)

    for tab_data, tab_radius, color in zip((tab_src_point, tab_src_extended),
                                           (radius_point_sources, radius_extended_sources),
                                           ('r', 'y')):
        if len(tab_data) == 0:
            logger.info('No rows in tab_data, skipping...')
            continue

        # Create skycoord
        sc = SkyCoord(ra=tab_data['RA'], dec=tab_data['DEC'], unit='deg')
        x_img, y_img = wcs.world_to_pixel(sc)

        # Convert to Sky Coordinates
        X = x_img * 80
        Y = y_img * 80

        # Remove values outside the cube
        xcube_max, xcube_min = data_cube.bin_x[-1], data_cube.bin_x[0]
        ycube_max, ycube_min = data_cube.bin_y[-1], data_cube.bin_y[0]
        XY = np.array([[x, y] for x, y in zip(X, Y) if ((x < xcube_max) and
                                                        (y < ycube_max) and
                                                        (x > xcube_min) and
                                                        (y > ycube_min))]).T
        X = XY[0]
        Y = XY[1]

        # Interpolate to Cube coordinates
        interp_x_cube = interp1d(x=data_cube.bin_x, y=range(data_cube.shape[0]))
        interp_y_cube = interp1d(x=data_cube.bin_y, y=range(data_cube.shape[1]))
        x_cube = interp_x_cube(X)
        y_cube = interp_y_cube(Y)

        for x, y, rad in zip(x_cube, y_cube, tab_radius):
            radius_image = rad / data_cube.size_arcsec
            rr, cc = disk(center=(x, y), radius=radius_image)
            kept_pixels = (rr>0) & (rr<data_cube.shape[0]-1) & (cc>0) & (cc<data_cube.shape[1]-1) # Keep mask pixels only if in image
            rr, cc = rr[kept_pixels], cc[kept_pixels]
            image_mask[rr, cc] = True

    #         im[rr,cc] = np.nan
    #         circle = plt.Circle((x, y), radius_image, color=color, fill=False)
    #         plt.gca().add_patch(circle)

    # plt.imshow(im.T, origin='lower', norm=LogNorm(vmax=1e3), interpolation='none')
    # plt.show()
    # for x, y in zip(x_cube, y_cube):
    #     im[x - 1:x + 1, y - 1:y + 1] = 0

    # plt.figure(figsize=(10, 10))
    # plt.imshow(im.T, origin='lower', norm=LogNorm(), interpolation='none')
    return image_mask

`bayesian_computations`

`PrecomputeBayesLimits`

Source code in exod/processing/bayesian_computations.py

class PrecomputeBayesLimits:
    _instances = {}

    def __new__(cls, threshold_sigma):
        """
        Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

        This method implements a multiton pattern, where only one instance of
        PrecomputeBayesLimits exists per unique value of `threshold_sigma`.

        When a new threshold value is requested, a new instance is created and stored in
        the `_instances` dictionary, keyed by `threshold_sigma`. If an instance with the same `threshold_sigma`
        already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

        Args:
            threshold_sigma (int): The significance threshold for the instance.

        Returns:
            PrecomputeBayesLimits: The existing or newly created instance with  the specified `threshold_sigma`.
        """
        if threshold_sigma not in cls._instances:
            instance = super(PrecomputeBayesLimits, cls).__new__(cls)
            instance.threshold_sigma = threshold_sigma
            logger.warning(f'Creating new PrecomputeBayesLimits() instance threshold_sigma={threshold_sigma}.')
            cls._instances[threshold_sigma] = instance
        return cls._instances[threshold_sigma]

    def __init__(self, threshold_sigma):
        self.threshold_sigma = threshold_sigma
        self.get_savepath()
        self.range_mu = None
        self.n_peak_threshold = None
        self.n_eclipse_threshold = None
        self.is_loaded = False
        self.load()

    def __repr__(self):
        return f'PrecomputeBayesLimits(threshold_sigma={self.threshold_sigma})'

    def get_savepath(self):
        self.savepath = path.data_util / f'bayesfactorlimits_{self.threshold_sigma}.txt'
        if not self.savepath.exists():
            logger.info(f'{self.savepath} does not exist. Precomputing Bayes Factors...')
            precompute_bayes_limits(threshold_sigma=self.threshold_sigma)
        return self.savepath

    def load(self):
        if self.is_loaded:
            return None
        range_mu, n_peak_threshold, n_eclipse_threshold = load_precomputed_bayes_limits(threshold_sigma=self.threshold_sigma)
        self.range_mu = range_mu
        self.n_peak_threshold = n_peak_threshold
        self.n_eclipse_threshold = n_eclipse_threshold
        self.is_loaded = True

    def get_cube_masks_peak_and_eclipse(self, cube_n, cube_mu):
        cube_mu = np.where(cube_mu > self.range_mu[0], cube_mu, np.nan)  # Remove small expectation values outside of interpolation range
        cube_mask_peaks   = cube_n > self.n_peak_threshold(cube_mu)
        cube_mask_eclipse = cube_n < self.n_eclipse_threshold(cube_mu)
        return cube_mask_peaks, cube_mask_eclipse

`new(threshold_sigma)`

Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

This method implements a multiton pattern, where only one instance of PrecomputeBayesLimits exists per unique value of threshold_sigma.

When a new threshold value is requested, a new instance is created and stored in the _instances dictionary, keyed by threshold_sigma. If an instance with the same threshold_sigma already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

Parameters:

Name	Type	Description	Default
`threshold_sigma`	`int`	The significance threshold for the instance.	required

Returns:

Name	Type	Description
`PrecomputeBayesLimits`		The existing or newly created instance with the specified `threshold_sigma`.

Source code in exod/processing/bayesian_computations.py

def __new__(cls, threshold_sigma):
    """
    Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

    This method implements a multiton pattern, where only one instance of
    PrecomputeBayesLimits exists per unique value of `threshold_sigma`.

    When a new threshold value is requested, a new instance is created and stored in
    the `_instances` dictionary, keyed by `threshold_sigma`. If an instance with the same `threshold_sigma`
    already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

    Args:
        threshold_sigma (int): The significance threshold for the instance.

    Returns:
        PrecomputeBayesLimits: The existing or newly created instance with  the specified `threshold_sigma`.
    """
    if threshold_sigma not in cls._instances:
        instance = super(PrecomputeBayesLimits, cls).__new__(cls)
        instance.threshold_sigma = threshold_sigma
        logger.warning(f'Creating new PrecomputeBayesLimits() instance threshold_sigma={threshold_sigma}.')
        cls._instances[threshold_sigma] = instance
    return cls._instances[threshold_sigma]

`B_eclipse(n, mu)`

Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts

Source code in exod/processing/bayesian_computations.py

def B_eclipse(n, mu):
    """Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts"""
    return gammainc(n + 1, mu) / poisson.pmf(n, mu)

`B_eclipse_log(n, mu)`

Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts

Source code in exod/processing/bayesian_computations.py

def B_eclipse_log(n, mu):
    """Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts"""
    return np.log10(gammainc(n + 1, mu)) - np.log10(poisson.pmf(n, mu))

`B_peak(n, mu)`

Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts.

Source code in exod/processing/bayesian_computations.py

def B_peak(n, mu):
    """Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts."""
    return gammaincc(n + 1, mu) / poisson.pmf(n, mu)

`B_peak_log(n, mu)`

Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts.

Source code in exod/processing/bayesian_computations.py

def B_peak_log(n, mu):
    """Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts."""
    return np.log10(gammaincc(n + 1, mu)) - np.log10(poisson.pmf(n, mu))

`get_bayes_thresholds(threshold_sigma)`

The thresholds for B_peak and B_eclipse are calculated here for 3 and 5 sigma.

This is sort of a hack, and is done by finding the value of B(n,mu) that is equal to a given significance (sigma) under the Gaussian assumption at mu=1000.

For example, we want a significance level of sigma = 3. We first find what observed (n) value we need to get a 3 sigma peak Gaussian assumption. n_peak_large_mu(mu=1000, sigma=3) = 1139

Next, we find the value of B_peak that an observed (n) value of 1139 would give us. B_peak(n=1139, mu=1000) = 872908 (5.94 in log10)

We treat this value of B as being "Equivalent" to a 3 sigma detection and subsequently can specify: B > B_threshold_peak for a peak detection B < B_eclipse_threshold for an eclipse detection

Source code in exod/processing/bayesian_computations.py

def get_bayes_thresholds(threshold_sigma):
    """
    The thresholds for B_peak and B_eclipse are calculated here for 3 and 5 sigma.

    This is sort of a hack, and is done by finding the value of B(n,mu) that is equal to
    a given significance (sigma) under the Gaussian assumption at mu=1000.

    For example, we want a significance level of sigma = 3.
    We first find what observed (n) value we need to get a 3 sigma peak Gaussian assumption.
        n_peak_large_mu(mu=1000, sigma=3) = 1139

    Next, we find the value of B_peak that an observed (n) value of 1139 would give us.
        B_peak(n=1139, mu=1000) = 872908  (5.94 in log10)

    We treat this value of B as being "Equivalent" to a 3 sigma detection and subsequently can specify:
        B > B_threshold_peak for a peak detection
        B < B_eclipse_threshold for an eclipse detection
    """
    B_peak_threshold    = B_peak_log(n=n_peak_large_mu(mu=1000, sigma=threshold_sigma), mu=1000)
    B_eclipse_threshold = B_eclipse_log(n=n_eclipse_large_mu(mu=1000, sigma=threshold_sigma), mu=1000)
    return B_peak_threshold, B_eclipse_threshold

`get_cube_masks_peak_and_eclipse(cube_n, cube_mu, threshold_sigma)`

Returns two cubes with booleans where the rate correspond to a peak or an eclipse.

Source code in exod/processing/bayesian_computations.py

def get_cube_masks_peak_and_eclipse(cube_n, cube_mu, threshold_sigma):
    """Returns two cubes with booleans where the rate correspond to a peak or an eclipse."""
    range_mu, n_peak_threshold, n_eclipse_threshold = load_precomputed_bayes_limits(threshold_sigma=threshold_sigma)
    cube_mu = np.where(cube_mu > range_mu[0], cube_mu, np.nan) # Remove small expectation values outside of interpolation range
    cube_mask_peaks   = cube_n > n_peak_threshold(cube_mu)
    cube_mask_eclipse = cube_n < n_eclipse_threshold(cube_mu)
    return cube_mask_peaks, cube_mask_eclipse

`load_precomputed_bayes1000()`

Loads & interpolates the precomputed values of Bayes factors at mu=1000

Source code in exod/processing/bayesian_computations.py

def load_precomputed_bayes1000():
    """Loads & interpolates the precomputed values of Bayes factors at mu=1000"""
    data              = np.loadtxt(path.data_util / f'bayesfactor_mu1000.txt')
    range_N           = data[0]
    B_values_peaks    = interp1d(range_N, data[1])
    B_values_eclipses = interp1d(range_N, data[2])
    return range_N, B_values_peaks, B_values_eclipses

`load_precomputed_bayes_limits(threshold_sigma)`

Loads the precomputed Bayes factor limit numbers, for a chosen threshold.

Source code in exod/processing/bayesian_computations.py

def load_precomputed_bayes_limits(threshold_sigma):
    """Loads the precomputed Bayes factor limit numbers, for a chosen threshold."""
    data = np.loadtxt(path.data_util / f'bayesfactorlimits_{threshold_sigma}.txt')
    range_mu            = data[0]
    n_peak_threshold    = interp1d(range_mu, data[1])
    n_eclipse_threshold = interp1d(range_mu, data[2])
    return range_mu, n_peak_threshold, n_eclipse_threshold

`n_eclipse_large_mu(mu, sigma)`

Calculate the observed (n) value required for an eclipse at a specific expecation (mu) and significance (sigma) in the Gaussian regime.

Source code in exod/processing/bayesian_computations.py

def n_eclipse_large_mu(mu, sigma):
    """Calculate the observed (n) value required for an eclipse at a specific expecation (mu) and significance (sigma) in the Gaussian regime."""
    return np.floor((2*mu+sigma**2-np.sqrt(8*mu*(sigma**2)+sigma**4))/2)

`n_peak_large_mu(mu, sigma)`

Calculate the observed (n) value required for a peak at a specific expectation (mu) and significance (sigma) in the Gaussian regime.

Source code in exod/processing/bayesian_computations.py

def n_peak_large_mu(mu, sigma):
    """Calculate the observed (n) value required for a peak at a specific expectation (mu) and significance (sigma) in the Gaussian regime."""
    return np.ceil((2*mu+sigma**2+np.sqrt(8*mu*(sigma**2)+sigma**4))/2)

`precompute_bayes_1000()`

Precomputes the Bayes factor at mu=1000 for a bunch of values of N. Will be interpolated to estimate the sigma

Source code in exod/processing/bayesian_computations.py

def precompute_bayes_1000():
    """Precomputes the Bayes factor at mu=1000 for a bunch of values of N. Will be interpolated to estimate the sigma"""
    range_N    = np.arange(10000)
    B_peaks    = B_peak_log(n=range_N, mu=1000)
    B_eclipses = B_eclipse_log(n=range_N, mu=1000)
    data = np.array([range_N, B_peaks, B_eclipses])
    savepath = path.data_util / f'bayesfactor_mu1000.txt'
    logger.info(f'Saving to {savepath}')
    np.savetxt(savepath, data)

`precompute_bayes_limits(threshold_sigma)`

Compute the minimum and maximum number of observed counts (n) required for an eclipse or peak for a given confidence threshold (threshold_sigma) and expectation (mu).

For counts > 1000, we use a Gaussian approximation. sigma = (N-mu) / (N+mu)^0.5

Solving for N gives: N = 2mu + sigma^2 + (8 mu sigma^2 + sigma^4)^0.5 N = 2mu + sigma^2 - (8 mu sigma^2 + sigma^4)^0.5

The resulting table looks like

i mu n_peak n_eclipse

46000 158.556483 219.0 106.0 46001 158.629519 219.0 107.0 46002 158.702589 219.0 107.0 46003 158.775693 219.0 107.0

Each data cell in the observed and data_cube cube is then compared to the values pre-calculated here to determine if is it a peak or an eclipse.

An example for 1 frame is given here.

(n_cube) (mu_cube) is_3_sig_peak? 0 2 1 0.03 1.98 0.87 F F F 0 5 0 1.01 1.10 1.00 F T F 3 2 1 2.30 2.14 0.98 F F F

The evaluation of each data cell of the cube against a threshold is made faster by precomputing the counts here.

Source code in exod/processing/bayesian_computations.py

def precompute_bayes_limits(threshold_sigma):
    """
    Compute the minimum and maximum number of observed
    counts (n) required for an eclipse or peak for a given
    confidence threshold (threshold_sigma) and expectation (mu).

    For counts > 1000, we use a Gaussian approximation.
    sigma = (N-mu) / (N+mu)^0.5

    Solving for N gives:
    N = 2mu + sigma^2 + (8 mu sigma^2 + sigma^4)^0.5
    N = 2mu + sigma^2 - (8 mu sigma^2 + sigma^4)^0.5

    The resulting table looks like:
        i          mu  n_peak  n_eclipse
    46000  158.556483   219.0      106.0
    46001  158.629519   219.0      107.0
    46002  158.702589   219.0      107.0
    46003  158.775693   219.0      107.0


    Each data cell in the observed and data_cube cube
    is then compared to the values pre-calculated here
    to determine if is it a peak or an eclipse.

    An example for 1 frame is given here.

     (n_cube)    (mu_cube)       is_3_sig_peak?
      0 2 1    0.03 1.98 0.87       F F F
      0 5 0    1.01 1.10 1.00       F T F
      3 2 1    2.30 2.14 0.98       F F F

    The evaluation of each data cell of the cube against
    a threshold is made faster by precomputing the counts here.
    """
    B_peak_threshold, B_eclipse_threshold = get_bayes_thresholds(threshold_sigma=threshold_sigma)

    range_mu       = np.geomspace(start=1e-7, stop=1e3, num=50000)
    range_mu_large = np.geomspace(start=1e3, stop=1e6, num=1000)   # Above 1000

    tab_npeak, tab_neclipse = [], []
    for mu in tqdm(range_mu):
        # Find the smallest observed (n) value for a peak
        range_n_peak = np.arange(max(10 * int(mu), 100))
        B_factors = B_peak_log(n=range_n_peak, mu=mu)
        n_peak_min = range_n_peak[B_factors > B_peak_threshold][0]
        tab_npeak.append(n_peak_min)

        # Get the largest observed (n) value for an eclipse
        range_n_eclipse = np.arange(2 * int(mu) + 1)
        B_factors = B_eclipse_log(n=range_n_eclipse, mu=mu)
        n_eclipse_max = range_n_eclipse[B_factors < B_eclipse_threshold][0]
        tab_neclipse.append(n_eclipse_max)

    tab_npeak    += list(n_peak_large_mu(range_mu_large, threshold_sigma))
    tab_neclipse += list(n_eclipse_large_mu(range_mu_large, threshold_sigma))

    range_mu = np.concatenate((range_mu, range_mu_large))

    data = np.array([range_mu, tab_npeak, tab_neclipse])
    savepath = path.data_util / f'bayesfactorlimits_{threshold_sigma}.txt'
    logger.info(f'Saving to {savepath}')
    np.savetxt(savepath, data)

`sigma_equivalent(n, mu)`

Find the equivalent sigma for a given observed (n) and expectation (mu).

For large counts (mu=1000), the required n to obtain a given sigma can be calculated assuming Gaussian statistics. This can be done using the functions: n_peak_large_mu(mu, sigma) and n_eclipse_large_mu(mu, sigma)

For small counts, we cannot use these functions, as we are in the Poisson regime, but we can calculate the Bayes factors for peaks and eclipses, using B_peak_log(n, mu) and B_eclipse_log(n, mu).

We can however for a given value of B, find out what the equivalent value of sigma would be. To do this we need to find the sigma that gives: B_peak(n, mu=1000) = B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000)

This is done by finding the root of the function

B_peak(n, mu=1000) - B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000) = 0

Source code in exod/processing/bayesian_computations.py

def sigma_equivalent(n, mu):
    """
    Find the equivalent sigma for a given observed (n) and expectation (mu).

    For large counts (mu=1000), the required n to obtain a given sigma can be calculated assuming Gaussian statistics.
    This can be done using the functions: n_peak_large_mu(mu, sigma) and n_eclipse_large_mu(mu, sigma)

    For small counts, we cannot use these functions, as we are in the Poisson regime, but we can calculate the
    Bayes factors for peaks and eclipses, using B_peak_log(n, mu) and B_eclipse_log(n, mu).

    We can however for a given value of B, find out what the equivalent value of sigma would be.
    To do this we need to find the sigma that gives:
        B_peak(n, mu=1000) = B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000)

    This is done by finding the root of the function:
        B_peak(n, mu=1000) - B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000) = 0
    """
    if n > mu:  # Peak
        b = B_peak_log(n, mu)
        function_to_invert = lambda sigma: b - B_peak_log(n_peak_large_mu(mu=1000, sigma=sigma), mu=1000)
    else:  # Eclipse
        b = B_eclipse_log(n, mu)
        function_to_invert = lambda sigma: b - B_eclipse_log(n_eclipse_large_mu(mu=1000, sigma=sigma), mu=1000)

    if function_to_invert(10) > 0:
        return 10
    elif function_to_invert(1) < 0:
        return 0
    else:
        return root_scalar(function_to_invert, bracket=(1, 10)).root

`sigma_equivalent_B_eclipse(B_eclipse)`

Find the equivalent sigma for a given Bayes factor for an eclipse. B_eclipse must be in log!

1.6 < B_eclipse < 42

0.00 < sigma < 9.94

Source code in exod/processing/bayesian_computations.py

def sigma_equivalent_B_eclipse(B_eclipse):
    """
    Find the equivalent sigma for a given Bayes factor for an eclipse. B_eclipse must be in log!

    Range: 1.6  < B_eclipse < 42
           0.00 < sigma     < 9.94
    """
    if B_eclipse < 1.6:
        return 0
    elif B_eclipse > 42:
        return 10
    f = lambda sigma: B_eclipse - B_eclipse_log(n_eclipse_large_mu(mu=1000, sigma=sigma), mu=1000)
    return root_scalar(f, bracket=(0, 10)).root

`sigma_equivalent_B_peak(B_peak)`

Find the equivalent sigma for a given Bayes factor for a peak. B_Peak must be in log!

1.61 < B_peak < 48.976

0.00 < sigma < 9.98

Source code in exod/processing/bayesian_computations.py

def sigma_equivalent_B_peak(B_peak):
    """
    Find the equivalent sigma for a given Bayes factor for a peak. B_Peak must be in log!

    Range: 1.61 < B_peak < 48.976
           0.00 < sigma  < 9.98
    """
    if B_peak< 1.61:
        return 0
    elif B_peak > 48.976:
        return 10
    f = lambda sigma: B_peak - B_peak_log(n_peak_large_mu(mu=1000, sigma=sigma), mu=1000)
    return root_scalar(f, bracket=(0, 10)).root

`bayesian_tests`

This module contains functions to test the Bayesian computations in the exod package.

`check_estimate_success()`

If I remember correctly, it is to assess if the count-rate estimation worked, independently of the value of the quiescent state mu. So, for a range of mu, I add a peak of three different amplitudes, I generate poisson realisations of mu+peak (the crosses), and see the value we can estimate for the peak (the envelopes). You have the residuals at the bottom It might be an earlier version, and not work anymore. It was just to check the success of the rate estimate function

Source code in exod/processing/bayesian_tests.py

def check_estimate_success():
    """
    If I remember correctly, it is to assess if the count-rate estimation
    worked, independently of the value of the quiescent state mu. So, for a
    range of mu, I add a peak of three different amplitudes, I generate poisson
    realisations of mu+peak (the crosses), and see the value we can estimate
    for the peak (the envelopes). You have the residuals at the bottom It might
    be an earlier version, and not work anymore. It was just to check the
    success of the rate estimate function
    """
    fig, ax = plt.subplots(2,1, sharex=True, gridspec_kw={'height_ratios' : [3, 1]})
    colors = cmr.take_cmap_colors(cmap='cmr.ocean', N=3, cmap_range=(0, 0.7))
    all_res = []

    for peak, color in zip((0.5, 2.5, 10), colors):
        tab_up, tab_mid, tab_low = [], [], []
        tab_N = []

        tab_mu = np.geomspace(start=1e-3, stop=1e2, num=100)
        for mu in tab_mu:
            N = np.random.poisson(mu+peak)
            tab_N.append(N)
            for fraction, tab in zip((0.01, 0.5, 0.99), (tab_low, tab_mid, tab_up)):
                peak_est = peak_count_estimate(fraction, N, mu)
                tab.append(peak_est)

                res = {}
                res['peak'] = peak
                res['mu'] = mu
                res['N'] = N
                res['fraction'] = fraction
                res['peak_count_estimate'] = peak_est
                all_res.append(res)

        tab_mid = np.array(tab_mid)
        tab_low = np.array(tab_low)
        tab_up  = np.array(tab_up)
        residual = np.where(tab_mid > peak,
                            (tab_mid - peak) / (tab_mid - tab_low),
                            (peak - tab_mid) / (tab_up - tab_mid))

        ax[0].fill_between(tab_mu, tab_low, tab_up, alpha=0.4, facecolor=color)
        ax[0].scatter(tab_mu, tab_N, color=color, s=10, marker='x', label=f'Peak amplitude={peak}')
        ax[0].plot(tab_mu, tab_mid, color=color)
        ax[0].axhline(y=peak, color=color, ls='--')

        ax[1].errorbar(tab_mu, residual, yerr=1, fmt='.', color=color, capsize=1.0, lw=1.0)
        ax[1].axhline(0, ls='--', color='black', lw=1.0)

        ax[0].set_xlim(0)
        ax[0].set_ylim(0)
        ax[0].set_xscale('log')
        ax[0].set_ylabel('Observed (n)')
        ax[1].set_xscale("log")
        ax[1].set_xlabel(r'Expectation ($\mu$)')
        ax[1].set_ylabel('Residual')

    plt.subplots_adjust(hspace=0)
    ax[0].legend(loc='upper left')
    plt.savefig(data_plots / 'check_estimate_success1.png')
    plt.savefig(data_plots / 'check_estimate_success1.pdf')
    df_res = pd.DataFrame(all_res)
    print(df_res)



    plt.figure(figsize=(6,6))
    for mu, color in zip((0.01, 1, 10), colors):
        tab_mid, tab_err, tab_errneg, tab_errpos=[],[],[],[]
        tab_peak = np.geomspace(1e-3,1e2,100)
        for peak in tab_peak:
        #     tabN = np.random.poisson(mu+peak, 50)
        #     tab_rates = [peak_rate_estimate(0.5,mu, N) for N in tabN]
        #     tab_mid.append(np.median(tab_rates))
        #     tab_err.append(np.std(tab_rates))
        # plt.errorbar(tab_peak,tab_mid,yerr=tab_err, color=color, fmt='o', label=f"$\mu={mu}$")
            N = np.random.poisson(mu+peak)
            tab_rates = peak_count_estimate(np.array((0.16, 0.5, 0.84)), N, mu)
            tab_mid.append(tab_rates[1])
            tab_errneg.append(tab_rates[1]-tab_rates[0])
            tab_errpos.append(tab_rates[2] - tab_rates[1])
        plt.errorbar(tab_peak, tab_mid, yerr=[tab_errneg,tab_errpos],
                     color=color, label=fr"$\mu={mu}$", lw=1.0, capsize=1.0, fmt='.', markersize=10)
    plt.loglog()
    plt.xlabel("Peak amplitude (n)")
    plt.ylabel("Estimated peak amplitude (n)")
    plt.plot(tab_peak, tab_peak, c='k')
    plt.xlim(min(tab_peak), max(tab_peak))
    plt.ylim(min(tab_peak), max(tab_peak))
    plt.legend()
    plt.savefig(data_plots / 'check_estimate_success2.png')
    plt.savefig(data_plots / 'check_estimate_success2.pdf')
    plt.show()

`plot_B_eclipse()`

Plot the eclipse Bayes factor for different observed (n) counts as a function of expecation (mu). Also plot the 3 and 5 sigma threshold values.

Source code in exod/processing/bayesian_tests.py

def plot_B_eclipse():
    """
    Plot the eclipse Bayes factor for different observed (n) counts as a function of expecation (mu).
    Also plot the 3 and 5 sigma threshold values.
    """
    n_lines_to_plot = 20 # 1 line is drawn for each value of n from 0 to n-1
    colors = plt.cm.winter(np.linspace(0, 1, n_lines_to_plot))

    B_peak_3sig, B_eclipse_3sig = get_bayes_thresholds(3)
    B_peak_5sig, B_eclipse_5sig = get_bayes_thresholds(5)

    mu_lo, mu_hi = 1e-3, 50
    mus = np.geomspace(mu_lo, mu_hi, 1000)

    plt.figure(figsize=(3.5, 3.5))
    for n in range(n_lines_to_plot):
        label = None
        if (n == 0) or (n == n_lines_to_plot-1):  # Label first and last line
            label = f'n={n}'
        plt.plot(mus, B_eclipse_log(n=n, mu=mus), color=colors[n], label=label)

    plt.axhline(B_eclipse_3sig, color='red', label=rf'3 $\sigma$ (B={B_peak_3sig:.2f})')
    plt.axhline(B_eclipse_5sig, color='black', label=rf'5 $\sigma$ (B={B_peak_5sig:.2f})')
    plt.title(f'Eclipse Bayes factor for n=0-{n_lines_to_plot}')
    plt.xlabel(r'Expected Value $\mu$')
    plt.ylabel(r'$log_{10}$($B_{eclipse}$)')
    # plt.xscale('log')
    plt.tight_layout()
    plt.xlim(mu_lo, mu_hi)
    plt.legend()
    plt.savefig(data_plots / 'B_eclipse.png')
    plt.savefig(data_plots / 'B_eclipse.pdf')
    plt.show()

`plot_B_peak()`

Plot the peak Bayes factor for different observed (n) counts as a function of expectation (mu). Also plot the 3 and 5 sigma threshold values.

Source code in exod/processing/bayesian_tests.py

def plot_B_peak():
    """
    Plot the peak Bayes factor for different observed (n) counts as a function of expectation (mu).
    Also plot the 3 and 5 sigma threshold values.
    """
    n_lines_to_plot = 20  # 1 line is drawn for each value of n from 0 to n-1
    colors = plt.cm.winter(np.linspace(start=0, stop=1, num=n_lines_to_plot))

    B_peak_3sig, B_eclipse_3sig = get_bayes_thresholds(3)
    B_peak_5sig, B_eclipse_5sig = get_bayes_thresholds(5)

    mu_lo, mu_hi = 1e-3, 50
    mus = np.geomspace(mu_lo, mu_hi, 1000)

    plt.figure(figsize=(3.5, 3.5))
    for n in range(n_lines_to_plot):
        label = None
        if (n == 0) or (n == n_lines_to_plot-1):  # Label first and last line
            label = f'n={n}'
        plt.plot(mus, B_peak_log(n=n, mu=mus), color=colors[n], label=label)

    plt.axhline(B_peak_3sig, color='red', label=rf'3 $\sigma$ (B={B_peak_3sig:.2f})')
    plt.axhline(B_peak_5sig, color='black', label=rf'5 $\sigma$ (B={B_peak_5sig:.2f})')
    plt.title(f'Peak Bayes factor for n=0-{n_lines_to_plot}')
    plt.xlabel(r'Expected Value $\mu$')
    plt.ylabel(r'$log_{10}$($B_{peak}$)')
    plt.xscale('log')
    plt.tight_layout()
    plt.ylim(0)
    plt.xlim(mu_lo, mu_hi)
    plt.legend()
    plt.savefig(data_plots / 'B_peak.png')
    plt.savefig(data_plots / 'B_peak.pdf')
    plt.show()

`bti`

Functions to calculate bad time intervals (BTI) for a given lightcurve.

BTIs are defined as time intervals where average count rate the 10.0-12.0 keV energy band exceeds a certain threshold. This threshold is usually set to 0.5 ct/s. However, when combining multiple lightcurves, the threshold can be set to 1.5 ct/s.

`get_bti(time, data, threshold)`

Get the bad time intervals for a given lightcurve.

Parameters:

Name	Type	Description	Default
`time`	`array`	Time Array.	required
`data`	`array`	Data Array.	required
`threshold`	`float`	Threshold for bad time intervals.	required

Returns:

Name	Type	Description
`bti`	`list`	[{'START':500, 'STOP':600}, {'START':700, 'STOP':800}, ...]

Source code in exod/processing/bti.py

def get_bti(time, data, threshold):
    """
    Get the bad time intervals for a given lightcurve.

    Parameters:
        time (array): Time Array.
        data (array): Data Array.
        threshold (float): Threshold for bad time intervals.

    Returns:
        bti (list): [{'START':500, 'STOP':600}, {'START':700, 'STOP':800}, ...]
    """
    mask = data > threshold  # you can flip this to get the gti instead (it works)
    if mask.all():
        logger.info('All values above threshold! Entire observation is bad :(')
        return [{'START': time[0], 'STOP': time[-1]}]
    elif (~mask).all():
        logger.info('All values below Threshold! Entire observation is good :)')
        return []

    int_mask    = mask.astype(int)
    diff        = np.diff(int_mask)
    idx_starts  = np.where(diff == 1)[0]
    idx_ends    = np.where(diff == -1)[0]
    time_starts = time[idx_starts]
    time_ends   = time[idx_ends]

    first_crossing = diff[diff != 0][0]
    last_crossing  = diff[diff != 0][-1]

    if (first_crossing, last_crossing) == (-1, 1):
        logger.info('Curve Started and Ended above threshold!')
        time_starts = np.append(time[0], time_starts)
        time_ends = np.append(time_ends, time[-1])

    elif len(idx_starts) < len(idx_ends):
        logger.info('Curve Started Above threshold! (but did not end above it)')
        time_starts = np.append(time[0], time_starts)

    elif len(idx_starts) > len(idx_ends):
        logger.info('Curve Ended Above threshold! (but did not start above it)')
        time_ends = np.append(time_ends, time[-1])

    else:
        logger.info('Curve started and ended below threshold, nothing to do.')

    assert len(time_starts) == len(time_ends)
    bti = [{'START': time_starts[i], 'STOP': time_ends[i]} for i in range(len(time_ends))]
    return bti

`get_bti_bin_idx(bti, bin_t)`

Get the indexs corresponding to the bad time intervals for an array of time windows given a set of bad time intervals.

Parameters:

Name	Type	Description	Default
`bti`	`list`	[{'START':300, 'STOP':500}, {'START':600, 'STOP':800}, ...]	required
`bin_t`	`array`	Array of time bins.	required

Returns:

Name	Type	Description
`bti_bin_idx`	`array`	array of indexs corresponding to BTIs.

Source code in exod/processing/bti.py

def get_bti_bin_idx(bti, bin_t):
    """
    Get the indexs corresponding to the bad time intervals for an array of time windows given
    a set of bad time intervals.

    Parameters:
        bti (list): [{'START':300, 'STOP':500}, {'START':600, 'STOP':800}, ...]
        bin_t (array): Array of time bins.

    Returns:
        bti_bin_idx (array): array of indexs corresponding to BTIs.
    """
    t_starts = [b['START'] for b in bti]
    t_stops  = [b['STOP'] for b in bti]
    idx_starts = np.searchsorted(bin_t, t_starts)
    idx_stops  = np.searchsorted(bin_t, t_stops) - 1

    bti_bin_idx = np.array([])
    for i in range(len(idx_starts)):
        idxs = np.arange(idx_starts[i], idx_stops[i], 1)
        bti_bin_idx = np.append(bti_bin_idx, idxs)
    bti_bin_idx = bti_bin_idx.astype(int)
    return bti_bin_idx

`get_bti_bin_idx_bool(bti_bin_idx, bin_t)`

Get the boolean array mask corresponding to if a time window was a bad time interval or not.

Parameters:

Name	Type	Description	Default
`bti_bin_idx`	`array`	[1,4,6]	required
`bin_t`	`array`	[0, 1.5, 2.0, 3.5, 5.0, 6.5, 8.0]	required

Returns:

Name	Type	Description
`bti_bin_idx_bool`	`array`	[F,T,F,F,T,F,T,F]

Source code in exod/processing/bti.py

def get_bti_bin_idx_bool(bti_bin_idx, bin_t):
    """
    Get the boolean array mask corresponding to if a time window was a bad time interval or not.

    Parameters:
        bti_bin_idx (array): [1,4,6]
        bin_t (array): [0, 1.5, 2.0, 3.5, 5.0, 6.5, 8.0]

    Returns:
        bti_bin_idx_bool (array): [F,T,F,F,T,F,T,F]
    """
    arr = np.arange(len(bin_t))
    bti_bin_idx_bool = np.isin(arr, bti_bin_idx)
    return bti_bin_idx_bool

`coordinates`

`calc_df_regions(image, image_mask)`

Return the region dataframe for a given image and mask.

Source code in exod/processing/coordinates.py

def calc_df_regions(image, image_mask):
    """Return the region dataframe for a given image and mask."""
    properties_ = ('label', 'bbox', 'centroid', 'weighted_centroid', 'intensity_mean', 'equivalent_diameter_area', 'area_bbox')
    region_dict = regionprops_table(label_image=label(image_mask), intensity_image=image, properties=properties_)
    df_region = pd.DataFrame(region_dict)
    df_region['label'] = df_region['label'] - 1 # Start labels at 0
    return df_region

`correct_sky_position(skycoord, data_cube)`

Correct the systematic offset due to the gridding.

The change we need needs to be positive in dec and negative in RA, which results in diagonal (UL/NW) position angle correction, corresponding to 360-45=315 degrees.

The angular separation is the diagonal distance from the corner to the center of a pixel thus

(2*(size_arcsec/2)^2)^0.5

Source code in exod/processing/coordinates.py

def correct_sky_position(skycoord, data_cube):
    """
    Correct the systematic offset due to the gridding.

    The change we need needs to be positive in dec and negative in RA, which results in diagonal (UL/NW)
    position angle correction, corresponding to 360-45=315 degrees.

    The angular separation is the diagonal distance from the corner to the center of a pixel thus:
        (2*(size_arcsec/2)^2)^0.5
    """
    position_angle = 315 * u.deg
    separation = np.sqrt(2 * (data_cube.size_arcsec / 2) ** 2) * u.arcsec
    sc = skycoord.directional_offset_by(position_angle=position_angle, separation=separation)
    return sc

`get_regions_sky_position(data_cube, df_regions, wcs)`

Calculate the sky position of the detected regions.

Test coords: observation : 0803990501 1313 X-1 : 03 18 20.00 -66 29 10.9 1313 X-2 : 03 18 22.00 -66 36 04.3 SN 1978K : 03 17 38.62 -66 33 03.4 NGC1313 XMM4 : 03 18 18.46 -66 30 00.2 (lil guy next to x-1)

To calculate the EPIC X and Y coordinates of the variable sources, we use the final coordinates in the variability map, which are not integers. To know to which X and Y correspond to this, we interpolate the values of X and Y on the final coordinates. We divide by 80 because the WCS from the image is binned by x80 compared to X and Y values

Source code in exod/processing/coordinates.py

def get_regions_sky_position(data_cube, df_regions, wcs):
    """
    Calculate the sky position of the detected regions.

    Test coords: observation : 0803990501
    1313 X-1     : 03 18 20.00 -66 29 10.9
    1313 X-2     : 03 18 22.00 -66 36 04.3
    SN 1978K     : 03 17 38.62 -66 33 03.4
    NGC1313 XMM4 : 03 18 18.46 -66 30 00.2 (lil guy next to x-1)

    To calculate the EPIC X and Y coordinates of the variable sources, we use the final coordinates
    in the variability map, which are not integers. To know to which X and Y correspond to this, we interpolate the
    values of X and Y on the final coordinates. We divide by 80 because the WCS from the image is binned by x80
    compared to X and Y values
    """
    img_bin_size = 80  # This is the binning of the wcs image when it was created using evselect.

    interpX = interp1d(range(len(data_cube.bin_x)), data_cube.bin_x)
    interpY = interp1d(range(len(data_cube.bin_y)), data_cube.bin_y)

    all_res = []
    for i, row in df_regions.iterrows():
        X = interpX(row['weighted_centroid-0'])
        Y = interpY(row['weighted_centroid-1'])

        x_img = X / img_bin_size
        y_img = Y / img_bin_size
        skycoord = wcs.pixel_to_world(x_img, y_img)
        skycoord = correct_sky_position(skycoord, data_cube)

        res = {'x_img'    : x_img,  # x_position in the binned fits image
               'y_img'    : y_img,  # y_position in the binned fits image
               'X'        : X,      # X in the event_list (sky coordinates)
               'Y'        : Y,      # Y in the event-list (sky coordinates)
               'ra'       : skycoord.ra.to_string(unit=u.hourangle, precision=2),
               'dec'      : skycoord.dec.to_string(unit=u.deg, precision=2),
               'ra_deg'   : skycoord.ra.value,
               'dec_deg'  : skycoord.dec.value}
        all_res.append(res)

    df_sky = pd.DataFrame(all_res)
    logger.info(f'df_sky:\n{df_sky}')
    return df_sky

`ra_dec_to_xyz(ra_deg, dec_deg)`

Convert RA and DEC to points on the unit sphere.

Parameters:

Name	Type	Description	Default
`dec_deg`	`Quantity`	Declination in degrees.	required
`ra_deg`	`Quantity`	Right Ascension in degrees.	required

Returns:

Name	Type	Description
`xyz`	`ndarray`	The xyz positions of the points as a (N, 3) array.

Source code in exod/processing/coordinates.py

def ra_dec_to_xyz(ra_deg, dec_deg):
    """
    Convert RA and DEC to points on the unit sphere.

    Parameters:
        dec_deg (astropy.units.Quantity): Declination in degrees.
        ra_deg (astropy.units.Quantity): Right Ascension in degrees.

    Returns:
        xyz (np.ndarray): The xyz positions of the points as a (N, 3) array.
    """
    ra_rad  = ra_deg.to(u.rad).value
    dec_rad = dec_deg.to(u.rad).value

    x = np.cos(dec_rad) * np.cos(ra_rad)
    y = np.cos(dec_rad) * np.sin(ra_rad)
    z = np.sin(dec_rad)
    xyz = np.vstack((x, y, z)).T
    return xyz

`rotate_XY(X, Y, angle, pivotXY=(25719, 25719))`

Rotates the positions following the pointing angle of the observation, so that the coordinates are in an EPIC frame.

Parameters:

Name	Type	Description	Default
`X`	`array`	The X sky coordinate from event list.	required
`Y`	`array`	The Y sky coordinate from event list.	required
`angle`	`float`	The pointing angle of the observation (PA_PNT).	required
`pivotXY`	`tuple`	The pivot point for the rotation.	`(25719, 25719)`

Returns:

Name	Type	Description
`X_EPIC`	`array`	The X sky coordinate in the EPIC frame.
`Y_EPIC`	`array`	The Y sky coordinate in the EPIC frame.

Source code in exod/processing/coordinates.py

def rotate_XY(X, Y, angle, pivotXY=(25719, 25719)):
    """
    Rotates the positions following the pointing angle of the observation, so that the coordinates are in an EPIC frame.

    Parameters:
        X (array): The X sky coordinate from event list.
        Y (array): The Y sky coordinate from event list.
        angle (float): The pointing angle of the observation (PA_PNT).
        pivotXY (tuple): The pivot point for the rotation.

    Returns:
        X_EPIC (array): The X sky coordinate in the EPIC frame.
        Y_EPIC (array): The Y sky coordinate in the EPIC frame.
    """
    X_EPIC = (X - pivotXY[0]) * np.cos(-angle * np.pi / 180) - (Y - pivotXY[1]) * np.sin(-angle * np.pi / 180)
    Y_EPIC = (X - pivotXY[0]) * np.sin(-angle * np.pi / 180) + (Y - pivotXY[1]) * np.cos(-angle * np.pi / 180)
    return X_EPIC, Y_EPIC

`data_cube`

`DataCube`

Class to represent a 3D data cube from XMM data.

Attributes:

Name	Type	Description
`event_list`	`EventList`	EventList object.
`size_arcsec`	`float`	size of one side of a cell in the cube in arcseconds.
`time_interval`	`float`	time interval for each frame in seconds.
`extent`	`int`	Initial Cube Size.
`pixel_size`	`float`	size of one side of a cell in the cube in pixels.
`n_bins`	`int`	Spatial Bins (x,y).
`bin_x`	`array`	Bins for the x-axis.
`bin_y`	`array`	Bins for the y-axis.
`bin_t`	`array`	Bins for the t-axis.
`n_t_bins`	`int`	Number of time bins.
`bti_bin_idx`	`array`	Indexs of the bad time intervals (BTIs)
`bti_bin_idx_bool`	`array`	Boolean Mask of the bad time intervals that can be applied to bin_t
`n_bti_bin`	`int`	Number of bad time intervals bins.
`bti_frac`	`float`	Fraction of bad time intervals.
`gti_bin_idx`	`array`	Indexs of the good time intervals (GTIs)
`gti_bin_idx_bool`	`array`	Boolean Mask of the good time intervals that can be applied to bin_t
`n_gti_bin`	`int`	Number of good time intervals bins.
`gti_frac`	`float`	Fraction of good time intervals.
`bccd_bin_idx`	`array`	Indexs of frames marked as having irregular (bad) CCD exposures.
`bccd_frac`	`float`	Fraction of time frames marked as having bad CCD exposures.
`data`	`array`	3-D numpy array containing the data.
`bbox_img`	`tuple`	Bounding box of the image (single time slice).

Source code in exod/processing/data_cube.py

class DataCube:
    """
    Class to represent a 3D data cube from XMM data.

    Attributes:
        event_list (EventList): EventList object.
        size_arcsec (float): size of one side of a cell in the cube in arcseconds.
        time_interval (float): time interval for each frame in seconds.
        extent (int): Initial Cube Size.
        pixel_size (float): size of one side of a cell in the cube in pixels.
        n_bins (int): Spatial Bins (x,y).
        bin_x (np.array): Bins for the x-axis.
        bin_y (np.array): Bins for the y-axis.
        bin_t (np.array): Bins for the t-axis.
        n_t_bins (int): Number of time bins.
        bti_bin_idx (np.array): Indexs of the bad time intervals (BTIs)
        bti_bin_idx_bool (np.array): Boolean Mask of the bad time intervals that can be applied to bin_t
        n_bti_bin (int): Number of bad time intervals bins.
        bti_frac (float): Fraction of bad time intervals.
        gti_bin_idx (np.array): Indexs of the good time intervals (GTIs)
        gti_bin_idx_bool (np.array): Boolean Mask of the good time intervals that can be applied to bin_t
        n_gti_bin (int): Number of good time intervals bins.
        gti_frac (float): Fraction of good time intervals.
        bccd_bin_idx (np.array): Indexs of frames marked as having irregular (bad) CCD exposures.
        bccd_frac (float): Fraction of time frames marked as having bad CCD exposures.
        data (np.array): 3-D numpy array containing the data.
        bbox_img (tuple): Bounding box of the image (single time slice).
    """
    def __init__(self, event_list, size_arcsec, time_interval):
        self.event_list = event_list
        self.size_arcsec = size_arcsec
        self.time_interval = time_interval
        self.extent = 51840  # Initial Cube Size
        self.pixel_size = size_arcsec / 0.05
        self.n_bins = int(self.extent / self.pixel_size) # Spatial Bins (x,y)
        self.bin_x = np.linspace(0, self.extent, self.n_bins + 1)
        self.bin_y = np.linspace(0, self.extent, self.n_bins + 1)
        self.bin_t = self.calc_time_bins()
        self.n_t_bins = len(self.bin_t) - 1

        self.bti_bin_idx = []      # index of bad time interval bins e.g. [2,4,6]
        self.bti_bin_idx_bool = [] # Mask of the bad time interval bins e.g. [False, False, True, False, True, ...]
        self.n_bti_bin = None
        self.bti_frac = None

        self.gti_bin_idx = []
        self.gti_bin_idx_bool = []
        self.n_gti_bin = None
        self.gti_frac = None

        # Used for keeping track of frames (time bins) with uneven ccd exposures.
        self.bccd_bin_idx = []
        self.bccd_bin_idx_bool = []
        self.n_bccd_bin = None
        self.bccd_frac = None

        self.data = self.bin_event_list()
        self.bbox_img = self.get_cube_bbox()
        self.crop_data_cube()
        self.relative_frame_exposures = np.ones(self.data.shape[2])
        self.shape = self.data.shape
        self.memory_mb = self.data.nbytes / (1024 ** 2)


    def __repr__(self):
        return (f"DataCube(shape={self.shape}, "
                f"total_values={np.prod(self.shape)}, "
                f"memory={self.memory_mb:.2f} MB)")

    def copy(self):
        """Return a copy of the datacube."""
        return copy.deepcopy(self)

    def calc_time_bins(self):
        t_lo = self.event_list.time_min
        t_hi = self.event_list.time_max
        t_i = self.time_interval
        n_time_bins = int((t_hi - t_lo) / t_i)
        time_stop = t_lo + n_time_bins * t_i
        time_bins = np.arange(t_lo, time_stop + 1, t_i)
        return time_bins

    def bin_event_list(self):
        data = self.event_list.data
        sample = data['X'], data['Y'], data['TIME']
        bins = [self.bin_x, self.bin_y, self.bin_t]
        cube, bin_edges, bin_number = binned_statistic_dd(sample=sample, values=None, statistic='count', bins=bins)
        cube = cube.astype(np.int32)
        return cube

    def crop_data_cube(self):
        """Crop the surrounding areas of the data_cube that are empty."""
        bbox_img = self.bbox_img

        logger.info(f'Cropping data cube between bbox_img: {bbox_img}')
        self.data = self.data[bbox_img[0]:bbox_img[1], bbox_img[2]:bbox_img[3]]

        # Calculate the new bins
        self.bin_x = self.bin_x[bbox_img[0]:bbox_img[1]]
        self.bin_y = self.bin_y[bbox_img[2]:bbox_img[3]]

    def get_cube_bbox(self):
        """Get the Bounding Box corresponding to the cube's image plane."""
        idx_nonempty = np.where(np.sum(self.data, axis=2) > 0)
        bbox_img = (np.min(idx_nonempty[0]), np.max(idx_nonempty[0]) + 1,
                    np.min(idx_nonempty[1]), np.max(idx_nonempty[1]) + 1)
        return bbox_img

    def calc_gti_bti_bins(self, bti):
        """
        Calculate the good and bad time interval indexes & masks.

        Parameters:
            bti (dict): {['START':344.2, 'STOP':454.2], ...}
        """
        self.bti_bin_idx      = get_bti_bin_idx(bti=bti, bin_t=self.bin_t)
        self.bti_bin_idx_bool = get_bti_bin_idx_bool(bti_bin_idx=self.bti_bin_idx, bin_t=self.bin_t)
        self.gti_bin_idx_bool = ~self.bti_bin_idx_bool
        self.gti_bin_idx      = np.where(self.gti_bin_idx_bool)[0][:-1]
        self.n_gti_bin = len(self.gti_bin_idx)
        self.n_bti_bin = len(self.bti_bin_idx)
        self.bti_frac = self.n_bti_bin / self.n_t_bins
        self.gti_frac = self.n_gti_bin / self.n_t_bins
        logger.info(f'n_gti = {self.n_gti_bin:<4} / {self.n_t_bins} ({self.gti_frac:.2f})')
        logger.info(f'n_bti = {self.n_bti_bin:<4} / {self.n_t_bins} ({self.bti_frac:.2f})')

    def mask_bti(self):
        logger.info('Masking bad frames from Data Cube (setting to nan)')
        img_shape = (self.shape[0], self.shape[1], 1)
        img_nan = np.full(shape=img_shape, fill_value=np.nan, dtype=np.float32)
        self.data[:, :, self.bti_bin_idx] = img_nan

    def remove_bti_frames(self):
        """Return the cube without the masked nan frames."""
        data_non_nan = self.data[:, :, ~self.bti_bin_idx_bool[:-1]]
        return data_non_nan

    def mask_frames_with_partial_ccd_exposure(self, mask_frames=True, plot=False):
        """
        Remove the frames with irregular exposures between CCDs.

        https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html
        https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

        Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301
        0201900201, 0724840301, 0743700201

        Args:
            mask_frames (bool): If True mask the frames.
            plot (bool): If True plot diagnostics.
        """
        if not mask_frames:
            self.bccd_bin_idx_bool = np.full(self.n_t_bins, fill_value=False)
            return

        all_masks = {}
        for e in self.event_list.event_lists:
            bad_ccd_mask = self.get_bad_ccd_mask(event_list=e, plot=plot)
            all_masks[e.instrument] = bad_ccd_mask

        if all_masks:
            # Combine the frame masks from all the individual event lists.
            self.bccd_bin_idx_bool = np.any(list(all_masks.values()), axis=0)
            self.bccd_bin_idx      = np.where(self.bccd_bin_idx_bool)[0]

            # Mask from the data cube and update the relative frame exposures.
            self.data = np.where(self.bccd_bin_idx_bool, np.empty(self.data.shape) * np.nan, self.data)
            self.relative_frame_exposures = np.where(self.bccd_bin_idx_bool, 0, self.relative_frame_exposures)

            # Calculate the number and fraction of bins
            self.n_bccd_bin = len(self.bccd_bin_idx)
            self.bccd_frac = self.n_bccd_bin / self.n_t_bins
            logger.info(f'n_bccd = {self.n_bccd_bin:<4} / {self.n_t_bins} ({self.bccd_frac:.2f})')

    def get_bad_ccd_mask(self, event_list, plot=False):
        """
        Get the mask for the frames corresponding with irregular CCD exposures.

        Parameters:
            event_list (EventList): EventList object.
            plot (bool): if True, then plot.

        Returns:
            bccd_bin_idx_bool (np.array): Array of bools indicating the frames with irregular CCD exposures.
        """
        ccd_bins = event_list.get_ccd_bins()

        # Get the lightcurves for each CCD.
        lcs_ccd, bin_edges, bin_number = binned_statistic_dd(sample=(event_list.data['CCDNR'], event_list.data['TIME']),
                                                             values=None,
                                                             statistic='count',
                                                             bins=[ccd_bins, self.bin_t])

        if event_list.instrument == 'EPN':
            quadrant_split         = np.split(lcs_ccd, indices_or_sections=(3, 6, 9))
            lcs_ccd                = np.sum(quadrant_split, axis=1)
            lcs_median_quadrant    = np.median(quadrant_split, axis=1)
            lc_median_quadrant_max = np.max(lcs_median_quadrant, axis=0)
            lc_median_quadrant_min = np.min(lcs_median_quadrant, axis=0)
            ccd_bins = [1, 4, 7, 10, 13]

        lcs_ccd_max = np.max(lcs_ccd, axis=0)
        # lcs_ccd_min = np.min(lcs_ccd, axis=0)
        count_active_ccd = np.sum(lcs_ccd > 0, axis=0)  # Nbr of CCDs active in each frame

        m0 = count_active_ccd == 0  # Frame is entirely empty

        if event_list.instrument == 'EPN':
            lc_min_counts = 10
            lc_max_norm_diff = 0.5
            logger.info(f'Removing PN frames lc_min_counts > {lc_min_counts} and lc_max_norm_diff > {lc_max_norm_diff}')
            lc_median_norm_diff = (lc_median_quadrant_max - lc_median_quadrant_min) / (lc_median_quadrant_max + lc_median_quadrant_min)
            m1 = lcs_ccd_max > lc_min_counts             # Frame is more than 10 counts in the brightest CCD
            m2 = self.bti_bin_idx_bool[:-1]              # Frame is a bad time index
            m3 = count_active_ccd < len(ccd_bins) - 1    # Frame is not running all CCDs
            m4 = lc_median_norm_diff > lc_max_norm_diff  # When this is >0.5 it is equivalent to max/min > 3
            m5 = m1 & (m3 | (m2 & m4))
            bccd_bin_idx_bool = m0 | m5
            # Remove the frames that are 1 left of a True frame that are False
            bccd_bin_idx_bool = bccd_bin_idx_bool | np.roll(bccd_bin_idx_bool, shift=1)

            masks = [m0, m2, m1, m3, m4, m5, bccd_bin_idx_bool]
            labels = ['empty', 'is_bti', 'max > 10', 'active_ccds <  #_ccds', 'relative_diff > 0.5', 'combined', 'to remove']
            plot_frame_masks(instrum=event_list.instrument, masks=masks, labels=labels, plot=plot)
        else:
            bccd_bin_idx_bool = m0

        bccd_bin_idx = np.where(bccd_bin_idx_bool)[0]
        logger.info(f'Removing {np.sum(bccd_bin_idx_bool)} / {len(bccd_bin_idx_bool)} incomplete frames from {event_list.instrument}')

        # Plot the Lightcurves for PN
        if plot and event_list.instrument == 'EPN':
            fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(10, 10), sharex=True)
            for i, lc in enumerate(lcs_median_quadrant):
                ax[0].plot(lcs_ccd[i], label=f'lc ccd={i+1}')
                ax[1].plot(lcs_median_quadrant[i], label=f'median quadrant={i+1}')

            ax[-2].plot(lc_median_quadrant_min, label='ccd min')
            ax[-2].plot(lc_median_quadrant_max, label='ccd max')
            ax[-1].plot(lc_median_norm_diff, label='max-min/max+min')
            ax[-1].axhline(0.5, color='red', lw=1.0)
            for a in ax:
                a.legend(loc='right')
                # Plot the Masked frames.
                for j in range(len(bccd_bin_idx)):
                    a.axvspan(bccd_bin_idx[j], bccd_bin_idx[j]+1, color='red', alpha=0.2)
            plt.tight_layout()
            plt.show()

        return bccd_bin_idx_bool

    def multiply_time_interval(self, n_factor):
        """Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales."""
        self.time_interval = n_factor*self.time_interval
        self.bin_t = self.calc_time_bins()

        # allows to cut X in chunks of size N (plus the remaining bit)
        # np.split(X, np.arange(N, len(X), N))

        # Splits in groups of n_factor along the time axis
        datacube_twoframegroups = np.split(self.data, np.arange(n_factor, self.shape[2], n_factor), axis=2)

        # Nansum each group along the time axis, and makes it into a cube again
        self.data = np.transpose([np.nansum(frame_grp, axis=2) for frame_grp in datacube_twoframegroups], (1,2,0))

        # Update the relative exposures of each frame
        # Splits in groups of n_factor along the time axis
        frame_exposures_twoframegroups = np.split(self.relative_frame_exposures, np.arange(n_factor, self.shape[2], n_factor))

        # Update relative frame exposures and shape.
        self.relative_frame_exposures = np.array([np.sum(exp_frame_grp) for exp_frame_grp in frame_exposures_twoframegroups])
        self.shape = self.data.shape

    def video(self, savepath=None):
        """Play each frame of the datacube sequentially in a matplotlib figure."""
        if np.isnan(self.data[:, :, 0]).all():
            logger.info('first frame all nan =no plot')
            return None

        fig, ax = plt.subplots()
        img = ax.imshow(self.data[:, :, 0].T, cmap=cmap_image(), animated=True, interpolation='none',
                        origin='lower')
        colorbar = fig.colorbar(img, ax=ax)

        def update(frame):
            ax.set_title(f'{self}\n{frame}/{num_frames}')
            img.set_array(self.data[:, :, frame].T)
            return img,

        num_frames = self.shape[2]
        ani = FuncAnimation(fig, update, frames=num_frames, interval=20)
        if savepath:
            logger.info(f'Saving {self} to {savepath}')
            ani.save(savepath)
        plt.show()


    @property
    def info(self):
        info = {'event_list'   : self.event_list.filename,
                'size_arcsec'  : self.size_arcsec,
                'time_interval': self.time_interval,
                'n_t_bins'     : self.n_t_bins,
                'n_bti_bin'    : self.n_bti_bin,
                'n_gti_bin'    : self.n_gti_bin,
                'gti_frac'     : self.gti_frac,
                'bti_frac'     : self.bti_frac,
                'n_bccd_bin'   : self.n_bccd_bin,
                'bccd_frac'    : self.bccd_frac,
                'bbox_img'     : self.bbox_img,
                'shape'        : self.shape,
                'total_values' : np.prod(self.shape),
                'memory_mb'    : self.memory_mb}
        for k, v in info.items():
            logger.info(f'{k:>13} : {v}')
        return info

`calc_gti_bti_bins(bti)`

Calculate the good and bad time interval indexes & masks.

Parameters:

Name	Type	Description	Default
`bti`	`dict`	{['START':344.2, 'STOP':454.2], ...}	required

Source code in exod/processing/data_cube.py

def calc_gti_bti_bins(self, bti):
    """
    Calculate the good and bad time interval indexes & masks.

    Parameters:
        bti (dict): {['START':344.2, 'STOP':454.2], ...}
    """
    self.bti_bin_idx      = get_bti_bin_idx(bti=bti, bin_t=self.bin_t)
    self.bti_bin_idx_bool = get_bti_bin_idx_bool(bti_bin_idx=self.bti_bin_idx, bin_t=self.bin_t)
    self.gti_bin_idx_bool = ~self.bti_bin_idx_bool
    self.gti_bin_idx      = np.where(self.gti_bin_idx_bool)[0][:-1]
    self.n_gti_bin = len(self.gti_bin_idx)
    self.n_bti_bin = len(self.bti_bin_idx)
    self.bti_frac = self.n_bti_bin / self.n_t_bins
    self.gti_frac = self.n_gti_bin / self.n_t_bins
    logger.info(f'n_gti = {self.n_gti_bin:<4} / {self.n_t_bins} ({self.gti_frac:.2f})')
    logger.info(f'n_bti = {self.n_bti_bin:<4} / {self.n_t_bins} ({self.bti_frac:.2f})')

`copy()`

Return a copy of the datacube.

Source code in exod/processing/data_cube.py

def copy(self):
    """Return a copy of the datacube."""
    return copy.deepcopy(self)

`crop_data_cube()`

Crop the surrounding areas of the data_cube that are empty.

Source code in exod/processing/data_cube.py

def crop_data_cube(self):
    """Crop the surrounding areas of the data_cube that are empty."""
    bbox_img = self.bbox_img

    logger.info(f'Cropping data cube between bbox_img: {bbox_img}')
    self.data = self.data[bbox_img[0]:bbox_img[1], bbox_img[2]:bbox_img[3]]

    # Calculate the new bins
    self.bin_x = self.bin_x[bbox_img[0]:bbox_img[1]]
    self.bin_y = self.bin_y[bbox_img[2]:bbox_img[3]]

`get_bad_ccd_mask(event_list, plot=False)`

Get the mask for the frames corresponding with irregular CCD exposures.

Parameters:

Name	Type	Description	Default
`event_list`	`EventList`	EventList object.	required
`plot`	`bool`	if True, then plot.	`False`

Returns:

Name	Type	Description
`bccd_bin_idx_bool`	`array`	Array of bools indicating the frames with irregular CCD exposures.

Source code in exod/processing/data_cube.py

def get_bad_ccd_mask(self, event_list, plot=False):
    """
    Get the mask for the frames corresponding with irregular CCD exposures.

    Parameters:
        event_list (EventList): EventList object.
        plot (bool): if True, then plot.

    Returns:
        bccd_bin_idx_bool (np.array): Array of bools indicating the frames with irregular CCD exposures.
    """
    ccd_bins = event_list.get_ccd_bins()

    # Get the lightcurves for each CCD.
    lcs_ccd, bin_edges, bin_number = binned_statistic_dd(sample=(event_list.data['CCDNR'], event_list.data['TIME']),
                                                         values=None,
                                                         statistic='count',
                                                         bins=[ccd_bins, self.bin_t])

    if event_list.instrument == 'EPN':
        quadrant_split         = np.split(lcs_ccd, indices_or_sections=(3, 6, 9))
        lcs_ccd                = np.sum(quadrant_split, axis=1)
        lcs_median_quadrant    = np.median(quadrant_split, axis=1)
        lc_median_quadrant_max = np.max(lcs_median_quadrant, axis=0)
        lc_median_quadrant_min = np.min(lcs_median_quadrant, axis=0)
        ccd_bins = [1, 4, 7, 10, 13]

    lcs_ccd_max = np.max(lcs_ccd, axis=0)
    # lcs_ccd_min = np.min(lcs_ccd, axis=0)
    count_active_ccd = np.sum(lcs_ccd > 0, axis=0)  # Nbr of CCDs active in each frame

    m0 = count_active_ccd == 0  # Frame is entirely empty

    if event_list.instrument == 'EPN':
        lc_min_counts = 10
        lc_max_norm_diff = 0.5
        logger.info(f'Removing PN frames lc_min_counts > {lc_min_counts} and lc_max_norm_diff > {lc_max_norm_diff}')
        lc_median_norm_diff = (lc_median_quadrant_max - lc_median_quadrant_min) / (lc_median_quadrant_max + lc_median_quadrant_min)
        m1 = lcs_ccd_max > lc_min_counts             # Frame is more than 10 counts in the brightest CCD
        m2 = self.bti_bin_idx_bool[:-1]              # Frame is a bad time index
        m3 = count_active_ccd < len(ccd_bins) - 1    # Frame is not running all CCDs
        m4 = lc_median_norm_diff > lc_max_norm_diff  # When this is >0.5 it is equivalent to max/min > 3
        m5 = m1 & (m3 | (m2 & m4))
        bccd_bin_idx_bool = m0 | m5
        # Remove the frames that are 1 left of a True frame that are False
        bccd_bin_idx_bool = bccd_bin_idx_bool | np.roll(bccd_bin_idx_bool, shift=1)

        masks = [m0, m2, m1, m3, m4, m5, bccd_bin_idx_bool]
        labels = ['empty', 'is_bti', 'max > 10', 'active_ccds <  #_ccds', 'relative_diff > 0.5', 'combined', 'to remove']
        plot_frame_masks(instrum=event_list.instrument, masks=masks, labels=labels, plot=plot)
    else:
        bccd_bin_idx_bool = m0

    bccd_bin_idx = np.where(bccd_bin_idx_bool)[0]
    logger.info(f'Removing {np.sum(bccd_bin_idx_bool)} / {len(bccd_bin_idx_bool)} incomplete frames from {event_list.instrument}')

    # Plot the Lightcurves for PN
    if plot and event_list.instrument == 'EPN':
        fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(10, 10), sharex=True)
        for i, lc in enumerate(lcs_median_quadrant):
            ax[0].plot(lcs_ccd[i], label=f'lc ccd={i+1}')
            ax[1].plot(lcs_median_quadrant[i], label=f'median quadrant={i+1}')

        ax[-2].plot(lc_median_quadrant_min, label='ccd min')
        ax[-2].plot(lc_median_quadrant_max, label='ccd max')
        ax[-1].plot(lc_median_norm_diff, label='max-min/max+min')
        ax[-1].axhline(0.5, color='red', lw=1.0)
        for a in ax:
            a.legend(loc='right')
            # Plot the Masked frames.
            for j in range(len(bccd_bin_idx)):
                a.axvspan(bccd_bin_idx[j], bccd_bin_idx[j]+1, color='red', alpha=0.2)
        plt.tight_layout()
        plt.show()

    return bccd_bin_idx_bool

`get_cube_bbox()`

Get the Bounding Box corresponding to the cube's image plane.

Source code in exod/processing/data_cube.py

def get_cube_bbox(self):
    """Get the Bounding Box corresponding to the cube's image plane."""
    idx_nonempty = np.where(np.sum(self.data, axis=2) > 0)
    bbox_img = (np.min(idx_nonempty[0]), np.max(idx_nonempty[0]) + 1,
                np.min(idx_nonempty[1]), np.max(idx_nonempty[1]) + 1)
    return bbox_img

`mask_frames_with_partial_ccd_exposure(mask_frames=True, plot=False)`

Remove the frames with irregular exposures between CCDs.

https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301 0201900201, 0724840301, 0743700201

Parameters:

Name	Type	Description	Default
`mask_frames`	`bool`	If True mask the frames.	`True`
`plot`	`bool`	If True plot diagnostics.	`False`

Source code in exod/processing/data_cube.py

def mask_frames_with_partial_ccd_exposure(self, mask_frames=True, plot=False):
    """
    Remove the frames with irregular exposures between CCDs.

    https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html
    https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

    Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301
    0201900201, 0724840301, 0743700201

    Args:
        mask_frames (bool): If True mask the frames.
        plot (bool): If True plot diagnostics.
    """
    if not mask_frames:
        self.bccd_bin_idx_bool = np.full(self.n_t_bins, fill_value=False)
        return

    all_masks = {}
    for e in self.event_list.event_lists:
        bad_ccd_mask = self.get_bad_ccd_mask(event_list=e, plot=plot)
        all_masks[e.instrument] = bad_ccd_mask

    if all_masks:
        # Combine the frame masks from all the individual event lists.
        self.bccd_bin_idx_bool = np.any(list(all_masks.values()), axis=0)
        self.bccd_bin_idx      = np.where(self.bccd_bin_idx_bool)[0]

        # Mask from the data cube and update the relative frame exposures.
        self.data = np.where(self.bccd_bin_idx_bool, np.empty(self.data.shape) * np.nan, self.data)
        self.relative_frame_exposures = np.where(self.bccd_bin_idx_bool, 0, self.relative_frame_exposures)

        # Calculate the number and fraction of bins
        self.n_bccd_bin = len(self.bccd_bin_idx)
        self.bccd_frac = self.n_bccd_bin / self.n_t_bins
        logger.info(f'n_bccd = {self.n_bccd_bin:<4} / {self.n_t_bins} ({self.bccd_frac:.2f})')

`multiply_time_interval(n_factor)`

Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales.

Source code in exod/processing/data_cube.py

def multiply_time_interval(self, n_factor):
    """Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales."""
    self.time_interval = n_factor*self.time_interval
    self.bin_t = self.calc_time_bins()

    # allows to cut X in chunks of size N (plus the remaining bit)
    # np.split(X, np.arange(N, len(X), N))

    # Splits in groups of n_factor along the time axis
    datacube_twoframegroups = np.split(self.data, np.arange(n_factor, self.shape[2], n_factor), axis=2)

    # Nansum each group along the time axis, and makes it into a cube again
    self.data = np.transpose([np.nansum(frame_grp, axis=2) for frame_grp in datacube_twoframegroups], (1,2,0))

    # Update the relative exposures of each frame
    # Splits in groups of n_factor along the time axis
    frame_exposures_twoframegroups = np.split(self.relative_frame_exposures, np.arange(n_factor, self.shape[2], n_factor))

    # Update relative frame exposures and shape.
    self.relative_frame_exposures = np.array([np.sum(exp_frame_grp) for exp_frame_grp in frame_exposures_twoframegroups])
    self.shape = self.data.shape

`remove_bti_frames()`

Return the cube without the masked nan frames.

Source code in exod/processing/data_cube.py

def remove_bti_frames(self):
    """Return the cube without the masked nan frames."""
    data_non_nan = self.data[:, :, ~self.bti_bin_idx_bool[:-1]]
    return data_non_nan

`video(savepath=None)`

Play each frame of the datacube sequentially in a matplotlib figure.

Source code in exod/processing/data_cube.py

def video(self, savepath=None):
    """Play each frame of the datacube sequentially in a matplotlib figure."""
    if np.isnan(self.data[:, :, 0]).all():
        logger.info('first frame all nan =no plot')
        return None

    fig, ax = plt.subplots()
    img = ax.imshow(self.data[:, :, 0].T, cmap=cmap_image(), animated=True, interpolation='none',
                    origin='lower')
    colorbar = fig.colorbar(img, ax=ax)

    def update(frame):
        ax.set_title(f'{self}\n{frame}/{num_frames}')
        img.set_array(self.data[:, :, frame].T)
        return img,

    num_frames = self.shape[2]
    ani = FuncAnimation(fig, update, frames=num_frames, interval=20)
    if savepath:
        logger.info(f'Saving {self} to {savepath}')
        ani.save(savepath)
    plt.show()

`extract_lc(data_cube, xhi, xlo, yhi, ylo, dtype=np.int32)`

Extract a lightcurve from a data cube by summing through a bounding box.

Source code in exod/processing/data_cube.py

def extract_lc(data_cube, xhi, xlo, yhi, ylo, dtype=np.int32):
    """Extract a lightcurve from a data cube by summing through a bounding box."""
    data = data_cube[xlo:xhi, ylo:yhi]
    lc = np.nansum(data, axis=(0, 1), dtype=dtype)
    return lc

`detector`

`Detector`

Obsolete detector class that re-implements the method used in Inés Pastor-Marazuela (2020) See: https://arxiv.org/abs/2005.08673

Source code in exod/processing/detector.py

class Detector:
    """
    Obsolete detector class that re-implements the method used in Inés Pastor-Marazuela (2020)
    See: https://arxiv.org/abs/2005.08673
    """
    def __init__(self, data_cube, wcs, sigma=4):
        self.data_cube = data_cube
        self.wcs = wcs
        self.sigma = sigma

        self.max_area_bbox = 6**2
        self.n_sources_max = 15  # Number of sources above which to optimise sigma
        self.n_target_regions = 10 # Number of Sources to aim for when optimising.

        self.df_regions = None
        self.df_sky = None

    def __repr__(self):
        return f'Detector({self.data_cube})'

    def run(self):
        self.image_var  = self.calc_image_var()
        self.image_var_mean = np.mean(self.image_var)
        self.image_var_std  = np.std(self.image_var)
        # self.image_var = self.conv_var_img(box_size=3)
        self.df_regions = self.extract_var_regions(sigma=self.sigma)

        if self.n_sources > self.n_sources_max:
            logger.info('More than 15 sources found! Optimising sigma.')
            self.sigma = self.optimise_sigma(self.image_var, n_target_regions=self.n_target_regions)
            self.threshold = self.calc_extraction_threshold(self.sigma)
            self.df_regions = self.extract_var_regions(sigma=self.sigma)

        self.df_sky = get_regions_sky_position(self.data_cube, self.df_regions, self.wcs)
        self.df_regions = pd.concat([self.df_regions, self.df_sky], axis=1)

        self.df_regions = self.filter_df_regions()
        self.df_lcs     = self.get_region_lightcurves()

    def calc_image_var(self):
        """
        Calculate the variability image from a data cube.
        """
        logger.info('Computing variability...')
        image_max = np.nanmax(self.data_cube.data, axis=2)
        image_std = np.nanstd(self.data_cube.data, axis=2)
        image_mean = np.nanmean(self.data_cube.data, axis=2)
        image_var = (image_max - image_mean) * image_std
        return image_var

    def conv_var_img(self, box_size=3):
        """
        Convolve the variability image with a box kernel.
        """
        logger.info('Convolving variability...')
        kernel = np.ones((box_size, box_size)) / box_size ** 2
        image_var_conv = convolve(self.image_var, kernel)

        # New version
        # convolved = gaussian_filter(image_var, 1)
        # convolved = np.where(image_var>0, convolved, 0)
        return image_var_conv

    def calc_extraction_threshold(self, sigma):
        """Calculate the extraction threshold for a given sigma value."""
        self.image_var_threshold = self.image_var_mean + sigma * self.image_var_std
        return self.image_var_threshold

    def extract_var_regions(self, sigma=5):
        """
        Use skimage to obtain the contiguous regions in the variability image.

        We currently used the weighted centroid which calculates a centroid based on
        both the size of the bounding box and the values of the pixels themselves.

        https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

        Returns
        -------
        df_regions : DataFrame of Detected Regions
        """
        logger.info('Extracting variable Regions')
        image_var_mask = self.image_var > self.calc_extraction_threshold(sigma)

        logger.info(f'threshold: {self.calc_extraction_threshold(sigma)} sigma: {sigma}')
        # self.plot_threshold_level(self.calc_extraction_threshold(sigma))

        # Obtain the region properties for the detected regions.
        df_regions = calc_df_regions(image=self.image_var, image_mask=image_var_mask)


        # Sort by Most Variable and reset in the label column
        df_regions = df_regions.sort_values(by='intensity_mean', ascending=False).reset_index(drop=True)
        df_regions['label'] = df_regions.index
        return df_regions

    def objective_function(self, sigma, image, n_target_regions):
        """Function used for optimising sigma detection threshold."""
        threshold = self.calc_extraction_threshold(sigma)
        image_mask = image > threshold
        labeled_image = label(image_mask)
        regions = regionprops(labeled_image)
        n_regions = len(regions)
        # logger.info(f'sigma={sigma:.4f} threshold={threshold:.4f} n_regions={n_regions} target={n_target_regions}')
        return np.abs(n_regions - n_target_regions)

    def optimise_sigma(self, image, n_target_regions, sigma_bounds=(0.1, 20.0)):
        """Find the optimal value of sigma to get n_target regions."""
        result = minimize_scalar(self.objective_function, args=(image, n_target_regions), bounds=sigma_bounds, method='bounded')
        logger.info(f'Optimisation results:\n{result}')
        return result.x

    def plot_threshold_level(self, threshold):
        plt.figure(figsize=(10, 3))
        plt.title('Thresholding')
        plt.plot(self.image_var.flatten(), label='Variability')
        plt.axhline(threshold, color='red', label=fr'threshold={threshold:.2f} ({self.sigma:.2f}$\sigma$)')
        plt.legend()
        plt.ylabel('V score')
        # plt.show()

    def filter_df_regions(self):
        logger.info('Removing regions with area_bbox > 12')
        df_regions = self.df_regions[self.df_regions['area_bbox'] <= self.max_area_bbox]
        return df_regions

    def get_region_lightcurves(self):
        """
        Extract the lightcurves from the variable regions found.
        Returns
        -------
        df_lc : DataFrame of Lightcurves
        """
        df_lcs = get_region_lcs(data_cube=self.data_cube, df_regions=self.df_regions)
        return df_lcs

    def plot_region_lightcurves(self, savedir=None, max_sources=15):
        if self.n_sources > max_sources:
            logger.info(f'{self.n_sources} > {max_sources} Not plotting lightcurves')
            return None

        logger.info(f'Plotting regions lightcurves')
        savepath = None
        for i, row in self.df_regions.iterrows():
            if savedir:
                savepath = savedir / f'lc_reg_{i}.png'
            plot_region_lightcurve(self.df_lcs, i, savepath=savepath)

    @property
    def n_sources(self):
        return len(self.df_regions)

    @property
    def info(self):
        info = {'data_cube'           : repr(self.data_cube),
                'sigma'               : self.sigma,
                'max_area_bbox'       : self.max_area_bbox,
                'n_sources_max'       : self.n_sources_max,
                'n_target_regions'    : self.n_target_regions,
                'image_var_mean'      : self.image_var_mean,
                'image_var_std'       : self.image_var_std,
                'image_var_threshold' : self.image_var_threshold,
                'n_sources'           : self.n_sources}

        for k, v in info.items():
            logger.info(f'{k:>20} : {v}')
        return info

`calc_extraction_threshold(sigma)`

Calculate the extraction threshold for a given sigma value.

Source code in exod/processing/detector.py

def calc_extraction_threshold(self, sigma):
    """Calculate the extraction threshold for a given sigma value."""
    self.image_var_threshold = self.image_var_mean + sigma * self.image_var_std
    return self.image_var_threshold

`calc_image_var()`

Calculate the variability image from a data cube.

Source code in exod/processing/detector.py

def calc_image_var(self):
    """
    Calculate the variability image from a data cube.
    """
    logger.info('Computing variability...')
    image_max = np.nanmax(self.data_cube.data, axis=2)
    image_std = np.nanstd(self.data_cube.data, axis=2)
    image_mean = np.nanmean(self.data_cube.data, axis=2)
    image_var = (image_max - image_mean) * image_std
    return image_var

`conv_var_img(box_size=3)`

Convolve the variability image with a box kernel.

Source code in exod/processing/detector.py

def conv_var_img(self, box_size=3):
    """
    Convolve the variability image with a box kernel.
    """
    logger.info('Convolving variability...')
    kernel = np.ones((box_size, box_size)) / box_size ** 2
    image_var_conv = convolve(self.image_var, kernel)

    # New version
    # convolved = gaussian_filter(image_var, 1)
    # convolved = np.where(image_var>0, convolved, 0)
    return image_var_conv

`extract_var_regions(sigma=5)`

Use skimage to obtain the contiguous regions in the variability image.

We currently used the weighted centroid which calculates a centroid based on both the size of the bounding box and the values of the pixels themselves.

https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

Returns

df_regions : DataFrame of Detected Regions

Source code in exod/processing/detector.py

def extract_var_regions(self, sigma=5):
    """
    Use skimage to obtain the contiguous regions in the variability image.

    We currently used the weighted centroid which calculates a centroid based on
    both the size of the bounding box and the values of the pixels themselves.

    https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

    Returns
    -------
    df_regions : DataFrame of Detected Regions
    """
    logger.info('Extracting variable Regions')
    image_var_mask = self.image_var > self.calc_extraction_threshold(sigma)

    logger.info(f'threshold: {self.calc_extraction_threshold(sigma)} sigma: {sigma}')
    # self.plot_threshold_level(self.calc_extraction_threshold(sigma))

    # Obtain the region properties for the detected regions.
    df_regions = calc_df_regions(image=self.image_var, image_mask=image_var_mask)


    # Sort by Most Variable and reset in the label column
    df_regions = df_regions.sort_values(by='intensity_mean', ascending=False).reset_index(drop=True)
    df_regions['label'] = df_regions.index
    return df_regions

`get_region_lightcurves()`

Extract the lightcurves from the variable regions found. Returns

df_lc : DataFrame of Lightcurves

Source code in exod/processing/detector.py

def get_region_lightcurves(self):
    """
    Extract the lightcurves from the variable regions found.
    Returns
    -------
    df_lc : DataFrame of Lightcurves
    """
    df_lcs = get_region_lcs(data_cube=self.data_cube, df_regions=self.df_regions)
    return df_lcs

`objective_function(sigma, image, n_target_regions)`

Function used for optimising sigma detection threshold.

Source code in exod/processing/detector.py

def objective_function(self, sigma, image, n_target_regions):
    """Function used for optimising sigma detection threshold."""
    threshold = self.calc_extraction_threshold(sigma)
    image_mask = image > threshold
    labeled_image = label(image_mask)
    regions = regionprops(labeled_image)
    n_regions = len(regions)
    # logger.info(f'sigma={sigma:.4f} threshold={threshold:.4f} n_regions={n_regions} target={n_target_regions}')
    return np.abs(n_regions - n_target_regions)

`optimise_sigma(image, n_target_regions, sigma_bounds=(0.1, 20.0))`

Find the optimal value of sigma to get n_target regions.

Source code in exod/processing/detector.py

def optimise_sigma(self, image, n_target_regions, sigma_bounds=(0.1, 20.0)):
    """Find the optimal value of sigma to get n_target regions."""
    result = minimize_scalar(self.objective_function, args=(image, n_target_regions), bounds=sigma_bounds, method='bounded')
    logger.info(f'Optimisation results:\n{result}')
    return result.x

`calc_ks_poission(lc)`

Calculate the KS Probability assuming the lightcurve was created from a Poisson distribution with the mean of the lightcurve.

Source code in exod/processing/detector.py

def calc_ks_poission(lc):
    """
    Calculate the KS Probability assuming the lightcurve was
    created from a Poisson distribution with the mean of the lightcurve.
    """
    #logger.info("Calculating KS Probability, assuming a Poission Distribution")
    lc_mean = mean_of_poisson = np.nanmean(lc)
    N_data = len(lc)
    result = kstest(lc, [lc_mean] * N_data)
    expected_distribution = poisson(mean_of_poisson)
    ks_res = kstest(lc, expected_distribution.cdf)
    logger.debug(f'KS_prob: lc_mean = {lc_mean}, N_data = {N_data}\nks_res = {ks_res}')
    return ks_res

`plot_image_with_regions(image, df_regions, cbar_label=None, savepath=None)`

Plot the variability image with the bounding boxes of the detected regions.

Parameters

image : np.ndarray : Variability Image (or Likelihood Image) df_regions : pd.DataFrame : from extract_variability_regions cbar_label : string : Label for colorbar savepath : str : Path to save the figure to

Source code in exod/processing/detector.py

def plot_image_with_regions(image, df_regions, cbar_label=None, savepath=None):
    """
    Plot the variability image with the bounding boxes of the detected regions.

    Parameters
    ----------
    image       : np.ndarray : Variability Image  (or Likelihood Image)
    df_regions  : pd.DataFrame : from extract_variability_regions
    cbar_label  : string : Label for colorbar
    savepath    : str : Path to save the figure to
    """
    logger.info('Plotting Image with source regions')

    fig, ax = plt.subplots(figsize=(8,8))
    ax.set_title(f'Detected Regions : {len(df_regions)}')

    cmap = cmap_image()

    norm = ImageNormalize(stretch=SqrtStretch()) #LogNorm()

    m1 = ax.imshow(image.T, norm=norm, interpolation='none', origin='lower', cmap=cmap)
    cbar = plt.colorbar(mappable=m1, ax=ax, shrink=0.75)
    cbar.set_label(cbar_label)

    src_color = 'lime'
    ax.scatter(df_regions['weighted_centroid-0'], df_regions['weighted_centroid-1'], marker='+', s=10, color='white')
    ax.scatter(df_regions['centroid-0'], df_regions['centroid-1'], marker='.', s=10, color=src_color)

    for i, row in df_regions.iterrows():
        ind   = row['label']
        x_cen = row['centroid-0']
        y_cen = row['centroid-1']

        width  = row['bbox-2'] - row['bbox-0']
        height = row['bbox-3'] - row['bbox-1']

        x_pos = x_cen - width / 2
        y_pos = y_cen - height / 2

        rect = patches.Rectangle(xy=(x_pos, y_pos),
                                 width=width,
                                 height=height,
                                 linewidth=1,
                                 edgecolor=src_color,
                                 facecolor='none')

        plt.text(x_pos+width, y_pos+height, str(ind), c=src_color)
        ax.add_patch(rect)
    plt.tight_layout()
    if savepath:
        logger.info(f'Saving Image to: {savepath}')
        plt.savefig(savepath)
    plt.show()

`plot_region_lightcurve(df_lcs, i, savepath=None)`

Plot the ith region lightcurve.

Source code in exod/processing/detector.py

def plot_region_lightcurve(df_lcs, i, savepath=None):
    """Plot the ith region lightcurve."""
    lc = df_lcs[f'src_{i}']
    fig, ax = plt.subplots(figsize=(10, 4))
    ax2 = ax.twiny()
    ax.step(df_lcs['time'], lc, where='post', color='black', lw=1.0)
    ax2.step(range(len(lc)), lc, where='post', color='none', lw=1.0)

    ax.set_title(f'Source #{i}')
    ax.set_ylabel('Counts (N)')
    ax.set_xlabel('Time (s)')
    ax.set_xlim(df_lcs['time'].min(), df_lcs['time'].max())
    ax2.set_xlabel('Window/Frame Number')
    plt.tight_layout()

    if savepath:
        logger.info(f'Saving lightcurve plot to: {savepath}')
        plt.savefig(savepath)

`pipeline`

`Pipeline`

Pipeline to process a single XMM-Newton observation and find variable sources.

Attributes:
obsid (str): Observation ID.
size_arcsec (float): Size of the binned pixels in arcseconds.
time_interval (int): Time interval in seconds.
gti_only (bool): Use only Good Time Intervals.
remove_partial_ccd_frames (bool): Remove partial CCD frames.
min_energy (float): Minimum energy in keV.
max_energy (float): Maximum energy in keV.
clobber (bool): Overwrite existing files.
threshold_sigma (float): Sigma threshold for variability detection.
precomputed_bayes_limit (PrecomputeBayesLimits): Precomputed Bayes Limits object.
observation (Observation): Observation object.

runid (str): Unique identifier for the run.
event_list (EventList): EventList() object.
data_cube (XMMDataCube): XMMDataCube() object containing the number of photons in each cell.
subset_number (int): The subset number.
total_subsets (int): Total number of subsets.
gti_threshold (float): Threshold for Good Time Intervals.
n_regions (int): Number of regions detected.

savedir (Path): Directory to save results to.

Source code in exod/processing/pipeline.py

class Pipeline:
    """
    Pipeline to process a single XMM-Newton observation and find variable sources.

        Attributes:
        obsid (str): Observation ID.
        size_arcsec (float): Size of the binned pixels in arcseconds.
        time_interval (int): Time interval in seconds.
        gti_only (bool): Use only Good Time Intervals.
        remove_partial_ccd_frames (bool): Remove partial CCD frames.
        min_energy (float): Minimum energy in keV.
        max_energy (float): Maximum energy in keV.
        clobber (bool): Overwrite existing files.
        threshold_sigma (float): Sigma threshold for variability detection.
        precomputed_bayes_limit (PrecomputeBayesLimits): Precomputed Bayes Limits object.
        observation (Observation): Observation object.

        runid (str): Unique identifier for the run.
        event_list (EventList): EventList() object.
        data_cube (XMMDataCube): XMMDataCube() object containing the number of photons in each cell.
        subset_number (int): The subset number.
        total_subsets (int): Total number of subsets.
        gti_threshold (float): Threshold for Good Time Intervals.
        n_regions (int): Number of regions detected.

        savedir (Path): Directory to save results to.
    """
    def __init__(self, obsid='0724210501', size_arcsec=20.0, time_interval=50, gti_only=False, remove_partial_ccd_frames=True,
                 min_energy=0.2, max_energy=10.0, clobber=False, threshold_sigma=3, **kwargs):
        self.obsid = obsid
        self.size_arcsec = size_arcsec
        self.time_interval = time_interval
        self.gti_only = gti_only
        self.remove_partial_ccd_frames = remove_partial_ccd_frames
        self.min_energy = min_energy
        self.max_energy = max_energy
        self.clobber = clobber
        self.threshold_sigma = threshold_sigma
        self.precomputed_bayes_limit = PrecomputeBayesLimits(threshold_sigma)
        self.observation = Observation(obsid)

        self.runid = None
        self.event_list = None
        self.data_cube = None
        self.subset_number = None
        self.total_subsets = None
        self.gti_threshold = None
        self.n_regions = None

        self.savedir = None

    def generate_runid(self):
        """Generate a unique runid of the form 0765080801_0_50_0.2_12.0"""
        runid = f'{self.obsid}_{self.subset_number}_{self.time_interval}_{self.min_energy}_{self.max_energy}'
        self.runid = runid
        return runid

    def set_savedir(self):
        """Create the directory to save results to."""
        self.savedir = self.observation.path_results / self.generate_runid()
        self.savedir.mkdir(parents=True, exist_ok=True)
        logger.info(f'savedir set to: {self.savedir}')
        return self.savedir

    def pre_process(self):
        """Pre-process the data."""
        self.observation.download_events()
        self.observation.filter_events(clobber=self.clobber)
        self.observation.create_images(clobber=self.clobber)
        self.observation.get_files()
        self.observation.get_events_overlapping_subsets()
        self.subsets       = self.observation.get_events_overlapping_subsets()
        self.total_subsets = self.observation.get_number_of_overlapping_subsets()

    def run(self):
        self.pre_process()
        for i, s in enumerate(self.subsets):
            self.run_subset(subset_number=i, subset_overlapping_exposures=s)

    def run_subset(self, subset_number, subset_overlapping_exposures):
        self.subset_number = subset_number
        self.set_savedir()
        self.generate_runid()

        self.event_list = EventList.from_event_lists(subset_overlapping_exposures)
        self.calculate_bti()
        self.event_list.filter_by_energy(self.min_energy, self.max_energy)

        self.data_cube = DataCube(self.event_list, self.size_arcsec, self.time_interval)
        self.data_cube.calc_gti_bti_bins(bti=self.bti)
        self.data_cube.mask_frames_with_partial_ccd_exposure(mask_frames=self.remove_partial_ccd_frames)
        if self.gti_only:
            self.data_cube.mask_bti()
        cube_n = self.data_cube

        if sas_is_installed():
            img = self.observation.images[0]
            img.read(wcs_only=True)
            wcs = img.wcs
        else:
            wcs = wcs_from_eventlist_file(self.observation.events_raw[0].path)

        # Calculate Expectation Cube
        cube_mu = calc_cube_mu(cube_n, wcs=wcs)
        cube_mask_peaks, cube_mask_eclipses = self.precomputed_bayes_limit.get_cube_masks_peak_and_eclipse(cube_n=cube_n.data, cube_mu=cube_mu)

        # Extract information for each significant data cell (3D pixel)
        df_significant_cells = self.get_significant_cells(cube_mask_peaks, cube_mask_eclipses, cube_n)
        self.filter_events_from_significant_cells(df_significant_cells, self.event_list)
        unique_xy = self.get_significant_pixels_from_cells(df_significant_cells)

        # Plot Lightcurves for each cell.
        for x_cube, y_cube in unique_xy:
            plot_pixel_lc(cube_mu, cube_n, x_cube, y_cube, plot=True)

        # Get Images.
        image_n       = np.nansum(cube_n.data, axis=2)                # Total Counts.
        image_peak    = np.nansum(cube_mask_peaks, axis=2)            # Each Pixel is the number of peaks in cube_n
        image_eclipse = np.nansum(cube_mask_eclipses, axis=2)         # Each Pixel is the number of eclipses in cube_n
        image_mask_combined = (image_peak > 0) | (image_eclipse > 0)  # Combine peaks and eclipses

        df_reg = calc_df_regions(image=image_n, image_mask=image_mask_combined)
        df_reg['detid'] = self.runid + '_' + df_reg['label'].astype('str') # Create detection id from runid_label
        df_sky = get_regions_sky_position(data_cube=cube_n, df_regions=df_reg, wcs=wcs)
        df_regions = pd.concat([df_reg, df_sky], axis=1).reset_index(drop=True)
        self.n_regions = len(df_regions)

        dfs_lcs = get_region_lightcurves(df_regions, cube_n, cube_mu)

        # Plot Lightcurves for each region
        if dfs_lcs:
            for df_lc in dfs_lcs:
                plot_df_lc(df_lc=df_lc, savedir=self.savedir)

        # Plot Image
        plot_detection_image(df_regions, image_eclipse, image_n, image_peak, savepath=self.savedir / 'detection_img.png')

        # Save Results
        self.save_results(dc_info=cube_n.info, df_alerts=df_significant_cells, df_regions=df_regions, dfs_lcs=dfs_lcs,
                          df_bti=self.df_bti, evt_info=self.event_list.info)
        # plt.show()
        # plt.close('all')
        # plt.clf()

    def calculate_bti(self):
        """Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold."""
        self.gti_threshold = get_gti_threshold(self.event_list.N_event_lists)
        t_bin_he, lc_he = self.event_list.get_high_energy_lc(self.time_interval)
        self.bti = get_bti(time=t_bin_he, data=lc_he, threshold=self.gti_threshold)
        self.df_bti = pd.DataFrame(self.bti)

    def multiply_time_interval(self, n_factor):
        logger.info(f'Rebinning the cube with longer timebins by factor {n_factor}...')
        self.data_cube.multiply_time_interval(n_factor)
        self.time_interval = self.data_cube.time_interval
        self.calculate_bti()
        self.data_cube.calc_gti_bti_bins(bti=self.bti)

    def get_significant_pixels_from_cells(self, df_significant_cells):
        """Extract the unique x,y values (2D Pixels) from the significant (3D) cells."""
        if len(df_significant_cells) == 0:
            return []
        unique_xy = list(df_significant_cells.groupby(['x_cube', 'y_cube']).groups.keys())
        return unique_xy

    def filter_events_from_significant_cells(self, df_alerts, event_list):
        """Extract the events in the events list for each significant cell."""
        for i, r in df_alerts.iterrows():
            evt_subset = event_list.get_events_within_bounds(X_lo=r['X_lo'], X_hi=r['X_hi'],
                                                             Y_lo=r['Y_lo'], Y_hi=r['Y_hi'],
                                                             TIME_lo=r['TIME_lo'], TIME_hi=r['TIME_hi'])
            logger.info(r)
            logger.info(evt_subset)
            logger.info(f'N_events = {len(evt_subset)}')

            """
            # Create an image of the significant cell.
            for instrument in np.unique(evt_subset['INSTRUMENT']):
                sub = evt_subset[evt_subset['INSTRUMENT'] == instrument]
                rawx = sub['RAWX']
                rawy = sub['RAWY']
                xbins = np.arange(rawx.min()-1, rawx.max()+1, 1)
                ybins = np.arange(rawy.min()-1, rawy.max()+1, 1)

                try:
                    hist, xedges, yedges = np.histogram2d(rawx, rawy, bins=(xbins, ybins))

                    n = int(hist.max()) + 1
                    cmap = plt.get_cmap('hot', n)
                    boundaries = np.arange(-0.5, n + 0.5, 1)
                    norm = colors.BoundaryNorm(boundaries, ncolors=n)

                    plt.figure()
                    plt.title(f'{instrument} ({r['x_cube']},{r['y_cube']},{r['t_cube']})')
                    plt.imshow(hist.T, origin="lower", aspect="equal", cmap=cmap, norm=norm,
                               extent=(xedges[0], xedges[-1], yedges[0], yedges[-1]), interpolation='none')
                    plt.colorbar(shrink=0.5)
                    plt.xlabel("RAWX")
                    plt.ylabel("RAWY")
                    plt.subplots_adjust(left=0, right=0, top=0, bottom=0)
                except Exception as e:
                    print(f'Could not do {i} {e}')
            """

    def get_significant_cells(self, cube_mask_peaks, cube_mask_eclipses, cube_n):
        """Extract information for each pixel in the datacube that is associated with an alert."""
        logger.info('Extracting significant 3D pixels from data cube.')
        # Get the positions (x,y,t) of the bursts and eclipses
        peak_positions    = np.argwhere(cube_mask_peaks == 1)
        eclipse_positions = np.argwhere(cube_mask_eclipses == 1)

        alerts = []
        for pos in peak_positions:
            pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
            pixel_alert_info['type'] = 'peak'
            alerts.append(pixel_alert_info)

        for pos in eclipse_positions:
            pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
            pixel_alert_info['type'] = 'eclipse'
            alerts.append(pixel_alert_info)

        df_alerts = pd.DataFrame(alerts)
        logger.info('df_alerts:')
        logger.info(df_alerts)
        return df_alerts

    def extract_pixel_alert_info(self, cube_n, pixel_xyt_pos):
        x_cube, y_cube, t_cube = pixel_xyt_pos
        X    = cube_n.bin_x[x_cube]
        Y    = cube_n.bin_y[y_cube]
        TIME = cube_n.bin_t[t_cube]

        X_size, Y_size, TIME_size = cube_n.pixel_size, cube_n.pixel_size, self.time_interval

        t_depth   = 2
        TIME_size = t_depth * TIME_size

        alert = {'x_cube'    : x_cube,
                 'y_cube'    : y_cube,
                 't_cube'    : t_cube,
                 'X'         : X,
                 'Y'         : Y,
                 'TIME'      : TIME,
                 'X_size'    : X_size,
                 'Y_size'    : Y_size,
                 'TIME_size' : TIME_size,
                 'X_lo'      : (X - X_size),
                 'X_hi'      : (X + X_size),
                 'Y_lo'      : (Y - Y_size),
                 'Y_hi'      : (Y + Y_size),
                 'TIME_lo'   : (TIME - TIME_size),
                 'TIME_hi'   : (TIME + TIME_size)}
        return alert

    def save_results(self, dc_info, df_alerts, df_regions, dfs_lcs, df_bti, evt_info):
        results = {}
        # Collect DataFrames
        if dfs_lcs:
            for i, df_lc in enumerate(dfs_lcs):
                results[f'lc_{i}'] = df_lc

        results['bti']     = df_bti
        results['regions'] = df_regions
        results['alerts']  = df_alerts

        # Collect info
        results['obs_info'] = self.observation.info
        results['evt_info'] = evt_info
        results['dc_info']  = dc_info
        results['run_info'] = self.info
        for k, v in results.items():
            save_result(key=k, value=v, runid=self.runid, savedir=self.savedir)

    def load_subset_results(self, subset_number):
        """Load results for a single observation subset. returns a dictionary."""
        self.subset_number = subset_number
        self.set_savedir()
        results = {}
        csv_files = list(self.savedir.glob('*.csv'))
        n_csv_files = len(csv_files)
        if n_csv_files == 0:
            logger.info(f'No csv files found in {self.savedir}!')
            return None
        for f in csv_files:
            if 'info' in f.stem:
                results[f.stem] = load_info(loadpath=f)
            else:
                results[f.stem] = load_df(loadpath=f)
        return results

    def load_results(self):
        """Load results for all observation subsets, returns a list of dictionaries."""
        all_results = [self.load_subset_results(i) for i in range(self.total_subsets)]
        return all_results

    @property
    def info(self):
        info = {
            'runid'                     : self.runid,
            'obsid'                     : self.obsid,
            'subset_number'             : self.subset_number,
            'total_subsets'             : self.total_subsets,
            'size_arcsec'               : self.size_arcsec,
            'time_interval'             : self.time_interval,
            'gti_only'                  : self.gti_only,
            'gti_threshold'             : self.gti_threshold,
            'remove_partial_ccd_frames' : self.remove_partial_ccd_frames,
            'min_energy'                : self.min_energy,
            'max_energy'                : self.max_energy,
            'clobber'                   : self.clobber,
            'threshold_sigma'           : self.threshold_sigma,
            'savedir'                   : self.savedir,
            'n_regions'                 : self.n_regions,
        }
        for k, v in info.items():
            logger.info(f'{k:>25} : {v}')
        return info

`calculate_bti()`

Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold.

Source code in exod/processing/pipeline.py

def calculate_bti(self):
    """Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold."""
    self.gti_threshold = get_gti_threshold(self.event_list.N_event_lists)
    t_bin_he, lc_he = self.event_list.get_high_energy_lc(self.time_interval)
    self.bti = get_bti(time=t_bin_he, data=lc_he, threshold=self.gti_threshold)
    self.df_bti = pd.DataFrame(self.bti)

`filter_events_from_significant_cells(df_alerts, event_list)`

Extract the events in the events list for each significant cell.

Source code in exod/processing/pipeline.py

def filter_events_from_significant_cells(self, df_alerts, event_list):
    """Extract the events in the events list for each significant cell."""
    for i, r in df_alerts.iterrows():
        evt_subset = event_list.get_events_within_bounds(X_lo=r['X_lo'], X_hi=r['X_hi'],
                                                         Y_lo=r['Y_lo'], Y_hi=r['Y_hi'],
                                                         TIME_lo=r['TIME_lo'], TIME_hi=r['TIME_hi'])
        logger.info(r)
        logger.info(evt_subset)
        logger.info(f'N_events = {len(evt_subset)}')

        """
        # Create an image of the significant cell.
        for instrument in np.unique(evt_subset['INSTRUMENT']):
            sub = evt_subset[evt_subset['INSTRUMENT'] == instrument]
            rawx = sub['RAWX']
            rawy = sub['RAWY']
            xbins = np.arange(rawx.min()-1, rawx.max()+1, 1)
            ybins = np.arange(rawy.min()-1, rawy.max()+1, 1)

            try:
                hist, xedges, yedges = np.histogram2d(rawx, rawy, bins=(xbins, ybins))

                n = int(hist.max()) + 1
                cmap = plt.get_cmap('hot', n)
                boundaries = np.arange(-0.5, n + 0.5, 1)
                norm = colors.BoundaryNorm(boundaries, ncolors=n)

                plt.figure()
                plt.title(f'{instrument} ({r['x_cube']},{r['y_cube']},{r['t_cube']})')
                plt.imshow(hist.T, origin="lower", aspect="equal", cmap=cmap, norm=norm,
                           extent=(xedges[0], xedges[-1], yedges[0], yedges[-1]), interpolation='none')
                plt.colorbar(shrink=0.5)
                plt.xlabel("RAWX")
                plt.ylabel("RAWY")
                plt.subplots_adjust(left=0, right=0, top=0, bottom=0)
            except Exception as e:
                print(f'Could not do {i} {e}')
        """

`generate_runid()`

Generate a unique runid of the form 0765080801_0_50_0.2_12.0

Source code in exod/processing/pipeline.py

def generate_runid(self):
    """Generate a unique runid of the form 0765080801_0_50_0.2_12.0"""
    runid = f'{self.obsid}_{self.subset_number}_{self.time_interval}_{self.min_energy}_{self.max_energy}'
    self.runid = runid
    return runid

`get_significant_cells(cube_mask_peaks, cube_mask_eclipses, cube_n)`

Extract information for each pixel in the datacube that is associated with an alert.

Source code in exod/processing/pipeline.py

def get_significant_cells(self, cube_mask_peaks, cube_mask_eclipses, cube_n):
    """Extract information for each pixel in the datacube that is associated with an alert."""
    logger.info('Extracting significant 3D pixels from data cube.')
    # Get the positions (x,y,t) of the bursts and eclipses
    peak_positions    = np.argwhere(cube_mask_peaks == 1)
    eclipse_positions = np.argwhere(cube_mask_eclipses == 1)

    alerts = []
    for pos in peak_positions:
        pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
        pixel_alert_info['type'] = 'peak'
        alerts.append(pixel_alert_info)

    for pos in eclipse_positions:
        pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
        pixel_alert_info['type'] = 'eclipse'
        alerts.append(pixel_alert_info)

    df_alerts = pd.DataFrame(alerts)
    logger.info('df_alerts:')
    logger.info(df_alerts)
    return df_alerts

`get_significant_pixels_from_cells(df_significant_cells)`

Extract the unique x,y values (2D Pixels) from the significant (3D) cells.

Source code in exod/processing/pipeline.py

def get_significant_pixels_from_cells(self, df_significant_cells):
    """Extract the unique x,y values (2D Pixels) from the significant (3D) cells."""
    if len(df_significant_cells) == 0:
        return []
    unique_xy = list(df_significant_cells.groupby(['x_cube', 'y_cube']).groups.keys())
    return unique_xy

`load_results()`

Load results for all observation subsets, returns a list of dictionaries.

Source code in exod/processing/pipeline.py

def load_results(self):
    """Load results for all observation subsets, returns a list of dictionaries."""
    all_results = [self.load_subset_results(i) for i in range(self.total_subsets)]
    return all_results

`load_subset_results(subset_number)`

Load results for a single observation subset. returns a dictionary.

Source code in exod/processing/pipeline.py

def load_subset_results(self, subset_number):
    """Load results for a single observation subset. returns a dictionary."""
    self.subset_number = subset_number
    self.set_savedir()
    results = {}
    csv_files = list(self.savedir.glob('*.csv'))
    n_csv_files = len(csv_files)
    if n_csv_files == 0:
        logger.info(f'No csv files found in {self.savedir}!')
        return None
    for f in csv_files:
        if 'info' in f.stem:
            results[f.stem] = load_info(loadpath=f)
        else:
            results[f.stem] = load_df(loadpath=f)
    return results

`pre_process()`

Pre-process the data.

Source code in exod/processing/pipeline.py

def pre_process(self):
    """Pre-process the data."""
    self.observation.download_events()
    self.observation.filter_events(clobber=self.clobber)
    self.observation.create_images(clobber=self.clobber)
    self.observation.get_files()
    self.observation.get_events_overlapping_subsets()
    self.subsets       = self.observation.get_events_overlapping_subsets()
    self.total_subsets = self.observation.get_number_of_overlapping_subsets()

`set_savedir()`

Create the directory to save results to.

Source code in exod/processing/pipeline.py

def set_savedir(self):
    """Create the directory to save results to."""
    self.savedir = self.observation.path_results / self.generate_runid()
    self.savedir.mkdir(parents=True, exist_ok=True)
    logger.info(f'savedir set to: {self.savedir}')
    return self.savedir

`combine_results(obsids)`

Get the results for each of the observations ids and combine them into joint dataframes and save them into their respective paths in exod.utils.path.data_combined.

Will raise an error if there are already files data_combined, to avoid accidentally deleting data.

Parameters:

Name	Type	Description	Default
`obsids`	`list`	list of observation IDs	required

Source code in exod/processing/pipeline.py

def combine_results(obsids):
    """
    Get the results for each of the observations ids and combine them into joint dataframes and save them into
    their respective paths in exod.utils.path.data_combined.

    Will raise an error if there are already files data_combined, to avoid accidentally deleting data.

    Parameters:
        obsids (list): list of observation IDs
    """
    if any([p.exists() for p in savepaths_combined.values()]):
        raise FileExistsError(f'There are already some combined files in the combined data path! '
                              f'Be careful! You dont want to accidentally overwrite these!')

    all_results = []
    for params in parameter_grid(obsids=obsids):
        p = Pipeline(**params)
        results_list = p.load_results()
        if results_list:
            for r in results_list:
                all_results.append(r)

    logger.info('Combining all DataFrames')
    df_bti      = pd.concat([r.get('bti') for r in all_results], axis=0)
    df_regions  = pd.concat([r.get('regions') for r in all_results], axis=0)
    df_alerts   = pd.concat([r.get('alerts') for r in all_results], axis=0)
    df_lc       = pd.concat([df for r in all_results for key, df in r.items() if key.startswith('lc_')])
    df_run_info = pd.DataFrame([r['run_info'] for r in all_results])
    df_obs_info = pd.DataFrame([r['obs_info'] for r in all_results])
    df_dc_info  = pd.DataFrame([r['dc_info'] for r in all_results])
    df_evt_info = pd.DataFrame([r['evt_info'] for r in all_results])

    make_df_lc_idx(df_lc)

    logger.info('Saving all DataFrames...')
    df_bti.to_csv(savepaths_combined['bti'], index=False)
    df_regions.to_csv(savepaths_combined['regions'], index=False)
    df_alerts.to_csv(savepaths_combined['alerts'], index=False)
    df_lc.to_hdf(savepaths_combined['lc'], key='df_lc', index=False, mode='w', format='table')
    df_run_info.to_csv(savepaths_combined['run_info'], index=False)
    df_obs_info.to_csv(savepaths_combined['obs_info'], index=False)
    df_dc_info.to_csv(savepaths_combined['dc_info'], index=False)
    df_evt_info.to_csv(savepaths_combined['evt_info'], index=False)

    logger.info(df_bti)
    logger.info(df_regions)
    logger.info(df_alerts)
    logger.info(df_lc)
    logger.info(df_run_info)
    logger.info(df_obs_info)
    logger.info(df_dc_info)
    logger.info(df_evt_info)

`make_df_lc_idx(df_lc)`

Calculate the start and end indexs of each of the lightcurves. This is done so that a specific runid + label combination can be quickly accessed.

Source code in exod/processing/pipeline.py

def make_df_lc_idx(df_lc):
    """
    Calculate the start and end indexs of each of the lightcurves.
    This is done so that a specific runid + label combination can be quickly accessed.
    """
    df_lc = df_lc.reset_index(drop=True)
    tot = 0
    combinations = {}
    for (label, runid), i in df_lc.groupby(['label', 'runid']).groups.items():
        combinations[(label, runid)] = (i[0], i[-1])

    df_start_stop = pd.DataFrame.from_dict(combinations, orient='index', columns=['start', 'stop'])
    df_start_stop = df_start_stop.sort_values('start')
    df_start_stop.to_csv(savepaths_combined['lc_idx'])
    logger.info(f'Saved df_lc_idx to {savepaths_combined["lc_idx"]}')

`parameter_grid(obsids)`

usage: for params in parameter_grid(['0792180301', '0792180302', ...]: print(params) pipeline = Pipeline(**params)

Parameters:

Name	Type	Description	Default
`obsids`	`list`	list of obsids	required

Returns params (dict): parameters for a run.

Source code in exod/processing/pipeline.py

def parameter_grid(obsids):
    """
    usage:
    for params in parameter_grid(['0792180301', '0792180302', ...]:
        print(params)
        pipeline = Pipeline(**params)

    Parameters:
        obsids (list): list of obsids

    Returns
        params (dict): parameters for a run.
    """
    parameter_grid = {
        'obsid'                     : obsids,
        'size_arcsec'               : [20.0],
        'time_interval'             : [300, 600, 1000],
        'gti_only'                  : [False],
        'remove_partial_ccd_frames' : [True],
        'energy_ranges'             : [[0.2, 2.0], [2.0, 12.0], [0.2, 12.0]],
        'clobber'                   : [False],
        'threshold_sigma'           : [3],
    }
    parameter_combinations = list(itertools.product(*parameter_grid.values()))
    logger.info(f'{len(parameter_combinations)} parameter combinations\n'
                f'predicted size ~{(500*len(parameter_combinations)) / 1e6} Gb\n'
                f'predicted runtime ~{1*len(parameter_combinations)/60/24:.2f} days (@ 1min/obs)')

    for combination in parameter_combinations:
        params = dict(zip(parameter_grid.keys(), combination))
        params['min_energy'] = params['energy_ranges'][0]
        params['max_energy'] = params['energy_ranges'][1]
        del(params['energy_ranges'])
        yield params

`plot_pixel_lc(cube_mu, cube_n, x, y, plot=True)`

Plot the lightcurve for a specific pixel (x,y), shows the three and five sigma error regions.

Source code in exod/processing/pipeline.py

def plot_pixel_lc(cube_mu, cube_n, x, y, plot=True):
    """Plot the lightcurve for a specific pixel (x,y), shows the three and five sigma error regions."""
    if not plot:
        return None
    cube_mu_xy = cube_mu[x, y]
    cube_data_xy = cube_n.data[x, y]
    time_interval = cube_n.time_interval

    # Plot lightcurves
    lw = 1.0
    frame_axis = np.arange(cube_mu.shape[2])  # Frame Number
    time_axis  = frame_axis * time_interval   # Zero'ed Time
    time_axis2 = cube_n.bin_t[:-1]            # Observation Time

    pbl_3_sig = PrecomputeBayesLimits(threshold_sigma=3)
    pbl_5_sig = PrecomputeBayesLimits(threshold_sigma=5)

    fig, ax = plt.subplots(2, 1, figsize=(15, 5), gridspec_kw={'height_ratios': [10, 1]}, sharex=True)
    ax[0].fill_between(time_axis, pbl_5_sig.n_eclipse_threshold(cube_mu_xy), pbl_5_sig.n_peak_threshold(cube_mu_xy), alpha=0.3, facecolor='steelblue', label=r'5 $\sigma$')
    ax[0].fill_between(time_axis, pbl_3_sig.n_eclipse_threshold(cube_mu_xy), pbl_3_sig.n_peak_threshold(cube_mu_xy), alpha=0.5, facecolor='steelblue', label=r'3 $\sigma$')
    ax[0].step(time_axis, cube_data_xy, color='black', where='mid', lw=lw, label=r'Observed ($n$)')
    ax[0].step(time_axis, cube_mu_xy, color='red', where='mid', lw=lw, label=r'Expected ($\mu$)')
    ax[1].step(time_axis, cube_mu_xy, color='red', where='mid', lw=lw, label=r'Expected ($\mu$)')
    ax2 = ax[0].twiny()
    ax2.plot(frame_axis, cube_data_xy, color='none')
    ax2.set_xlabel("Frame #")
    ax[0].legend()
    ax[1].set_xlabel("Time (s)")
    ax[0].set_ylabel("Counts")
    ax[1].set_ylabel("bkg")
    ax[0].set_ylim(0)
    ax[0].set_xlim(np.min(time_axis), np.max(time_axis))
    plt.subplots_adjust(hspace=0)
    plt.suptitle(f'Lightcurve for pixel x={x} y={y}')

`utils`

`path`

`check_file_exists(file_path, clobber=True)`

Check if a file exists and raise FileExistsError if clobber is False.

Parameters: - file_path (str or Path): The path to the file. - clobber (bool): If True, overwrite the file if it exists.

Raises: - FileExistsError: If the file exists and clobber is False.

Source code in exod/utils/path.py

def check_file_exists(file_path, clobber=True):
    """
    Check if a file exists and raise FileExistsError if clobber is False.

    Parameters:
    - file_path (str or Path): The path to the file.
    - clobber (bool): If True, overwrite the file if it exists.

    Raises:
    - FileExistsError: If the file exists and clobber is False.
    """
    if not clobber and Path(file_path).exists():
        raise FileExistsError(f'File {file_path} exists and clobber={clobber}!')

`create_all_paths()`

Create all paths if they don't exist.

Source code in exod/utils/path.py

def create_all_paths():
    """Create all paths if they don't exist."""
    for name, path in all_paths.items():
        os.makedirs(path, exist_ok=True)

`read_observation_ids(file_path)`

Read observation IDs from file. Each line should be a single observation.

Source code in exod/utils/path.py

def read_observation_ids(file_path):
    """
    Read observation IDs from file.
    Each line should be a single observation.
    """
    with open(file_path, 'r') as file:
        obs_ids = [line.strip() for line in file.readlines()]
    return obs_ids

`plotting`

`cmap_image()`

Colormap used in production of images.

Source code in exod/utils/plotting.py

def cmap_image():
    """Colormap used in production of images."""
    cmap = matplotlib.colormaps['hot']
    cmap.set_bad('black')
    return cmap

`compare_images(images, titles, log=False, plot=False)`

Creates a slideshow of multiple 2D numpy arrays of the same size. Args: images (list of np.ndarray): list of numpy arrays titles (list): titles of images log (bool): use Log normalisation for image plots. plot (bool): switch to disable or enable plotting.

Source code in exod/utils/plotting.py

def compare_images(images, titles, log=False, plot=False):
    """
    Creates a slideshow of multiple 2D numpy arrays of the same size.
    Args:
        images (list of np.ndarray): list of numpy arrays
        titles (list): titles of images
        log (bool): use Log normalisation for image plots.
        plot (bool): switch to disable or enable plotting.
    """
    if not plot:
        return None

    N_images = len(images)

    fig, ax = plt.subplots(figsize=(7, 7))
    ax.set_title(f'images[0]={titles[0]}')
    norm = None
    if log:
        norm = LogNorm()

    im = ax.imshow(images[0], norm=norm, cmap=cmap_image(), interpolation='none')
    plt.colorbar(im, ax=ax, shrink=0.75)
    plt.tight_layout()

    def update(frame):
        i = frame % N_images
        im.set_array(images[i])
        ax.set_title(f'images[{i}]={titles[i]}')

    ani = FuncAnimation(fig, update, frames=range(N_images), interval=1000)  # Adjust frames and interval as needed
    plt.show()

`fig2data_url(fig)`

Convert a matplotlib Figure() to a base64 png data url (for use in a html tag)

Source code in exod/utils/plotting.py

def fig2data_url(fig):
    """Convert a matplotlib Figure() to a base64 png data url (for use in a html <img> tag)"""
    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    data_url = base64.b64encode(buf.read()).decode('ascii')
    return data_url

`get_image_limits(image)`

Get the x and y limits of the non-zero pixels in an image.

Source code in exod/utils/plotting.py

def get_image_limits(image):
    """Get the x and y limits of the non-zero pixels in an image."""
    # Find the non-zero rows and columns
    non_zero_rows = np.any(image != 0, axis=1)
    non_zero_cols = np.any(image != 0, axis=0)

    # Get the min and max of the non-zero rows and columns
    ymin, ymax = np.where(non_zero_rows)[0][[0, -1]]
    xmin, xmax = np.where(non_zero_cols)[0][[0, -1]]

    return (xmin, xmax), (ymin, ymax)

`interactive_scatter_with_rectangles(x_data, y_data)`

Plot x and y data, allowing the user to interactively select rectangles. Prints the (x, y, width, height) of the selected rectangle in the terminal.

Parameters:

Name	Type	Description	Default
`x_data`	`array`	the x coordinates of the scatter data.	required
`y_data`	`array`	the y coordinates of the scatter data.	required

Source code in exod/utils/plotting.py

def interactive_scatter_with_rectangles(x_data, y_data):
    """
    Plot x and y data, allowing the user to interactively select rectangles.
    Prints the (x, y, width, height) of the selected rectangle in the terminal.

    Args:
        x_data (array): the x coordinates of the scatter data.
        y_data (array): the y coordinates of the scatter data.
    """
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, color='black', alpha=0.5, s=1)

    def onselect(eclick, erelease):
        x1, y1 = eclick.xdata, eclick.ydata
        x2, y2 = erelease.xdata, erelease.ydata
        width  = abs(x2 - x1)
        height = abs(y2 - y1)

        print(f"({x1:.2f}, {y1:.2f}, {width:.2f}, {height:.2f}) (x,y,w,h)")
        rect = plt.Rectangle((min(x1, x2), min(y1, y2)), width, height, linewidth=2, edgecolor='red', facecolor='none')
        ax.add_patch(rect)
        last_rect = rect
        plt.draw()

    rect_selector = RectangleSelector(ax, onselect, useblit=True, button=[1], minspanx=0.1, minspany=0.1, spancoords='data', interactive=True)
    plt.show()

`plot_3d_image(image)`

Plot an image as a 3D surface

Source code in exod/utils/plotting.py

def plot_3d_image(image):
    """Plot an image as a 3D surface"""
    xx, yy = np.mgrid[0:image.shape[0], 0:image.shape[1]]
    fig = plt.figure(figsize=(15, 15))
    ax = fig.add_subplot(projection='3d')
    ax.plot_surface(xx, yy, image, rstride=1, cstride=1, cmap='plasma', linewidth=0)  # , antialiased=False

    ax.grid(False)
    ax.set_xticks([])
    ax.set_yticks([])
    # ax.set_zticks([])
    # ax.set_zlim(0,100)ax.set_zticks([])
    plt.tight_layout()
    plt.show()

`plot_aitoff(ra_deg, dec_deg, savepath=None, color='grey', title=None)`

Plot ra and dec coordinates on an aitoff plot.

Parameters:

Name	Type	Description	Default
`ra_deg`	`list`	ras in degrees.	required
`dec_deg`	`list`	decs in degrees.	required
`savepath`	`str or Path`	savepath for plot.	`None`
`color`	`str`	color for the marker points.	`'grey'`
`title`	`str`	Title for the plot.	`None`

Source code in exod/utils/plotting.py

def plot_aitoff(ra_deg, dec_deg, savepath=None, color='grey', title=None):
    """
    Plot ra and dec coordinates on an aitoff plot.

    Args:
        ra_deg (list): ras in degrees.
        dec_deg (list): decs in degrees.
        savepath (str or Path): savepath for plot.
        color (str): color for the marker points.
        title (str): Title for the plot.
    """
    sky_coords = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg', frame='fk5', equinox='J2000')
    gal_coords = sky_coords.galactic

    l = -gal_coords.l.wrap_at(180 * u.deg).radian
    b = gal_coords.b.radian

    plt.figure()

    plt.subplot(111, projection='aitoff')
    plt.scatter(l, b, marker='.', s=1, color=color)
    plt.tight_layout()
    if title:
        plt.title(title)
    if savepath:
        logger.info(f'Saving Aitoff plot to {savepath}')
        plt.savefig(savepath)

`plot_aitoff_density(ra_deg, dec_deg, savepath=None)`

Plot ra and dec coordinates on an aitoff plot and color them by spatial density.

Parameters:

Name	Type	Description	Default
`ra_deg`	`list`	ras in degrees.	required
`dec_deg`	`list`	decs in degrees.	required
`savepath`	`str or Path`	savepath for plot.	`None`

Source code in exod/utils/plotting.py

def plot_aitoff_density(ra_deg, dec_deg, savepath=None):
    """
    Plot ra and dec coordinates on an aitoff plot and color them by spatial density.

    Args:
        ra_deg (list): ras in degrees.
        dec_deg (list): decs in degrees.
        savepath (str or Path): savepath for plot.
    """
    sky_coords = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg', frame='fk5', equinox='J2000')
    gal_coords = sky_coords.galactic

    l = -gal_coords.l.wrap_at(180 * u.deg).radian
    b = gal_coords.b.radian

    # Calculate the density of points (this is kinda slow)
    xy = np.vstack([l, b])
    z = gaussian_kde(xy)(xy)

    fig = plt.figure(figsize=(5, 2.9))

    plt.subplot(111, projection='aitoff')
    # plt.grid(linewidth=0.5)
    # plt.scatter(l, b, marker='.', s=1, color='cyan', rasterized=True)
    plt.scatter(l, b, marker='.', s=1, c=z, cmap=cmasher.cosmic, rasterized=True)
    plt.tight_layout()

    xticks = ['210°', '240°', '270°', '300°', '330°', '0°', '30°', '60°', '90°', '120°', '150°']
    plt.xticks(fig.get_axes()[0].get_xticks(), xticks)
    # plt.xlabel('Galactic longitude (l)')
    # plt.ylabel('Galactic latitude (b)')
    if savepath:
        logger.info(f'Saving to {savepath}')
        plt.savefig(savepath, dpi=300)

`plot_cube_statistics(data)`

Plot various first order functions for a 3-dimensional data cube.

Source code in exod/utils/plotting.py

def plot_cube_statistics(data):
    """Plot various first order functions for a 3-dimensional data cube."""
    cube = data
    logger.info('Calculating and plotting data cube statistics...')
    image_max = np.nanmax(cube, axis=2)
    image_min = np.nanmin(cube, axis=2)  # The Minimum and median are basically junk
    image_median = np.nanmedian(cube, axis=2)
    image_mean = np.nanmean(cube, axis=2)
    image_std = np.nanstd(cube, axis=2)
    image_sum = np.nansum(cube, axis=2)

    fig, ax = plt.subplots(2, 3, figsize=(15, 10))
    # Plotting images
    cmap = cmap_image()
    im_max = ax[0, 0].imshow(image_max.T, interpolation='none', origin='lower', cmap=cmap)
    im_min = ax[0, 1].imshow(image_min.T, interpolation='none', origin='lower', cmap=cmap)
    im_mean = ax[1, 0].imshow(image_mean.T, interpolation='none', origin='lower', cmap=cmap)
    im_median = ax[1, 1].imshow(image_median.T, interpolation='none', origin='lower', cmap=cmap)
    im_std = ax[1, 2].imshow(image_std.T, interpolation='none', origin='lower', cmap=cmap)
    im_sum = ax[0, 2].imshow(image_sum.T, interpolation='none', origin='lower', cmap=cmap)

    # Adding colorbars
    shrink = 0.55
    cbar_max = fig.colorbar(im_max, ax=ax[0, 0], shrink=shrink)
    cbar_min = fig.colorbar(im_min, ax=ax[0, 1], shrink=shrink)
    cbar_mean = fig.colorbar(im_mean, ax=ax[1, 0], shrink=shrink)
    cbar_median = fig.colorbar(im_median, ax=ax[1, 1], shrink=shrink)
    cbar_std = fig.colorbar(im_std, ax=ax[1, 2], shrink=shrink)
    cbar_sum = fig.colorbar(im_sum, ax=ax[0, 2], shrink=shrink)

    # Setting titles
    ax[0, 0].set_title('max')
    ax[0, 1].set_title('min')
    ax[1, 0].set_title('mean')
    ax[1, 1].set_title('median')
    ax[1, 2].set_title('std')
    ax[0, 2].set_title('sum')
    plt.tight_layout()

    plt.show()

`plot_event_list_ccds(table)`

Plot the images for each CCD for a given eventlist.

Parameters:

Name	Type	Description	Default
`table`	`Table`	Event List Table.	required

Returns:

Name	Type	Description
`fig`	`Figure`	Figure with each subplots containing the image for the specific CCD.

Source code in exod/utils/plotting.py

def plot_event_list_ccds(table):
    """
    Plot the images for each CCD for a given eventlist.

    Parameters:
        table (astropy.table.Table): Event List Table.

    Returns:
        fig (matplotlib.figure.Figure): Figure with each subplots containing the image for the specific CCD.
    """
    ccdnrs = np.unique(table['CCDNR'])
    fig, ax = plt.subplots((len(ccdnrs)+1)//2, 2 , figsize=(20,20))
    ax = ax.flatten()
    for i, ccdnr in enumerate(ccdnrs):
        sub = table[table['CCDNR'] == ccdnr]
        xbins = np.arange(sub['RAWX'].min(), sub['RAWX'].max(), 1)  
        ybins = np.arange(sub['RAWY'].min(), sub['RAWY'].max(), 1) 
        im, _, _ = np.histogram2d(x=sub['RAWX'], y=sub['RAWY'], bins=[xbins,ybins])
        ax[i].imshow(im, cmap='hot', origin='lower', interpolation='none', aspect='equal', norm=LogNorm())
        ax[i].text(5, 5, s=ccdnr, color='white', bbox=dict(facecolor='black', edgecolor='white'))
    plt.show()
    return fig

`plot_frame_masks(instrum, masks, labels, plot=False)`

Helper function to view which frames were masked during processing making of the data cube.

Parameters:

Name	Type	Description	Default
`instrum`	`str`	Name of the instruement.	required
`masks`	`list of np.arrays`	list of masks (true/false arrays) corresponding	required
`labels(list`	`of str`	labels for each mask.	required
`plot`	`bool`	switch for turn on/off plotting	`False`

Source code in exod/utils/plotting.py

def plot_frame_masks(instrum, masks, labels, plot=False):
    """
    Helper function to view which frames were masked during processing making of the data cube.

    Args:
        instrum (str): Name of the instruement.
        masks (list of np.arrays): list of masks (true/false arrays) corresponding
        labels(list of str): labels for each mask.
        plot (bool): switch for turn on/off plotting
    """
    if not plot:
        return None
    cmap = ListedColormap([[1, 0, 0], [0, 1, 0]])
    mask_stack = np.vstack(masks)

    fig, ax = plt.subplots(figsize=(13, 5))
    ax.set_title(f'{instrum} | Frame Masks | Green=True, Red=False')
    ax.imshow(mask_stack, cmap=cmap, aspect='auto', interpolation='none')
    ax.set_yticks(np.arange(len(labels)))
    ax.set_yticklabels(labels)
    ax.set_xlabel('Frame Number')
    plt.tight_layout()
    plt.show()

`plot_image(image_arr, title, log=False)`

Plot a 2D numpy array as an image.

Source code in exod/utils/plotting.py

def plot_image(image_arr, title, log=False):
    """Plot a 2D numpy array as an image."""
    plt.figure(figsize=(7,7))
    plt.title(title)
    norm = None
    if log:
        norm = LogNorm()

    plt.imshow(image_arr, norm=norm, cmap=cmap_image(), interpolation='none')
    plt.colorbar(shrink=0.75)
    plt.tight_layout()
    plt.show()

`set_latex_font()`

Set matplotlib global font to STIX (same as latex).

Source code in exod/utils/plotting.py

def set_latex_font():
    """Set matplotlib global font to STIX (same as latex)."""
    matplotlib.rcParams['mathtext.fontset'] = 'stix'
    matplotlib.rcParams['font.family'] = 'STIXGeneral'

`use_scienceplots()`

Use the scienceplots module for matplotlib plots.

Source code in exod/utils/plotting.py

def use_scienceplots():
    """Use the scienceplots module for matplotlib plots."""
    try:
        import scienceplots
        plt.style.use('science')
    except ModuleNotFoundError:
        pass

`synthetic_data`

`check_synthetic_peak_counts_diff(obsid)`

Evaluate differences in peak counts for synthetic bursts added to XMM-Newton data cubes.

Parameters:

Name	Type	Description	Default
`obsid`	`str`	XMM Observation ID	required

Source code in exod/utils/synthetic_data.py

def check_synthetic_peak_counts_diff(obsid):
    """
    Evaluate differences in peak counts for synthetic bursts added to XMM-Newton data cubes.

    Parameters:
        obsid (str): XMM Observation ID
    """
    # Load observation and event data and WCS
    observation = Observation(obsid)
    observation.get_files()
    observation.get_events_overlapping_subsets()
    event_list = EventList.from_event_lists(observation.events_overlapping_subsets[0])
    event_list.read()

    # Parameters for the synthetic test
    size_arcsec = 20
    amplitudes = [0.5, 2, 5, 10]
    time_bins = [500, 1000, 2000]
    num_trials = 40

    all_res = []
    for time_bin in tqdm(time_bins):
        data_cube = DataCube(event_list=event_list, size_arcsec=size_arcsec, time_interval=time_bin)

        for amplitude in tqdm(amplitudes):
            count_differences = []

            for i in range(num_trials):
                # Random position and time fraction for the synthetic burst
                x_pos = np.random.randint(low=5, high=data_cube.data.shape[0] - 5)
                y_pos = np.random.randint(low=5, high=data_cube.data.shape[1] - 5)
                time_fraction = np.random.random()

                # Add synthetic burst to the data cube
                data_cube_with_burst_only = create_fake_onebin_burst(
                    data_cube=data_cube,
                    x_pos=x_pos,
                    y_pos=y_pos,
                    time_peak_fraction=time_fraction,
                    amplitude=amplitude * time_bin
                )

                data_cube_with_burst_added = data_cube.data + data_cube_with_burst_only
                count_difference = np.sum(data_cube_with_burst_added) - np.sum(data_cube.data)
                count_difference = int(count_difference)

                res = {'amplitude'        : amplitude,
                       'time_bin'         : time_bin,
                       'trial_num'        : i,
                       'count difference' : count_difference}
                all_res.append(res)
                count_differences.append(count_difference)

            plt.figure()
            plt.title(f"Burst Amplitude={amplitude} Time Bin={time_bin}\ntrials={num_trials}")
            plt.hist(count_differences, bins=20, label=r"$C_{data+burst} - C_{data}$")
            plt.axvline(x=amplitude * time_bin, color='red', linestyle='--', label="Amplitude*time_bin (Expected)")
            plt.axvline(x=amplitude * time_bin * 1.5, color='green', linestyle='--', label="1.5x Amplitude*time_bin")
            plt.xlabel(r"Difference in Counts")
            plt.ylabel("Frequency")
            plt.legend()
            plt.show()
    df_res = pd.DataFrame(all_res)
    return df_res

`create_fake_Nbins_burst(data_cube, x_pos, y_pos, time_peak_fractions, amplitude)`

Create Many fake bursts in the data cube

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube() Object.	required
`x_pos`	`int`	x position of the burst.	required
`y_pos`	`int`	y position of the burst.	required
`time_peak_fractions`	`list`	list of time fractions where the bursts should be placed.	required
`amplitude`	`int`	Height of burst.	required

Returns:

Name	Type	Description
`peak_data`	`array`	Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py

def create_fake_Nbins_burst(data_cube, x_pos, y_pos, time_peak_fractions, amplitude):
    """
    Create Many fake bursts in the data cube

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fractions (list): list of time fractions where the bursts should be placed.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """
    peaks_data = np.zeros(data_cube.shape, dtype=int)
    for time_fraction in time_peak_fractions:
        peaks_data += create_fake_onebin_burst(data_cube, x_pos, y_pos, time_fraction, amplitude)
    return peaks_data

`create_fake_burst(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude)`

Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube() Object.	required
`x_pos`	`int`	x position of the burst.	required
`y_pos`	`int`	y position of the burst.	required
`time_peak_fraction`	`float`	0.0 - 1.0.	required
`width_time`	`float`	how long the burst lasts in seconds.	required
`amplitude`	`int`	Height of burst.	required

Returns:

Name	Type	Description
`peak_data`	`array`	Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py

def create_fake_burst(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude):
    """
    Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        width_time (float): how long the burst lasts in seconds.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """

    time       = np.arange(0, data_cube.shape[2])
    peak_data  = np.zeros(data_cube.shape, dtype=int)
    time_peak  = time_peak_fraction * len(time)
    width_bins = width_time / data_cube.time_interval

    #Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) # * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0,int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), peak_data.shape[0]-1)
    y_lo, y_hi = max(0,int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), peak_data.shape[1]-1)

    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = 1 / (2*np.pi*(sigma_2d**2)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * amplitude * data_cube.time_interval * np.exp(-(time-time_peak)**2 / (2*(width_bins**2)))
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            peak_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return peak_data

`create_fake_eclipse(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude, constant_level)`

Create a fake eclipse in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube() Object.	required
`x_pos`	`int`	x position of the burst.	required
`y_pos`	`int`	y position of the burst.	required
`time_peak_fraction`	`float`	0.0 - 1.0.	required
`width_time`	`float`	how long the Eclispe lasts in seconds.	required
`amplitude`	`int`	Amplitude of Eclipse	required

Returns:

Name	Type	Description
`peak_data`	`array`	Same size as data_cube with Eclipse data.

Source code in exod/utils/synthetic_data.py

def create_fake_eclipse(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude, constant_level):
    """
    Create a fake eclipse in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        width_time (float): how long the Eclispe lasts in seconds.
        amplitude (int): Amplitude of Eclipse

    Returns:
        peak_data (np.array): Same size as data_cube with Eclipse data.
    """

    time         = np.arange(0, data_cube.shape[2])
    eclipse_data = np.zeros(data_cube.shape, dtype=int)
    time_peak    = time_peak_fraction * len(time)
    width_bins   = width_time / data_cube.time_interval

    #Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0, int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), eclipse_data.shape[0]-1)
    y_lo, y_hi = max(0, int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), eclipse_data.shape[1]-1)

    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = 1 / (2*np.pi*np.sqrt(sigma_2d)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * data_cube.time_interval * (constant_level - amplitude * np.exp(-(time-time_peak)**2 / (2*(width_bins**2))))
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            eclipse_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return eclipse_data

`create_fake_onebin_burst(data_cube, x_pos, y_pos, time_peak_fraction, amplitude)`

Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube() Object.	required
`x_pos`	`int`	x position of the burst.	required
`y_pos`	`int`	y position of the burst.	required
`time_peak_fraction`	`float`	0.0 - 1.0.	required
`amplitude`	`int`	Height of burst.	required

Returns:

Name	Type	Description
`peak_data`	`array`	Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py

def create_fake_onebin_burst(data_cube, x_pos, y_pos, time_peak_fraction, amplitude):
    """
    Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """
    time        = np.arange(0, data_cube.shape[2])
    peak_data   = np.zeros(data_cube.shape, dtype=int)
    time_index  = int(time_peak_fraction * len(time))

    # Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) # * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0, int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), peak_data.shape[0]-1)
    y_lo, y_hi = max(0, int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), peak_data.shape[1]-1)


    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = (1 / (2*np.pi*(sigma_2d**2))) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * amplitude
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            peak_data[x,y,time_index] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return peak_data

`create_multiple_fake_eclipses(data_cube, tab_x_pos, tab_y_pos, tab_time_peak_fraction, tab_width_time, tab_amplitude, tab_constant_level)`

Create Many fake eclipses in the data cube

Parameters:

Name	Type	Description	Default
`data_cube`	`DataCube`	DataCube() Object.	required
`tab_x_pos`	`list`	list of x positions of the Eclipses.	required
`tab_y_pos`	`list`	list of y positions of the Eclipses.	required
`tab_time_peak_fraction`	`list`	list of time fractions where the Eclipses should be placed.	required
`tab_width_time`	`list`	list of how long the Eclipses lasts in seconds.	required
`tab_amplitude`	`list`	list of amplitudes of Eclipses	required

Returns:

Name	Type	Description
`peak_data`	`array`	Same size as data_cube with Eclipse data.

Source code in exod/utils/synthetic_data.py

def create_multiple_fake_eclipses(data_cube, tab_x_pos, tab_y_pos, tab_time_peak_fraction, tab_width_time, tab_amplitude, tab_constant_level):
    """
    Create Many fake eclipses in the data cube

    Parameters:
        data_cube (DataCube): DataCube() Object.
        tab_x_pos (list): list of x positions of the Eclipses.
        tab_y_pos (list): list of y positions of the Eclipses.
        tab_time_peak_fraction (list): list of time fractions where the Eclipses should be placed.
        tab_width_time (list): list of how long the Eclipses lasts in seconds.
        tab_amplitude (list): list of amplitudes of Eclipses

    Returns:
        peak_data (np.array): Same size as data_cube with Eclipse data.
    """

    time          = np.arange(0, data_cube.shape[2])
    eclipse_data  = np.zeros(data_cube.shape, dtype=int)

    for time_peak_fraction, width_time, x_pos, y_pos, constant_level, amplitude in\
        zip(tab_time_peak_fraction,tab_width_time,tab_x_pos,tab_y_pos,tab_constant_level,tab_amplitude):
        time_peak  = time_peak_fraction * (len(time)-1)
        width_bins = width_time / data_cube.time_interval

        #Poissonian PSF
        # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
        sigma_2d = (6.6 / data_cube.size_arcsec) * 1.4 # * 2.355

        # Pixels to iterate over
        r = 10 # Size of box to calculate values
        x_lo, x_hi = max(0, int(x_pos - r * sigma_2d)), min(int(x_pos + r * sigma_2d), eclipse_data.shape[0] - 1)
        y_lo, y_hi = max(0, int(y_pos - r * sigma_2d)), min(int(y_pos + r * sigma_2d), eclipse_data.shape[1] - 1)
        if np.nansum(data_cube.data[x_pos,y_pos]) > 0:
            for x in range(x_lo, x_hi):
                for y in range(y_lo, y_hi):
                    dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
                    psf        = 1 / (2*np.pi*np.sqrt(sigma_2d)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
                    bin_means  = psf * data_cube.time_interval * (constant_level - amplitude * np.exp(-(time-time_peak)**2 / (2*(width_bins**2))))
                    bin_values = np.random.poisson(lam=bin_means)
                    # print(x, y, dist_sq, psf)
                    # print(bin_means, bin_values)
                    eclipse_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return eclipse_data

`util`

`save_result(key, value, runid, savedir)`

Save a label/value pair to a .csv file.

Source code in exod/utils/util.py

def save_result(key, value, runid, savedir):
    """Save a label/value pair to a .csv file."""
    if isinstance(value, pd.DataFrame):
        # Append the runid to the dataframe if it is not there
        if 'runid' not in value.columns:
            value['runid'] = runid
        save_df(df=value, savepath=savedir / f'{key}.csv')
    elif isinstance(value, dict):
        # append the runid to the dictionary if it is not there
        value['runid'] = runid
        save_info(dictionary=value, savepath=savedir / f'{key}.csv')
    else:
        logger.warning(f'{key} {value} is not a dict or df!!')

`xmm`

`bad_obs`

These observations are those that contained the most alerts using EXOD.

I have gone through them all and manually selected those that are a result of bad data, in a lot of cases, many spurious alerts are generated by the presence of extended sources, such as galaxy clusters or supernova remnants.

`event_list`

`EventList`

Class for handling EventList objects.

Attributes:

Name	Type	Description
`path`	`Path`	Path to the event list file.
`filename`	`str`	Name of the event list file.
`data`	`Table`	Data from the event list file.
`is_read`	`bool`	Flag indicating if the event list file has been read.
`is_merged`	`bool`	Flag indicating if the event list is a merged one.

Methods:

Name	Description
`read`	Reads the event list file and extracts necessary information.
`from_event_lists`	Creates a merged EventList from a list of existing ones.
`filter_by_energy`	Filters the event list data by energy range.
`check_submode`	Checks if the submode of the event list is supported.
`is_supported_submode`	Returns if the submode of the event list is supported.
`remove_MOS_central_ccd`	Removes the central CCD of MOS if not in PrimeFullWindow.
`remove_bad_rows`	Removes bad PN rows as per Struder et al. 2001b.
`remove_borders`	Removes borders from the event list data.
`get_ccd_bins`	Gets the CCD bins for the instrument.
`unload_data`	Unloads the data from the event list to save memory.
`info`	Returns a dictionary containing information about the event list.

Source code in exod/xmm/event_list.py

class EventList:
    """
    Class for handling EventList objects.

    Attributes:
        path (Path): Path to the event list file.
        filename (str): Name of the event list file.
        data (Table): Data from the event list file.
        is_read (bool): Flag indicating if the event list file has been read.
        is_merged (bool): Flag indicating if the event list is a merged one.

    Methods:
        read: Reads the event list file and extracts necessary information.
        from_event_lists: Creates a merged EventList from a list of existing ones.
        filter_by_energy: Filters the event list data by energy range.
        check_submode: Checks if the submode of the event list is supported.
        is_supported_submode: Returns if the submode of the event list is supported.
        remove_MOS_central_ccd: Removes the central CCD of MOS if not in PrimeFullWindow.
        remove_bad_rows: Removes bad PN rows as per Struder et al. 2001b.
        remove_borders: Removes borders from the event list data.
        get_ccd_bins: Gets the CCD bins for the instrument.
        unload_data: Unloads the data from the event list to save memory.
        info: Returns a dictionary containing information about the event list.
    """
    def __init__(self, path):
        self.path      = Path(path)
        self.filename  = self.path.name
        self.data      = None
        self.is_read   = False
        self.is_merged = False

        self.N_event_lists = 1

    def __repr__(self):
        return f'EventList({self.path})'

    def read(self, remove_bad_rows=True, remove_borders=True, remove_MOS_central_ccd=True, remove_hot_pixels=True):
        if self.is_read:
            return None
        self.hdul = fits.open(self.path)
        self.header = self.hdul[1].header
        self.data = Table(self.hdul[1].data)

        self.obsid      = self.header['OBS_ID']
        self.instrument = self.header['INSTRUME']
        self.submode    = self.header['SUBMODE']
        self.date       = self.header['DATE-OBS']
        self.revolution = self.header['REVOLUT']
        self.object     = self.header['OBJECT']
        self.exposure   = self.header['TELAPSE']
        self.pnt_angle  = self.header['PA_PNT']
        # self.N_events   = self.header['NAXIS2']
        # self.mean_rate  = self.N_events / self.exposure
        self.time_min   = np.min(self.data['TIME'])
        self.time_max   = np.max(self.data['TIME'])

        self.data['INSTRUMENT'] = self.instrument

        # self.check_submode()
        if remove_bad_rows:
            self.remove_bad_rows()
        if remove_borders:
            self.remove_borders()
        if remove_MOS_central_ccd:
            self.remove_MOS_central_ccd()
        if remove_hot_pixels:
            self.remove_hot_pixels()
        self.is_read = True

    @classmethod
    def from_event_lists(cls, event_lists):
        """
        Create a merged event list from a list of existing ones.
        Args:
            event_lists (list): list of EventList() objects.
        Returns:
            event_list (EventList): New merged eventlist object.
        """
        # EventList Object to return
        event_list = cls.__new__(cls) # Create the object without calling .__init__()

        for e in event_lists:
            e.read()

        # Remove unsupported EventLists
        event_lists = [e for e in event_lists if e.is_supported_submode()]
        if not event_lists:
            raise NotImplementedError(f'None of the supplied event lists are in a supported submode!')

        # Store the starts and ends of all event lists
        starts = [e.time_min for e in event_lists]
        stops = [e.time_max for e in event_lists]
        latest_start = max(starts)
        earliest_stop = min(stops)

        # Combine the data into a single table
        data_stacked = vstack([e.data for e in event_lists])

        # Crop when instruments are not all online
        # data_stacked = data_stacked[(data_stacked['TIME']>latest_start)&(data_stacked['TIME']<earliest_stop)]

        # Unload the data from the constituent event lists to save memory
        # for e in event_lists:
        #     e.unload_data()

        # Write Attributes
        event_list.path     = 'merged'
        event_list.filename = str([e.filename for e in event_lists])
        event_list.is_read  = True
        event_list.is_merged = True

        event_list.hdul   = None
        event_list.header = None
        event_list.data   = data_stacked

        event_list.event_lists   = event_lists
        event_list.N_event_lists = len(event_lists)
        event_list.obsid         = event_lists[0].obsid
        event_list.instrument    = [e.instrument for e in event_lists]
        event_list.submode       = [e.submode for e in event_lists]
        event_list.date          = event_lists[0].date
        event_list.revolution    = event_lists[0].revolution
        event_list.object        = event_lists[0].object
        event_list.time_min      = latest_start
        event_list.time_max      = earliest_stop
        event_list.exposure      = event_list.time_max - event_list.time_min
        event_list.pnt_angle     = event_lists[0].pnt_angle
        # event_list.N_events      = len(data_stacked)
        # event_list.mean_rate     = event_list.N_events / event_list.exposure
        return event_list

    def filter_by_energy(self, min_energy, max_energy):
        logger.info(f'Filtering Events list by energy min_energy={min_energy} max_energy={max_energy}')
        self.data = self.data[(min_energy * 1000 < self.data['PI']) & (self.data['PI'] < max_energy * 1000)]

    def check_submode(self):
        if not ALL_SUBMODES[self.submode]:
            raise NotImplementedError(f"The submode {self.submode} is not supported")

    def is_supported_submode(self):
        return ALL_SUBMODES[self.submode]

    def remove_MOS_central_ccd(self):
        if (self.instrument in ('EMOS1','EMOS2')) and (self.submode!='PrimeFullWindow'):
            logger.info('Removing central CCD of MOS because NOT in PrimeFullWindow')
            self.data = self.data[~(self.data['CCDNR'] == 1)]
            if len(self.data) == 0:
                raise ValueError(f'No data in EventList after removing central MOS events!')

    def remove_bad_rows(self):
        if self.instrument == 'EPN':
            logger.info('Removing bad PN rows Struder et al. 2001b')
            self.data = self.data[~((self.data['CCDNR'] == 4) & (self.data['RAWX'] == 12)) &
                                  ~((self.data['CCDNR'] == 5) & (self.data['RAWX'] == 11)) &
                                  ~((self.data['CCDNR'] == 10) & (self.data['RAWX'] == 28))]

    def remove_hot_pixels(self):
        """Remove hot pixels from the event list data."""
        hot_pixels = {'EPN': data_util / 'hotpix_PN.csv',
                      'EMOS1': data_util / 'hotpix_M1.csv',
                      'EMOS2': data_util / 'hotpix_M2.csv'}
        pre = self.N_events
        tab_hotpix = Table.read(hot_pixels[self.instrument], format='csv')
        tab_hotpix = tab_hotpix[(self.revolution > tab_hotpix['REV1']) & (self.revolution < tab_hotpix['REV2'])]
        if len(tab_hotpix) == 0:
            logger.info(f'No Hot Pixels found for {self.instrument}')
            return None
        self.data = setdiff(table1=self.data, table2=tab_hotpix, keys=['CCDNR', 'RAWX', 'RAWY'])
        logger.info(f'Removed Hot Pixels {self.instrument:<5} | Pre: {pre:,} Post: {self.N_events:,} (-{pre - self.N_events:,})')


    def remove_borders(self):
        """
        For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot
        that appears between two of the CCDs.

        PrimeFullWindow             PrimeLargeWindow   PrimeSmallWindow
        & PrimeFullWindowExtended
        RAWY MAX: 200               RAWY MAX: 200      RAWY MAX: 200
        RAWY MIN: 13                RAWY MIN: 102      RAWY MIN: 137
        RAWX MAX: 64                RAWX MAX: 64       RAWX MAX: 64
        RAWX MIN: 1                 RAWX MIN: 1        RAWX MIN: 1
        """
        margin = 3
        if self.instrument == 'EPN' and (self.submode == 'PrimeFullWindow' or self.submode == 'PrimeFullWindowExtended'):
            logger.info(f'Removing Borders: {self.instrument} {self.submode}')
            self.data = self.data[self.data['RAWY'] > 20+margin]
            self.data = self.data[self.data['RAWY'] < 200-margin]
            self.data = self.data[self.data['RAWX'] < 64-margin]
            self.data = self.data[self.data['RAWX'] > 1+margin]

        if self.instrument == 'EPN' and self.submode == 'PrimeLargeWindow':
            logger.info(f'Removing Borders: {self.instrument} {self.submode}')
            self.data = self.data[self.data['RAWY'] < 200-margin]
            self.data = self.data[self.data['RAWX'] < 64-margin]
            self.data = self.data[self.data['RAWX'] > 1+margin]

    def get_ccd_bins(self):
        """Get the CCD bins for the instrument. This is used in calculating the bad CCD bins."""
        ccd_bins = list(set(self.data['CCDNR']))
        ccd_bins.append(ccd_bins[-1] + 1)  # Just to get a right edge for the final bin
        return ccd_bins

    def get_events_within_bounds(self, X_lo, X_hi, Y_lo, Y_hi, TIME_lo, TIME_hi):
        evt_subset = self.data[(self.data['X'] > X_lo)       & (self.data['X'] < X_hi) &
                               (self.data['Y'] > Y_lo)       & (self.data['Y'] < Y_hi) &
                               (self.data['TIME'] > TIME_lo) & (self.data['TIME'] < TIME_hi)]
        return evt_subset

    def get_high_energy_lc(self, time_interval):
        min_energy_he = 10.0     # minimum extraction energy for High Energy Background events
        max_energy_he = 12.0     # maximum extraction energy for High Energy Background events
        time_interval_gti = min(time_interval, 100) #100  # Window Size to use for GTI extraction
        data = self.data
        time_min = self.time_min
        time_max = self.time_max
        logger.info(f'min_energy_he = {min_energy_he} max_energy_he = {max_energy_he} time_interval_gti = {time_interval_gti}')
        data_he = np.array(data['TIME'][(data['PI'] > min_energy_he * 1000) & (data['PI'] < max_energy_he * 1000)])

        t_bin_he = np.arange(time_min, time_max, time_interval_gti)
        lc_he = np.histogram(data_he, bins=t_bin_he)[0] / time_interval_gti  # Divide by the bin size to get in ct/s
        return t_bin_he, lc_he

    def unload_data(self):
        del(self.data)
        self.is_read = False

    @property
    def N_events(self):
        return len(self.data)

    @property
    def mean_rate(self):
        return self.N_events / self.exposure

    @property
    def info(self):
        info = {
            'filename'      : self.filename,
            'obsid'         : self.obsid,
            'N_event_lists' : self.N_event_lists,
            'instrument'    : self.instrument,
            'submode'       : self.submode,
            'date'          : self.date,
            'revolution'    : self.revolution,
            'object'        : self.object,
            'exposure'      : self.exposure,
            'pnt_angle'     : self.pnt_angle,
            'N_events'      : self.N_events,
            'mean_rate'     : self.mean_rate
            }
        for k, v in info.items():
            logger.info(f'{k:>14} : {v}')
        return info

`from_event_lists(event_lists)` `classmethod`

Create a merged event list from a list of existing ones. Args: event_lists (list): list of EventList() objects. Returns: event_list (EventList): New merged eventlist object.

Source code in exod/xmm/event_list.py

@classmethod
def from_event_lists(cls, event_lists):
    """
    Create a merged event list from a list of existing ones.
    Args:
        event_lists (list): list of EventList() objects.
    Returns:
        event_list (EventList): New merged eventlist object.
    """
    # EventList Object to return
    event_list = cls.__new__(cls) # Create the object without calling .__init__()

    for e in event_lists:
        e.read()

    # Remove unsupported EventLists
    event_lists = [e for e in event_lists if e.is_supported_submode()]
    if not event_lists:
        raise NotImplementedError(f'None of the supplied event lists are in a supported submode!')

    # Store the starts and ends of all event lists
    starts = [e.time_min for e in event_lists]
    stops = [e.time_max for e in event_lists]
    latest_start = max(starts)
    earliest_stop = min(stops)

    # Combine the data into a single table
    data_stacked = vstack([e.data for e in event_lists])

    # Crop when instruments are not all online
    # data_stacked = data_stacked[(data_stacked['TIME']>latest_start)&(data_stacked['TIME']<earliest_stop)]

    # Unload the data from the constituent event lists to save memory
    # for e in event_lists:
    #     e.unload_data()

    # Write Attributes
    event_list.path     = 'merged'
    event_list.filename = str([e.filename for e in event_lists])
    event_list.is_read  = True
    event_list.is_merged = True

    event_list.hdul   = None
    event_list.header = None
    event_list.data   = data_stacked

    event_list.event_lists   = event_lists
    event_list.N_event_lists = len(event_lists)
    event_list.obsid         = event_lists[0].obsid
    event_list.instrument    = [e.instrument for e in event_lists]
    event_list.submode       = [e.submode for e in event_lists]
    event_list.date          = event_lists[0].date
    event_list.revolution    = event_lists[0].revolution
    event_list.object        = event_lists[0].object
    event_list.time_min      = latest_start
    event_list.time_max      = earliest_stop
    event_list.exposure      = event_list.time_max - event_list.time_min
    event_list.pnt_angle     = event_lists[0].pnt_angle
    # event_list.N_events      = len(data_stacked)
    # event_list.mean_rate     = event_list.N_events / event_list.exposure
    return event_list

`get_ccd_bins()`

Get the CCD bins for the instrument. This is used in calculating the bad CCD bins.

Source code in exod/xmm/event_list.py

def get_ccd_bins(self):
    """Get the CCD bins for the instrument. This is used in calculating the bad CCD bins."""
    ccd_bins = list(set(self.data['CCDNR']))
    ccd_bins.append(ccd_bins[-1] + 1)  # Just to get a right edge for the final bin
    return ccd_bins

`remove_borders()`

For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot that appears between two of the CCDs.

PrimeFullWindow PrimeLargeWindow PrimeSmallWindow & PrimeFullWindowExtended RAWY MAX: 200 RAWY MAX: 200 RAWY MAX: 200 RAWY MIN: 13 RAWY MIN: 102 RAWY MIN: 137 RAWX MAX: 64 RAWX MAX: 64 RAWX MAX: 64 RAWX MIN: 1 RAWX MIN: 1 RAWX MIN: 1

Source code in exod/xmm/event_list.py

def remove_borders(self):
    """
    For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot
    that appears between two of the CCDs.

    PrimeFullWindow             PrimeLargeWindow   PrimeSmallWindow
    & PrimeFullWindowExtended
    RAWY MAX: 200               RAWY MAX: 200      RAWY MAX: 200
    RAWY MIN: 13                RAWY MIN: 102      RAWY MIN: 137
    RAWX MAX: 64                RAWX MAX: 64       RAWX MAX: 64
    RAWX MIN: 1                 RAWX MIN: 1        RAWX MIN: 1
    """
    margin = 3
    if self.instrument == 'EPN' and (self.submode == 'PrimeFullWindow' or self.submode == 'PrimeFullWindowExtended'):
        logger.info(f'Removing Borders: {self.instrument} {self.submode}')
        self.data = self.data[self.data['RAWY'] > 20+margin]
        self.data = self.data[self.data['RAWY'] < 200-margin]
        self.data = self.data[self.data['RAWX'] < 64-margin]
        self.data = self.data[self.data['RAWX'] > 1+margin]

    if self.instrument == 'EPN' and self.submode == 'PrimeLargeWindow':
        logger.info(f'Removing Borders: {self.instrument} {self.submode}')
        self.data = self.data[self.data['RAWY'] < 200-margin]
        self.data = self.data[self.data['RAWX'] < 64-margin]
        self.data = self.data[self.data['RAWX'] > 1+margin]

`remove_hot_pixels()`

Remove hot pixels from the event list data.

Source code in exod/xmm/event_list.py

def remove_hot_pixels(self):
    """Remove hot pixels from the event list data."""
    hot_pixels = {'EPN': data_util / 'hotpix_PN.csv',
                  'EMOS1': data_util / 'hotpix_M1.csv',
                  'EMOS2': data_util / 'hotpix_M2.csv'}
    pre = self.N_events
    tab_hotpix = Table.read(hot_pixels[self.instrument], format='csv')
    tab_hotpix = tab_hotpix[(self.revolution > tab_hotpix['REV1']) & (self.revolution < tab_hotpix['REV2'])]
    if len(tab_hotpix) == 0:
        logger.info(f'No Hot Pixels found for {self.instrument}')
        return None
    self.data = setdiff(table1=self.data, table2=tab_hotpix, keys=['CCDNR', 'RAWX', 'RAWY'])
    logger.info(f'Removed Hot Pixels {self.instrument:<5} | Pre: {pre:,} Post: {self.N_events:,} (-{pre - self.N_events:,})')

`observation`

`Observation`

Class for handling XMM-Newton observations.

Attributes:

Name	Type	Description
`obsid`	`str`	The observation ID.
`path_raw`	`Path`	Path to the raw data directory (unprocessed).
`path_processed`	`Path`	Path to the processed data directory (filtered).
`path_results`	`Path`	Path to the results directory.
`events_raw`	`list`	List of raw event lists.
`events_processed`	`list`	List of processed event lists.
`events_processed_pn`	`list`	List of processed PN event lists.
`events_processed_mos1`	`list`	List of processed MOS1 event lists.
`events_processed_mos2`	`list`	List of processed MOS2 event lists.
`events_overlapping_subsets`	`list`	List of overlapping event list subsets.
`images`	`list`	List of images.
`source_list`	`list`	List of source lists (OMSMLI files).

Source code in exod/xmm/observation.py

class Observation:
    """
    Class for handling XMM-Newton observations.

    Attributes:
        obsid (str): The observation ID.
        path_raw (Path): Path to the raw data directory (unprocessed).
        path_processed (Path): Path to the processed data directory (filtered).
        path_results (Path): Path to the results directory.
        events_raw (list): List of raw event lists.
        events_processed (list): List of processed event lists.
        events_processed_pn (list): List of processed PN event lists.
        events_processed_mos1 (list): List of processed MOS1 event lists.
        events_processed_mos2 (list): List of processed MOS2 event lists.
        events_overlapping_subsets (list): List of overlapping event list subsets.
        images (list): List of images.
        source_list (list): List of source lists (OMSMLI files).
    """
    def __init__(self, obsid):
        self.obsid = obsid
        self.path_raw = data_raw / obsid
        self.path_processed = data_processed / obsid
        self.path_results = data_results / obsid
        self.make_dirs()

        self.events_raw = []
        self.events_processed = []
        self.events_processed_pn = []
        self.events_processed_mos1 = []
        self.events_processed_mos2 = []
        self.events_overlapping_subsets = []

        self.images = []

        self.source_list = []

    def __repr__(self):
        return f"Observation('{self.obsid}')"

    def make_dirs(self):
        os.makedirs(self.path_raw, exist_ok=True)
        os.makedirs(self.path_processed, exist_ok=True)
        os.makedirs(self.path_results, exist_ok=True)

    def download_events(self):
        download_observation_events(self.obsid)

    def filter_events(self, min_energy=0.2, max_energy=12.0, clobber=False):
        filter_obsid_events(observation=self, min_energy=min_energy, max_energy=max_energy, clobber=clobber)

    def create_images(self, ximagebinsize=80, yimagebinsize=80, clobber=False):
        create_obsid_images(observation=self, ximagebinsize=ximagebinsize, yimagebinsize=yimagebinsize, clobber=clobber)

    def get_event_lists_raw(self):
        evt_raw = list(self.path_raw.glob('*EVLI*FTZ'))
        self.events_raw = [EventList(e) for e in evt_raw]

    def get_event_lists_processed(self):
        evt_processed = list(self.path_processed.glob('*FILT.fits'))
        self.events_processed      = [EventList(e) for e in evt_processed]
        self.events_processed_pn   = [e for e in self.events_processed if 'PI' in e.filename]
        self.events_processed_mos1 = [e for e in self.events_processed if 'M1' in e.filename]
        self.events_processed_mos2 = [e for e in self.events_processed if 'M2' in e.filename]

    def get_source_list(self):
        try:
            self.source_list = list(self.path_raw.glob('*EP*OBSMLI*FTZ'))[0]
        except IndexError:
            raise NotImplementedError(f'No EPIC OBSMLI file found!')

    def get_images(self):
        img_processed = list(self.path_processed.glob('*IMG.fits'))
        self.images = [Image(i) for i in img_processed]

    def get_files(self):
        logger.info(f'Getting {self} Files...')
        self.get_event_lists_raw()
        self.get_event_lists_processed()
        self.get_images()
        self.get_source_list()

    def get_events_overlapping_subsets(self):
        """
        Get the overlapping eventlists for a given observation.

        Returns the subsets as a list of lists
        [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]
        """
        self.get_event_lists_processed()

        if len(self.events_processed) == 0:
            raise KeyError(f'No eventlists found for {self.obsid}')

        subsets = get_overlapping_eventlist_subsets(self.events_processed)
        self.events_overlapping_subsets = subsets
        return self.events_overlapping_subsets


    def get_number_of_overlapping_subsets(self):
        return len(self.events_overlapping_subsets)


    @property
    def info(self):
        info = {'obsid' : self.obsid}

        for i, evt in enumerate(self.events_raw):
            info[f'evt_raw_{i}'] = evt.filename

        for i, evt in enumerate(self.events_processed):
            info[f'evt_filt_{i}'] = evt.filename

        for i, img in enumerate(self.images):
            info[f'img_{i}'] = img.filename

        info[f'source_list'] = self.source_list.name

        for k, v in info.items():
            logger.info(f'{k:>10} : {v}')
        return info

`get_events_overlapping_subsets()`

Get the overlapping eventlists for a given observation.

Returns the subsets as a list of lists [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]

Source code in exod/xmm/observation.py

def get_events_overlapping_subsets(self):
    """
    Get the overlapping eventlists for a given observation.

    Returns the subsets as a list of lists
    [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]
    """
    self.get_event_lists_processed()

    if len(self.events_processed) == 0:
        raise KeyError(f'No eventlists found for {self.obsid}')

    subsets = get_overlapping_eventlist_subsets(self.events_processed)
    self.events_overlapping_subsets = subsets
    return self.events_overlapping_subsets

`get_overlapping_eventlist_subsets(event_lists)`

Return the overlapping eventlists for a given observation. Parameters: event_lists (list): list of EventList() objects that have been .read()

In most cases this will just return a list of 1 list like [[M1.fits, M2.fits, PN.fits'] with the event files. However, for some observations it will return multiple entires if there have been multiple seperate observations. This function will return all the overlapping subsets. e.g. for 2 subsets this will return: [[],[]]

Source code in exod/xmm/observation.py

def get_overlapping_eventlist_subsets(event_lists):
    """
    Return the overlapping eventlists for a given observation.
    Parameters:
        event_lists (list): list of EventList() objects that have been .read()

    In most cases this will just return a list of 1 list like
    [[M1.fits, M2.fits, PN.fits'] with the event files. However,
    for some observations it will return multiple entires if there
    have been multiple seperate observations. This function will return
    all the overlapping subsets. e.g. for 2 subsets this will return: [[],[]]
    """
    def intervals_overlap(I1, I2):
        return I1[0] <= I2[1] and I1[1] >= I2[0]

    for e in event_lists:
        e.read()

    disjoint_set = DisjointSet(event_lists)
    file_intervals = {e: [e.time_min, e.time_max] for e in event_lists}

    for d1, d2 in combinations(file_intervals.items(), r=2):
        f1, I1 = d1[0], d1[1]
        f2, I2 = d2[0], d2[1]

        if intervals_overlap(I1, I2):
            disjoint_set.merge(f1, f2)
            logger.debug(f'f1: {f1.filename} : {I1} OVERLAP!')
            logger.debug(f'f2: {f2.filename} : {I2} OVERLAP!')
        else:
            logger.debug(f'f1: {f1.filename} : {I1} NO OVERLAP!')
            logger.debug(f'f2: {f2.filename} : {I2} NO OVERLAP!')
        logger.debug('=============')

    subsets = disjoint_set.subsets()
    logger.info(f'Found {len(subsets)} overlapping subsets.')

    subsets_to_return = []
    for s in subsets:
        if len(s) > 3:
            logger.info(f'Overlapping subset has more than 3 event files ({len(s)})')
            logger.info(f'Just going to take the longest three')
            m1_evt = [e for e in s if 'M1' in e.filename]
            m2_evt = [e for e in s if 'M2' in e.filename]
            pn_evt = [e for e in s if 'PN' in e.filename]

            m1_longest = max(m1_evt, key=lambda x: x.exposure)
            m2_longest = max(m2_evt, key=lambda x: x.exposure)
            pn_longest = max(pn_evt, key=lambda x: x.exposure)
            subset = [m1_longest, m2_longest, pn_longest]
            subsets_to_return.append(subset)
        else:
            subsets_to_return.append(list(s))

    # Sort the inner lists and outer lists
    subsets_to_return = [sorted(subset, key=lambda x: x.filename) for subset in subsets_to_return]
    subsets_to_return = sorted(subsets_to_return, key=lambda subset: subset[0].filename)
    return subsets_to_return

EPIC XMM-Newton Observation Data (EXOD) Documentation

Installation

Running

Full Module Listing

post_processing

cluster_regions

Row 1 therefore gets associated with cluster num 0

ClusterRegions

calc_cluster_xyz_means()

cluster_regions()

label_region_num_to_cluster_num()

map_region_num_to_clusters()

number_clusters()

renumber_clusters()

crossmatch

crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col)

crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg')

crossmatch_dr14_slim(df_region, clobber=True)

crossmatch_fits_table(fits_path, df_region, ra_col, dec_col)

crossmatch_glade(df_region, clobber=True)

crossmatch_tranin_dr12(df_region)

crossmatch_runs

calc_subset_cmatch_fraction(dfs_subset_crossmatch)

calc_subset_n_regions(dfs_subset_crossmatch)

crossmatch_simulation_subsets(dfs_subsets)

get_run_subset_keys()

plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim)

plot_mean_count_histogram(dfs_subsets)

split_subsets(df_regions)

ds9

estimate_variability_properties

calc_sigma_cube(cube_n, cube_mu)

clean_up_eclipses(data_cube, eclipses)

clean_up_peaks(data_cube, peaks)

convert_count_to_flux(count, position, data_cube)

count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)

count_peaks(peaks_or_eclipses)

eclipse_count_estimate(fraction, N, mu)

peak_count_estimate(fraction, N, mu)

plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes)

extract_lc_features

calc_df_lc_feat_filter_flags(df_lc_feat)

calc_features(df_lc, key)

count_significant_bins(df_lc)

extract_lc_features(clobber=True)

largest_peak_info(df_lc)

print_df_lc_feat_filter_flag_stats(df_lc_feat)

filter

FilterBase

apply(df) abstractmethod

get_parameters() abstractmethod

generate_combinations_with_one_or_none(filters)

Output: [(), ('filter1',), ('filter2',)]

generate_valid_combinations(*filter_lists)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]

filter_subsets

FilterBase

apply(df) abstractmethod

get_parameters() abstractmethod

generate_combinations_with_one_or_none(filters)

Output: [(), ('filter1',), ('filter2',)]

generate_valid_combinations(*filter_lists)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]

hot_regions

calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions)

get_pointing_angle(obsid, tab_xmm_obslist)

plot_detector_coords_soft_and_hard(df_regions_rotated)

plot_hot_regions(df_regions_rotated, hot_regions)

main

check_results_shape()

make_exod_cat

read_and_print_table_and_header(savepath)

results_manager

create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0)

util

calc_detid_column(df_regions)

get_lc(key, df_lc_idx)

pre_processing

download_observations

download_observation_events(obsid, clobber=False)

`post_processing`

`cluster_regions`

`ClusterRegions`

`calc_cluster_xyz_means()`

`cluster_regions()`

`label_region_num_to_cluster_num()`

`map_region_num_to_clusters()`

`number_clusters()`

`renumber_clusters()`

`crossmatch`

`crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col)`

`crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg')`

`crossmatch_dr14_slim(df_region, clobber=True)`

`crossmatch_fits_table(fits_path, df_region, ra_col, dec_col)`

`crossmatch_glade(df_region, clobber=True)`

`crossmatch_tranin_dr12(df_region)`

`crossmatch_runs`

`calc_subset_cmatch_fraction(dfs_subset_crossmatch)`

`calc_subset_n_regions(dfs_subset_crossmatch)`

`crossmatch_simulation_subsets(dfs_subsets)`

`get_run_subset_keys()`

`plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim)`

`plot_mean_count_histogram(dfs_subsets)`

`split_subsets(df_regions)`

`ds9`

`estimate_variability_properties`

`calc_sigma_cube(cube_n, cube_mu)`

`clean_up_eclipses(data_cube, eclipses)`

`clean_up_peaks(data_cube, peaks)`

`convert_count_to_flux(count, position, data_cube)`

`count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)`

`count_peaks(peaks_or_eclipses)`

`eclipse_count_estimate(fraction, N, mu)`

`peak_count_estimate(fraction, N, mu)`

`plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes)`

`extract_lc_features`

`calc_df_lc_feat_filter_flags(df_lc_feat)`

`calc_features(df_lc, key)`

`count_significant_bins(df_lc)`

`extract_lc_features(clobber=True)`

`largest_peak_info(df_lc)`

`print_df_lc_feat_filter_flag_stats(df_lc_feat)`

`filter`

`FilterBase`

`apply(df)` `abstractmethod`

`get_parameters()` `abstractmethod`

`generate_combinations_with_one_or_none(filters)`

`generate_valid_combinations(*filter_lists)`

`filter_subsets`

`FilterBase`

`apply(df)` `abstractmethod`

`get_parameters()` `abstractmethod`

`generate_combinations_with_one_or_none(filters)`

`generate_valid_combinations(*filter_lists)`

`hot_regions`

`calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions)`

`get_pointing_angle(obsid, tab_xmm_obslist)`

`plot_detector_coords_soft_and_hard(df_regions_rotated)`

`plot_hot_regions(df_regions_rotated, hot_regions)`

`main`

`check_results_shape()`

`make_exod_cat`

`read_and_print_table_and_header(savepath)`

`results_manager`

`create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0)`

`util`

`calc_detid_column(df_regions)`

`get_lc(key, df_lc_idx)`

`pre_processing`

`download_observations`

`download_observation_events(obsid, clobber=False)`

`event_filtering`

`espfilt(eventfile)`

`processing`

`background_inference`

`calc_background_template(image_sub, image_mask_source)`

`calc_cube_mu(data_cube, wcs)`

`calc_source_template(image_sub, image_sub_background_template, image_mask_source, data_cube, count_sub_outside_sources, subset_bin_idx)`

`mask_known_sources(data_cube, wcs)`

`bayesian_computations`