Skip to content

EPIC XMM-Newton Observation Data (EXOD) Documentation

This is the documentation for the EPIC XMM-Newton Observation Data (EXOD) project.

EXOD is designed for the detection of rapid transients in archival XMM-Newton data.

Contact: Norman Khan & Erwan Quintin (2024)

Norman.Khan@irap.omp.eu

Erwan.Quintin@irap.omp.eu

Installation

git clone https://github.com/nx1/EXOD2
cd EXOD
pip install -e .
Then set the 'EXOD' enviroment in your .bashrc to point to this repo i.e.
export EXOD='/home/{username}/EXOD2'

Running

The main script for the pipeline is found in EXOD2/exod/main.py This will run over all the obsids specified in EXOD2/data/observations.txt and perform the transient search, the output is then saved in /data/results/obsid/

Full Module Listing

post_processing

cluster_regions

Cluster regions in a DataFrame of regions by clustering sources within a certain radius using a K-D tree.

An example of how this works is provided below:

Input Regions (cartesian coords) xyz = [[0,0,0], # 0 [0,0,1], # 1 [0,1,0], # 2 [5,0,0], # 3 [4,0,0], # 4 [0,2,0], # 5 [0,3,0], # 6

cluster[0] = [0,1,3] --> cluster_num = 0 | mean position = 0,0,0 + 0,0,1 + 0,1,0 = 0.00, 0.33, 0.33 cluster[1] = [0,1,3] --> cluster_num = 0 | mean position = 0,0,0 + 0,0,1 + 0,1,0 = 0.00, 0.33, 0.33 cluster[2] = [1,3,6] --> cluster_num = 1 | mean position = 0,0,1 + 5,0,0 + 0,3,0 = 1.66, 1.00, 0.33 cluster[3] = [1,3,6] --> cluster_num = 1 | mean position = 0,0,1 + 5,0,0 + 0,3,0 = 1.66, 1.00, 0.33

Row 1 is both in cluster_num=1 and cluster_num=2, which one should we label it to? The closest mean position of course!

xyz[1] = [0,0,1]

distance.euclidean([0,0,1], [0.00, 0.33, 0.33]) = 0.75 (lowest!) distance.euclidean([0,0,1], 1.66, 1.00, 0.33) = 2.05

Row 1 therefore gets associated with cluster num 0

unique_regions = OrderedDict([((0,) : 0), ((1, 7) : 1), ((2, 8) : 2), ((3, 9) : 3), ((4, 10, 13, 16, 18) : 4), ((5, 11, 14, 17, 19, 27) : 5), ((6,) : 6), ((12,) : 7), ((15,) : 8), ((20, 23) : 9),

region_to_clusters = defaultdict(list, {0: [[0]], 1: [[1, 7], [1, 7]], 7: [[1, 7], [1, 7]], 2: [[2, 8], [2, 8]], 8: [[2, 8], [2, 8]], 3: [[3, 9], [3, 9]], 9: [[3, 9], [3, 9]], 4: [[4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18], [4, 10, 13, 16, 18]], ... }

cluster_xyz_means = {(0,) : array([ 0.71092614, 0.12281395, -0.69245994]), (1, 7) : array([ 0.71165566, 0.12214711, -0.69182823]), (2, 8) : array([ 0.71478684, 0.1221836 , -0.68858618]), (3, 9) : array([ 0.71486105, 0.11933079, -0.68900932]), (4, 10, 13, 16, 18) : array([ 0.71399846, 0.11878942, -0.68999657]),

cluster_labels = [0, 1, 2, 3, 4, 5, 6, 1, 2, ...]

ClusterRegions

Cluster regions by clustering sources within a certain radius using a K-D tree.

Some definitions

Region Number: The index of the region in the DataFrame. Cluster: A collection of regions, e.g (0,) or (1,7) Cluster Number: The index of the cluster in the list of clusters.

Attributes:

Name Type Description
df_regions DataFrame

DataFrame containing ra_deg and dec_deg columns.

clustering_radius Quantity

Clustering radius in arcseconds.

xyz ndarray

Cartesian coordinates of the regions.

clusters list

List of lists containing the regions in each cluster.

cluster_xyz_means dict

Maps the cluster number to the mean cartesian position of the cluster on the unit sphere.

region_to_clusters dict

Maps the region number to the containing the regions associated with each cluster.

cluster_num_to_cluster dict

Maps the cluster number to the cluster.

cluster_to_cluster_num dict

Maps the cluster to the cluster number. (reverse of cluster_num_to_cluster)

region_num_to_cluster_num dict

Maps the region number to the cluster number (unique region number).

df_regions_unique DataFrame

DataFrame containing the unique regions.

n_clusters int

Number of unique regions.

save bool

If True, save the clustered regions to a file.

Source code in exod/post_processing/cluster_regions.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
class ClusterRegions:
    """
    Cluster regions by clustering sources within a certain radius using a K-D tree.

    Some definitions:
        Region Number: The index of the region in the DataFrame.
        Cluster: A collection of regions, e.g (0,) or (1,7)
        Cluster Number: The index of the cluster in the list of clusters.

    Attributes:
        df_regions (pd.DataFrame): DataFrame containing ra_deg and dec_deg columns.
        clustering_radius (astropy.units.Quantity): Clustering radius in arcseconds.
        xyz (np.ndarray): Cartesian coordinates of the regions.
        clusters (list): List of lists containing the regions in each cluster.
        cluster_xyz_means (dict): Maps the cluster number to the mean cartesian position of the cluster on the unit sphere.
        region_to_clusters (dict): Maps the region number to the containing the regions associated with each cluster.
        cluster_num_to_cluster (dict): Maps the cluster number to the cluster.
        cluster_to_cluster_num (dict): Maps the cluster to the cluster number. (reverse of cluster_num_to_cluster)
        region_num_to_cluster_num (dict): Maps the region number to the cluster number (unique region number).
        df_regions_unique (pd.DataFrame): DataFrame containing the unique regions.
        n_clusters (int): Number of unique regions.
        save (bool): If True, save the clustered regions to a file.
    """
    def __init__(self, df_regions, clustering_radius=20 * u.arcsec, save=False):
        self.df_regions = df_regions
        self.clustering_radius = clustering_radius

        self.xyz = []                           # Cartesian coordinates of the regions
        self.clusters = []                      # List of lists containing the regions in each cluster
        self.cluster_xyz_means = []             # Maps the cluster number to the mean cartesian position of the cluster on the unit sphere
        self.region_to_clusters = {}            # Maps the region number to the containing the regions associated with each cluster
        self.cluster_num_to_cluster = {}        # Maps the cluster number to the cluster
        self.cluster_to_cluster_num = {}        # Maps the cluster to the cluster number (reverse of cluster_num_to_cluster)
        self.region_num_to_cluster_num = {}     # Maps the region number to the cluster number (unique region number)
        self.df_regions_unique = pd.DataFrame() # DataFrame containing the unique regions
        self.n_clusters = 0                     # Number of unique regions
        self.save = save

        self.run()

    def number_clusters(self):
        """Create dictionary that maps each unique cluster to its unique cluster number."""
        cluster_to_cluster_num = OrderedDict()
        i = 0
        for j, cluster in enumerate(self.clusters):
            c_t = tuple(cluster)
            if c_t not in cluster_to_cluster_num:
                cluster_to_cluster_num[c_t] = i
                i += 1
        logger.info(f'Initially found {len(cluster_to_cluster_num)} unique region clusters...')
        self.cluster_to_cluster_num = cluster_to_cluster_num
        # Create reverse mapping (cluster number to cluster)
        self.cluster_num_to_cluster = dict((v, k) for k, v in self.cluster_to_cluster_num.items())

    def map_region_num_to_clusters(self):
        """Create dictionary that maps the region number to the cluster."""
        self.region_to_clusters = defaultdict(list)
        for c in self.clusters:
            for region_num in c:
                self.region_to_clusters[region_num].append(c)

    def calc_cluster_xyz_means(self):
        """ Calculate the mean cartesian positions of each cluster."""
        self.cluster_xyz_means = {tuple(cluster): np.mean([self.xyz[region] for region in cluster], axis=0) for cluster in self.clusters}

    def cluster_regions(self):
        """Find unique regions by clustering sources within a certain radius using a K-D tree."""
        self.xyz = ra_dec_to_xyz(self.df_regions['ra_deg'].values * u.deg, self.df_regions['dec_deg'].values * u.deg)
        kd_tree = KDTree(data=self.xyz, leafsize=10, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)
        clustering_radius_rad = self.clustering_radius.to('rad').value
        self.clusters = kd_tree.query_ball_tree(other=kd_tree, r=clustering_radius_rad, p=2.0, eps=0)

    def label_region_num_to_cluster_num(self):
        """Loop over all regions and label them to the closest cluster."""
        cluster_xyz_means = self.cluster_xyz_means
        region_to_clusters = self.region_to_clusters
        cluster_to_cluster_num = self.cluster_to_cluster_num
        xyz = self.xyz

        cluster_labels = []
        for i in range(len(xyz)):
            clusters = region_to_clusters[i]
            if len(clusters) == 1:
                c_t = tuple(clusters[0])
                unique_region_num = cluster_to_cluster_num[c_t]
                cluster_labels.append(unique_region_num)

            elif len(clusters) > 1: # More than one cluster associated with region i
                cluster_distances = {}
                for cluster in region_to_clusters[i]:
                    c_t = tuple(cluster)
                    d = distance.euclidean(xyz[i], cluster_xyz_means[c_t])
                    cluster_distances[c_t] = d

                # Find the closest cluster
                closest_cluster = min(cluster_distances, key=cluster_distances.get)
                cluster_labels.append(cluster_to_cluster_num[closest_cluster])

        self.cluster_labels = cluster_labels
        self.df_regions['cluster_label'] = cluster_labels
        self.n_clusters = len(set(self.cluster_labels))
        logger.info(f'Final number of unique regions = {self.n_clusters}')

    def calc_unique_regions_table(self):
        df_regions_unique = self.df_regions.groupby(['cluster_label'])[['ra_deg', 'dec_deg']].agg('mean')
        df_regions_unique['idxs'] = [list(self.cluster_num_to_cluster[c_num]) for c_num in df_regions_unique.index]
        self.df_regions_unique = df_regions_unique

    def renumber_clusters(self):
        """Renumber the clusters so that they go from 0 to n_clusters."""
        logger.info('Renumbering clusters...')
        old2new = {old:new for new, old in zip(range(self.n_clusters), self.df_regions_unique.index)}
        new2old = {new:old for old, new in old2new.items()}

        self.df_regions_unique.reset_index(drop=True, inplace=True)
        self.df_regions_unique.index.name = 'cluster_label'
        self.df_regions['cluster_label'] = self.df_regions['cluster_label'].map(old2new)
        self.cluster_labels = self.df_regions['cluster_label'].values
        self.cluster_num_to_cluster = {c_num : self.cluster_num_to_cluster[new2old[c_num]] for c_num in range(self.n_clusters)}
        self.cluster_to_cluster_num = {v:k for k,v in self.cluster_num_to_cluster.items()}
        self.region_num_to_cluster_num = {reg:cluster for reg, cluster in zip(self.df_regions.index, self.cluster_labels)}

    def save_unique_regions_table(self):
        if self.save:
            logger.info(f'Saving unique regions table to {savepaths_combined["regions_unique"]}')
            tab_regions_unique = Table.from_pandas(self.df_regions_unique[['ra_deg', 'dec_deg']])
            tab_regions_unique.write(savepaths_combined['regions_unique'], overwrite=True)

    def run(self):
        self.cluster_regions()
        self.number_clusters()
        self.map_region_num_to_clusters()
        self.calc_cluster_xyz_means()
        self.label_region_num_to_cluster_num()
        self.calc_unique_regions_table()
        self.renumber_clusters()
        self.save_unique_regions_table()
calc_cluster_xyz_means()

Calculate the mean cartesian positions of each cluster.

Source code in exod/post_processing/cluster_regions.py
138
139
140
def calc_cluster_xyz_means(self):
    """ Calculate the mean cartesian positions of each cluster."""
    self.cluster_xyz_means = {tuple(cluster): np.mean([self.xyz[region] for region in cluster], axis=0) for cluster in self.clusters}
cluster_regions()

Find unique regions by clustering sources within a certain radius using a K-D tree.

Source code in exod/post_processing/cluster_regions.py
142
143
144
145
146
147
def cluster_regions(self):
    """Find unique regions by clustering sources within a certain radius using a K-D tree."""
    self.xyz = ra_dec_to_xyz(self.df_regions['ra_deg'].values * u.deg, self.df_regions['dec_deg'].values * u.deg)
    kd_tree = KDTree(data=self.xyz, leafsize=10, compact_nodes=True, copy_data=False, balanced_tree=True, boxsize=None)
    clustering_radius_rad = self.clustering_radius.to('rad').value
    self.clusters = kd_tree.query_ball_tree(other=kd_tree, r=clustering_radius_rad, p=2.0, eps=0)
label_region_num_to_cluster_num()

Loop over all regions and label them to the closest cluster.

Source code in exod/post_processing/cluster_regions.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def label_region_num_to_cluster_num(self):
    """Loop over all regions and label them to the closest cluster."""
    cluster_xyz_means = self.cluster_xyz_means
    region_to_clusters = self.region_to_clusters
    cluster_to_cluster_num = self.cluster_to_cluster_num
    xyz = self.xyz

    cluster_labels = []
    for i in range(len(xyz)):
        clusters = region_to_clusters[i]
        if len(clusters) == 1:
            c_t = tuple(clusters[0])
            unique_region_num = cluster_to_cluster_num[c_t]
            cluster_labels.append(unique_region_num)

        elif len(clusters) > 1: # More than one cluster associated with region i
            cluster_distances = {}
            for cluster in region_to_clusters[i]:
                c_t = tuple(cluster)
                d = distance.euclidean(xyz[i], cluster_xyz_means[c_t])
                cluster_distances[c_t] = d

            # Find the closest cluster
            closest_cluster = min(cluster_distances, key=cluster_distances.get)
            cluster_labels.append(cluster_to_cluster_num[closest_cluster])

    self.cluster_labels = cluster_labels
    self.df_regions['cluster_label'] = cluster_labels
    self.n_clusters = len(set(self.cluster_labels))
    logger.info(f'Final number of unique regions = {self.n_clusters}')
map_region_num_to_clusters()

Create dictionary that maps the region number to the cluster.

Source code in exod/post_processing/cluster_regions.py
131
132
133
134
135
136
def map_region_num_to_clusters(self):
    """Create dictionary that maps the region number to the cluster."""
    self.region_to_clusters = defaultdict(list)
    for c in self.clusters:
        for region_num in c:
            self.region_to_clusters[region_num].append(c)
number_clusters()

Create dictionary that maps each unique cluster to its unique cluster number.

Source code in exod/post_processing/cluster_regions.py
117
118
119
120
121
122
123
124
125
126
127
128
129
def number_clusters(self):
    """Create dictionary that maps each unique cluster to its unique cluster number."""
    cluster_to_cluster_num = OrderedDict()
    i = 0
    for j, cluster in enumerate(self.clusters):
        c_t = tuple(cluster)
        if c_t not in cluster_to_cluster_num:
            cluster_to_cluster_num[c_t] = i
            i += 1
    logger.info(f'Initially found {len(cluster_to_cluster_num)} unique region clusters...')
    self.cluster_to_cluster_num = cluster_to_cluster_num
    # Create reverse mapping (cluster number to cluster)
    self.cluster_num_to_cluster = dict((v, k) for k, v in self.cluster_to_cluster_num.items())
renumber_clusters()

Renumber the clusters so that they go from 0 to n_clusters.

Source code in exod/post_processing/cluster_regions.py
185
186
187
188
189
190
191
192
193
194
195
196
197
def renumber_clusters(self):
    """Renumber the clusters so that they go from 0 to n_clusters."""
    logger.info('Renumbering clusters...')
    old2new = {old:new for new, old in zip(range(self.n_clusters), self.df_regions_unique.index)}
    new2old = {new:old for old, new in old2new.items()}

    self.df_regions_unique.reset_index(drop=True, inplace=True)
    self.df_regions_unique.index.name = 'cluster_label'
    self.df_regions['cluster_label'] = self.df_regions['cluster_label'].map(old2new)
    self.cluster_labels = self.df_regions['cluster_label'].values
    self.cluster_num_to_cluster = {c_num : self.cluster_num_to_cluster[new2old[c_num]] for c_num in range(self.n_clusters)}
    self.cluster_to_cluster_num = {v:k for k,v in self.cluster_num_to_cluster.items()}
    self.region_num_to_cluster_num = {reg:cluster for reg, cluster in zip(self.df_regions.index, self.cluster_labels)}

crossmatch

This module contains code for crossmatching the regions with various catalogues.

crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col)

Crossmatch with an arbitrary Fits Table.

Parameters:

Name Type Description Default
tab Table

Astropy Table

required
df_region DataFrame

DataFrame containing the regions to crossmatch (must have ra_deg and dec_deg columns)

required
ra_col str

Column name for the RA values in degrees

required
dec_col str

Column name for the DEC values in in degrees.

required

Returns:

Name Type Description
tab_fits_cmatch Table

Table containing the crossmatched data.

Source code in exod/post_processing/crossmatch.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def crossmatch_astropy_table_with_regions(tab, df_region, ra_col, dec_col):
    """
    Crossmatch with an arbitrary Fits Table.

    Parameters:
        tab (astropy.Table): Astropy Table
        df_region (pd.DataFrame): DataFrame containing the regions to crossmatch (must have ra_deg and dec_deg columns)
        ra_col (str): Column name for the RA values in degrees
        dec_col (str): Column name for the DEC values in in degrees.

    Returns:
        tab_fits_cmatch (astropy.Table): Table containing the crossmatched data.
    """
    sc1 = SkyCoord(ra=tab[ra_col], dec=tab[dec_col], unit=u.deg, frame='fk5', equinox='J2000')
    sc2 = SkyCoord(ra=df_region['ra_deg'].values, dec=df_region['dec_deg'].values, unit='deg', frame='fk5', equinox='J2000')

    cmatch = sc2.match_to_catalog_sky(sc1)

    tab_cmatch = Table(cmatch)
    tab_cmatch.rename_columns(names=tab_cmatch.colnames, new_names=['label', 'sep2d', 'dist3d'])
    tab_cmatch['sep2d_arcsec'] = tab_cmatch['sep2d'].to(u.arcsec)
    tab_cmatch['idx_orig']     = np.arange(len(tab_cmatch))
    df_region_matched = df_region.iloc[tab_cmatch['idx_orig']]

    tab_fits_cmatch = tab[tab_cmatch['label']]
    tab_fits_cmatch['SEP_ARCSEC']   = tab_cmatch['sep2d_arcsec']
    tab_fits_cmatch['RA_OFFSET']    = calc_ra_offset(ra_deg1=df_region_matched['ra_deg'], ra_deg2=tab_fits_cmatch[ra_col], dec_deg1=df_region_matched['dec_deg'])
    tab_fits_cmatch['DEC_OFFSET']   = calc_dec_offset(dec_deg1=df_region_matched['dec_deg'], dec_deg2=tab_fits_cmatch[dec_col])
    tab_fits_cmatch['IDX_ORIGINAL'] = tab_cmatch['idx_orig']
    return tab_fits_cmatch

crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg')

Crossmatch using CDS X-Match Service.

Source code in exod/post_processing/crossmatch.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def crossmatch_cds(df, max_sep_arcsec=20, catalogue='simbad', ra_col='ra_deg', dec_col='dec_deg'):
    """Crossmatch using CDS X-Match Service."""
    print(f'Crossmatching {len(df)} rows with {catalogue} using CDS Xmatch max_sep={max_sep_arcsec}"')
    r = requests.post(
        url='http://cdsxmatch.u-strasbg.fr/xmatch/api/v1/sync',
        data={'request'        : 'xmatch',
              'distMaxArcsec'  : max_sep_arcsec,
              'RESPONSEFORMAT' : 'csv',
              'cat2'           : catalogue,
              'colRA1'         : ra_col,
              'colDec1'        : dec_col},
        files={'cat1'          : df.to_csv()})
    if r.status_code != 200:
        raise Exception(f'Failed to crossmatch with CDS Xmatch. Status code: {r.status_code}')
    df_res = pd.read_csv(io.StringIO(r.text), low_memory=False)
    print(f'Returned {len(df_res)} rows')
    return df_res

crossmatch_dr14_slim(df_region, clobber=True)

Crossmatch regions with the 4XMM DR14 slim catalogue.

Source code in exod/post_processing/crossmatch.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def crossmatch_dr14_slim(df_region, clobber=True):
    """Crossmatch regions with the 4XMM DR14 slim catalogue."""
    logger.info('Crossmatching with 4XMM DR14 slim catalogue')
    if not clobber and savepaths_combined['cmatch_dr14'].exists():
        logger.info(f"{savepaths_combined['cmatch_dr14']} already exists and clobber=False, loading from files")
        return Table.read(savepaths_combined['cmatch_dr14'])
    else:
        logger.info('Some crossmatch files are missing. Recreating...')

    fits_path = savepaths_util['4xmm_dr14_slim']
    tab_xmm_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='SC_RA', dec_col='SC_DEC')
    logger.info(f"Saving XMM DR14 crossmatch to {savepaths_combined['cmatch_dr14']}")
    tab_xmm_cmatch.write(savepaths_combined['cmatch_dr14'], format='csv', overwrite=True)
    return tab_xmm_cmatch

crossmatch_fits_table(fits_path, df_region, ra_col, dec_col)

Crossmatch with an arbitrary Fits Table.

Parameters:

Name Type Description Default
fits_path Path

Path to the FITS file.

required
df_region DataFrame

DataFrame containing the regions to crossmatch.

required
ra_col str

Column name for the RA values in FITS file.

required
dec_col str

Column name for the DEC values in FITS file.

required

Returns:

Name Type Description
tab_fits_cmatch Table

Table containing the crossmatched data.

Source code in exod/post_processing/crossmatch.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def crossmatch_fits_table(fits_path, df_region, ra_col, dec_col):
    """
    Crossmatch with an arbitrary Fits Table.

    Parameters:
        fits_path (Path): Path to the FITS file.
        df_region (pd.DataFrame): DataFrame containing the regions to crossmatch.
        ra_col (str): Column name for the RA values in FITS file.
        dec_col (str): Column name for the DEC values in FITS file.

    Returns:
        tab_fits_cmatch (astropy.Table): Table containing the crossmatched data.
    """
    tab_fits = Table.read(fits_path)

    tab_fits_cmatch = crossmatch_astropy_table_with_regions(tab_fits, df_region, ra_col, dec_col)
    return tab_fits_cmatch

crossmatch_glade(df_region, clobber=True)

Crossmatch regions with GLADE+ catalogue, (converted to fits using topcat)

Source code in exod/post_processing/crossmatch.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
def crossmatch_glade(df_region, clobber=True):
    """Crossmatch regions with GLADE+ catalogue, (converted to fits using topcat)"""
    logger.info("Crossmatching with GLADEP slim catalogue (can take a second)")
    if not clobber and savepaths_combined['cmatch_glade'].exists():
        logger.info(f"{savepaths_combined['cmatch_glade']} already exists and clobber=False, loading from files")
        return Table.read(savepaths_combined['cmatch_glade'])
    else:
        logger.info('Some crossmatch files are missing. Recreating...')

    fits_path = savepaths_util['GLADE+']
    tab_glade_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='RA', dec_col='Dec')
    logger.info(f"Saving GLADE crossmatch to {savepaths_combined['cmatch_glade']}")
    tab_glade_cmatch.write(savepaths_combined['cmatch_glade'], format='csv', overwrite=True)
    return tab_glade_cmatch

crossmatch_tranin_dr12(df_region)

Crossmatch the regions with the CLAXON Hugo Tranin DR12 catalogue.

Source code in exod/post_processing/crossmatch.py
106
107
108
109
110
111
def crossmatch_tranin_dr12(df_region):
    """Crossmatch the regions with the CLAXON Hugo Tranin DR12 catalogue."""
    logger.info('Crossmatching with CLAXON Hugo Tranin DR12 catalogue')
    fits_path = data_util / 'tranin/classification_DR12_with_input.fits'
    tab_xmm_cmatch = crossmatch_fits_table(fits_path, df_region, ra_col='RA', dec_col='DEC')
    return tab_xmm_cmatch

crossmatch_runs

This module is used for various the simulation subsets with each other In order to determine the crossmatch fraction and various other metrics.

calc_subset_cmatch_fraction(dfs_subset_crossmatch)

Calculate the fraction of successfully crossmatched regions.

Parameters:

Name Type Description Default
dfs_subset_crossmatch dict

A dictionary containing the crossmatch results for each subset.

required

Returns:

Name Type Description
df_subset_cmatch_fraction DataFrame

A DataFrame containing the fraction of successfully crossmatched regions.

Source code in exod/post_processing/crossmatch_runs.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def calc_subset_cmatch_fraction(dfs_subset_crossmatch):
    """
    Calculate the fraction of successfully crossmatched regions.

    Parameters:
        dfs_subset_crossmatch (dict): A dictionary containing the crossmatch results for each subset.

    Returns:
        df_subset_cmatch_fraction (pd.DataFrame): A DataFrame containing the fraction of successfully crossmatched regions.
    """
    all_res = []
    for k, df in dfs_subset_crossmatch.items():
        res = {}
        for col in df.columns:
            count = (df[col] > -1).sum()
            perc = count / len(df)
            res[col] = perc
        all_res.append(res)
    df_subset_cmatch_fraction = pd.DataFrame(all_res)
    df_subset_cmatch_fraction.index = dfs_subset_crossmatch.keys()
    return df_subset_cmatch_fraction

calc_subset_n_regions(dfs_subset_crossmatch)

Calculate the number of regions in each subset.

Source code in exod/post_processing/crossmatch_runs.py
107
108
109
110
111
112
def calc_subset_n_regions(dfs_subset_crossmatch):
    """Calculate the number of regions in each subset."""
    n_regions_sim = {}
    for k, df in dfs_subset_crossmatch.items():
        n_regions_sim[k] = len(df)
    return n_regions_sim

crossmatch_simulation_subsets(dfs_subsets)

Crossmatch the subsets of simulations to each other.

Parameters:

Name Type Description Default
dfs_subsets dict

A dictionary containing the DataFrames for each subset.

required

Returns:

Name Type Description
dfs_subset_crossmatch dict

A dictionary containing the crossmatch results for each subset. Each DataFrame contains the indices of the crossmatched sources in the other subsets. A placeholder value of -1 is denoted for sources without a crossmatch.

Source code in exod/post_processing/crossmatch_runs.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def crossmatch_simulation_subsets(dfs_subsets):
    """
    Crossmatch the subsets of simulations to each other.

    Parameters:
        dfs_subsets (dict): A dictionary containing the DataFrames for each subset.

    Returns:
        dfs_subset_crossmatch (dict): A dictionary containing the crossmatch results for each subset.
            Each DataFrame contains the indices of the crossmatched sources in the other subsets.
            A placeholder value of -1 is denoted for sources without a crossmatch.
    """
    max_sep = 15 * u.arcsec  # Maximum separation for considering a crossmatch
    no_cmatch_id = -1  # Placeholder value for sources without a crossmatch
    dfs_subset_crossmatch = {}
    for k1, df1 in dfs_subsets.items():
        sc1 = SkyCoord(ra=df1['ra_deg'], dec=df1['dec_deg'], unit='deg', frame='fk5', equinox='J2000')  #
        res = {}
        res[k1] = np.arange(len(sc1))
        for k2, df2 in dfs_subsets.items():
            if k1 == k2:
                continue

            logger.info(f'Matching {k1} ({len(df1)}) With {k2:<12} ({len(df2)})')
            sc2 = SkyCoord(ra=df2['ra_deg'], dec=df2['dec_deg'], unit='deg', frame='fk5', equinox='J2000')
            cmatch = sc1.match_to_catalog_sky(sc2)

            tab_cmatch = Table(cmatch)
            tab_cmatch.rename_columns(names=tab_cmatch.colnames, new_names=['label', 'sep2d', 'dist3d'])
            tab_cmatch['sep2d'] = tab_cmatch['sep2d'].to(u.arcsec)

            is_match = np.where(tab_cmatch['sep2d'] < max_sep, tab_cmatch['label'], no_cmatch_id)  # Replace
            res[k2] = is_match

        df = pd.DataFrame(res)
        dfs_subset_crossmatch[k1] = df
    return dfs_subset_crossmatch

get_run_subset_keys()

Get the keys for the simulation subsets.

Source code in exod/post_processing/crossmatch_runs.py
16
17
18
19
20
21
22
23
24
25
26
27
def get_run_subset_keys():
    """Get the keys for the simulation subsets."""
    subsets = ['5_0.2_2.0',
               '5_2.0_12.0',
               '5_0.2_12.0',
               '50_0.2_2.0',
               '50_2.0_12.0',
               '50_0.2_12.0',
               '200_0.2_2.0',
               '200_2.0_12.0',
               '200_0.2_12.0']
    return subsets

plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim)

Plot the confusion matrix for the crossmatch fractions.

Source code in exod/post_processing/crossmatch_runs.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
def plot_crossmatch_confusion_matrix(df_subset_cmatch_fraction, n_regions_sim):
    """Plot the confusion matrix for the crossmatch fractions."""
    labels_readable = {'5_0.2_2.0'    : r'$t_{\mathrm{bin}} = 5  ;E=0.2-2.0$',
                       '5_2.0_12.0'   : r'$t_{\mathrm{bin}} = 5  ;E=2.0-12.0$',
                       '5_0.2_12.0'   : r'$t_{\mathrm{bin}} = 5  ;E=0.2-12.0$',
                       '50_0.2_2.0'   : r'$t_{\mathrm{bin}} = 50 ;E=0.2-2.0$',
                       '50_2.0_12.0'  : r'$t_{\mathrm{bin}} = 50 ;E=2.0-12.0$',
                       '50_0.2_12.0'  : r'$t_{\mathrm{bin}} = 50 ;E=0.2-12.0$',
                       '200_0.2_2.0'  : r'$t_{\mathrm{bin}} = 200;E=0.2-2.0$',
                       '200_2.0_12.0' : r'$t_{\mathrm{bin}} = 200;E=2.0-12.0$',
                       '200_0.2_12.0' : r'$t_{\mathrm{bin}} = 200;E=0.2-12.0$'}
    lab1 = [labels_readable[i] for i in df_subset_cmatch_fraction.index]
    lab2 = [fr'$N_{{\mathrm{{reg}}}}$ = {n}' for n in n_regions_sim.values()]

    X = df_subset_cmatch_fraction.values
    fig, ax = plt.subplots(figsize=(10, 9.2))
    ax.set_title('Fraction of successfully crossmatched Regions')
    ax.imshow(X, cmap='Blues', interpolation='none')
    ax.set_xticks(range(X.shape[0]), lab1, rotation=90)
    ax.set_yticks(range(X.shape[0]), lab1, rotation=0)
    ax2 = ax.twinx()
    ax2.set_yticks(range(X.shape[0]), lab2, rotation=0)
    ax2.set_ylim(ax.get_ylim())
    for (j, i), label in np.ndenumerate(X):
        label = f'{label:.2f}'
        ax.text(i, j, label, ha='center', va='center')
    ax.set_xlabel('Crossmatch A')
    ax.set_ylabel('Crossmatch B')
    plt.tight_layout()
    logger.info(f'Saving Crossmatch Confusion Matrix to: crossmatch_confusion_matrix.png...')
    plt.savefig(data_plots / 'crossmatch_confusion_matrix.png')
    plt.savefig(data_plots / 'crossmatch_confusion_matrix.pdf')
    plt.show()

plot_mean_count_histogram(dfs_subsets)

Plot the mean count histograms for each subset.

Source code in exod/post_processing/crossmatch_runs.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def plot_mean_count_histogram(dfs_subsets):
    """Plot the mean count histograms for each subset."""
    linestyles = {'5_': 'solid',
                  '50_': 'dashed',
                  '200_': 'dotted'}

    colors = {'0.2_2.0': 'red',
              '2.0_12.0': 'blue',
              '0.2_12.0': 'black'}

    fig, ax = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
    ax[0].set_title('Mean Count Histograms')
    for k, df in dfs_subsets.items():
        ls = [linestyles[key] for key in linestyles.keys() if key in k]
        c = [colors[key] for key in colors.keys() if key in k]

        hist_kwargs = {'bins'     : np.linspace(start=0, stop=500, num=50),
                       'histtype' : 'step',
                       'lw'       : 1.5,
                       'ls'       : ls[0],
                       'color'    : c[0],
                       'label'    : k}

        ax[0].hist(df['intensity_mean'], **hist_kwargs)
        ax[1].hist(df['intensity_mean'], density=True, **hist_kwargs)
    ax[0].set_ylabel('Number')
    ax[1].set_ylabel('Fraction')
    ax[1].set_xlabel('Intensity Mean (Counts)')
    for a in ax:
        a.legend()
        a.grid()
    plt.subplots_adjust(hspace=0)
    plt.savefig(data_plots / 'mean_count_histogram.png')
    plt.savefig(data_plots / 'mean_count_histogram.pdf')
    plt.show()

split_subsets(df_regions)

Split the regions into subsets based on the t_bin, E_lo, E_hi values in the runid.

Parameters:

Name Type Description Default
df_regions DataFrame

The DataFrame containing the region information.

required

Returns:

Name Type Description
dfs_subsets dict

A dictionary containing the DataFrames for each subset.

Source code in exod/post_processing/crossmatch_runs.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def split_subsets(df_regions):
    """
    Split the regions into subsets based on the t_bin, E_lo, E_hi values in the runid.

    Parameters:
        df_regions (pd.DataFrame): The DataFrame containing the region information.

    Returns:
        dfs_subsets (dict): A dictionary containing the DataFrames for each subset.
    """
    logger.info('Splitting df_regions into simulations subsets...')
    subsets = get_run_subset_keys()
    dfs_subsets = {}
    for s in subsets:
        dfs_subsets[s] = df_regions[df_regions['runid'].str.contains(s)]
    return dfs_subsets

ds9

View images from an XMM observation in DS9.

estimate_variability_properties

calc_sigma_cube(cube_n, cube_mu)

Calculate the sigma equivalent over a data cube. cube_n : observed data cube cube_mu : Expected data cube

Source code in exod/post_processing/estimate_variability_properties.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def calc_sigma_cube(cube_n, cube_mu):
    """
    Calculate the sigma equivalent over a data cube.
    cube_n  : observed data cube
    cube_mu : Expected data cube
    """
    result_cube = np.zeros_like(cube_n)
    for i in range(cube_n.shape[0]):
        print(f'Calculating sigma cube... {i}/{cube_n.shape[0]}')
        for j in range(cube_n.shape[1]):
            for k in range(cube_n.shape[2]):
                n = cube_n[i, j, k]
                mu = cube_mu[i, j, k]
                result_cube[i, j, k] = sigma_equivalent(n, mu)
    return result_cube

clean_up_eclipses(data_cube, eclipses)

Removes eclipses in bright sources coming from merging of partial exposures. This is done by checking if the flux change is the same in the source and the rest of the frame

Source code in exod/post_processing/estimate_variability_properties.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def clean_up_eclipses(data_cube, eclipses):
    """Removes eclipses in bright sources coming from merging of partial exposures. This is done by checking if the flux
    change is the same in the source and the rest of the frame"""
    new_eclipses = np.copy(eclipses)
    eclipse_x, eclipse_y, eclipse_t = np.where(eclipses == True)
    cube = data_cube.data
    for (x, y, t) in zip(eclipse_x, eclipse_y, eclipse_t):
        non_source_eclipseframe = np.nansum(cube[:, :, t]) - np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        non_source_around_eclipseframe = (np.nansum(cube[:, :, t - 5:t + 6]) - np.nansum(
            cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - non_source_eclipseframe) / 10
        source_eclipseframe = np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        source_around_eclipseframe = (np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - source_eclipseframe) / 10
        # We reject eclipses for which the rest of the frame has had an eclipse as well, and this background "eclipse"
        # is at most 25% larger in relative amplitude than the source eclipse.
        if ((non_source_eclipseframe < non_source_around_eclipseframe) and
                ((((non_source_around_eclipseframe - non_source_eclipseframe) / non_source_around_eclipseframe) -
                  ((source_around_eclipseframe - source_eclipseframe) / source_around_eclipseframe)) < 0.25)):
            new_eclipses[x, y, t] = False
            # data_cube.data[:,:,t]=np.full(data_cube.shape[:2],np.nan)
    return new_eclipses

clean_up_peaks(data_cube, peaks)

Removes peaks coming from flaring CCD edges

Source code in exod/post_processing/estimate_variability_properties.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def clean_up_peaks(data_cube, peaks):
    """Removes peaks coming from flaring CCD edges"""
    new_peaks = np.copy(peaks)
    peak_x, peak_y, peak_t = np.where(peaks == True)
    cube = data_cube.data
    for (x, y, t) in zip(peak_x, peak_y, peak_t):
        non_source_peakframe = np.nansum(cube[:, :, t]) - np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        non_source_around_peakframe = (np.nansum(cube[:, :, t - 5:t + 6]) - np.nansum(
            cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - non_source_peakframe) / 10
        source_peakframe = np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t])
        source_around_peakframe = (np.nansum(cube[x - 1:x + 2, y - 1:y + 2, t - 5:t + 6]) - source_peakframe) / 10
        # We reject peaks for which the rest of the frame has a peak as well, and the source peak
        # is at most 25% larger in relative amplitude than the background peak.
        background_peak_amplitude = ((non_source_peakframe - non_source_around_peakframe) / non_source_around_peakframe)
        source_peak_amplitude = ((source_peakframe - source_around_peakframe) / source_around_peakframe)
        if (background_peak_amplitude > 0) and ((source_peak_amplitude - background_peak_amplitude) < 0.25):
            new_peaks[x, y, t] = False
            # data_cube.data[:,:,t]=np.full(data_cube.shape[:2],np.nan)
    return new_peaks

convert_count_to_flux(count, position, data_cube)

Used to convert count rates to fluxes, using vignetting / EEF / ECF

Source code in exod/post_processing/estimate_variability_properties.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def convert_count_to_flux(count, position, data_cube):
    # TODO: find the vignetting functions depending on energy / submode / filters. Maybe build the exposure map for
    # each observation  (https://xmm-tools.cosmos.esa.int/external/sas/current/doc/eexpmap.pdf)
    # Find the EEF (Encircled Energy Frac.), maybe with ARFGEN if we manage to convert data_cube regions to XY regions
    # Think about which Energy Conversion Factor (ECF) to use. Maybe depends on spectral search (hard or soft).
    """Used to convert count rates to fluxes, using vignetting / EEF / ECF"""

    """
    ds9 --> square region
    from this create an ARF for the region. http://xmm-tools.cosmos.esa.int/external/sas/current/doc/arfgen/
    This will allow you to convert a count to a count rate.
    from a count rate, you can get the flux by using an ECF.

    A spectral should be assumed that will likely depend on the band you're looking at:
    0.2 - 2.0 --> bbody
    2.0 - 12.0 --> something else
    this will change the flux by a factor of 2-3x

    this will give us an L for a given distance. :)
    """

    return count / data_cube.time_interval

count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)

Counts the individual number of times the lightcurve went above the threshold for variability, and there was a given number of bins since the last time it was above this threshold last

Source code in exod/post_processing/estimate_variability_properties.py
135
136
137
138
139
140
141
142
143
144
145
146
def count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks):
    """Counts the individual number of times the lightcurve went above the threshold for variability, and there was
    a given number of bins since the last time it was above this threshold last"""
    if np.sum(peaks_or_eclipses)==0.:
        return 0
    indices_peaksoreclipses = np.where(peaks_or_eclipses)[0]
    number_last_peak_end = np.searchsorted(indices_peaksoreclipses, np.arange(len(peaks_or_eclipses)), side='left')-1
    indices_last_peak_end = np.take(indices_peaksoreclipses,number_last_peak_end)
    distance_since_last_peak_end = (np.arange(len(peaks_or_eclipses))-indices_last_peak_end)
    peaks_distant_enough = np.array(peaks_or_eclipses, dtype=bool)&(distance_since_last_peak_end>bins_min_between_peaks)
    nbr_of_variability_events = np.nansum(peaks_distant_enough)+1
    return nbr_of_variability_events

count_peaks(peaks_or_eclipses)

Counts the individual number of times the lightcurve went above the threshold for variability

Source code in exod/post_processing/estimate_variability_properties.py
130
131
132
133
def count_peaks(peaks_or_eclipses):
    """Counts the individual number of times the lightcurve went above the threshold for variability"""
    nbr_of_variability_events = np.nansum(np.abs(np.diff(peaks_or_eclipses, axis=2)), axis=2) / 2
    return nbr_of_variability_events

eclipse_count_estimate(fraction, N, mu)

Estimate the upper limit on the count of the eclipse, given an data_cube and observed counts, and a confidence fraction

Source code in exod/post_processing/estimate_variability_properties.py
155
156
157
158
def eclipse_count_estimate(fraction, N, mu):
    """Estimate the upper limit on the count of the eclipse, given an data_cube and observed counts,
     and a confidence fraction"""
    return mu - gammaincinv(N + 1, gammainc(N + 1, mu) - fraction * gammainc(N + 1, mu))

peak_count_estimate(fraction, N, mu)

Estimate the upper limit on the count of the peak, given an data_cube and observed counts, and a confidence fraction

Source code in exod/post_processing/estimate_variability_properties.py
149
150
151
152
def peak_count_estimate(fraction, N, mu):
    """Estimate the upper limit on the count of the peak, given an data_cube and observed counts,
     and a confidence fraction"""
    return gammaincinv(N + 1, fraction * gammaincc(N + 1, mu) + gammainc(N + 1, mu)) - mu

plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes)

This function creates the multi-panel lightcurve of the transient object. It will retrieve the background, and use it to compare the (source+background) to the background, and compute the likelihood in each frame :param cube: full data cube :param cube_background: data cube of de-sourced background estimate :param cube_background_withsource: data cube of de-sourced background estimate + constant contribution from the sources (i.e. we assume they are constant, take their stacked flux and distribute it over all frames) :param tab_boundingboxes: bounding boxes of variable objects, as obtained from variability.py :return: nothing, but saves the lightcurve of each (source+background) and its background, lightcurve of background-subtracted source, and lightcurve of likelihood of creating the signal with the background

Source code in exod/post_processing/estimate_variability_properties.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def plot_lightcurve_alerts_with_background(cube, cube_background, cube_background_withsource, tab_boundingboxes):
    """
    This function creates the multi-panel lightcurve of the transient object. It will retrieve the background,
    and use it to compare the (source+background) to the background, and compute the likelihood in each frame
    :param cube: full data cube
    :param cube_background: data cube of de-sourced background estimate
    :param cube_background_withsource: data cube of de-sourced background estimate + constant contribution from the
    sources (i.e. we assume they are constant, take their stacked flux and distribute it over all frames)
    :param tab_boundingboxes: bounding boxes of variable objects, as obtained from variability.py
    :return: nothing, but saves the lightcurve of each (source+background) and its background, lightcurve of
    background-subtracted source, and lightcurve of likelihood of creating the signal with the background
    """
    color = cmr.take_cmap_colors('cmr.ocean', N=1, cmap_range=(0.3, 0.3))[0]
    for ind, source in enumerate(tab_boundingboxes):
        legend_plots_ax1 = []
        legend_labels_ax1 = []
        legend_plots_ax2 = []
        legend_labels_ax2 = []
        legend_plots_ax3 = []
        legend_labels_ax3 = []

        # First step: we generate the lightcurves and a number of Poisson realisations of it
        lc = np.sum(cube[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_generated = np.random.poisson(lc, (5000, len(lc)))
        lc_percentiles = np.nanpercentile(lc_generated, (16, 84), axis=0)
        lc_back = np.sum(cube_background[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_back_generated = np.random.poisson(lc_back, (5000, len(lc)))
        lc_back_percentiles = np.nanpercentile(lc_back_generated, (16, 84), axis=0)
        lc_back_withsource = np.sum(cube_background_withsource[source[0]:source[2], source[1]:source[3]], axis=(0, 1))
        lc_back_generated_withsource = np.random.poisson(lc_back_withsource, (5000, len(lc)))

        fig, (ax1, ax2, ax3) = plt.subplots(3, 1)

        # First panel: Source + background lightcurve
        p1 = ax1.step(range(len(lc)), lc, c=color, where="mid")
        p2 = ax1.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        ax1.fill_between(range(len(lc)), lc_percentiles[0], lc_percentiles[1], alpha=0.4, facecolor=color, step="mid")
        legend_plots_ax1.append((p1[0], p2[0]))
        legend_labels_ax1.append("Source+background")
        p1 = ax1.step(range(len(lc_back)), lc_back, c="gray", where="mid")
        p2 = ax1.fill(np.NaN, np.NaN, facecolor="gray", alpha=0.4)
        ax1.fill_between(range(len(lc_back)), lc_back_percentiles[0], lc_back_percentiles[1],
                         alpha=0.4, facecolor="gray", step="mid")
        legend_plots_ax1.append((p1[0], p2[0]))
        legend_labels_ax1.append("Background")
        ax1.legend(legend_plots_ax1, legend_labels_ax1)

        # Second panel: Background subtracted lightcurve
        p1 = ax2.step(range(len(lc)), lc - lc_back, c=color, where='mid')
        p2 = ax2.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        lc_diff_percentiles = np.nanpercentile(lc_generated - lc_back_generated, (16, 84), axis=0)
        ax2.fill_between(range(len(lc)), lc_diff_percentiles[0], lc_diff_percentiles[1],
                         alpha=0.4, facecolor=color, step='mid')
        legend_plots_ax2.append((p1[0], p2[0]))
        legend_labels_ax2.append("Source-Background")
        ax2.legend(legend_plots_ax2, legend_labels_ax2)

        # Third panel: Poisson likelihood
        p1 = ax3.step(range(len(lc)), -poisson.logpmf(lc, lc_back_withsource), c=color, where='mid')
        p2 = ax3.fill(np.NaN, np.NaN, facecolor=color, alpha=0.4)
        likelihood_percentiles = np.nanpercentile(-poisson.logpmf(lc_generated, lc_back_generated_withsource), (16, 84),
                                                  axis=0)
        ax3.fill_between(range(len(lc)), likelihood_percentiles[0], likelihood_percentiles[1],
                         alpha=0.4, facecolor=color, step='mid')
        legend_plots_ax3.append((p1[0], p2[0]))
        legend_labels_ax3.append("Poisson likelihood")
        ax3.legend(legend_plots_ax2, legend_labels_ax2)

        plt.savefig(os.path.join(data_processed, '0831790701', f'lightcurve_transient_{ind}.png'))

extract_lc_features

calc_df_lc_feat_filter_flags(df_lc_feat)

Calculate the quality flags for the lightcurves in the sample.

Source code in exod/post_processing/extract_lc_features.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def calc_df_lc_feat_filter_flags(df_lc_feat):
    """Calculate the quality flags for the lightcurves in the sample."""
    print('Calculating Light Curve Feature Filter Flags...')
    # Filter flag for regions that have less than 5 counts maximum in 5 second binning
    df_lc_feat['filt_tbin_5_n_l_5'] = (df_lc_feat['runid'].str.contains('_5_')) & (df_lc_feat['n_max'] < 5)

    # Filter flag for runids with more than 20 detected regions
    vc_runid = df_lc_feat['runid'].value_counts()
    df_lc_feat['filt_g_20_detections'] = df_lc_feat['runid'].isin(vc_runid.index[vc_runid > 20])

    # Filter flag for 5 sigma detections
    df_lc_feat['filt_5sig'] = (df_lc_feat['B_peak_log_max'] > 13.2) | (df_lc_feat['B_eclipse_log_max'] > 12.38)

    # Filter flag for excluded obsids
    df_lc_feat['obsid'] = df_lc_feat['runid'].str.extract(r'(\d{10})')
    df_lc_feat['filt_exclude_obsid'] = df_lc_feat['obsid'].isin(obsids_to_exclude)
    return df_lc_feat

calc_features(df_lc, key)

Calculate Features for a single lightcurve.

Source code in exod/post_processing/extract_lc_features.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def calc_features(df_lc, key):
    """Calculate Features for a single lightcurve."""
    parts    = key.strip("()").split(", ")
    parts    = [part.strip("'") for part in parts]
    runid    = parts[0]
    label    = parts[1]
    runid_sp = runid.split("_")
    obsid    = runid_sp[0]
    subset   = int(runid_sp[1])
    t_bin    = int(runid_sp[2])
    E_low    = float(runid_sp[3])
    E_hi     = float(runid_sp[4])

    length = len(df_lc)
    n_bccd = df_lc['bccd'].sum()
    n_bti = df_lc['bti'].sum()
    ratio_bccd = n_bccd / length
    ratio_bti = n_bti / length

    ks = ks_2samp(df_lc['n'], df_lc['mu'])

    peaks_or_eclipses = (df_lc['B_peak_log'] > 6.4) | (df_lc['B_eclipse_log'] > 5.5)
    t_bin_bin_spacing = {5 : 1000, 50 : 100, 200 : 5}
    bins_min_between_peaks = t_bin_bin_spacing[t_bin]
    n_peaks = count_distant_peaks(peaks_or_eclipses, bins_min_between_peaks)
    sig_bins = count_significant_bins(df_lc)

    num_B_peak_above_6_4    = count_recurring_peaks(df_lc['B_peak_log'].values, threshold=6.4)
    num_B_eclipse_above_5_5 = count_recurring_peaks(df_lc['B_eclipse_log'].values, threshold=5.5)

    n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare = largest_peak_info(df_lc)

    res = {'label'                     : key,
           'runid'                   : runid,
           'label'                   : label,
           'obsid'                   : obsid,
           'subset'                  : subset,
           't_bin'                   : t_bin,
           'E_low'                   : E_low,
           'E_hi'                    : E_hi,
           'len'                     : length,
           'n_bccd'                  : df_lc['bccd'].sum(),
           'n_bti'                   : df_lc['bti'].sum(),
           'ratio_bccd'              : ratio_bccd,
           'ratio_bti'               : ratio_bti,
           'ks_stat'                 : ks.statistic,
           'ks_pval'                 : ks.pvalue,
           'n_min'                   : df_lc['n'].min(),
           'n_max'                   : df_lc['n'].max(),
           'n_mean'                  : df_lc['n'].mean(),
           'n_std'                   : df_lc['n'].std(),
           'n_sum'                   : df_lc['n'].sum(),
           'n_skew'                  : skew(df_lc['n']),
           'n_kurt'                  : kurtosis(df_lc['n']),
           'n_max_idx'               : n_max_idx,
           'n_max_isolated_flare'    : n_max_isolated_flare,
           'n_max_first_bin'         : n_max_first_bin,
           'n_max_last_bin'          : n_max_last_bin,
           'mu_min'                  : df_lc['n'].min(),
           'mu_max'                  : df_lc['n'].max(),
           'mu_mean'                 : df_lc['n'].mean(),
           'mu_std'                  : df_lc['n'].std(),
           'mu_skew'                 : skew(df_lc['n']),
           'mu_kurt'                 : kurtosis(df_lc['n']),
           'B_peak_log_max'          : df_lc['B_peak_log'].max(),
           'B_eclipse_log_max'       : df_lc['B_eclipse_log'].max(),
           'sigma_max_B_peak'        : sigma_equivalent_B_peak(df_lc['B_peak_log'].max()),
           'sigma_max_B_eclipse'     : sigma_equivalent_B_eclipse(df_lc['B_eclipse_log'].max()),
           'num_B_peak_above_6_4'    : num_B_peak_above_6_4,
           'num_B_eclipse_above_5_5' : num_B_eclipse_above_5_5,
           'bin_min_between_peaks'   : bins_min_between_peaks,
           'n_peaks'                 : n_peaks}
    for k, v in sig_bins.items():
        res[k] = v

    return res

count_significant_bins(df_lc)

Count the total number of significant bins across all the lightcurves.

Source code in exod/post_processing/extract_lc_features.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def count_significant_bins(df_lc):
    """Count the total number of significant bins across all the lightcurves."""
    # The following are obtained from bayesian_computations import get_bayes_thresholds
    B_peak_3_sig    = 5.94096891658419
    B_peak_5_sig    = 13.278830271385576
    B_eclipse_3_sig = 5.708917345972507
    B_eclipse_5_sig = 12.380488516107011

    res = {'n_3_sig_peak_bins'        : (df_lc['B_peak_log'] > B_peak_3_sig).sum(),
           'n_3_sig_peak_bins_bti'    : (df_lc[df_lc['bti'] == 1]['B_peak_log'] > B_peak_3_sig).sum(),
           'n_3_sig_peak_bins_gti'    : (df_lc[df_lc['bti'] == 0]['B_peak_log'] > B_peak_3_sig).sum(),
           'n_5_sig_peak_bins'        : (df_lc['B_peak_log'] > B_peak_5_sig).sum(),
           'n_5_sig_peak_bins_bti'    : (df_lc[df_lc['bti'] == 1]['B_peak_log'] > B_peak_5_sig).sum(),
           'n_5_sig_peak_bins_gti'    : (df_lc[df_lc['bti'] == 0]['B_peak_log'] > B_peak_5_sig).sum(),
           'n_3_sig_eclipse_bins'     : (df_lc['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_3_sig_eclipse_bins_bti' : (df_lc[df_lc['bti'] == 1]['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_3_sig_eclipse_bins_gti' : (df_lc[df_lc['bti'] == 0]['B_eclipse_log'] > B_eclipse_3_sig).sum(),
           'n_5_sig_eclipse_bins'     : (df_lc['B_eclipse_log'] > B_eclipse_5_sig).sum(),
           'n_5_sig_eclipse_bins_bti' : (df_lc[df_lc['bti'] == 1]['B_eclipse_log'] > B_eclipse_5_sig).sum(),
           'n_5_sig_eclipse_bins_gti' : (df_lc[df_lc['bti'] == 0]['B_eclipse_log'] > B_eclipse_5_sig).sum()}
    return res

extract_lc_features(clobber=True)

Loop over all lightcurves and extract features.

Source code in exod/post_processing/extract_lc_features.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
def extract_lc_features(clobber=True):
    """Loop over all lightcurves and extract features."""
    if not clobber:
        if savepaths_combined['lc_features'].exists():
            logger.info(f'Light curve features already exist at {savepaths_combined["lc_features"]}. Skipping...')
            df_lc_features = pd.read_csv(savepaths_combined['lc_features'])
            return df_lc_features

    df_lc_indexs = pd.read_csv(savepaths_combined['lc_idx'], index_col='Unnamed: 0')
    all_res = []
    for i, r in tqdm(df_lc_indexs.iterrows(), desc="Extracting Lightcurve Features."):
        df_lc = pd.read_hdf(savepaths_combined['lc'], start=r['start'], stop=r['stop'])
        res = calc_features(df_lc, key=i)
        all_res.append(res)
    df_lc_features = pd.DataFrame(all_res)
    logger.info(f'Saving light curve features to {savepaths_combined["lc_features"]}')
    df_lc_features.to_csv(savepaths_combined['lc_features'], index=False)
    return df_lc_features

largest_peak_info(df_lc)

Find the largest peak in the light curve and return some information about it.

Parameters:

Name Type Description Default
df_lc DataFrame

DataFrame containing the light curve data.

required

Returns:

Name Type Description
n_max_idx int

Index of the largest peak.

n_max_last_bin bool

True if the largest peak is in the last bin.

n_max_first_bin bool

True if the largest peak is in the first bin.

n_max_isolated_flare bool

True if the largest peak is surrounded by zeros.

Source code in exod/post_processing/extract_lc_features.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def largest_peak_info(df_lc):
    """
    Find the largest peak in the light curve and return some information about it.

    Parameters:
        df_lc (pd.DataFrame): DataFrame containing the light curve data.

    Returns:
        n_max_idx (int): Index of the largest peak.
        n_max_last_bin (bool): True if the largest peak is in the last bin.
        n_max_first_bin (bool): True if the largest peak is in the first bin.
        n_max_isolated_flare (bool): True if the largest peak is surrounded by zeros.
    """
    n_max_last_bin       = False
    n_max_first_bin      = False
    n_max_isolated_flare = False

    n = df_lc['n'].values
    n_max_idx = np.argmax(n)
    if n_max_idx + 1 == len(n):
        n_max_last_bin = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare
    elif n_max_idx + 1 == -1:
        n_max_first_bin = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare

    val_before = n[n_max_idx - 1]
    val_after = n[n_max_idx + 1]
    if (val_before == 0) & (val_after == 0):
        n_max_isolated_flare = True
        return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare
    return n_max_idx, n_max_last_bin, n_max_first_bin, n_max_isolated_flare

print_df_lc_feat_filter_flag_stats(df_lc_feat)

Print a summary of the lightcurve quality flags.

Source code in exod/post_processing/extract_lc_features.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def print_df_lc_feat_filter_flag_stats(df_lc_feat):
    """Print a summary of the lightcurve quality flags."""
    print('Lightcurve Feature Flags Statistics:')
    print('------------------------------------')
    flag_cols = ['n_max_isolated_flare', 'n_max_first_bin', 'n_max_last_bin', 'filt_tbin_5_n_l_5', 'filt_5sig', 'filt_exclude_obsid', 'filt_g_20_detections']
    df_lc_feat_5_sig = df_lc_feat[df_lc_feat['filt_5sig']]
    N = len(df_lc_feat)
    for col in flag_cols:
        num = len(df_lc_feat[df_lc_feat[col] == True])
        percentage = num / N * 100
        count_5sig = len(df_lc_feat_5_sig[df_lc_feat_5_sig[col] == True])
        total_5sig = len(df_lc_feat_5_sig)
        percentage_5sig = count_5sig / total_5sig * 100

        col_formatted = f'{col:<20}'
        num_formatted = f'{num:<5} / {N} ({percentage:.2f}%)'
        five_sig_formatted = f'5sig: {count_5sig:<5} / {total_5sig} ({percentage_5sig:.2f}%)'
        print(f'{col_formatted} : {num_formatted:<22} | {five_sig_formatted}')

    print(f'n_exluded_obsids     : {len(obsids_to_exclude):,}')
    print('------------------------------------')

filter

This module contains classes for filtering regions and lightcurves.

FilterBase

Bases: ABC

Base class for filters. All filters should inherit from this class.

Parameters:

Name Type Description Default
name str

Name of the filter.

required

Methods:

Name Description
apply

Apply the filter to the input dataframe.

get_parameters

Return a dictionary of filter-specific parameters.

info

Return a dictionary with information about the filter.

Source code in exod/post_processing/filter.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class FilterBase(ABC):
    """
    Base class for filters. All filters should inherit from this class.

    Parameters:
        name (str): Name of the filter.

    Methods:
        apply(df): Apply the filter to the input dataframe.
        get_parameters(): Return a dictionary of filter-specific parameters.
        info(): Return a dictionary with information about the filter.
    """
    def __init__(self, name):
        self.name = name
        self.df = None
        self.df_filtered = None
        self.df_removed = None

    @abstractmethod
    def apply(self, df):
        """This should set self.df_regions, self.df_filtered and self.removed."""

    @abstractmethod
    def get_parameters(self):
        """This should return a dictionary of filter-specific parameters."""

    def info(self):
        info = {}
        info['name'] = self.name
        info['parameters'] = self.get_parameters()
        info['n_original'] = len(self.df)
        info['n_filtered'] = len(self.df_filtered)
        info['n_removed'] = len(self.df_removed)
        return info

    def __repr__(self):
        return f'Filter(name="{self.name}")'
apply(df) abstractmethod

This should set self.df_regions, self.df_filtered and self.removed.

Source code in exod/post_processing/filter.py
26
27
28
@abstractmethod
def apply(self, df):
    """This should set self.df_regions, self.df_filtered and self.removed."""
get_parameters() abstractmethod

This should return a dictionary of filter-specific parameters.

Source code in exod/post_processing/filter.py
30
31
32
@abstractmethod
def get_parameters(self):
    """This should return a dictionary of filter-specific parameters."""

generate_combinations_with_one_or_none(filters)

Generates all combinations of zero or one filter from the given list.

This function creates an iterable that yields combinations of filters, where each combination includes either no filter or exactly one filter from the provided list. The function returns an iterator that produces these combinations on demand.

Parameters:

Name Type Description Default
filters list

A list of Filters() to generate combinations from.

required

Returns:

Type Description

itertools.chain: An iterator yielding tuples that contain either

no filters or one filter from the provided list. Each tuple is a

valid combination.

Example

filters = ['filter1', 'filter2'] result = generate_combinations_with_one_or_none(filters) print(list(result))

Output: [(), ('filter1',), ('filter2',)]
Source code in exod/post_processing/filter.py
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
def generate_combinations_with_one_or_none(filters):
    """
    Generates all combinations of zero or one filter from the given list.

    This function creates an iterable that yields combinations of filters, 
    where each combination includes either no filter or exactly one filter 
    from the provided list. The function returns an iterator that produces 
    these combinations on demand.

    Parameters:
        filters (list): A list of Filters() to generate combinations from.

    Returns:
        itertools.chain: An iterator yielding tuples that contain either 
        no filters or one filter from the provided list. Each tuple is a 
        valid combination.

    Example:
        filters = ['filter1', 'filter2']
        result = generate_combinations_with_one_or_none(filters)
        print(list(result)) 
        # Output: [(), ('filter1',), ('filter2',)]
    """
    return chain.from_iterable(combinations(filters, r) for r in range(2))

generate_valid_combinations(*filter_lists)

Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

This function takes multiple lists of filters as input and generates all possible combinations where each combination contains zero or one filter from each list. The Cartesian product of the combinations from each list is computed to create all valid combinations across multiple filter categories.

Parameters:

Name Type Description Default
*filter_lists list of lists

Variable number of filter lists. Each argument is a

()

Returns:

Name Type Description
list

A list of valid filter combinations. Each combination is represented as a list of filters,

where the filters are chosen from the input lists. The result includes combinations with zero

or one filter from each list.

Example

filters_energy = ['E1', 'E2'] filters_time = ['T1', 'T2'] result = generate_valid_combinations(filters_energy, filters_time) print(result)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
Source code in exod/post_processing/filter.py
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
def generate_valid_combinations(*filter_lists):
    """
    Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

    This function takes multiple lists of filters as input and generates all possible combinations 
    where each combination contains zero or one filter from each list. The Cartesian product 
    of the combinations from each list is computed to create all valid combinations across 
    multiple filter categories.

    Parameters:
        *filter_lists (list of lists): Variable number of filter lists. Each argument is a 
        list of filters representing a different filter category. The function will combine 
        filters from each list, with each combination containing at most one filter from 
        each category.

    Returns:
        list: A list of valid filter combinations. Each combination is represented as a list of filters, 
        where the filters are chosen from the input lists. The result includes combinations with zero 
        or one filter from each list.

    Example:
        filters_energy = ['E1', 'E2']
        filters_time = ['T1', 'T2']
        result = generate_valid_combinations(filters_energy, filters_time)
        print(result)
        # Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
    """
    combinations_lists = [generate_combinations_with_one_or_none(filters) for filters in filter_lists]
    valid_combinations = product(*combinations_lists)
    valid_combinations = [list(chain.from_iterable(combo)) for combo in valid_combinations]
    return valid_combinations

filter_subsets

FilterBase

Bases: ABC

Base class for filters. All filters should inherit from this class.

Parameters:

Name Type Description Default
name str

Name of the filter.

required

Methods:

Name Description
apply

Apply the filter to the input dataframe.

get_parameters

Return a dictionary of filter-specific parameters.

info

Return a dictionary with information about the filter.

Source code in exod/post_processing/filter.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class FilterBase(ABC):
    """
    Base class for filters. All filters should inherit from this class.

    Parameters:
        name (str): Name of the filter.

    Methods:
        apply(df): Apply the filter to the input dataframe.
        get_parameters(): Return a dictionary of filter-specific parameters.
        info(): Return a dictionary with information about the filter.
    """
    def __init__(self, name):
        self.name = name
        self.df = None
        self.df_filtered = None
        self.df_removed = None

    @abstractmethod
    def apply(self, df):
        """This should set self.df_regions, self.df_filtered and self.removed."""

    @abstractmethod
    def get_parameters(self):
        """This should return a dictionary of filter-specific parameters."""

    def info(self):
        info = {}
        info['name'] = self.name
        info['parameters'] = self.get_parameters()
        info['n_original'] = len(self.df)
        info['n_filtered'] = len(self.df_filtered)
        info['n_removed'] = len(self.df_removed)
        return info

    def __repr__(self):
        return f'Filter(name="{self.name}")'
apply(df) abstractmethod

This should set self.df_regions, self.df_filtered and self.removed.

Source code in exod/post_processing/filter.py
26
27
28
@abstractmethod
def apply(self, df):
    """This should set self.df_regions, self.df_filtered and self.removed."""
get_parameters() abstractmethod

This should return a dictionary of filter-specific parameters.

Source code in exod/post_processing/filter.py
30
31
32
@abstractmethod
def get_parameters(self):
    """This should return a dictionary of filter-specific parameters."""

generate_combinations_with_one_or_none(filters)

Generates all combinations of zero or one filter from the given list.

This function creates an iterable that yields combinations of filters, where each combination includes either no filter or exactly one filter from the provided list. The function returns an iterator that produces these combinations on demand.

Parameters:

Name Type Description Default
filters list

A list of Filters() to generate combinations from.

required

Returns:

Type Description

itertools.chain: An iterator yielding tuples that contain either

no filters or one filter from the provided list. Each tuple is a

valid combination.

Example

filters = ['filter1', 'filter2'] result = generate_combinations_with_one_or_none(filters) print(list(result))

Output: [(), ('filter1',), ('filter2',)]
Source code in exod/post_processing/filter.py
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
def generate_combinations_with_one_or_none(filters):
    """
    Generates all combinations of zero or one filter from the given list.

    This function creates an iterable that yields combinations of filters, 
    where each combination includes either no filter or exactly one filter 
    from the provided list. The function returns an iterator that produces 
    these combinations on demand.

    Parameters:
        filters (list): A list of Filters() to generate combinations from.

    Returns:
        itertools.chain: An iterator yielding tuples that contain either 
        no filters or one filter from the provided list. Each tuple is a 
        valid combination.

    Example:
        filters = ['filter1', 'filter2']
        result = generate_combinations_with_one_or_none(filters)
        print(list(result)) 
        # Output: [(), ('filter1',), ('filter2',)]
    """
    return chain.from_iterable(combinations(filters, r) for r in range(2))

generate_valid_combinations(*filter_lists)

Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

This function takes multiple lists of filters as input and generates all possible combinations where each combination contains zero or one filter from each list. The Cartesian product of the combinations from each list is computed to create all valid combinations across multiple filter categories.

Parameters:

Name Type Description Default
*filter_lists list of lists

Variable number of filter lists. Each argument is a

()

Returns:

Name Type Description
list

A list of valid filter combinations. Each combination is represented as a list of filters,

where the filters are chosen from the input lists. The result includes combinations with zero

or one filter from each list.

Example

filters_energy = ['E1', 'E2'] filters_time = ['T1', 'T2'] result = generate_valid_combinations(filters_energy, filters_time) print(result)

Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
Source code in exod/post_processing/filter.py
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
def generate_valid_combinations(*filter_lists):
    """
    Generates all valid combinations of filters from multiple lists, allowing at most one filter from each list.

    This function takes multiple lists of filters as input and generates all possible combinations 
    where each combination contains zero or one filter from each list. The Cartesian product 
    of the combinations from each list is computed to create all valid combinations across 
    multiple filter categories.

    Parameters:
        *filter_lists (list of lists): Variable number of filter lists. Each argument is a 
        list of filters representing a different filter category. The function will combine 
        filters from each list, with each combination containing at most one filter from 
        each category.

    Returns:
        list: A list of valid filter combinations. Each combination is represented as a list of filters, 
        where the filters are chosen from the input lists. The result includes combinations with zero 
        or one filter from each list.

    Example:
        filters_energy = ['E1', 'E2']
        filters_time = ['T1', 'T2']
        result = generate_valid_combinations(filters_energy, filters_time)
        print(result)
        # Output: [[], ['E1'], ['E2'], ['T1'], ['T2'], ['E1', 'T1'], ['E1', 'T2'], ['E2', 'T1'], ['E2', 'T2']]
    """
    combinations_lists = [generate_combinations_with_one_or_none(filters) for filters in filter_lists]
    valid_combinations = product(*combinations_lists)
    valid_combinations = [list(chain.from_iterable(combo)) for combo in valid_combinations]
    return valid_combinations

hot_regions

calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions)

Calculate the hot pixel flags for use on df_regions.

Source code in exod/post_processing/hot_regions.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
def calc_hot_region_flags(df_regions, df_regions_rotated, hot_regions):
    """Calculate the hot pixel flags for use on df_regions."""
    all_masks = []
    for k, v in hot_regions.items():
        # Get the mask for the rows in the table that correspond to the time binning.
        mask_t_bin = df_regions_rotated['runid'].str.contains(k)

        # Loop over each hot region and calculate the mask on df_regions_rotated that corresponds
        # to that specific hot region, append these all together in a list.
        for i, (x,y,w,h) in enumerate(v):
            mask = ((df_regions_rotated['X_EPIC'] < x+w) & (df_regions_rotated['X_EPIC'] > x)
                  & (df_regions_rotated['Y_EPIC'] < y+h) & (df_regions_rotated['Y_EPIC'] > y)
                  & mask_t_bin)

            all_masks.append(mask)
    mask_hot_pixels = np.logical_or.reduce(all_masks)

    filt_hot_pixel  = np.isin(df_regions['detid'], df_regions_rotated[mask_hot_pixels]['detid'])

    return filt_hot_pixel

get_pointing_angle(obsid, tab_xmm_obslist)

Get the pointing angle for a given observation ID, reads from the xmm_obslist table.

Source code in exod/post_processing/hot_regions.py
68
69
70
71
72
73
def get_pointing_angle(obsid, tab_xmm_obslist):
    """
    Get the pointing angle for a given observation ID, reads from the xmm_obslist table.
    """
    angle = tab_xmm_obslist[tab_xmm_obslist['OBS_ID'] == obsid]['PA_PNT'].value[0]
    return angle

plot_detector_coords_soft_and_hard(df_regions_rotated)

Plot the detector coords plots for soft and hard energies.

Source code in exod/post_processing/hot_regions.py
169
170
171
172
173
174
175
176
177
178
179
180
181
def plot_detector_coords_soft_and_hard(df_regions_rotated):
    """Plot the detector coords plots for soft and hard energies."""
    energy_bands = ['_2.0_12.0', '_0.2_2.0']
    for e in energy_bands:
        sub = df_regions_rotated[df_regions_rotated['runid'].str.contains(e)]
        fig, ax = plt.subplots(figsize=(7, 7))
        ax.set_title(f'Energy Range: {e} keV')
        ax.scatter(sub['X_EPIC'], sub['Y_EPIC'], s=1.0, marker='.', color='black', alpha=0.2)
        ax.set_xlim(-17500, 17500)
        ax.set_ylim(-17500, 17500)
        plt.savefig(data_plots / f'spatial_dist{e}.png')
        plt.savefig(data_plots / f'spatial_dist{e}.pdf')
        print(f'Saving to: spatial_dist{e}.png')

plot_hot_regions(df_regions_rotated, hot_regions)

Plot the hot regions and label them.

Source code in exod/post_processing/hot_regions.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def plot_hot_regions(df_regions_rotated, hot_regions):
    """
    Plot the hot regions and label them.
    """
    # Plot the hot regions.
    for k, v in hot_regions.items():
        sub = df_regions_rotated[df_regions_rotated['runid'].str.contains(k)]

        fig, ax = plt.subplots(figsize=(7, 7))
        ax.scatter(sub['X_EPIC'], sub['Y_EPIC'], s=1.0, marker='.', color='black', alpha=0.2)
        for i, (x, y, w, h) in enumerate(v):
            mask = (sub['X_EPIC'] < x + w) & (sub['X_EPIC'] > x) & (sub['Y_EPIC'] < y + h) & (sub['Y_EPIC'] > y)
            sub2 = sub[mask]
            n_reg = len(sub2)
            n_obs = len(sub2['obsid'].unique())

            ax.scatter(sub2['X_EPIC'], sub2['Y_EPIC'], s=2, alpha=1.0, color='red', label=f'{i} : $N_{{reg}}$={n_reg} $N_{{obs}}$={n_obs}')
            offset = 1500
            ax.text(x - offset, y + offset, s=f'{i}', color='black', fontsize=12,
                    bbox=dict(facecolor='white', edgecolor='black', boxstyle='round,pad=0.1'))
        ax.legend(ncol=3, bbox_to_anchor=(1, 0, 0, 1))
        ax.set_xlim(-17500, 17500)
        ax.set_ylim(-17500, 17500)
        plt.savefig(data_plots / f'hot_regions{k}s.png')
        plt.savefig(data_plots / f'hot_regions{k}s.pdf')
        print(f'Saving to: hot_regions{k}s.png')

main

check_results_shape()

Check if the length of the results are consistent.

Source code in exod/post_processing/main.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def check_results_shape():
    """Check if the length of the results are consistent."""
    print('Checking results shape...')
    print('-------------------------')
    shapes = {}
    for name, path in savepaths_combined.items():
        if name == 'lc':
            continue
        if 'fits' in path.name:
            continue
        if 'alerts' in path.name:
            continue
        print(name, path)
        df = pd.read_csv(path)
        shape = df.shape
        shapes[name] = shape
        print(f'{name:<15}: {shape}')

    assert shapes['regions_unique'][0] == shapes['cmatch_simbad'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_gaia'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_om'][0]
    assert shapes['regions_unique'][0] == shapes['cmatch_dr14'][0]
    print('-------------------------\n\n')

make_exod_cat

read_and_print_table_and_header(savepath)

Used to checking the file at the end.

Source code in exod/post_processing/make_exod_cat.py
15
16
17
18
19
20
21
22
def read_and_print_table_and_header(savepath):
    """Used to checking the file at the end."""
    with fits.open(savepath) as hdul:
        hdul.info()
        hdr = hdul[1].header
        print(repr(hdr))
    tab = Table.read(savepath)
    tab.pprint(max_width=-1)

results_manager

create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0)

Create Source Identifiers from coordinates in degrees (IAU name).

Source code in exod/post_processing/results_manager.py
32
33
34
35
36
37
38
39
40
41
def create_iau_srcids(ra_deg, dec_deg, ra_precision=1, dec_precision=0):
    """Create Source Identifiers from coordinates in degrees (IAU name)."""
    sc = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg')
    srcids = []
    for s in sc:
        ra   = s.ra.to_string(unit=u.hour, sep='', precision=ra_precision, pad=True)
        dec  = s.dec.to_string(sep='', precision=dec_precision, alwayssign=True, pad=True)
        name = f"EXOD J{ra}{dec}"
        srcids.append(name)
    return srcids

util

calc_detid_column(df_regions)

Calculate the detection id, defined as {runid}_{label}

Source code in exod/post_processing/util.py
13
14
15
16
def calc_detid_column(df_regions):
    """Calculate the detection id, defined as {runid}_{label}"""
    df_regions['detid'] = df_regions['runid'] + '_' + df_regions['label'].astype(str)
    return df_regions

get_lc(key, df_lc_idx)

label : ('0761070101_0_5_0.2_12.0', '0')

Source code in exod/post_processing/util.py
 4
 5
 6
 7
 8
 9
10
def get_lc(key, df_lc_idx):
    """ label : ('0761070101_0_5_0.2_12.0', '0')"""
    key = str(key)
    start, stop = df_lc_idx.loc[key]
    df_lc = pd.read_hdf(savepaths_combined['lc'], start=start, stop=stop)
    df_lc['t0'] = df_lc['time'] - df_lc['time'].min()
    return df_lc

pre_processing

download_observations

download_observation_events(obsid, clobber=False)

Download the post-processed event lists for PN, M1 and M2.

Source code in exod/pre_processing/download_observations.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def download_observation_events(obsid, clobber=False):
    """
    Download the post-processed event lists for PN, M1 and M2.
    """
    logger.info(f'Downloading observations for obsid={obsid}')
    url_PN = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=PN&level=PPS&name=PIEVLI'
    url_M1 = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=M1&level=PPS&name=MIEVLI'
    url_M2 = f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&instname=M2&level=PPS&name=MIEVLI'
    url_src= f'http://nxsa.esac.esa.int/nxsa-sl/servlet/data-action-aio?obsno={obsid}&name=OBSMLI&level=PPS'

    urls = {'PN' : url_PN,
            'M1' : url_M1,
            'M2' : url_M2,
            'src': url_src}

    globs = {'PN' : '*PN*FTZ',
             'M1' : '*M1*FTZ',
             'M2' : '*M2*FTZ',
             'src': '*EP*OBSMLI*FTZ'}

    for inst, download_url in urls.items():
        # Create the folder to save to
        obs_path = path.data_raw / f'{obsid}'

        os.makedirs(obs_path, exist_ok=True)
        if not clobber and len(list(obs_path.glob(globs[inst]))) > 0: # Dirty check for fits files
            logger.info(f'{globs[inst]} found, skipping...')
            continue

        response = requests.get(download_url)
        logger.info(response)
        if response.status_code == 200:
            # Get the filename from the response header
            cd = response.headers.get('content-disposition')
            filename = cd.split('filename=')[1].strip('";')
            file_path = obs_path / f'{filename}'
            if file_path.is_file() and not clobber:
                logger.info(f'Skipping {file_path}. File already exists.')
            else:
                logger.info(f'Response 200, downloading to {file_path}')
                with open(file_path, 'wb') as file:
                    file.write(response.content)
                logger.info(f'Downloaded: {file_path}')

                # Deal with GUEST.tar files (these show up if you have multiple eventlists in an observation)
                if 'GUEST' in filename:
                    logger.info(f'GUEST tar file found! Extracting to current dir!')
                    cmd = f'tar -xvf {file_path} -C {obs_path} --strip-components=2'
                    logger.info(f'Executing: {cmd}')
                    subprocess.run(cmd, shell=True)
        else:
            logger.warning(f'Failed to download event files for: {obsid} {inst}')

event_filtering

Functions to filter events files and create images from the raw data.

Requires having pre-run the sas task 'setsas' and the enviroment variable 'export CCFPATH=' in the terminal.

espfilt(eventfile)

https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/espfilt.html https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/node9.html

Source code in exod/pre_processing/event_filtering.py
160
161
162
163
164
165
166
def espfilt(eventfile):
    """
    https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/espfilt.html
    https://xmm-tools.cosmos.esa.int/external/sas/current/doc/espfilt/node9.html
    """
    cmd = f'espfilt eventfile={eventfile}'
    run_cmd(cmd)

processing

background_inference

calc_background_template(image_sub, image_mask_source)

Calculate the background template for a given image_sub.

This is done by the following
  1. Remove the sources from the image.
  2. Calculate the number of counts in the background.
  3. Inpaint the holes where the sources were.
  4. Divide the image by the total counts in the background.

We then blur the image and inpaint again, this is to deal with issues where if large sources were removed we can fill them in, we should probably only do this if we need to.

Parameters:

Name Type Description Default
image_sub ndarray

Summed image of BTI or GTI frames.

required
image_mask_source ndarray

Mask of the sources.

required

Returns:

Name Type Description
image_sub_background_template ndarray

The background template for the image subset.

count_sub_outside_sources float

The number of counts outside the sources (total counts in background)

Source code in exod/processing/background_inference.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def calc_background_template(image_sub, image_mask_source):
    """
    Calculate the background template for a given image_sub.

    This is done by the following:
        1. Remove the sources from the image.
        2. Calculate the number of counts in the background.
        3. Inpaint the holes where the sources were.
        4. Divide the image by the total counts in the background.

    We then blur the image and inpaint again, this is to deal with issues where if large
    sources were removed we can fill them in, we should probably only do this if we need to.

    Parameters:
        image_sub (np.ndarray): Summed image of BTI or GTI frames.
        image_mask_source (np.ndarray): Mask of the sources.

    Returns:
        image_sub_background_template (np.ndarray): The background template for the image subset.
        count_sub_outside_sources (float): The number of counts outside the sources (total counts in background)
    """
    sigma_blurring = 0.5
    inpaint_method = INPAINT_NS # INPAINT_TELEA

    image_sub_no_source                       = inpaint(image_sub.astype(np.float32), image_mask_source.astype(np.uint8), inpaintRadius=2, flags=inpaint_method)
    count_sub_outside_sources                 = np.nansum(image_sub[~image_mask_source])
    image_sub_no_source_template              = image_sub_no_source / count_sub_outside_sources
    image_sub_no_source_template_blur         = convolve(image_sub_no_source_template, Gaussian2DKernel(sigma_blurring))
    image_mask_missing_pixels                 = (image_sub > 0) & np.isnan(image_sub_no_source_template_blur)
    image_sub_no_source_template_blur_inpaint = inpaint(image_sub_no_source_template_blur.astype(np.float32), image_mask_missing_pixels.astype(np.uint8), inpaintRadius=2, flags=inpaint_method)
    image_sub_background_template             = np.where(image_sub > 0, image_sub_no_source_template_blur_inpaint, np.nan)

    compare_images(images=[image_sub, image_sub_no_source],
                   titles=['image_sub', 'image_sub_no_source'],
                   log=False)

    compare_images(images=[image_sub_no_source_template,
                           image_sub_no_source_template_blur,
                           image_sub_no_source_template_blur_inpaint,
                           image_sub_background_template],
                   titles=['image_sub_no_source_template',
                           'image_sub_no_source_template_blur',
                           'image_sub_no_source_template_blur_inpaint',
                           'image_sub_background_template'],
                   log=False)
    return image_sub_background_template, count_sub_outside_sources

calc_cube_mu(data_cube, wcs)

Calculates an expectation (mu) data cube.

Any departure from this cube corresponds to variability.

The background is dealt with by assuming that all GTIs and BTIs follow respective templates (i.e., once each frame is divided by its total counts, they all look the same).

The sources are dealt with assuming they are constant. We take their net emission, and distribute it evenly across all frames.

Parameters:

Name Type Description Default
data_cube DataCube

DataCube object.

required
wcs WCS

wcs object.

required

Returns: cube_mu (np.ndarray): The expectation (mu) data cube.

Source code in exod/processing/background_inference.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
def calc_cube_mu(data_cube, wcs):
    """
    Calculates an expectation (mu) data cube.

    Any departure from this cube corresponds to variability.

    The background is dealt with by assuming that all GTIs and BTIs follow
    respective templates (i.e., once each frame is divided by its total counts,
    they all look the same).

    The sources are dealt with assuming they are constant.
    We take their net emission, and distribute it evenly across all frames.

    Parameters:
        data_cube (DataCube): DataCube object.
        wcs (astropy.wcs.WCS): wcs object.
    Returns:
        cube_mu (np.ndarray): The expectation (mu) data cube.
    """
    cube_n = data_cube.data
    bti_bin_idx = data_cube.bti_bin_idx
    gti_bin_idx = data_cube.gti_bin_idx
    n_gti_bin = data_cube.n_gti_bin
    n_bti_bin = data_cube.n_bti_bin
    cube_gti = cube_n[:, :, gti_bin_idx]
    cube_bti = cube_n[:, :, bti_bin_idx]

    # Get the summed Images.
    image_total = np.nansum(cube_n, axis=2)
    image_gti = np.nansum(cube_gti, axis=2)

    # Create the source masks.
    # Two source masks are calculated then combined, the first comes from the pipeline detected
    # sources in the OBSMLI file. The second mask takes the remaining image and masks pixels that are
    # above 3 x 75% of the total image. These two masks are then combined using an OR statement.
    # This works well in most cases but often struggles when there are extended sources.
    image_mask_source_list = mask_known_sources(data_cube, wcs=wcs)  # Image mask from OBSMLI file.
    # source_threshold = np.nanpercentile(image_gti.flatten(), 99) # This or from detected sources
    source_threshold = 3 * np.nanpercentile(image_gti[~image_mask_source_list], q=75)
    image_mask_source_percentile = image_gti > source_threshold
    image_mask_source = image_mask_source_list | image_mask_source_percentile

    if n_gti_bin:
        logger.info('Calculating gti template...')
        image_gti = np.nansum(cube_gti, axis=2)
        image_gti_background_template, count_gti_outside_sources = calc_background_template(image_sub=image_gti, image_mask_source=image_mask_source)

    if n_bti_bin:
        logger.info('Calculating bti template...')
        image_bti = np.nansum(cube_bti, axis=2) # Image of all btis combined, with sources
        image_bti_background_template, count_bti_outside_sources = calc_background_template(image_sub=image_bti, image_mask_source=image_mask_source)

    # image_bti_no_source_template_blur=image_bti_no_source_template

    # Obtain the Image with the mean source contribution (zero everywhere that is not a source)
    if n_gti_bin > n_bti_bin:
        logger.info('Calculating source contribution using gtis')
        image_source_template = calc_source_template(image_gti, image_gti_background_template, image_mask_source,
                                                     data_cube, count_gti_outside_sources, gti_bin_idx)
    else:
        logger.info('Calculating source contribution using btis')
        image_source_template = calc_source_template(image_bti, image_bti_background_template, image_mask_source,
                                                     data_cube, count_bti_outside_sources, bti_bin_idx)

    # Get data cube outside the sources to obtain the background light curve.
    cube_mask_source       = np.repeat(image_mask_source[:, :, np.newaxis], cube_n.shape[2], axis=2)
    cube_n_outside_sources = np.where(cube_mask_source, np.nan, cube_n)
    lc_outside_sources     = np.nansum(cube_n_outside_sources, axis=(0, 1)) # Essentially the background light curve.


    logger.info('Creating expected (mu) cube...')
    # Create the expectation cube by combining the source and background templates.
    cube_mu = np.empty(cube_n.shape)
    if n_gti_bin:
        cube_mu[:,:,gti_bin_idx] = image_gti_background_template[:,:,np.newaxis] * lc_outside_sources[gti_bin_idx]
    if n_bti_bin:
        cube_mu[:,:,bti_bin_idx] = image_bti_background_template[:,:,np.newaxis] * lc_outside_sources[bti_bin_idx]
    cube_mu += image_source_template[:,:,np.newaxis] * data_cube.relative_frame_exposures
    cube_mu = np.where(np.nansum(cube_n, axis=(0,1)) > 0, cube_mu, np.nan)
    logger.info('Expected cube created!')
    return cube_mu

calc_source_template(image_sub, image_sub_background_template, image_mask_source, data_cube, count_sub_outside_sources, subset_bin_idx)

Calculate the average contribution from the sources in the field of the observation.

This is done by masking out the background and dividing by the effective exposed frames. This essentially gives the image of the average contribution of the sources in each frame.

Parameters:

Name Type Description Default
image_sub ndarray

image of GTI or BTIs obtained by summing the frames.

required
image_sub_background_template ndarray

The template for the BTI or GTI obtained via calc_background_template.

required
image_mask_source ndarray

Mask of the sources.

required
data_cube DataCube

DataCube object.

required
count_sub_outside_sources float

The number of counts outside the sources (total counts in background).

required
subset_bin_idx ndarray

The bins corresponding to the subset (either the GTI or BTIs).

required

Returns:

Name Type Description
image_source_template ndarray

The average contribution of the sources in each frame.

Source code in exod/processing/background_inference.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def calc_source_template(image_sub, image_sub_background_template, image_mask_source, data_cube,
                         count_sub_outside_sources, subset_bin_idx):
    """
    Calculate the average contribution from the sources in the field of the observation.

    This is done by masking out the background and dividing by the effective exposed frames.
    This essentially gives the image of the average contribution of the sources in each frame.

    Parameters:
        image_sub (np.ndarray): image of GTI or BTIs obtained by summing the frames.
        image_sub_background_template (np.ndarray): The template for the BTI or GTI obtained via calc_background_template.
        image_mask_source (np.ndarray): Mask of the sources.
        data_cube (DataCube): DataCube object.
        count_sub_outside_sources (float): The number of counts outside the sources (total counts in background).
        subset_bin_idx (np.ndarray): The bins corresponding to the subset (either the GTI or BTIs).

    Returns:
        image_source_template (np.ndarray): The average contribution of the sources in each frame.
    """
    image_sub_source_only1   = image_sub - image_sub_background_template * count_sub_outside_sources
    image_sub_source_only2   = np.where(image_mask_source, image_sub_source_only1, 0)           # Replace everything that is not a source with 0
    image_sub_source_only3   = np.where(image_sub_source_only2 > 0, image_sub_source_only2, 0)  # Replace negative values with 0
    effective_exposed_frames = np.sum(data_cube.relative_frame_exposures[subset_bin_idx])       # Analagous to n_gti_bin if relative exposures = 1
    image_source_template    = image_sub_source_only3 / effective_exposed_frames                # Average value of the source contribution per frame.

    compare_images(images=[image_sub_source_only1, image_sub_source_only2, image_sub_source_only3, image_source_template],
                   titles=['image_sub_source_only1', 'image_sub_source_only2', 'image_sub_source_only3', 'image_source_template'],
                   log=False)
    return image_source_template

mask_known_sources(data_cube, wcs)

Mask known sources from the data cube.

Parameters:

Name Type Description Default
data_cube DataCube

DataCube object.

required
wcs WCS

wcs object.

required

Returns:

Name Type Description
image_mask ndarray

The mask of the sources.

Source code in exod/processing/background_inference.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def mask_known_sources(data_cube, wcs):
    """
    Mask known sources from the data cube.

    Parameters:
        data_cube (DataCube): DataCube object.
        wcs (astropy.wcs.WCS): wcs object.

    Returns:
        image_mask (np.ndarray): The mask of the sources.
    """
    def cropping_radius_counts(epic_total_rate, size_arcsec):
        """
        Get cropping radius for a given EPIC count rate.
        0.0 < RATE < 0.1 = 20
        0.1 < RATE < 1.0 = 40
        RATE > 1.0 = 80
        """
        if epic_total_rate > 1:
            return 80
        elif epic_total_rate > 0.1:
            return 40
        else:
            return size_arcsec

    observation = Observation(data_cube.event_list.obsid)
    observation.get_source_list()
    obsmli_file_path = observation.source_list
    logger.info(f'OBSMLI file: {obsmli_file_path}')
    tab_src = Table(fits.open(obsmli_file_path)[1].data)

    # We include bright (i.e. large DET_ML) point sources & extended sources
    tab_src_point    = tab_src[(tab_src['EP_DET_ML'] > 8) & (tab_src['EP_EXTENT'] == 0)]
    tab_src_extended = tab_src[(tab_src['EP_DET_ML'] > 8) & (tab_src['EP_EXTENT'] > 0)]

    radius_point_sources    = [cropping_radius_counts(counts, data_cube.size_arcsec) for counts in tab_src_point['EP_TOT']]
    radius_extended_sources = tab_src_extended['EP_EXTENT']

    logger.info(f'Total rows: {len(tab_src)} Point sources: {len(tab_src_point)} Extended Sources: {len(tab_src_extended)}')

    image_mask = np.full(data_cube.shape[:2], fill_value=False)

    for tab_data, tab_radius, color in zip((tab_src_point, tab_src_extended),
                                           (radius_point_sources, radius_extended_sources),
                                           ('r', 'y')):
        if len(tab_data) == 0:
            logger.info('No rows in tab_data, skipping...')
            continue

        # Create skycoord
        sc = SkyCoord(ra=tab_data['RA'], dec=tab_data['DEC'], unit='deg')
        x_img, y_img = wcs.world_to_pixel(sc)

        # Convert to Sky Coordinates
        X = x_img * 80
        Y = y_img * 80

        # Remove values outside the cube
        xcube_max, xcube_min = data_cube.bin_x[-1], data_cube.bin_x[0]
        ycube_max, ycube_min = data_cube.bin_y[-1], data_cube.bin_y[0]
        XY = np.array([[x, y] for x, y in zip(X, Y) if ((x < xcube_max) and
                                                        (y < ycube_max) and
                                                        (x > xcube_min) and
                                                        (y > ycube_min))]).T
        X = XY[0]
        Y = XY[1]

        # Interpolate to Cube coordinates
        interp_x_cube = interp1d(x=data_cube.bin_x, y=range(data_cube.shape[0]))
        interp_y_cube = interp1d(x=data_cube.bin_y, y=range(data_cube.shape[1]))
        x_cube = interp_x_cube(X)
        y_cube = interp_y_cube(Y)

        for x, y, rad in zip(x_cube, y_cube, tab_radius):
            radius_image = rad / data_cube.size_arcsec
            rr, cc = disk(center=(x, y), radius=radius_image)
            kept_pixels = (rr>0) & (rr<data_cube.shape[0]-1) & (cc>0) & (cc<data_cube.shape[1]-1) # Keep mask pixels only if in image
            rr, cc = rr[kept_pixels], cc[kept_pixels]
            image_mask[rr, cc] = True

    #         im[rr,cc] = np.nan
    #         circle = plt.Circle((x, y), radius_image, color=color, fill=False)
    #         plt.gca().add_patch(circle)

    # plt.imshow(im.T, origin='lower', norm=LogNorm(vmax=1e3), interpolation='none')
    # plt.show()
    # for x, y in zip(x_cube, y_cube):
    #     im[x - 1:x + 1, y - 1:y + 1] = 0

    # plt.figure(figsize=(10, 10))
    # plt.imshow(im.T, origin='lower', norm=LogNorm(), interpolation='none')
    return image_mask

bayesian_computations

PrecomputeBayesLimits

Source code in exod/processing/bayesian_computations.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
class PrecomputeBayesLimits:
    _instances = {}

    def __new__(cls, threshold_sigma):
        """
        Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

        This method implements a multiton pattern, where only one instance of
        PrecomputeBayesLimits exists per unique value of `threshold_sigma`.

        When a new threshold value is requested, a new instance is created and stored in
        the `_instances` dictionary, keyed by `threshold_sigma`. If an instance with the same `threshold_sigma`
        already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

        Args:
            threshold_sigma (int): The significance threshold for the instance.

        Returns:
            PrecomputeBayesLimits: The existing or newly created instance with  the specified `threshold_sigma`.
        """
        if threshold_sigma not in cls._instances:
            instance = super(PrecomputeBayesLimits, cls).__new__(cls)
            instance.threshold_sigma = threshold_sigma
            logger.warning(f'Creating new PrecomputeBayesLimits() instance threshold_sigma={threshold_sigma}.')
            cls._instances[threshold_sigma] = instance
        return cls._instances[threshold_sigma]

    def __init__(self, threshold_sigma):
        self.threshold_sigma = threshold_sigma
        self.get_savepath()
        self.range_mu = None
        self.n_peak_threshold = None
        self.n_eclipse_threshold = None
        self.is_loaded = False
        self.load()

    def __repr__(self):
        return f'PrecomputeBayesLimits(threshold_sigma={self.threshold_sigma})'

    def get_savepath(self):
        self.savepath = path.data_util / f'bayesfactorlimits_{self.threshold_sigma}.txt'
        if not self.savepath.exists():
            logger.info(f'{self.savepath} does not exist. Precomputing Bayes Factors...')
            precompute_bayes_limits(threshold_sigma=self.threshold_sigma)
        return self.savepath

    def load(self):
        if self.is_loaded:
            return None
        range_mu, n_peak_threshold, n_eclipse_threshold = load_precomputed_bayes_limits(threshold_sigma=self.threshold_sigma)
        self.range_mu = range_mu
        self.n_peak_threshold = n_peak_threshold
        self.n_eclipse_threshold = n_eclipse_threshold
        self.is_loaded = True

    def get_cube_masks_peak_and_eclipse(self, cube_n, cube_mu):
        cube_mu = np.where(cube_mu > self.range_mu[0], cube_mu, np.nan)  # Remove small expectation values outside of interpolation range
        cube_mask_peaks   = cube_n > self.n_peak_threshold(cube_mu)
        cube_mask_eclipse = cube_n < self.n_eclipse_threshold(cube_mu)
        return cube_mask_peaks, cube_mask_eclipse
__new__(threshold_sigma)

Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

This method implements a multiton pattern, where only one instance of PrecomputeBayesLimits exists per unique value of threshold_sigma.

When a new threshold value is requested, a new instance is created and stored in the _instances dictionary, keyed by threshold_sigma. If an instance with the same threshold_sigma already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

Parameters:

Name Type Description Default
threshold_sigma int

The significance threshold for the instance.

required

Returns:

Name Type Description
PrecomputeBayesLimits

The existing or newly created instance with the specified threshold_sigma.

Source code in exod/processing/bayesian_computations.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def __new__(cls, threshold_sigma):
    """
    Creates or retrieves an instance of PrecomputeBayesLimits with a specific threshold.

    This method implements a multiton pattern, where only one instance of
    PrecomputeBayesLimits exists per unique value of `threshold_sigma`.

    When a new threshold value is requested, a new instance is created and stored in
    the `_instances` dictionary, keyed by `threshold_sigma`. If an instance with the same `threshold_sigma`
    already exists, it is returned instead of creating a new one. https://en.wikipedia.org/wiki/Multiton_pattern

    Args:
        threshold_sigma (int): The significance threshold for the instance.

    Returns:
        PrecomputeBayesLimits: The existing or newly created instance with  the specified `threshold_sigma`.
    """
    if threshold_sigma not in cls._instances:
        instance = super(PrecomputeBayesLimits, cls).__new__(cls)
        instance.threshold_sigma = threshold_sigma
        logger.warning(f'Creating new PrecomputeBayesLimits() instance threshold_sigma={threshold_sigma}.')
        cls._instances[threshold_sigma] = instance
    return cls._instances[threshold_sigma]

B_eclipse(n, mu)

Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts

Source code in exod/processing/bayesian_computations.py
18
19
20
def B_eclipse(n, mu):
    """Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts"""
    return gammainc(n + 1, mu) / poisson.pmf(n, mu)

B_eclipse_log(n, mu)

Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts

Source code in exod/processing/bayesian_computations.py
28
29
30
def B_eclipse_log(n, mu):
    """Computes the Bayes factors of the presence of an eclipse, given the data_cube (mu) and observed (n) counts"""
    return np.log10(gammainc(n + 1, mu)) - np.log10(poisson.pmf(n, mu))

B_peak(n, mu)

Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts.

Source code in exod/processing/bayesian_computations.py
13
14
15
def B_peak(n, mu):
    """Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts."""
    return gammaincc(n + 1, mu) / poisson.pmf(n, mu)

B_peak_log(n, mu)

Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts.

Source code in exod/processing/bayesian_computations.py
23
24
25
def B_peak_log(n, mu):
    """Computes the Bayes factors of the presence of a peak, given the data_cube (mu) and observed (n) counts."""
    return np.log10(gammaincc(n + 1, mu)) - np.log10(poisson.pmf(n, mu))

get_bayes_thresholds(threshold_sigma)

The thresholds for B_peak and B_eclipse are calculated here for 3 and 5 sigma.

This is sort of a hack, and is done by finding the value of B(n,mu) that is equal to a given significance (sigma) under the Gaussian assumption at mu=1000.

For example, we want a significance level of sigma = 3. We first find what observed (n) value we need to get a 3 sigma peak Gaussian assumption. n_peak_large_mu(mu=1000, sigma=3) = 1139

Next, we find the value of B_peak that an observed (n) value of 1139 would give us. B_peak(n=1139, mu=1000) = 872908 (5.94 in log10)

We treat this value of B as being "Equivalent" to a 3 sigma detection and subsequently can specify: B > B_threshold_peak for a peak detection B < B_eclipse_threshold for an eclipse detection

Source code in exod/processing/bayesian_computations.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def get_bayes_thresholds(threshold_sigma):
    """
    The thresholds for B_peak and B_eclipse are calculated here for 3 and 5 sigma.

    This is sort of a hack, and is done by finding the value of B(n,mu) that is equal to
    a given significance (sigma) under the Gaussian assumption at mu=1000.

    For example, we want a significance level of sigma = 3.
    We first find what observed (n) value we need to get a 3 sigma peak Gaussian assumption.
        n_peak_large_mu(mu=1000, sigma=3) = 1139

    Next, we find the value of B_peak that an observed (n) value of 1139 would give us.
        B_peak(n=1139, mu=1000) = 872908  (5.94 in log10)

    We treat this value of B as being "Equivalent" to a 3 sigma detection and subsequently can specify:
        B > B_threshold_peak for a peak detection
        B < B_eclipse_threshold for an eclipse detection
    """
    B_peak_threshold    = B_peak_log(n=n_peak_large_mu(mu=1000, sigma=threshold_sigma), mu=1000)
    B_eclipse_threshold = B_eclipse_log(n=n_eclipse_large_mu(mu=1000, sigma=threshold_sigma), mu=1000)
    return B_peak_threshold, B_eclipse_threshold

get_cube_masks_peak_and_eclipse(cube_n, cube_mu, threshold_sigma)

Returns two cubes with booleans where the rate correspond to a peak or an eclipse.

Source code in exod/processing/bayesian_computations.py
108
109
110
111
112
113
114
def get_cube_masks_peak_and_eclipse(cube_n, cube_mu, threshold_sigma):
    """Returns two cubes with booleans where the rate correspond to a peak or an eclipse."""
    range_mu, n_peak_threshold, n_eclipse_threshold = load_precomputed_bayes_limits(threshold_sigma=threshold_sigma)
    cube_mu = np.where(cube_mu > range_mu[0], cube_mu, np.nan) # Remove small expectation values outside of interpolation range
    cube_mask_peaks   = cube_n > n_peak_threshold(cube_mu)
    cube_mask_eclipse = cube_n < n_eclipse_threshold(cube_mu)
    return cube_mask_peaks, cube_mask_eclipse

load_precomputed_bayes1000()

Loads & interpolates the precomputed values of Bayes factors at mu=1000

Source code in exod/processing/bayesian_computations.py
127
128
129
130
131
132
133
def load_precomputed_bayes1000():
    """Loads & interpolates the precomputed values of Bayes factors at mu=1000"""
    data              = np.loadtxt(path.data_util / f'bayesfactor_mu1000.txt')
    range_N           = data[0]
    B_values_peaks    = interp1d(range_N, data[1])
    B_values_eclipses = interp1d(range_N, data[2])
    return range_N, B_values_peaks, B_values_eclipses

load_precomputed_bayes_limits(threshold_sigma)

Loads the precomputed Bayes factor limit numbers, for a chosen threshold.

Source code in exod/processing/bayesian_computations.py
136
137
138
139
140
141
142
def load_precomputed_bayes_limits(threshold_sigma):
    """Loads the precomputed Bayes factor limit numbers, for a chosen threshold."""
    data = np.loadtxt(path.data_util / f'bayesfactorlimits_{threshold_sigma}.txt')
    range_mu            = data[0]
    n_peak_threshold    = interp1d(range_mu, data[1])
    n_eclipse_threshold = interp1d(range_mu, data[2])
    return range_mu, n_peak_threshold, n_eclipse_threshold

n_eclipse_large_mu(mu, sigma)

Calculate the observed (n) value required for an eclipse at a specific expecation (mu) and significance (sigma) in the Gaussian regime.

Source code in exod/processing/bayesian_computations.py
38
39
40
def n_eclipse_large_mu(mu, sigma):
    """Calculate the observed (n) value required for an eclipse at a specific expecation (mu) and significance (sigma) in the Gaussian regime."""
    return np.floor((2*mu+sigma**2-np.sqrt(8*mu*(sigma**2)+sigma**4))/2)

n_peak_large_mu(mu, sigma)

Calculate the observed (n) value required for a peak at a specific expectation (mu) and significance (sigma) in the Gaussian regime.

Source code in exod/processing/bayesian_computations.py
33
34
35
def n_peak_large_mu(mu, sigma):
    """Calculate the observed (n) value required for a peak at a specific expectation (mu) and significance (sigma) in the Gaussian regime."""
    return np.ceil((2*mu+sigma**2+np.sqrt(8*mu*(sigma**2)+sigma**4))/2)

precompute_bayes_1000()

Precomputes the Bayes factor at mu=1000 for a bunch of values of N. Will be interpolated to estimate the sigma

Source code in exod/processing/bayesian_computations.py
116
117
118
119
120
121
122
123
124
def precompute_bayes_1000():
    """Precomputes the Bayes factor at mu=1000 for a bunch of values of N. Will be interpolated to estimate the sigma"""
    range_N    = np.arange(10000)
    B_peaks    = B_peak_log(n=range_N, mu=1000)
    B_eclipses = B_eclipse_log(n=range_N, mu=1000)
    data = np.array([range_N, B_peaks, B_eclipses])
    savepath = path.data_util / f'bayesfactor_mu1000.txt'
    logger.info(f'Saving to {savepath}')
    np.savetxt(savepath, data)

precompute_bayes_limits(threshold_sigma)

Compute the minimum and maximum number of observed counts (n) required for an eclipse or peak for a given confidence threshold (threshold_sigma) and expectation (mu).

For counts > 1000, we use a Gaussian approximation. sigma = (N-mu) / (N+mu)^0.5

Solving for N gives: N = 2mu + sigma^2 + (8 mu sigma^2 + sigma^4)^0.5 N = 2mu + sigma^2 - (8 mu sigma^2 + sigma^4)^0.5

The resulting table looks like

i mu n_peak n_eclipse

46000 158.556483 219.0 106.0 46001 158.629519 219.0 107.0 46002 158.702589 219.0 107.0 46003 158.775693 219.0 107.0

Each data cell in the observed and data_cube cube is then compared to the values pre-calculated here to determine if is it a peak or an eclipse.

An example for 1 frame is given here.

(n_cube) (mu_cube) is_3_sig_peak? 0 2 1 0.03 1.98 0.87 F F F 0 5 0 1.01 1.10 1.00 F T F 3 2 1 2.30 2.14 0.98 F F F

The evaluation of each data cell of the cube against a threshold is made faster by precomputing the counts here.

Source code in exod/processing/bayesian_computations.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def precompute_bayes_limits(threshold_sigma):
    """
    Compute the minimum and maximum number of observed
    counts (n) required for an eclipse or peak for a given
    confidence threshold (threshold_sigma) and expectation (mu).

    For counts > 1000, we use a Gaussian approximation.
    sigma = (N-mu) / (N+mu)^0.5

    Solving for N gives:
    N = 2mu + sigma^2 + (8 mu sigma^2 + sigma^4)^0.5
    N = 2mu + sigma^2 - (8 mu sigma^2 + sigma^4)^0.5

    The resulting table looks like:
        i          mu  n_peak  n_eclipse
    46000  158.556483   219.0      106.0
    46001  158.629519   219.0      107.0
    46002  158.702589   219.0      107.0
    46003  158.775693   219.0      107.0


    Each data cell in the observed and data_cube cube
    is then compared to the values pre-calculated here
    to determine if is it a peak or an eclipse.

    An example for 1 frame is given here.

     (n_cube)    (mu_cube)       is_3_sig_peak?
      0 2 1    0.03 1.98 0.87       F F F
      0 5 0    1.01 1.10 1.00       F T F
      3 2 1    2.30 2.14 0.98       F F F

    The evaluation of each data cell of the cube against
    a threshold is made faster by precomputing the counts here.
    """
    B_peak_threshold, B_eclipse_threshold = get_bayes_thresholds(threshold_sigma=threshold_sigma)

    range_mu       = np.geomspace(start=1e-7, stop=1e3, num=50000)
    range_mu_large = np.geomspace(start=1e3, stop=1e6, num=1000)   # Above 1000

    tab_npeak, tab_neclipse = [], []
    for mu in tqdm(range_mu):
        # Find the smallest observed (n) value for a peak
        range_n_peak = np.arange(max(10 * int(mu), 100))
        B_factors = B_peak_log(n=range_n_peak, mu=mu)
        n_peak_min = range_n_peak[B_factors > B_peak_threshold][0]
        tab_npeak.append(n_peak_min)

        # Get the largest observed (n) value for an eclipse
        range_n_eclipse = np.arange(2 * int(mu) + 1)
        B_factors = B_eclipse_log(n=range_n_eclipse, mu=mu)
        n_eclipse_max = range_n_eclipse[B_factors < B_eclipse_threshold][0]
        tab_neclipse.append(n_eclipse_max)

    tab_npeak    += list(n_peak_large_mu(range_mu_large, threshold_sigma))
    tab_neclipse += list(n_eclipse_large_mu(range_mu_large, threshold_sigma))

    range_mu = np.concatenate((range_mu, range_mu_large))

    data = np.array([range_mu, tab_npeak, tab_neclipse])
    savepath = path.data_util / f'bayesfactorlimits_{threshold_sigma}.txt'
    logger.info(f'Saving to {savepath}')
    np.savetxt(savepath, data)

sigma_equivalent(n, mu)

Find the equivalent sigma for a given observed (n) and expectation (mu).

For large counts (mu=1000), the required n to obtain a given sigma can be calculated assuming Gaussian statistics. This can be done using the functions: n_peak_large_mu(mu, sigma) and n_eclipse_large_mu(mu, sigma)

For small counts, we cannot use these functions, as we are in the Poisson regime, but we can calculate the Bayes factors for peaks and eclipses, using B_peak_log(n, mu) and B_eclipse_log(n, mu).

We can however for a given value of B, find out what the equivalent value of sigma would be. To do this we need to find the sigma that gives: B_peak(n, mu=1000) = B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000)

This is done by finding the root of the function

B_peak(n, mu=1000) - B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000) = 0

Source code in exod/processing/bayesian_computations.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def sigma_equivalent(n, mu):
    """
    Find the equivalent sigma for a given observed (n) and expectation (mu).

    For large counts (mu=1000), the required n to obtain a given sigma can be calculated assuming Gaussian statistics.
    This can be done using the functions: n_peak_large_mu(mu, sigma) and n_eclipse_large_mu(mu, sigma)

    For small counts, we cannot use these functions, as we are in the Poisson regime, but we can calculate the
    Bayes factors for peaks and eclipses, using B_peak_log(n, mu) and B_eclipse_log(n, mu).

    We can however for a given value of B, find out what the equivalent value of sigma would be.
    To do this we need to find the sigma that gives:
        B_peak(n, mu=1000) = B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000)

    This is done by finding the root of the function:
        B_peak(n, mu=1000) - B_peak(n=n_peak_large_mu(mu=1000, sigma), mu=1000) = 0
    """
    if n > mu:  # Peak
        b = B_peak_log(n, mu)
        function_to_invert = lambda sigma: b - B_peak_log(n_peak_large_mu(mu=1000, sigma=sigma), mu=1000)
    else:  # Eclipse
        b = B_eclipse_log(n, mu)
        function_to_invert = lambda sigma: b - B_eclipse_log(n_eclipse_large_mu(mu=1000, sigma=sigma), mu=1000)

    if function_to_invert(10) > 0:
        return 10
    elif function_to_invert(1) < 0:
        return 0
    else:
        return root_scalar(function_to_invert, bracket=(1, 10)).root

sigma_equivalent_B_eclipse(B_eclipse)

Find the equivalent sigma for a given Bayes factor for an eclipse. B_eclipse must be in log!

1.6 < B_eclipse < 42

0.00 < sigma < 9.94

Source code in exod/processing/bayesian_computations.py
215
216
217
218
219
220
221
222
223
224
225
226
227
def sigma_equivalent_B_eclipse(B_eclipse):
    """
    Find the equivalent sigma for a given Bayes factor for an eclipse. B_eclipse must be in log!

    Range: 1.6  < B_eclipse < 42
           0.00 < sigma     < 9.94
    """
    if B_eclipse < 1.6:
        return 0
    elif B_eclipse > 42:
        return 10
    f = lambda sigma: B_eclipse - B_eclipse_log(n_eclipse_large_mu(mu=1000, sigma=sigma), mu=1000)
    return root_scalar(f, bracket=(0, 10)).root

sigma_equivalent_B_peak(B_peak)

Find the equivalent sigma for a given Bayes factor for a peak. B_Peak must be in log!

1.61 < B_peak < 48.976

0.00 < sigma < 9.98

Source code in exod/processing/bayesian_computations.py
200
201
202
203
204
205
206
207
208
209
210
211
212
def sigma_equivalent_B_peak(B_peak):
    """
    Find the equivalent sigma for a given Bayes factor for a peak. B_Peak must be in log!

    Range: 1.61 < B_peak < 48.976
           0.00 < sigma  < 9.98
    """
    if B_peak< 1.61:
        return 0
    elif B_peak > 48.976:
        return 10
    f = lambda sigma: B_peak - B_peak_log(n_peak_large_mu(mu=1000, sigma=sigma), mu=1000)
    return root_scalar(f, bracket=(0, 10)).root

bayesian_tests

This module contains functions to test the Bayesian computations in the exod package.

check_estimate_success()

If I remember correctly, it is to assess if the count-rate estimation worked, independently of the value of the quiescent state mu. So, for a range of mu, I add a peak of three different amplitudes, I generate poisson realisations of mu+peak (the crosses), and see the value we can estimate for the peak (the envelopes). You have the residuals at the bottom It might be an earlier version, and not work anymore. It was just to check the success of the rate estimate function

Source code in exod/processing/bayesian_tests.py
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def check_estimate_success():
    """
    If I remember correctly, it is to assess if the count-rate estimation
    worked, independently of the value of the quiescent state mu. So, for a
    range of mu, I add a peak of three different amplitudes, I generate poisson
    realisations of mu+peak (the crosses), and see the value we can estimate
    for the peak (the envelopes). You have the residuals at the bottom It might
    be an earlier version, and not work anymore. It was just to check the
    success of the rate estimate function
    """
    fig, ax = plt.subplots(2,1, sharex=True, gridspec_kw={'height_ratios' : [3, 1]})
    colors = cmr.take_cmap_colors(cmap='cmr.ocean', N=3, cmap_range=(0, 0.7))
    all_res = []

    for peak, color in zip((0.5, 2.5, 10), colors):
        tab_up, tab_mid, tab_low = [], [], []
        tab_N = []

        tab_mu = np.geomspace(start=1e-3, stop=1e2, num=100)
        for mu in tab_mu:
            N = np.random.poisson(mu+peak)
            tab_N.append(N)
            for fraction, tab in zip((0.01, 0.5, 0.99), (tab_low, tab_mid, tab_up)):
                peak_est = peak_count_estimate(fraction, N, mu)
                tab.append(peak_est)

                res = {}
                res['peak'] = peak
                res['mu'] = mu
                res['N'] = N
                res['fraction'] = fraction
                res['peak_count_estimate'] = peak_est
                all_res.append(res)

        tab_mid = np.array(tab_mid)
        tab_low = np.array(tab_low)
        tab_up  = np.array(tab_up)
        residual = np.where(tab_mid > peak,
                            (tab_mid - peak) / (tab_mid - tab_low),
                            (peak - tab_mid) / (tab_up - tab_mid))

        ax[0].fill_between(tab_mu, tab_low, tab_up, alpha=0.4, facecolor=color)
        ax[0].scatter(tab_mu, tab_N, color=color, s=10, marker='x', label=f'Peak amplitude={peak}')
        ax[0].plot(tab_mu, tab_mid, color=color)
        ax[0].axhline(y=peak, color=color, ls='--')

        ax[1].errorbar(tab_mu, residual, yerr=1, fmt='.', color=color, capsize=1.0, lw=1.0)
        ax[1].axhline(0, ls='--', color='black', lw=1.0)

        ax[0].set_xlim(0)
        ax[0].set_ylim(0)
        ax[0].set_xscale('log')
        ax[0].set_ylabel('Observed (n)')
        ax[1].set_xscale("log")
        ax[1].set_xlabel(r'Expectation ($\mu$)')
        ax[1].set_ylabel('Residual')

    plt.subplots_adjust(hspace=0)
    ax[0].legend(loc='upper left')
    plt.savefig(data_plots / 'check_estimate_success1.png')
    plt.savefig(data_plots / 'check_estimate_success1.pdf')
    df_res = pd.DataFrame(all_res)
    print(df_res)



    plt.figure(figsize=(6,6))
    for mu, color in zip((0.01, 1, 10), colors):
        tab_mid, tab_err, tab_errneg, tab_errpos=[],[],[],[]
        tab_peak = np.geomspace(1e-3,1e2,100)
        for peak in tab_peak:
        #     tabN = np.random.poisson(mu+peak, 50)
        #     tab_rates = [peak_rate_estimate(0.5,mu, N) for N in tabN]
        #     tab_mid.append(np.median(tab_rates))
        #     tab_err.append(np.std(tab_rates))
        # plt.errorbar(tab_peak,tab_mid,yerr=tab_err, color=color, fmt='o', label=f"$\mu={mu}$")
            N = np.random.poisson(mu+peak)
            tab_rates = peak_count_estimate(np.array((0.16, 0.5, 0.84)), N, mu)
            tab_mid.append(tab_rates[1])
            tab_errneg.append(tab_rates[1]-tab_rates[0])
            tab_errpos.append(tab_rates[2] - tab_rates[1])
        plt.errorbar(tab_peak, tab_mid, yerr=[tab_errneg,tab_errpos],
                     color=color, label=fr"$\mu={mu}$", lw=1.0, capsize=1.0, fmt='.', markersize=10)
    plt.loglog()
    plt.xlabel("Peak amplitude (n)")
    plt.ylabel("Estimated peak amplitude (n)")
    plt.plot(tab_peak, tab_peak, c='k')
    plt.xlim(min(tab_peak), max(tab_peak))
    plt.ylim(min(tab_peak), max(tab_peak))
    plt.legend()
    plt.savefig(data_plots / 'check_estimate_success2.png')
    plt.savefig(data_plots / 'check_estimate_success2.pdf')
    plt.show()

plot_B_eclipse()

Plot the eclipse Bayes factor for different observed (n) counts as a function of expecation (mu). Also plot the 3 and 5 sigma threshold values.

Source code in exod/processing/bayesian_tests.py
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
def plot_B_eclipse():
    """
    Plot the eclipse Bayes factor for different observed (n) counts as a function of expecation (mu).
    Also plot the 3 and 5 sigma threshold values.
    """
    n_lines_to_plot = 20 # 1 line is drawn for each value of n from 0 to n-1
    colors = plt.cm.winter(np.linspace(0, 1, n_lines_to_plot))

    B_peak_3sig, B_eclipse_3sig = get_bayes_thresholds(3)
    B_peak_5sig, B_eclipse_5sig = get_bayes_thresholds(5)

    mu_lo, mu_hi = 1e-3, 50
    mus = np.geomspace(mu_lo, mu_hi, 1000)

    plt.figure(figsize=(3.5, 3.5))
    for n in range(n_lines_to_plot):
        label = None
        if (n == 0) or (n == n_lines_to_plot-1):  # Label first and last line
            label = f'n={n}'
        plt.plot(mus, B_eclipse_log(n=n, mu=mus), color=colors[n], label=label)

    plt.axhline(B_eclipse_3sig, color='red', label=rf'3 $\sigma$ (B={B_peak_3sig:.2f})')
    plt.axhline(B_eclipse_5sig, color='black', label=rf'5 $\sigma$ (B={B_peak_5sig:.2f})')
    plt.title(f'Eclipse Bayes factor for n=0-{n_lines_to_plot}')
    plt.xlabel(r'Expected Value $\mu$')
    plt.ylabel(r'$log_{10}$($B_{eclipse}$)')
    # plt.xscale('log')
    plt.tight_layout()
    plt.xlim(mu_lo, mu_hi)
    plt.legend()
    plt.savefig(data_plots / 'B_eclipse.png')
    plt.savefig(data_plots / 'B_eclipse.pdf')
    plt.show()

plot_B_peak()

Plot the peak Bayes factor for different observed (n) counts as a function of expectation (mu). Also plot the 3 and 5 sigma threshold values.

Source code in exod/processing/bayesian_tests.py
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
def plot_B_peak():
    """
    Plot the peak Bayes factor for different observed (n) counts as a function of expectation (mu).
    Also plot the 3 and 5 sigma threshold values.
    """
    n_lines_to_plot = 20  # 1 line is drawn for each value of n from 0 to n-1
    colors = plt.cm.winter(np.linspace(start=0, stop=1, num=n_lines_to_plot))

    B_peak_3sig, B_eclipse_3sig = get_bayes_thresholds(3)
    B_peak_5sig, B_eclipse_5sig = get_bayes_thresholds(5)

    mu_lo, mu_hi = 1e-3, 50
    mus = np.geomspace(mu_lo, mu_hi, 1000)

    plt.figure(figsize=(3.5, 3.5))
    for n in range(n_lines_to_plot):
        label = None
        if (n == 0) or (n == n_lines_to_plot-1):  # Label first and last line
            label = f'n={n}'
        plt.plot(mus, B_peak_log(n=n, mu=mus), color=colors[n], label=label)

    plt.axhline(B_peak_3sig, color='red', label=rf'3 $\sigma$ (B={B_peak_3sig:.2f})')
    plt.axhline(B_peak_5sig, color='black', label=rf'5 $\sigma$ (B={B_peak_5sig:.2f})')
    plt.title(f'Peak Bayes factor for n=0-{n_lines_to_plot}')
    plt.xlabel(r'Expected Value $\mu$')
    plt.ylabel(r'$log_{10}$($B_{peak}$)')
    plt.xscale('log')
    plt.tight_layout()
    plt.ylim(0)
    plt.xlim(mu_lo, mu_hi)
    plt.legend()
    plt.savefig(data_plots / 'B_peak.png')
    plt.savefig(data_plots / 'B_peak.pdf')
    plt.show()

bti

Functions to calculate bad time intervals (BTI) for a given lightcurve.

BTIs are defined as time intervals where average count rate the 10.0-12.0 keV energy band exceeds a certain threshold. This threshold is usually set to 0.5 ct/s. However, when combining multiple lightcurves, the threshold can be set to 1.5 ct/s.

get_bti(time, data, threshold)

Get the bad time intervals for a given lightcurve.

Parameters:

Name Type Description Default
time array

Time Array.

required
data array

Data Array.

required
threshold float

Threshold for bad time intervals.

required

Returns:

Name Type Description
bti list

[{'START':500, 'STOP':600}, {'START':700, 'STOP':800}, ...]

Source code in exod/processing/bti.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def get_bti(time, data, threshold):
    """
    Get the bad time intervals for a given lightcurve.

    Parameters:
        time (array): Time Array.
        data (array): Data Array.
        threshold (float): Threshold for bad time intervals.

    Returns:
        bti (list): [{'START':500, 'STOP':600}, {'START':700, 'STOP':800}, ...]
    """
    mask = data > threshold  # you can flip this to get the gti instead (it works)
    if mask.all():
        logger.info('All values above threshold! Entire observation is bad :(')
        return [{'START': time[0], 'STOP': time[-1]}]
    elif (~mask).all():
        logger.info('All values below Threshold! Entire observation is good :)')
        return []

    int_mask    = mask.astype(int)
    diff        = np.diff(int_mask)
    idx_starts  = np.where(diff == 1)[0]
    idx_ends    = np.where(diff == -1)[0]
    time_starts = time[idx_starts]
    time_ends   = time[idx_ends]

    first_crossing = diff[diff != 0][0]
    last_crossing  = diff[diff != 0][-1]

    if (first_crossing, last_crossing) == (-1, 1):
        logger.info('Curve Started and Ended above threshold!')
        time_starts = np.append(time[0], time_starts)
        time_ends = np.append(time_ends, time[-1])

    elif len(idx_starts) < len(idx_ends):
        logger.info('Curve Started Above threshold! (but did not end above it)')
        time_starts = np.append(time[0], time_starts)

    elif len(idx_starts) > len(idx_ends):
        logger.info('Curve Ended Above threshold! (but did not start above it)')
        time_ends = np.append(time_ends, time[-1])

    else:
        logger.info('Curve started and ended below threshold, nothing to do.')

    assert len(time_starts) == len(time_ends)
    bti = [{'START': time_starts[i], 'STOP': time_ends[i]} for i in range(len(time_ends))]
    return bti

get_bti_bin_idx(bti, bin_t)

Get the indexs corresponding to the bad time intervals for an array of time windows given a set of bad time intervals.

Parameters:

Name Type Description Default
bti list

[{'START':300, 'STOP':500}, {'START':600, 'STOP':800}, ...]

required
bin_t array

Array of time bins.

required

Returns:

Name Type Description
bti_bin_idx array

array of indexs corresponding to BTIs.

Source code in exod/processing/bti.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def get_bti_bin_idx(bti, bin_t):
    """
    Get the indexs corresponding to the bad time intervals for an array of time windows given
    a set of bad time intervals.

    Parameters:
        bti (list): [{'START':300, 'STOP':500}, {'START':600, 'STOP':800}, ...]
        bin_t (array): Array of time bins.

    Returns:
        bti_bin_idx (array): array of indexs corresponding to BTIs.
    """
    t_starts = [b['START'] for b in bti]
    t_stops  = [b['STOP'] for b in bti]
    idx_starts = np.searchsorted(bin_t, t_starts)
    idx_stops  = np.searchsorted(bin_t, t_stops) - 1

    bti_bin_idx = np.array([])
    for i in range(len(idx_starts)):
        idxs = np.arange(idx_starts[i], idx_stops[i], 1)
        bti_bin_idx = np.append(bti_bin_idx, idxs)
    bti_bin_idx = bti_bin_idx.astype(int)
    return bti_bin_idx

get_bti_bin_idx_bool(bti_bin_idx, bin_t)

Get the boolean array mask corresponding to if a time window was a bad time interval or not.

Parameters:

Name Type Description Default
bti_bin_idx array

[1,4,6]

required
bin_t array

[0, 1.5, 2.0, 3.5, 5.0, 6.5, 8.0]

required

Returns:

Name Type Description
bti_bin_idx_bool array

[F,T,F,F,T,F,T,F]

Source code in exod/processing/bti.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def get_bti_bin_idx_bool(bti_bin_idx, bin_t):
    """
    Get the boolean array mask corresponding to if a time window was a bad time interval or not.

    Parameters:
        bti_bin_idx (array): [1,4,6]
        bin_t (array): [0, 1.5, 2.0, 3.5, 5.0, 6.5, 8.0]

    Returns:
        bti_bin_idx_bool (array): [F,T,F,F,T,F,T,F]
    """
    arr = np.arange(len(bin_t))
    bti_bin_idx_bool = np.isin(arr, bti_bin_idx)
    return bti_bin_idx_bool

coordinates

calc_df_regions(image, image_mask)

Return the region dataframe for a given image and mask.

Source code in exod/processing/coordinates.py
73
74
75
76
77
78
79
def calc_df_regions(image, image_mask):
    """Return the region dataframe for a given image and mask."""
    properties_ = ('label', 'bbox', 'centroid', 'weighted_centroid', 'intensity_mean', 'equivalent_diameter_area', 'area_bbox')
    region_dict = regionprops_table(label_image=label(image_mask), intensity_image=image, properties=properties_)
    df_region = pd.DataFrame(region_dict)
    df_region['label'] = df_region['label'] - 1 # Start labels at 0
    return df_region

correct_sky_position(skycoord, data_cube)

Correct the systematic offset due to the gridding.

The change we need needs to be positive in dec and negative in RA, which results in diagonal (UL/NW) position angle correction, corresponding to 360-45=315 degrees.

The angular separation is the diagonal distance from the corner to the center of a pixel thus

(2*(size_arcsec/2)^2)^0.5

Source code in exod/processing/coordinates.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def correct_sky_position(skycoord, data_cube):
    """
    Correct the systematic offset due to the gridding.

    The change we need needs to be positive in dec and negative in RA, which results in diagonal (UL/NW)
    position angle correction, corresponding to 360-45=315 degrees.

    The angular separation is the diagonal distance from the corner to the center of a pixel thus:
        (2*(size_arcsec/2)^2)^0.5
    """
    position_angle = 315 * u.deg
    separation = np.sqrt(2 * (data_cube.size_arcsec / 2) ** 2) * u.arcsec
    sc = skycoord.directional_offset_by(position_angle=position_angle, separation=separation)
    return sc

get_regions_sky_position(data_cube, df_regions, wcs)

Calculate the sky position of the detected regions.

Test coords: observation : 0803990501 1313 X-1 : 03 18 20.00 -66 29 10.9 1313 X-2 : 03 18 22.00 -66 36 04.3 SN 1978K : 03 17 38.62 -66 33 03.4 NGC1313 XMM4 : 03 18 18.46 -66 30 00.2 (lil guy next to x-1)

To calculate the EPIC X and Y coordinates of the variable sources, we use the final coordinates in the variability map, which are not integers. To know to which X and Y correspond to this, we interpolate the values of X and Y on the final coordinates. We divide by 80 because the WCS from the image is binned by x80 compared to X and Y values

Source code in exod/processing/coordinates.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def get_regions_sky_position(data_cube, df_regions, wcs):
    """
    Calculate the sky position of the detected regions.

    Test coords: observation : 0803990501
    1313 X-1     : 03 18 20.00 -66 29 10.9
    1313 X-2     : 03 18 22.00 -66 36 04.3
    SN 1978K     : 03 17 38.62 -66 33 03.4
    NGC1313 XMM4 : 03 18 18.46 -66 30 00.2 (lil guy next to x-1)

    To calculate the EPIC X and Y coordinates of the variable sources, we use the final coordinates
    in the variability map, which are not integers. To know to which X and Y correspond to this, we interpolate the
    values of X and Y on the final coordinates. We divide by 80 because the WCS from the image is binned by x80
    compared to X and Y values
    """
    img_bin_size = 80  # This is the binning of the wcs image when it was created using evselect.

    interpX = interp1d(range(len(data_cube.bin_x)), data_cube.bin_x)
    interpY = interp1d(range(len(data_cube.bin_y)), data_cube.bin_y)

    all_res = []
    for i, row in df_regions.iterrows():
        X = interpX(row['weighted_centroid-0'])
        Y = interpY(row['weighted_centroid-1'])

        x_img = X / img_bin_size
        y_img = Y / img_bin_size
        skycoord = wcs.pixel_to_world(x_img, y_img)
        skycoord = correct_sky_position(skycoord, data_cube)

        res = {'x_img'    : x_img,  # x_position in the binned fits image
               'y_img'    : y_img,  # y_position in the binned fits image
               'X'        : X,      # X in the event_list (sky coordinates)
               'Y'        : Y,      # Y in the event-list (sky coordinates)
               'ra'       : skycoord.ra.to_string(unit=u.hourangle, precision=2),
               'dec'      : skycoord.dec.to_string(unit=u.deg, precision=2),
               'ra_deg'   : skycoord.ra.value,
               'dec_deg'  : skycoord.dec.value}
        all_res.append(res)

    df_sky = pd.DataFrame(all_res)
    logger.info(f'df_sky:\n{df_sky}')
    return df_sky

ra_dec_to_xyz(ra_deg, dec_deg)

Convert RA and DEC to points on the unit sphere.

Parameters:

Name Type Description Default
dec_deg Quantity

Declination in degrees.

required
ra_deg Quantity

Right Ascension in degrees.

required

Returns:

Name Type Description
xyz ndarray

The xyz positions of the points as a (N, 3) array.

Source code in exod/processing/coordinates.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def ra_dec_to_xyz(ra_deg, dec_deg):
    """
    Convert RA and DEC to points on the unit sphere.

    Parameters:
        dec_deg (astropy.units.Quantity): Declination in degrees.
        ra_deg (astropy.units.Quantity): Right Ascension in degrees.

    Returns:
        xyz (np.ndarray): The xyz positions of the points as a (N, 3) array.
    """
    ra_rad  = ra_deg.to(u.rad).value
    dec_rad = dec_deg.to(u.rad).value

    x = np.cos(dec_rad) * np.cos(ra_rad)
    y = np.cos(dec_rad) * np.sin(ra_rad)
    z = np.sin(dec_rad)
    xyz = np.vstack((x, y, z)).T
    return xyz

rotate_XY(X, Y, angle, pivotXY=(25719, 25719))

Rotates the positions following the pointing angle of the observation, so that the coordinates are in an EPIC frame.

Parameters:

Name Type Description Default
X array

The X sky coordinate from event list.

required
Y array

The Y sky coordinate from event list.

required
angle float

The pointing angle of the observation (PA_PNT).

required
pivotXY tuple

The pivot point for the rotation.

(25719, 25719)

Returns:

Name Type Description
X_EPIC array

The X sky coordinate in the EPIC frame.

Y_EPIC array

The Y sky coordinate in the EPIC frame.

Source code in exod/processing/coordinates.py
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def rotate_XY(X, Y, angle, pivotXY=(25719, 25719)):
    """
    Rotates the positions following the pointing angle of the observation, so that the coordinates are in an EPIC frame.

    Parameters:
        X (array): The X sky coordinate from event list.
        Y (array): The Y sky coordinate from event list.
        angle (float): The pointing angle of the observation (PA_PNT).
        pivotXY (tuple): The pivot point for the rotation.

    Returns:
        X_EPIC (array): The X sky coordinate in the EPIC frame.
        Y_EPIC (array): The Y sky coordinate in the EPIC frame.
    """
    X_EPIC = (X - pivotXY[0]) * np.cos(-angle * np.pi / 180) - (Y - pivotXY[1]) * np.sin(-angle * np.pi / 180)
    Y_EPIC = (X - pivotXY[0]) * np.sin(-angle * np.pi / 180) + (Y - pivotXY[1]) * np.cos(-angle * np.pi / 180)
    return X_EPIC, Y_EPIC

data_cube

DataCube

Class to represent a 3D data cube from XMM data.

Attributes:

Name Type Description
event_list EventList

EventList object.

size_arcsec float

size of one side of a cell in the cube in arcseconds.

time_interval float

time interval for each frame in seconds.

extent int

Initial Cube Size.

pixel_size float

size of one side of a cell in the cube in pixels.

n_bins int

Spatial Bins (x,y).

bin_x array

Bins for the x-axis.

bin_y array

Bins for the y-axis.

bin_t array

Bins for the t-axis.

n_t_bins int

Number of time bins.

bti_bin_idx array

Indexs of the bad time intervals (BTIs)

bti_bin_idx_bool array

Boolean Mask of the bad time intervals that can be applied to bin_t

n_bti_bin int

Number of bad time intervals bins.

bti_frac float

Fraction of bad time intervals.

gti_bin_idx array

Indexs of the good time intervals (GTIs)

gti_bin_idx_bool array

Boolean Mask of the good time intervals that can be applied to bin_t

n_gti_bin int

Number of good time intervals bins.

gti_frac float

Fraction of good time intervals.

bccd_bin_idx array

Indexs of frames marked as having irregular (bad) CCD exposures.

bccd_frac float

Fraction of time frames marked as having bad CCD exposures.

data array

3-D numpy array containing the data.

bbox_img tuple

Bounding box of the image (single time slice).

Source code in exod/processing/data_cube.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
class DataCube:
    """
    Class to represent a 3D data cube from XMM data.

    Attributes:
        event_list (EventList): EventList object.
        size_arcsec (float): size of one side of a cell in the cube in arcseconds.
        time_interval (float): time interval for each frame in seconds.
        extent (int): Initial Cube Size.
        pixel_size (float): size of one side of a cell in the cube in pixels.
        n_bins (int): Spatial Bins (x,y).
        bin_x (np.array): Bins for the x-axis.
        bin_y (np.array): Bins for the y-axis.
        bin_t (np.array): Bins for the t-axis.
        n_t_bins (int): Number of time bins.
        bti_bin_idx (np.array): Indexs of the bad time intervals (BTIs)
        bti_bin_idx_bool (np.array): Boolean Mask of the bad time intervals that can be applied to bin_t
        n_bti_bin (int): Number of bad time intervals bins.
        bti_frac (float): Fraction of bad time intervals.
        gti_bin_idx (np.array): Indexs of the good time intervals (GTIs)
        gti_bin_idx_bool (np.array): Boolean Mask of the good time intervals that can be applied to bin_t
        n_gti_bin (int): Number of good time intervals bins.
        gti_frac (float): Fraction of good time intervals.
        bccd_bin_idx (np.array): Indexs of frames marked as having irregular (bad) CCD exposures.
        bccd_frac (float): Fraction of time frames marked as having bad CCD exposures.
        data (np.array): 3-D numpy array containing the data.
        bbox_img (tuple): Bounding box of the image (single time slice).
    """
    def __init__(self, event_list, size_arcsec, time_interval):
        self.event_list = event_list
        self.size_arcsec = size_arcsec
        self.time_interval = time_interval
        self.extent = 51840  # Initial Cube Size
        self.pixel_size = size_arcsec / 0.05
        self.n_bins = int(self.extent / self.pixel_size) # Spatial Bins (x,y)
        self.bin_x = np.linspace(0, self.extent, self.n_bins + 1)
        self.bin_y = np.linspace(0, self.extent, self.n_bins + 1)
        self.bin_t = self.calc_time_bins()
        self.n_t_bins = len(self.bin_t) - 1

        self.bti_bin_idx = []      # index of bad time interval bins e.g. [2,4,6]
        self.bti_bin_idx_bool = [] # Mask of the bad time interval bins e.g. [False, False, True, False, True, ...]
        self.n_bti_bin = None
        self.bti_frac = None

        self.gti_bin_idx = []
        self.gti_bin_idx_bool = []
        self.n_gti_bin = None
        self.gti_frac = None

        # Used for keeping track of frames (time bins) with uneven ccd exposures.
        self.bccd_bin_idx = []
        self.bccd_bin_idx_bool = []
        self.n_bccd_bin = None
        self.bccd_frac = None

        self.data = self.bin_event_list()
        self.bbox_img = self.get_cube_bbox()
        self.crop_data_cube()
        self.relative_frame_exposures = np.ones(self.data.shape[2])
        self.shape = self.data.shape
        self.memory_mb = self.data.nbytes / (1024 ** 2)


    def __repr__(self):
        return (f"DataCube(shape={self.shape}, "
                f"total_values={np.prod(self.shape)}, "
                f"memory={self.memory_mb:.2f} MB)")

    def copy(self):
        """Return a copy of the datacube."""
        return copy.deepcopy(self)

    def calc_time_bins(self):
        t_lo = self.event_list.time_min
        t_hi = self.event_list.time_max
        t_i = self.time_interval
        n_time_bins = int((t_hi - t_lo) / t_i)
        time_stop = t_lo + n_time_bins * t_i
        time_bins = np.arange(t_lo, time_stop + 1, t_i)
        return time_bins

    def bin_event_list(self):
        data = self.event_list.data
        sample = data['X'], data['Y'], data['TIME']
        bins = [self.bin_x, self.bin_y, self.bin_t]
        cube, bin_edges, bin_number = binned_statistic_dd(sample=sample, values=None, statistic='count', bins=bins)
        cube = cube.astype(np.int32)
        return cube

    def crop_data_cube(self):
        """Crop the surrounding areas of the data_cube that are empty."""
        bbox_img = self.bbox_img

        logger.info(f'Cropping data cube between bbox_img: {bbox_img}')
        self.data = self.data[bbox_img[0]:bbox_img[1], bbox_img[2]:bbox_img[3]]

        # Calculate the new bins
        self.bin_x = self.bin_x[bbox_img[0]:bbox_img[1]]
        self.bin_y = self.bin_y[bbox_img[2]:bbox_img[3]]

    def get_cube_bbox(self):
        """Get the Bounding Box corresponding to the cube's image plane."""
        idx_nonempty = np.where(np.sum(self.data, axis=2) > 0)
        bbox_img = (np.min(idx_nonempty[0]), np.max(idx_nonempty[0]) + 1,
                    np.min(idx_nonempty[1]), np.max(idx_nonempty[1]) + 1)
        return bbox_img

    def calc_gti_bti_bins(self, bti):
        """
        Calculate the good and bad time interval indexes & masks.

        Parameters:
            bti (dict): {['START':344.2, 'STOP':454.2], ...}
        """
        self.bti_bin_idx      = get_bti_bin_idx(bti=bti, bin_t=self.bin_t)
        self.bti_bin_idx_bool = get_bti_bin_idx_bool(bti_bin_idx=self.bti_bin_idx, bin_t=self.bin_t)
        self.gti_bin_idx_bool = ~self.bti_bin_idx_bool
        self.gti_bin_idx      = np.where(self.gti_bin_idx_bool)[0][:-1]
        self.n_gti_bin = len(self.gti_bin_idx)
        self.n_bti_bin = len(self.bti_bin_idx)
        self.bti_frac = self.n_bti_bin / self.n_t_bins
        self.gti_frac = self.n_gti_bin / self.n_t_bins
        logger.info(f'n_gti = {self.n_gti_bin:<4} / {self.n_t_bins} ({self.gti_frac:.2f})')
        logger.info(f'n_bti = {self.n_bti_bin:<4} / {self.n_t_bins} ({self.bti_frac:.2f})')

    def mask_bti(self):
        logger.info('Masking bad frames from Data Cube (setting to nan)')
        img_shape = (self.shape[0], self.shape[1], 1)
        img_nan = np.full(shape=img_shape, fill_value=np.nan, dtype=np.float32)
        self.data[:, :, self.bti_bin_idx] = img_nan

    def remove_bti_frames(self):
        """Return the cube without the masked nan frames."""
        data_non_nan = self.data[:, :, ~self.bti_bin_idx_bool[:-1]]
        return data_non_nan

    def mask_frames_with_partial_ccd_exposure(self, mask_frames=True, plot=False):
        """
        Remove the frames with irregular exposures between CCDs.

        https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html
        https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

        Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301
        0201900201, 0724840301, 0743700201

        Args:
            mask_frames (bool): If True mask the frames.
            plot (bool): If True plot diagnostics.
        """
        if not mask_frames:
            self.bccd_bin_idx_bool = np.full(self.n_t_bins, fill_value=False)
            return

        all_masks = {}
        for e in self.event_list.event_lists:
            bad_ccd_mask = self.get_bad_ccd_mask(event_list=e, plot=plot)
            all_masks[e.instrument] = bad_ccd_mask

        if all_masks:
            # Combine the frame masks from all the individual event lists.
            self.bccd_bin_idx_bool = np.any(list(all_masks.values()), axis=0)
            self.bccd_bin_idx      = np.where(self.bccd_bin_idx_bool)[0]

            # Mask from the data cube and update the relative frame exposures.
            self.data = np.where(self.bccd_bin_idx_bool, np.empty(self.data.shape) * np.nan, self.data)
            self.relative_frame_exposures = np.where(self.bccd_bin_idx_bool, 0, self.relative_frame_exposures)

            # Calculate the number and fraction of bins
            self.n_bccd_bin = len(self.bccd_bin_idx)
            self.bccd_frac = self.n_bccd_bin / self.n_t_bins
            logger.info(f'n_bccd = {self.n_bccd_bin:<4} / {self.n_t_bins} ({self.bccd_frac:.2f})')

    def get_bad_ccd_mask(self, event_list, plot=False):
        """
        Get the mask for the frames corresponding with irregular CCD exposures.

        Parameters:
            event_list (EventList): EventList object.
            plot (bool): if True, then plot.

        Returns:
            bccd_bin_idx_bool (np.array): Array of bools indicating the frames with irregular CCD exposures.
        """
        ccd_bins = event_list.get_ccd_bins()

        # Get the lightcurves for each CCD.
        lcs_ccd, bin_edges, bin_number = binned_statistic_dd(sample=(event_list.data['CCDNR'], event_list.data['TIME']),
                                                             values=None,
                                                             statistic='count',
                                                             bins=[ccd_bins, self.bin_t])

        if event_list.instrument == 'EPN':
            quadrant_split         = np.split(lcs_ccd, indices_or_sections=(3, 6, 9))
            lcs_ccd                = np.sum(quadrant_split, axis=1)
            lcs_median_quadrant    = np.median(quadrant_split, axis=1)
            lc_median_quadrant_max = np.max(lcs_median_quadrant, axis=0)
            lc_median_quadrant_min = np.min(lcs_median_quadrant, axis=0)
            ccd_bins = [1, 4, 7, 10, 13]

        lcs_ccd_max = np.max(lcs_ccd, axis=0)
        # lcs_ccd_min = np.min(lcs_ccd, axis=0)
        count_active_ccd = np.sum(lcs_ccd > 0, axis=0)  # Nbr of CCDs active in each frame

        m0 = count_active_ccd == 0  # Frame is entirely empty

        if event_list.instrument == 'EPN':
            lc_min_counts = 10
            lc_max_norm_diff = 0.5
            logger.info(f'Removing PN frames lc_min_counts > {lc_min_counts} and lc_max_norm_diff > {lc_max_norm_diff}')
            lc_median_norm_diff = (lc_median_quadrant_max - lc_median_quadrant_min) / (lc_median_quadrant_max + lc_median_quadrant_min)
            m1 = lcs_ccd_max > lc_min_counts             # Frame is more than 10 counts in the brightest CCD
            m2 = self.bti_bin_idx_bool[:-1]              # Frame is a bad time index
            m3 = count_active_ccd < len(ccd_bins) - 1    # Frame is not running all CCDs
            m4 = lc_median_norm_diff > lc_max_norm_diff  # When this is >0.5 it is equivalent to max/min > 3
            m5 = m1 & (m3 | (m2 & m4))
            bccd_bin_idx_bool = m0 | m5
            # Remove the frames that are 1 left of a True frame that are False
            bccd_bin_idx_bool = bccd_bin_idx_bool | np.roll(bccd_bin_idx_bool, shift=1)

            masks = [m0, m2, m1, m3, m4, m5, bccd_bin_idx_bool]
            labels = ['empty', 'is_bti', 'max > 10', 'active_ccds <  #_ccds', 'relative_diff > 0.5', 'combined', 'to remove']
            plot_frame_masks(instrum=event_list.instrument, masks=masks, labels=labels, plot=plot)
        else:
            bccd_bin_idx_bool = m0

        bccd_bin_idx = np.where(bccd_bin_idx_bool)[0]
        logger.info(f'Removing {np.sum(bccd_bin_idx_bool)} / {len(bccd_bin_idx_bool)} incomplete frames from {event_list.instrument}')

        # Plot the Lightcurves for PN
        if plot and event_list.instrument == 'EPN':
            fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(10, 10), sharex=True)
            for i, lc in enumerate(lcs_median_quadrant):
                ax[0].plot(lcs_ccd[i], label=f'lc ccd={i+1}')
                ax[1].plot(lcs_median_quadrant[i], label=f'median quadrant={i+1}')

            ax[-2].plot(lc_median_quadrant_min, label='ccd min')
            ax[-2].plot(lc_median_quadrant_max, label='ccd max')
            ax[-1].plot(lc_median_norm_diff, label='max-min/max+min')
            ax[-1].axhline(0.5, color='red', lw=1.0)
            for a in ax:
                a.legend(loc='right')
                # Plot the Masked frames.
                for j in range(len(bccd_bin_idx)):
                    a.axvspan(bccd_bin_idx[j], bccd_bin_idx[j]+1, color='red', alpha=0.2)
            plt.tight_layout()
            plt.show()

        return bccd_bin_idx_bool

    def multiply_time_interval(self, n_factor):
        """Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales."""
        self.time_interval = n_factor*self.time_interval
        self.bin_t = self.calc_time_bins()

        # allows to cut X in chunks of size N (plus the remaining bit)
        # np.split(X, np.arange(N, len(X), N))

        # Splits in groups of n_factor along the time axis
        datacube_twoframegroups = np.split(self.data, np.arange(n_factor, self.shape[2], n_factor), axis=2)

        # Nansum each group along the time axis, and makes it into a cube again
        self.data = np.transpose([np.nansum(frame_grp, axis=2) for frame_grp in datacube_twoframegroups], (1,2,0))

        # Update the relative exposures of each frame
        # Splits in groups of n_factor along the time axis
        frame_exposures_twoframegroups = np.split(self.relative_frame_exposures, np.arange(n_factor, self.shape[2], n_factor))

        # Update relative frame exposures and shape.
        self.relative_frame_exposures = np.array([np.sum(exp_frame_grp) for exp_frame_grp in frame_exposures_twoframegroups])
        self.shape = self.data.shape

    def video(self, savepath=None):
        """Play each frame of the datacube sequentially in a matplotlib figure."""
        if np.isnan(self.data[:, :, 0]).all():
            logger.info('first frame all nan =no plot')
            return None

        fig, ax = plt.subplots()
        img = ax.imshow(self.data[:, :, 0].T, cmap=cmap_image(), animated=True, interpolation='none',
                        origin='lower')
        colorbar = fig.colorbar(img, ax=ax)

        def update(frame):
            ax.set_title(f'{self}\n{frame}/{num_frames}')
            img.set_array(self.data[:, :, frame].T)
            return img,

        num_frames = self.shape[2]
        ani = FuncAnimation(fig, update, frames=num_frames, interval=20)
        if savepath:
            logger.info(f'Saving {self} to {savepath}')
            ani.save(savepath)
        plt.show()


    @property
    def info(self):
        info = {'event_list'   : self.event_list.filename,
                'size_arcsec'  : self.size_arcsec,
                'time_interval': self.time_interval,
                'n_t_bins'     : self.n_t_bins,
                'n_bti_bin'    : self.n_bti_bin,
                'n_gti_bin'    : self.n_gti_bin,
                'gti_frac'     : self.gti_frac,
                'bti_frac'     : self.bti_frac,
                'n_bccd_bin'   : self.n_bccd_bin,
                'bccd_frac'    : self.bccd_frac,
                'bbox_img'     : self.bbox_img,
                'shape'        : self.shape,
                'total_values' : np.prod(self.shape),
                'memory_mb'    : self.memory_mb}
        for k, v in info.items():
            logger.info(f'{k:>13} : {v}')
        return info
calc_gti_bti_bins(bti)

Calculate the good and bad time interval indexes & masks.

Parameters:

Name Type Description Default
bti dict

{['START':344.2, 'STOP':454.2], ...}

required
Source code in exod/processing/data_cube.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def calc_gti_bti_bins(self, bti):
    """
    Calculate the good and bad time interval indexes & masks.

    Parameters:
        bti (dict): {['START':344.2, 'STOP':454.2], ...}
    """
    self.bti_bin_idx      = get_bti_bin_idx(bti=bti, bin_t=self.bin_t)
    self.bti_bin_idx_bool = get_bti_bin_idx_bool(bti_bin_idx=self.bti_bin_idx, bin_t=self.bin_t)
    self.gti_bin_idx_bool = ~self.bti_bin_idx_bool
    self.gti_bin_idx      = np.where(self.gti_bin_idx_bool)[0][:-1]
    self.n_gti_bin = len(self.gti_bin_idx)
    self.n_bti_bin = len(self.bti_bin_idx)
    self.bti_frac = self.n_bti_bin / self.n_t_bins
    self.gti_frac = self.n_gti_bin / self.n_t_bins
    logger.info(f'n_gti = {self.n_gti_bin:<4} / {self.n_t_bins} ({self.gti_frac:.2f})')
    logger.info(f'n_bti = {self.n_bti_bin:<4} / {self.n_t_bins} ({self.bti_frac:.2f})')
copy()

Return a copy of the datacube.

Source code in exod/processing/data_cube.py
80
81
82
def copy(self):
    """Return a copy of the datacube."""
    return copy.deepcopy(self)
crop_data_cube()

Crop the surrounding areas of the data_cube that are empty.

Source code in exod/processing/data_cube.py
101
102
103
104
105
106
107
108
109
110
def crop_data_cube(self):
    """Crop the surrounding areas of the data_cube that are empty."""
    bbox_img = self.bbox_img

    logger.info(f'Cropping data cube between bbox_img: {bbox_img}')
    self.data = self.data[bbox_img[0]:bbox_img[1], bbox_img[2]:bbox_img[3]]

    # Calculate the new bins
    self.bin_x = self.bin_x[bbox_img[0]:bbox_img[1]]
    self.bin_y = self.bin_y[bbox_img[2]:bbox_img[3]]
get_bad_ccd_mask(event_list, plot=False)

Get the mask for the frames corresponding with irregular CCD exposures.

Parameters:

Name Type Description Default
event_list EventList

EventList object.

required
plot bool

if True, then plot.

False

Returns:

Name Type Description
bccd_bin_idx_bool array

Array of bools indicating the frames with irregular CCD exposures.

Source code in exod/processing/data_cube.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
def get_bad_ccd_mask(self, event_list, plot=False):
    """
    Get the mask for the frames corresponding with irregular CCD exposures.

    Parameters:
        event_list (EventList): EventList object.
        plot (bool): if True, then plot.

    Returns:
        bccd_bin_idx_bool (np.array): Array of bools indicating the frames with irregular CCD exposures.
    """
    ccd_bins = event_list.get_ccd_bins()

    # Get the lightcurves for each CCD.
    lcs_ccd, bin_edges, bin_number = binned_statistic_dd(sample=(event_list.data['CCDNR'], event_list.data['TIME']),
                                                         values=None,
                                                         statistic='count',
                                                         bins=[ccd_bins, self.bin_t])

    if event_list.instrument == 'EPN':
        quadrant_split         = np.split(lcs_ccd, indices_or_sections=(3, 6, 9))
        lcs_ccd                = np.sum(quadrant_split, axis=1)
        lcs_median_quadrant    = np.median(quadrant_split, axis=1)
        lc_median_quadrant_max = np.max(lcs_median_quadrant, axis=0)
        lc_median_quadrant_min = np.min(lcs_median_quadrant, axis=0)
        ccd_bins = [1, 4, 7, 10, 13]

    lcs_ccd_max = np.max(lcs_ccd, axis=0)
    # lcs_ccd_min = np.min(lcs_ccd, axis=0)
    count_active_ccd = np.sum(lcs_ccd > 0, axis=0)  # Nbr of CCDs active in each frame

    m0 = count_active_ccd == 0  # Frame is entirely empty

    if event_list.instrument == 'EPN':
        lc_min_counts = 10
        lc_max_norm_diff = 0.5
        logger.info(f'Removing PN frames lc_min_counts > {lc_min_counts} and lc_max_norm_diff > {lc_max_norm_diff}')
        lc_median_norm_diff = (lc_median_quadrant_max - lc_median_quadrant_min) / (lc_median_quadrant_max + lc_median_quadrant_min)
        m1 = lcs_ccd_max > lc_min_counts             # Frame is more than 10 counts in the brightest CCD
        m2 = self.bti_bin_idx_bool[:-1]              # Frame is a bad time index
        m3 = count_active_ccd < len(ccd_bins) - 1    # Frame is not running all CCDs
        m4 = lc_median_norm_diff > lc_max_norm_diff  # When this is >0.5 it is equivalent to max/min > 3
        m5 = m1 & (m3 | (m2 & m4))
        bccd_bin_idx_bool = m0 | m5
        # Remove the frames that are 1 left of a True frame that are False
        bccd_bin_idx_bool = bccd_bin_idx_bool | np.roll(bccd_bin_idx_bool, shift=1)

        masks = [m0, m2, m1, m3, m4, m5, bccd_bin_idx_bool]
        labels = ['empty', 'is_bti', 'max > 10', 'active_ccds <  #_ccds', 'relative_diff > 0.5', 'combined', 'to remove']
        plot_frame_masks(instrum=event_list.instrument, masks=masks, labels=labels, plot=plot)
    else:
        bccd_bin_idx_bool = m0

    bccd_bin_idx = np.where(bccd_bin_idx_bool)[0]
    logger.info(f'Removing {np.sum(bccd_bin_idx_bool)} / {len(bccd_bin_idx_bool)} incomplete frames from {event_list.instrument}')

    # Plot the Lightcurves for PN
    if plot and event_list.instrument == 'EPN':
        fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(10, 10), sharex=True)
        for i, lc in enumerate(lcs_median_quadrant):
            ax[0].plot(lcs_ccd[i], label=f'lc ccd={i+1}')
            ax[1].plot(lcs_median_quadrant[i], label=f'median quadrant={i+1}')

        ax[-2].plot(lc_median_quadrant_min, label='ccd min')
        ax[-2].plot(lc_median_quadrant_max, label='ccd max')
        ax[-1].plot(lc_median_norm_diff, label='max-min/max+min')
        ax[-1].axhline(0.5, color='red', lw=1.0)
        for a in ax:
            a.legend(loc='right')
            # Plot the Masked frames.
            for j in range(len(bccd_bin_idx)):
                a.axvspan(bccd_bin_idx[j], bccd_bin_idx[j]+1, color='red', alpha=0.2)
        plt.tight_layout()
        plt.show()

    return bccd_bin_idx_bool
get_cube_bbox()

Get the Bounding Box corresponding to the cube's image plane.

Source code in exod/processing/data_cube.py
112
113
114
115
116
117
def get_cube_bbox(self):
    """Get the Bounding Box corresponding to the cube's image plane."""
    idx_nonempty = np.where(np.sum(self.data, axis=2) > 0)
    bbox_img = (np.min(idx_nonempty[0]), np.max(idx_nonempty[0]) + 1,
                np.min(idx_nonempty[1]), np.max(idx_nonempty[1]) + 1)
    return bbox_img
mask_frames_with_partial_ccd_exposure(mask_frames=True, plot=False)

Remove the frames with irregular exposures between CCDs.

https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301 0201900201, 0724840301, 0743700201

Parameters:

Name Type Description Default
mask_frames bool

If True mask the frames.

True
plot bool

If True plot diagnostics.

False
Source code in exod/processing/data_cube.py
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def mask_frames_with_partial_ccd_exposure(self, mask_frames=True, plot=False):
    """
    Remove the frames with irregular exposures between CCDs.

    https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/pnchipgeom.html
    https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/uhb/moschipgeom.html

    Test Cases: 0165560101, 0765080801, 0116700301, 0765080801, 0872390901, 0116700301
    0201900201, 0724840301, 0743700201

    Args:
        mask_frames (bool): If True mask the frames.
        plot (bool): If True plot diagnostics.
    """
    if not mask_frames:
        self.bccd_bin_idx_bool = np.full(self.n_t_bins, fill_value=False)
        return

    all_masks = {}
    for e in self.event_list.event_lists:
        bad_ccd_mask = self.get_bad_ccd_mask(event_list=e, plot=plot)
        all_masks[e.instrument] = bad_ccd_mask

    if all_masks:
        # Combine the frame masks from all the individual event lists.
        self.bccd_bin_idx_bool = np.any(list(all_masks.values()), axis=0)
        self.bccd_bin_idx      = np.where(self.bccd_bin_idx_bool)[0]

        # Mask from the data cube and update the relative frame exposures.
        self.data = np.where(self.bccd_bin_idx_bool, np.empty(self.data.shape) * np.nan, self.data)
        self.relative_frame_exposures = np.where(self.bccd_bin_idx_bool, 0, self.relative_frame_exposures)

        # Calculate the number and fraction of bins
        self.n_bccd_bin = len(self.bccd_bin_idx)
        self.bccd_frac = self.n_bccd_bin / self.n_t_bins
        logger.info(f'n_bccd = {self.n_bccd_bin:<4} / {self.n_t_bins} ({self.bccd_frac:.2f})')
multiply_time_interval(n_factor)

Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales.

Source code in exod/processing/data_cube.py
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
def multiply_time_interval(self, n_factor):
    """Used to increase the time_interval by a factor of n_factor, in order to quickly scan different timescales."""
    self.time_interval = n_factor*self.time_interval
    self.bin_t = self.calc_time_bins()

    # allows to cut X in chunks of size N (plus the remaining bit)
    # np.split(X, np.arange(N, len(X), N))

    # Splits in groups of n_factor along the time axis
    datacube_twoframegroups = np.split(self.data, np.arange(n_factor, self.shape[2], n_factor), axis=2)

    # Nansum each group along the time axis, and makes it into a cube again
    self.data = np.transpose([np.nansum(frame_grp, axis=2) for frame_grp in datacube_twoframegroups], (1,2,0))

    # Update the relative exposures of each frame
    # Splits in groups of n_factor along the time axis
    frame_exposures_twoframegroups = np.split(self.relative_frame_exposures, np.arange(n_factor, self.shape[2], n_factor))

    # Update relative frame exposures and shape.
    self.relative_frame_exposures = np.array([np.sum(exp_frame_grp) for exp_frame_grp in frame_exposures_twoframegroups])
    self.shape = self.data.shape
remove_bti_frames()

Return the cube without the masked nan frames.

Source code in exod/processing/data_cube.py
143
144
145
146
def remove_bti_frames(self):
    """Return the cube without the masked nan frames."""
    data_non_nan = self.data[:, :, ~self.bti_bin_idx_bool[:-1]]
    return data_non_nan
video(savepath=None)

Play each frame of the datacube sequentially in a matplotlib figure.

Source code in exod/processing/data_cube.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
def video(self, savepath=None):
    """Play each frame of the datacube sequentially in a matplotlib figure."""
    if np.isnan(self.data[:, :, 0]).all():
        logger.info('first frame all nan =no plot')
        return None

    fig, ax = plt.subplots()
    img = ax.imshow(self.data[:, :, 0].T, cmap=cmap_image(), animated=True, interpolation='none',
                    origin='lower')
    colorbar = fig.colorbar(img, ax=ax)

    def update(frame):
        ax.set_title(f'{self}\n{frame}/{num_frames}')
        img.set_array(self.data[:, :, frame].T)
        return img,

    num_frames = self.shape[2]
    ani = FuncAnimation(fig, update, frames=num_frames, interval=20)
    if savepath:
        logger.info(f'Saving {self} to {savepath}')
        ani.save(savepath)
    plt.show()

extract_lc(data_cube, xhi, xlo, yhi, ylo, dtype=np.int32)

Extract a lightcurve from a data cube by summing through a bounding box.

Source code in exod/processing/data_cube.py
329
330
331
332
333
def extract_lc(data_cube, xhi, xlo, yhi, ylo, dtype=np.int32):
    """Extract a lightcurve from a data cube by summing through a bounding box."""
    data = data_cube[xlo:xhi, ylo:yhi]
    lc = np.nansum(data, axis=(0, 1), dtype=dtype)
    return lc

detector

Detector

Obsolete detector class that re-implements the method used in Inés Pastor-Marazuela (2020) See: https://arxiv.org/abs/2005.08673

Source code in exod/processing/detector.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
class Detector:
    """
    Obsolete detector class that re-implements the method used in Inés Pastor-Marazuela (2020)
    See: https://arxiv.org/abs/2005.08673
    """
    def __init__(self, data_cube, wcs, sigma=4):
        self.data_cube = data_cube
        self.wcs = wcs
        self.sigma = sigma

        self.max_area_bbox = 6**2
        self.n_sources_max = 15  # Number of sources above which to optimise sigma
        self.n_target_regions = 10 # Number of Sources to aim for when optimising.

        self.df_regions = None
        self.df_sky = None

    def __repr__(self):
        return f'Detector({self.data_cube})'

    def run(self):
        self.image_var  = self.calc_image_var()
        self.image_var_mean = np.mean(self.image_var)
        self.image_var_std  = np.std(self.image_var)
        # self.image_var = self.conv_var_img(box_size=3)
        self.df_regions = self.extract_var_regions(sigma=self.sigma)

        if self.n_sources > self.n_sources_max:
            logger.info('More than 15 sources found! Optimising sigma.')
            self.sigma = self.optimise_sigma(self.image_var, n_target_regions=self.n_target_regions)
            self.threshold = self.calc_extraction_threshold(self.sigma)
            self.df_regions = self.extract_var_regions(sigma=self.sigma)

        self.df_sky = get_regions_sky_position(self.data_cube, self.df_regions, self.wcs)
        self.df_regions = pd.concat([self.df_regions, self.df_sky], axis=1)

        self.df_regions = self.filter_df_regions()
        self.df_lcs     = self.get_region_lightcurves()

    def calc_image_var(self):
        """
        Calculate the variability image from a data cube.
        """
        logger.info('Computing variability...')
        image_max = np.nanmax(self.data_cube.data, axis=2)
        image_std = np.nanstd(self.data_cube.data, axis=2)
        image_mean = np.nanmean(self.data_cube.data, axis=2)
        image_var = (image_max - image_mean) * image_std
        return image_var

    def conv_var_img(self, box_size=3):
        """
        Convolve the variability image with a box kernel.
        """
        logger.info('Convolving variability...')
        kernel = np.ones((box_size, box_size)) / box_size ** 2
        image_var_conv = convolve(self.image_var, kernel)

        # New version
        # convolved = gaussian_filter(image_var, 1)
        # convolved = np.where(image_var>0, convolved, 0)
        return image_var_conv

    def calc_extraction_threshold(self, sigma):
        """Calculate the extraction threshold for a given sigma value."""
        self.image_var_threshold = self.image_var_mean + sigma * self.image_var_std
        return self.image_var_threshold

    def extract_var_regions(self, sigma=5):
        """
        Use skimage to obtain the contiguous regions in the variability image.

        We currently used the weighted centroid which calculates a centroid based on
        both the size of the bounding box and the values of the pixels themselves.

        https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

        Returns
        -------
        df_regions : DataFrame of Detected Regions
        """
        logger.info('Extracting variable Regions')
        image_var_mask = self.image_var > self.calc_extraction_threshold(sigma)

        logger.info(f'threshold: {self.calc_extraction_threshold(sigma)} sigma: {sigma}')
        # self.plot_threshold_level(self.calc_extraction_threshold(sigma))

        # Obtain the region properties for the detected regions.
        df_regions = calc_df_regions(image=self.image_var, image_mask=image_var_mask)


        # Sort by Most Variable and reset in the label column
        df_regions = df_regions.sort_values(by='intensity_mean', ascending=False).reset_index(drop=True)
        df_regions['label'] = df_regions.index
        return df_regions

    def objective_function(self, sigma, image, n_target_regions):
        """Function used for optimising sigma detection threshold."""
        threshold = self.calc_extraction_threshold(sigma)
        image_mask = image > threshold
        labeled_image = label(image_mask)
        regions = regionprops(labeled_image)
        n_regions = len(regions)
        # logger.info(f'sigma={sigma:.4f} threshold={threshold:.4f} n_regions={n_regions} target={n_target_regions}')
        return np.abs(n_regions - n_target_regions)

    def optimise_sigma(self, image, n_target_regions, sigma_bounds=(0.1, 20.0)):
        """Find the optimal value of sigma to get n_target regions."""
        result = minimize_scalar(self.objective_function, args=(image, n_target_regions), bounds=sigma_bounds, method='bounded')
        logger.info(f'Optimisation results:\n{result}')
        return result.x

    def plot_threshold_level(self, threshold):
        plt.figure(figsize=(10, 3))
        plt.title('Thresholding')
        plt.plot(self.image_var.flatten(), label='Variability')
        plt.axhline(threshold, color='red', label=fr'threshold={threshold:.2f} ({self.sigma:.2f}$\sigma$)')
        plt.legend()
        plt.ylabel('V score')
        # plt.show()

    def filter_df_regions(self):
        logger.info('Removing regions with area_bbox > 12')
        df_regions = self.df_regions[self.df_regions['area_bbox'] <= self.max_area_bbox]
        return df_regions

    def get_region_lightcurves(self):
        """
        Extract the lightcurves from the variable regions found.
        Returns
        -------
        df_lc : DataFrame of Lightcurves
        """
        df_lcs = get_region_lcs(data_cube=self.data_cube, df_regions=self.df_regions)
        return df_lcs

    def plot_region_lightcurves(self, savedir=None, max_sources=15):
        if self.n_sources > max_sources:
            logger.info(f'{self.n_sources} > {max_sources} Not plotting lightcurves')
            return None

        logger.info(f'Plotting regions lightcurves')
        savepath = None
        for i, row in self.df_regions.iterrows():
            if savedir:
                savepath = savedir / f'lc_reg_{i}.png'
            plot_region_lightcurve(self.df_lcs, i, savepath=savepath)

    @property
    def n_sources(self):
        return len(self.df_regions)

    @property
    def info(self):
        info = {'data_cube'           : repr(self.data_cube),
                'sigma'               : self.sigma,
                'max_area_bbox'       : self.max_area_bbox,
                'n_sources_max'       : self.n_sources_max,
                'n_target_regions'    : self.n_target_regions,
                'image_var_mean'      : self.image_var_mean,
                'image_var_std'       : self.image_var_std,
                'image_var_threshold' : self.image_var_threshold,
                'n_sources'           : self.n_sources}

        for k, v in info.items():
            logger.info(f'{k:>20} : {v}')
        return info
calc_extraction_threshold(sigma)

Calculate the extraction threshold for a given sigma value.

Source code in exod/processing/detector.py
85
86
87
88
def calc_extraction_threshold(self, sigma):
    """Calculate the extraction threshold for a given sigma value."""
    self.image_var_threshold = self.image_var_mean + sigma * self.image_var_std
    return self.image_var_threshold
calc_image_var()

Calculate the variability image from a data cube.

Source code in exod/processing/detector.py
61
62
63
64
65
66
67
68
69
70
def calc_image_var(self):
    """
    Calculate the variability image from a data cube.
    """
    logger.info('Computing variability...')
    image_max = np.nanmax(self.data_cube.data, axis=2)
    image_std = np.nanstd(self.data_cube.data, axis=2)
    image_mean = np.nanmean(self.data_cube.data, axis=2)
    image_var = (image_max - image_mean) * image_std
    return image_var
conv_var_img(box_size=3)

Convolve the variability image with a box kernel.

Source code in exod/processing/detector.py
72
73
74
75
76
77
78
79
80
81
82
83
def conv_var_img(self, box_size=3):
    """
    Convolve the variability image with a box kernel.
    """
    logger.info('Convolving variability...')
    kernel = np.ones((box_size, box_size)) / box_size ** 2
    image_var_conv = convolve(self.image_var, kernel)

    # New version
    # convolved = gaussian_filter(image_var, 1)
    # convolved = np.where(image_var>0, convolved, 0)
    return image_var_conv
extract_var_regions(sigma=5)

Use skimage to obtain the contiguous regions in the variability image.

We currently used the weighted centroid which calculates a centroid based on both the size of the bounding box and the values of the pixels themselves.

https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

Returns

df_regions : DataFrame of Detected Regions

Source code in exod/processing/detector.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def extract_var_regions(self, sigma=5):
    """
    Use skimage to obtain the contiguous regions in the variability image.

    We currently used the weighted centroid which calculates a centroid based on
    both the size of the bounding box and the values of the pixels themselves.

    https://scikit-image.org/docs/stable/auto_examples/segmentation/index.html

    Returns
    -------
    df_regions : DataFrame of Detected Regions
    """
    logger.info('Extracting variable Regions')
    image_var_mask = self.image_var > self.calc_extraction_threshold(sigma)

    logger.info(f'threshold: {self.calc_extraction_threshold(sigma)} sigma: {sigma}')
    # self.plot_threshold_level(self.calc_extraction_threshold(sigma))

    # Obtain the region properties for the detected regions.
    df_regions = calc_df_regions(image=self.image_var, image_mask=image_var_mask)


    # Sort by Most Variable and reset in the label column
    df_regions = df_regions.sort_values(by='intensity_mean', ascending=False).reset_index(drop=True)
    df_regions['label'] = df_regions.index
    return df_regions
get_region_lightcurves()

Extract the lightcurves from the variable regions found. Returns


df_lc : DataFrame of Lightcurves

Source code in exod/processing/detector.py
148
149
150
151
152
153
154
155
156
def get_region_lightcurves(self):
    """
    Extract the lightcurves from the variable regions found.
    Returns
    -------
    df_lc : DataFrame of Lightcurves
    """
    df_lcs = get_region_lcs(data_cube=self.data_cube, df_regions=self.df_regions)
    return df_lcs
objective_function(sigma, image, n_target_regions)

Function used for optimising sigma detection threshold.

Source code in exod/processing/detector.py
118
119
120
121
122
123
124
125
126
def objective_function(self, sigma, image, n_target_regions):
    """Function used for optimising sigma detection threshold."""
    threshold = self.calc_extraction_threshold(sigma)
    image_mask = image > threshold
    labeled_image = label(image_mask)
    regions = regionprops(labeled_image)
    n_regions = len(regions)
    # logger.info(f'sigma={sigma:.4f} threshold={threshold:.4f} n_regions={n_regions} target={n_target_regions}')
    return np.abs(n_regions - n_target_regions)
optimise_sigma(image, n_target_regions, sigma_bounds=(0.1, 20.0))

Find the optimal value of sigma to get n_target regions.

Source code in exod/processing/detector.py
128
129
130
131
132
def optimise_sigma(self, image, n_target_regions, sigma_bounds=(0.1, 20.0)):
    """Find the optimal value of sigma to get n_target regions."""
    result = minimize_scalar(self.objective_function, args=(image, n_target_regions), bounds=sigma_bounds, method='bounded')
    logger.info(f'Optimisation results:\n{result}')
    return result.x

calc_ks_poission(lc)

Calculate the KS Probability assuming the lightcurve was created from a Poisson distribution with the mean of the lightcurve.

Source code in exod/processing/detector.py
281
282
283
284
285
286
287
288
289
290
291
292
293
def calc_ks_poission(lc):
    """
    Calculate the KS Probability assuming the lightcurve was
    created from a Poisson distribution with the mean of the lightcurve.
    """
    #logger.info("Calculating KS Probability, assuming a Poission Distribution")
    lc_mean = mean_of_poisson = np.nanmean(lc)
    N_data = len(lc)
    result = kstest(lc, [lc_mean] * N_data)
    expected_distribution = poisson(mean_of_poisson)
    ks_res = kstest(lc, expected_distribution.cdf)
    logger.debug(f'KS_prob: lc_mean = {lc_mean}, N_data = {N_data}\nks_res = {ks_res}')
    return ks_res

plot_image_with_regions(image, df_regions, cbar_label=None, savepath=None)

Plot the variability image with the bounding boxes of the detected regions.

Parameters

image : np.ndarray : Variability Image (or Likelihood Image) df_regions : pd.DataFrame : from extract_variability_regions cbar_label : string : Label for colorbar savepath : str : Path to save the figure to

Source code in exod/processing/detector.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def plot_image_with_regions(image, df_regions, cbar_label=None, savepath=None):
    """
    Plot the variability image with the bounding boxes of the detected regions.

    Parameters
    ----------
    image       : np.ndarray : Variability Image  (or Likelihood Image)
    df_regions  : pd.DataFrame : from extract_variability_regions
    cbar_label  : string : Label for colorbar
    savepath    : str : Path to save the figure to
    """
    logger.info('Plotting Image with source regions')

    fig, ax = plt.subplots(figsize=(8,8))
    ax.set_title(f'Detected Regions : {len(df_regions)}')

    cmap = cmap_image()

    norm = ImageNormalize(stretch=SqrtStretch()) #LogNorm()

    m1 = ax.imshow(image.T, norm=norm, interpolation='none', origin='lower', cmap=cmap)
    cbar = plt.colorbar(mappable=m1, ax=ax, shrink=0.75)
    cbar.set_label(cbar_label)

    src_color = 'lime'
    ax.scatter(df_regions['weighted_centroid-0'], df_regions['weighted_centroid-1'], marker='+', s=10, color='white')
    ax.scatter(df_regions['centroid-0'], df_regions['centroid-1'], marker='.', s=10, color=src_color)

    for i, row in df_regions.iterrows():
        ind   = row['label']
        x_cen = row['centroid-0']
        y_cen = row['centroid-1']

        width  = row['bbox-2'] - row['bbox-0']
        height = row['bbox-3'] - row['bbox-1']

        x_pos = x_cen - width / 2
        y_pos = y_cen - height / 2

        rect = patches.Rectangle(xy=(x_pos, y_pos),
                                 width=width,
                                 height=height,
                                 linewidth=1,
                                 edgecolor=src_color,
                                 facecolor='none')

        plt.text(x_pos+width, y_pos+height, str(ind), c=src_color)
        ax.add_patch(rect)
    plt.tight_layout()
    if savepath:
        logger.info(f'Saving Image to: {savepath}')
        plt.savefig(savepath)
    plt.show()

plot_region_lightcurve(df_lcs, i, savepath=None)

Plot the ith region lightcurve.

Source code in exod/processing/detector.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
def plot_region_lightcurve(df_lcs, i, savepath=None):
    """Plot the ith region lightcurve."""
    lc = df_lcs[f'src_{i}']
    fig, ax = plt.subplots(figsize=(10, 4))
    ax2 = ax.twiny()
    ax.step(df_lcs['time'], lc, where='post', color='black', lw=1.0)
    ax2.step(range(len(lc)), lc, where='post', color='none', lw=1.0)

    ax.set_title(f'Source #{i}')
    ax.set_ylabel('Counts (N)')
    ax.set_xlabel('Time (s)')
    ax.set_xlim(df_lcs['time'].min(), df_lcs['time'].max())
    ax2.set_xlabel('Window/Frame Number')
    plt.tight_layout()

    if savepath:
        logger.info(f'Saving lightcurve plot to: {savepath}')
        plt.savefig(savepath)

pipeline

Pipeline

Pipeline to process a single XMM-Newton observation and find variable sources.

Attributes:
obsid (str): Observation ID.
size_arcsec (float): Size of the binned pixels in arcseconds.
time_interval (int): Time interval in seconds.
gti_only (bool): Use only Good Time Intervals.
remove_partial_ccd_frames (bool): Remove partial CCD frames.
min_energy (float): Minimum energy in keV.
max_energy (float): Maximum energy in keV.
clobber (bool): Overwrite existing files.
threshold_sigma (float): Sigma threshold for variability detection.
precomputed_bayes_limit (PrecomputeBayesLimits): Precomputed Bayes Limits object.
observation (Observation): Observation object.

runid (str): Unique identifier for the run.
event_list (EventList): EventList() object.
data_cube (XMMDataCube): XMMDataCube() object containing the number of photons in each cell.
subset_number (int): The subset number.
total_subsets (int): Total number of subsets.
gti_threshold (float): Threshold for Good Time Intervals.
n_regions (int): Number of regions detected.

savedir (Path): Directory to save results to.
Source code in exod/processing/pipeline.py
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
class Pipeline:
    """
    Pipeline to process a single XMM-Newton observation and find variable sources.

        Attributes:
        obsid (str): Observation ID.
        size_arcsec (float): Size of the binned pixels in arcseconds.
        time_interval (int): Time interval in seconds.
        gti_only (bool): Use only Good Time Intervals.
        remove_partial_ccd_frames (bool): Remove partial CCD frames.
        min_energy (float): Minimum energy in keV.
        max_energy (float): Maximum energy in keV.
        clobber (bool): Overwrite existing files.
        threshold_sigma (float): Sigma threshold for variability detection.
        precomputed_bayes_limit (PrecomputeBayesLimits): Precomputed Bayes Limits object.
        observation (Observation): Observation object.

        runid (str): Unique identifier for the run.
        event_list (EventList): EventList() object.
        data_cube (XMMDataCube): XMMDataCube() object containing the number of photons in each cell.
        subset_number (int): The subset number.
        total_subsets (int): Total number of subsets.
        gti_threshold (float): Threshold for Good Time Intervals.
        n_regions (int): Number of regions detected.

        savedir (Path): Directory to save results to.
    """
    def __init__(self, obsid='0724210501', size_arcsec=20.0, time_interval=50, gti_only=False, remove_partial_ccd_frames=True,
                 min_energy=0.2, max_energy=10.0, clobber=False, threshold_sigma=3, **kwargs):
        self.obsid = obsid
        self.size_arcsec = size_arcsec
        self.time_interval = time_interval
        self.gti_only = gti_only
        self.remove_partial_ccd_frames = remove_partial_ccd_frames
        self.min_energy = min_energy
        self.max_energy = max_energy
        self.clobber = clobber
        self.threshold_sigma = threshold_sigma
        self.precomputed_bayes_limit = PrecomputeBayesLimits(threshold_sigma)
        self.observation = Observation(obsid)

        self.runid = None
        self.event_list = None
        self.data_cube = None
        self.subset_number = None
        self.total_subsets = None
        self.gti_threshold = None
        self.n_regions = None

        self.savedir = None

    def generate_runid(self):
        """Generate a unique runid of the form 0765080801_0_50_0.2_12.0"""
        runid = f'{self.obsid}_{self.subset_number}_{self.time_interval}_{self.min_energy}_{self.max_energy}'
        self.runid = runid
        return runid

    def set_savedir(self):
        """Create the directory to save results to."""
        self.savedir = self.observation.path_results / self.generate_runid()
        self.savedir.mkdir(parents=True, exist_ok=True)
        logger.info(f'savedir set to: {self.savedir}')
        return self.savedir

    def pre_process(self):
        """Pre-process the data."""
        self.observation.download_events()
        self.observation.filter_events(clobber=self.clobber)
        self.observation.create_images(clobber=self.clobber)
        self.observation.get_files()
        self.observation.get_events_overlapping_subsets()
        self.subsets       = self.observation.get_events_overlapping_subsets()
        self.total_subsets = self.observation.get_number_of_overlapping_subsets()

    def run(self):
        self.pre_process()
        for i, s in enumerate(self.subsets):
            self.run_subset(subset_number=i, subset_overlapping_exposures=s)

    def run_subset(self, subset_number, subset_overlapping_exposures):
        self.subset_number = subset_number
        self.set_savedir()
        self.generate_runid()

        self.event_list = EventList.from_event_lists(subset_overlapping_exposures)
        self.calculate_bti()
        self.event_list.filter_by_energy(self.min_energy, self.max_energy)

        self.data_cube = DataCube(self.event_list, self.size_arcsec, self.time_interval)
        self.data_cube.calc_gti_bti_bins(bti=self.bti)
        self.data_cube.mask_frames_with_partial_ccd_exposure(mask_frames=self.remove_partial_ccd_frames)
        if self.gti_only:
            self.data_cube.mask_bti()
        cube_n = self.data_cube

        if sas_is_installed():
            img = self.observation.images[0]
            img.read(wcs_only=True)
            wcs = img.wcs
        else:
            wcs = wcs_from_eventlist_file(self.observation.events_raw[0].path)

        # Calculate Expectation Cube
        cube_mu = calc_cube_mu(cube_n, wcs=wcs)
        cube_mask_peaks, cube_mask_eclipses = self.precomputed_bayes_limit.get_cube_masks_peak_and_eclipse(cube_n=cube_n.data, cube_mu=cube_mu)

        # Extract information for each significant data cell (3D pixel)
        df_significant_cells = self.get_significant_cells(cube_mask_peaks, cube_mask_eclipses, cube_n)
        self.filter_events_from_significant_cells(df_significant_cells, self.event_list)
        unique_xy = self.get_significant_pixels_from_cells(df_significant_cells)

        # Plot Lightcurves for each cell.
        for x_cube, y_cube in unique_xy:
            plot_pixel_lc(cube_mu, cube_n, x_cube, y_cube, plot=True)

        # Get Images.
        image_n       = np.nansum(cube_n.data, axis=2)                # Total Counts.
        image_peak    = np.nansum(cube_mask_peaks, axis=2)            # Each Pixel is the number of peaks in cube_n
        image_eclipse = np.nansum(cube_mask_eclipses, axis=2)         # Each Pixel is the number of eclipses in cube_n
        image_mask_combined = (image_peak > 0) | (image_eclipse > 0)  # Combine peaks and eclipses

        df_reg = calc_df_regions(image=image_n, image_mask=image_mask_combined)
        df_reg['detid'] = self.runid + '_' + df_reg['label'].astype('str') # Create detection id from runid_label
        df_sky = get_regions_sky_position(data_cube=cube_n, df_regions=df_reg, wcs=wcs)
        df_regions = pd.concat([df_reg, df_sky], axis=1).reset_index(drop=True)
        self.n_regions = len(df_regions)

        dfs_lcs = get_region_lightcurves(df_regions, cube_n, cube_mu)

        # Plot Lightcurves for each region
        if dfs_lcs:
            for df_lc in dfs_lcs:
                plot_df_lc(df_lc=df_lc, savedir=self.savedir)

        # Plot Image
        plot_detection_image(df_regions, image_eclipse, image_n, image_peak, savepath=self.savedir / 'detection_img.png')

        # Save Results
        self.save_results(dc_info=cube_n.info, df_alerts=df_significant_cells, df_regions=df_regions, dfs_lcs=dfs_lcs,
                          df_bti=self.df_bti, evt_info=self.event_list.info)
        # plt.show()
        # plt.close('all')
        # plt.clf()

    def calculate_bti(self):
        """Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold."""
        self.gti_threshold = get_gti_threshold(self.event_list.N_event_lists)
        t_bin_he, lc_he = self.event_list.get_high_energy_lc(self.time_interval)
        self.bti = get_bti(time=t_bin_he, data=lc_he, threshold=self.gti_threshold)
        self.df_bti = pd.DataFrame(self.bti)

    def multiply_time_interval(self, n_factor):
        logger.info(f'Rebinning the cube with longer timebins by factor {n_factor}...')
        self.data_cube.multiply_time_interval(n_factor)
        self.time_interval = self.data_cube.time_interval
        self.calculate_bti()
        self.data_cube.calc_gti_bti_bins(bti=self.bti)

    def get_significant_pixels_from_cells(self, df_significant_cells):
        """Extract the unique x,y values (2D Pixels) from the significant (3D) cells."""
        if len(df_significant_cells) == 0:
            return []
        unique_xy = list(df_significant_cells.groupby(['x_cube', 'y_cube']).groups.keys())
        return unique_xy

    def filter_events_from_significant_cells(self, df_alerts, event_list):
        """Extract the events in the events list for each significant cell."""
        for i, r in df_alerts.iterrows():
            evt_subset = event_list.get_events_within_bounds(X_lo=r['X_lo'], X_hi=r['X_hi'],
                                                             Y_lo=r['Y_lo'], Y_hi=r['Y_hi'],
                                                             TIME_lo=r['TIME_lo'], TIME_hi=r['TIME_hi'])
            logger.info(r)
            logger.info(evt_subset)
            logger.info(f'N_events = {len(evt_subset)}')

            """
            # Create an image of the significant cell.
            for instrument in np.unique(evt_subset['INSTRUMENT']):
                sub = evt_subset[evt_subset['INSTRUMENT'] == instrument]
                rawx = sub['RAWX']
                rawy = sub['RAWY']
                xbins = np.arange(rawx.min()-1, rawx.max()+1, 1)
                ybins = np.arange(rawy.min()-1, rawy.max()+1, 1)

                try:
                    hist, xedges, yedges = np.histogram2d(rawx, rawy, bins=(xbins, ybins))

                    n = int(hist.max()) + 1
                    cmap = plt.get_cmap('hot', n)
                    boundaries = np.arange(-0.5, n + 0.5, 1)
                    norm = colors.BoundaryNorm(boundaries, ncolors=n)

                    plt.figure()
                    plt.title(f'{instrument} ({r['x_cube']},{r['y_cube']},{r['t_cube']})')
                    plt.imshow(hist.T, origin="lower", aspect="equal", cmap=cmap, norm=norm,
                               extent=(xedges[0], xedges[-1], yedges[0], yedges[-1]), interpolation='none')
                    plt.colorbar(shrink=0.5)
                    plt.xlabel("RAWX")
                    plt.ylabel("RAWY")
                    plt.subplots_adjust(left=0, right=0, top=0, bottom=0)
                except Exception as e:
                    print(f'Could not do {i} {e}')
            """

    def get_significant_cells(self, cube_mask_peaks, cube_mask_eclipses, cube_n):
        """Extract information for each pixel in the datacube that is associated with an alert."""
        logger.info('Extracting significant 3D pixels from data cube.')
        # Get the positions (x,y,t) of the bursts and eclipses
        peak_positions    = np.argwhere(cube_mask_peaks == 1)
        eclipse_positions = np.argwhere(cube_mask_eclipses == 1)

        alerts = []
        for pos in peak_positions:
            pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
            pixel_alert_info['type'] = 'peak'
            alerts.append(pixel_alert_info)

        for pos in eclipse_positions:
            pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
            pixel_alert_info['type'] = 'eclipse'
            alerts.append(pixel_alert_info)

        df_alerts = pd.DataFrame(alerts)
        logger.info('df_alerts:')
        logger.info(df_alerts)
        return df_alerts

    def extract_pixel_alert_info(self, cube_n, pixel_xyt_pos):
        x_cube, y_cube, t_cube = pixel_xyt_pos
        X    = cube_n.bin_x[x_cube]
        Y    = cube_n.bin_y[y_cube]
        TIME = cube_n.bin_t[t_cube]

        X_size, Y_size, TIME_size = cube_n.pixel_size, cube_n.pixel_size, self.time_interval

        t_depth   = 2
        TIME_size = t_depth * TIME_size

        alert = {'x_cube'    : x_cube,
                 'y_cube'    : y_cube,
                 't_cube'    : t_cube,
                 'X'         : X,
                 'Y'         : Y,
                 'TIME'      : TIME,
                 'X_size'    : X_size,
                 'Y_size'    : Y_size,
                 'TIME_size' : TIME_size,
                 'X_lo'      : (X - X_size),
                 'X_hi'      : (X + X_size),
                 'Y_lo'      : (Y - Y_size),
                 'Y_hi'      : (Y + Y_size),
                 'TIME_lo'   : (TIME - TIME_size),
                 'TIME_hi'   : (TIME + TIME_size)}
        return alert

    def save_results(self, dc_info, df_alerts, df_regions, dfs_lcs, df_bti, evt_info):
        results = {}
        # Collect DataFrames
        if dfs_lcs:
            for i, df_lc in enumerate(dfs_lcs):
                results[f'lc_{i}'] = df_lc

        results['bti']     = df_bti
        results['regions'] = df_regions
        results['alerts']  = df_alerts

        # Collect info
        results['obs_info'] = self.observation.info
        results['evt_info'] = evt_info
        results['dc_info']  = dc_info
        results['run_info'] = self.info
        for k, v in results.items():
            save_result(key=k, value=v, runid=self.runid, savedir=self.savedir)

    def load_subset_results(self, subset_number):
        """Load results for a single observation subset. returns a dictionary."""
        self.subset_number = subset_number
        self.set_savedir()
        results = {}
        csv_files = list(self.savedir.glob('*.csv'))
        n_csv_files = len(csv_files)
        if n_csv_files == 0:
            logger.info(f'No csv files found in {self.savedir}!')
            return None
        for f in csv_files:
            if 'info' in f.stem:
                results[f.stem] = load_info(loadpath=f)
            else:
                results[f.stem] = load_df(loadpath=f)
        return results

    def load_results(self):
        """Load results for all observation subsets, returns a list of dictionaries."""
        all_results = [self.load_subset_results(i) for i in range(self.total_subsets)]
        return all_results

    @property
    def info(self):
        info = {
            'runid'                     : self.runid,
            'obsid'                     : self.obsid,
            'subset_number'             : self.subset_number,
            'total_subsets'             : self.total_subsets,
            'size_arcsec'               : self.size_arcsec,
            'time_interval'             : self.time_interval,
            'gti_only'                  : self.gti_only,
            'gti_threshold'             : self.gti_threshold,
            'remove_partial_ccd_frames' : self.remove_partial_ccd_frames,
            'min_energy'                : self.min_energy,
            'max_energy'                : self.max_energy,
            'clobber'                   : self.clobber,
            'threshold_sigma'           : self.threshold_sigma,
            'savedir'                   : self.savedir,
            'n_regions'                 : self.n_regions,
        }
        for k, v in info.items():
            logger.info(f'{k:>25} : {v}')
        return info
calculate_bti()

Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold.

Source code in exod/processing/pipeline.py
170
171
172
173
174
175
def calculate_bti(self):
    """Calculate the bad time intervals (BTI) based upon the lightcurve from the eventlist and the threshold."""
    self.gti_threshold = get_gti_threshold(self.event_list.N_event_lists)
    t_bin_he, lc_he = self.event_list.get_high_energy_lc(self.time_interval)
    self.bti = get_bti(time=t_bin_he, data=lc_he, threshold=self.gti_threshold)
    self.df_bti = pd.DataFrame(self.bti)
filter_events_from_significant_cells(df_alerts, event_list)

Extract the events in the events list for each significant cell.

Source code in exod/processing/pipeline.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
def filter_events_from_significant_cells(self, df_alerts, event_list):
    """Extract the events in the events list for each significant cell."""
    for i, r in df_alerts.iterrows():
        evt_subset = event_list.get_events_within_bounds(X_lo=r['X_lo'], X_hi=r['X_hi'],
                                                         Y_lo=r['Y_lo'], Y_hi=r['Y_hi'],
                                                         TIME_lo=r['TIME_lo'], TIME_hi=r['TIME_hi'])
        logger.info(r)
        logger.info(evt_subset)
        logger.info(f'N_events = {len(evt_subset)}')

        """
        # Create an image of the significant cell.
        for instrument in np.unique(evt_subset['INSTRUMENT']):
            sub = evt_subset[evt_subset['INSTRUMENT'] == instrument]
            rawx = sub['RAWX']
            rawy = sub['RAWY']
            xbins = np.arange(rawx.min()-1, rawx.max()+1, 1)
            ybins = np.arange(rawy.min()-1, rawy.max()+1, 1)

            try:
                hist, xedges, yedges = np.histogram2d(rawx, rawy, bins=(xbins, ybins))

                n = int(hist.max()) + 1
                cmap = plt.get_cmap('hot', n)
                boundaries = np.arange(-0.5, n + 0.5, 1)
                norm = colors.BoundaryNorm(boundaries, ncolors=n)

                plt.figure()
                plt.title(f'{instrument} ({r['x_cube']},{r['y_cube']},{r['t_cube']})')
                plt.imshow(hist.T, origin="lower", aspect="equal", cmap=cmap, norm=norm,
                           extent=(xedges[0], xedges[-1], yedges[0], yedges[-1]), interpolation='none')
                plt.colorbar(shrink=0.5)
                plt.xlabel("RAWX")
                plt.ylabel("RAWY")
                plt.subplots_adjust(left=0, right=0, top=0, bottom=0)
            except Exception as e:
                print(f'Could not do {i} {e}')
        """
generate_runid()

Generate a unique runid of the form 0765080801_0_50_0.2_12.0

Source code in exod/processing/pipeline.py
77
78
79
80
81
def generate_runid(self):
    """Generate a unique runid of the form 0765080801_0_50_0.2_12.0"""
    runid = f'{self.obsid}_{self.subset_number}_{self.time_interval}_{self.min_energy}_{self.max_energy}'
    self.runid = runid
    return runid
get_significant_cells(cube_mask_peaks, cube_mask_eclipses, cube_n)

Extract information for each pixel in the datacube that is associated with an alert.

Source code in exod/processing/pipeline.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def get_significant_cells(self, cube_mask_peaks, cube_mask_eclipses, cube_n):
    """Extract information for each pixel in the datacube that is associated with an alert."""
    logger.info('Extracting significant 3D pixels from data cube.')
    # Get the positions (x,y,t) of the bursts and eclipses
    peak_positions    = np.argwhere(cube_mask_peaks == 1)
    eclipse_positions = np.argwhere(cube_mask_eclipses == 1)

    alerts = []
    for pos in peak_positions:
        pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
        pixel_alert_info['type'] = 'peak'
        alerts.append(pixel_alert_info)

    for pos in eclipse_positions:
        pixel_alert_info = self.extract_pixel_alert_info(cube_n, pos)
        pixel_alert_info['type'] = 'eclipse'
        alerts.append(pixel_alert_info)

    df_alerts = pd.DataFrame(alerts)
    logger.info('df_alerts:')
    logger.info(df_alerts)
    return df_alerts
get_significant_pixels_from_cells(df_significant_cells)

Extract the unique x,y values (2D Pixels) from the significant (3D) cells.

Source code in exod/processing/pipeline.py
184
185
186
187
188
189
def get_significant_pixels_from_cells(self, df_significant_cells):
    """Extract the unique x,y values (2D Pixels) from the significant (3D) cells."""
    if len(df_significant_cells) == 0:
        return []
    unique_xy = list(df_significant_cells.groupby(['x_cube', 'y_cube']).groups.keys())
    return unique_xy
load_results()

Load results for all observation subsets, returns a list of dictionaries.

Source code in exod/processing/pipeline.py
317
318
319
320
def load_results(self):
    """Load results for all observation subsets, returns a list of dictionaries."""
    all_results = [self.load_subset_results(i) for i in range(self.total_subsets)]
    return all_results
load_subset_results(subset_number)

Load results for a single observation subset. returns a dictionary.

Source code in exod/processing/pipeline.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
def load_subset_results(self, subset_number):
    """Load results for a single observation subset. returns a dictionary."""
    self.subset_number = subset_number
    self.set_savedir()
    results = {}
    csv_files = list(self.savedir.glob('*.csv'))
    n_csv_files = len(csv_files)
    if n_csv_files == 0:
        logger.info(f'No csv files found in {self.savedir}!')
        return None
    for f in csv_files:
        if 'info' in f.stem:
            results[f.stem] = load_info(loadpath=f)
        else:
            results[f.stem] = load_df(loadpath=f)
    return results
pre_process()

Pre-process the data.

Source code in exod/processing/pipeline.py
90
91
92
93
94
95
96
97
98
def pre_process(self):
    """Pre-process the data."""
    self.observation.download_events()
    self.observation.filter_events(clobber=self.clobber)
    self.observation.create_images(clobber=self.clobber)
    self.observation.get_files()
    self.observation.get_events_overlapping_subsets()
    self.subsets       = self.observation.get_events_overlapping_subsets()
    self.total_subsets = self.observation.get_number_of_overlapping_subsets()
set_savedir()

Create the directory to save results to.

Source code in exod/processing/pipeline.py
83
84
85
86
87
88
def set_savedir(self):
    """Create the directory to save results to."""
    self.savedir = self.observation.path_results / self.generate_runid()
    self.savedir.mkdir(parents=True, exist_ok=True)
    logger.info(f'savedir set to: {self.savedir}')
    return self.savedir

combine_results(obsids)

Get the results for each of the observations ids and combine them into joint dataframes and save them into their respective paths in exod.utils.path.data_combined.

Will raise an error if there are already files data_combined, to avoid accidentally deleting data.

Parameters:

Name Type Description Default
obsids list

list of observation IDs

required
Source code in exod/processing/pipeline.py
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
def combine_results(obsids):
    """
    Get the results for each of the observations ids and combine them into joint dataframes and save them into
    their respective paths in exod.utils.path.data_combined.

    Will raise an error if there are already files data_combined, to avoid accidentally deleting data.

    Parameters:
        obsids (list): list of observation IDs
    """
    if any([p.exists() for p in savepaths_combined.values()]):
        raise FileExistsError(f'There are already some combined files in the combined data path! '
                              f'Be careful! You dont want to accidentally overwrite these!')

    all_results = []
    for params in parameter_grid(obsids=obsids):
        p = Pipeline(**params)
        results_list = p.load_results()
        if results_list:
            for r in results_list:
                all_results.append(r)

    logger.info('Combining all DataFrames')
    df_bti      = pd.concat([r.get('bti') for r in all_results], axis=0)
    df_regions  = pd.concat([r.get('regions') for r in all_results], axis=0)
    df_alerts   = pd.concat([r.get('alerts') for r in all_results], axis=0)
    df_lc       = pd.concat([df for r in all_results for key, df in r.items() if key.startswith('lc_')])
    df_run_info = pd.DataFrame([r['run_info'] for r in all_results])
    df_obs_info = pd.DataFrame([r['obs_info'] for r in all_results])
    df_dc_info  = pd.DataFrame([r['dc_info'] for r in all_results])
    df_evt_info = pd.DataFrame([r['evt_info'] for r in all_results])

    make_df_lc_idx(df_lc)

    logger.info('Saving all DataFrames...')
    df_bti.to_csv(savepaths_combined['bti'], index=False)
    df_regions.to_csv(savepaths_combined['regions'], index=False)
    df_alerts.to_csv(savepaths_combined['alerts'], index=False)
    df_lc.to_hdf(savepaths_combined['lc'], key='df_lc', index=False, mode='w', format='table')
    df_run_info.to_csv(savepaths_combined['run_info'], index=False)
    df_obs_info.to_csv(savepaths_combined['obs_info'], index=False)
    df_dc_info.to_csv(savepaths_combined['dc_info'], index=False)
    df_evt_info.to_csv(savepaths_combined['evt_info'], index=False)

    logger.info(df_bti)
    logger.info(df_regions)
    logger.info(df_alerts)
    logger.info(df_lc)
    logger.info(df_run_info)
    logger.info(df_obs_info)
    logger.info(df_dc_info)
    logger.info(df_evt_info)

make_df_lc_idx(df_lc)

Calculate the start and end indexs of each of the lightcurves. This is done so that a specific runid + label combination can be quickly accessed.

Source code in exod/processing/pipeline.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
def make_df_lc_idx(df_lc):
    """
    Calculate the start and end indexs of each of the lightcurves.
    This is done so that a specific runid + label combination can be quickly accessed.
    """
    df_lc = df_lc.reset_index(drop=True)
    tot = 0
    combinations = {}
    for (label, runid), i in df_lc.groupby(['label', 'runid']).groups.items():
        combinations[(label, runid)] = (i[0], i[-1])

    df_start_stop = pd.DataFrame.from_dict(combinations, orient='index', columns=['start', 'stop'])
    df_start_stop = df_start_stop.sort_values('start')
    df_start_stop.to_csv(savepaths_combined['lc_idx'])
    logger.info(f'Saved df_lc_idx to {savepaths_combined["lc_idx"]}')

parameter_grid(obsids)

usage: for params in parameter_grid(['0792180301', '0792180302', ...]: print(params) pipeline = Pipeline(**params)

Parameters:

Name Type Description Default
obsids list

list of obsids

required

Returns params (dict): parameters for a run.

Source code in exod/processing/pipeline.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
def parameter_grid(obsids):
    """
    usage:
    for params in parameter_grid(['0792180301', '0792180302', ...]:
        print(params)
        pipeline = Pipeline(**params)

    Parameters:
        obsids (list): list of obsids

    Returns
        params (dict): parameters for a run.
    """
    parameter_grid = {
        'obsid'                     : obsids,
        'size_arcsec'               : [20.0],
        'time_interval'             : [300, 600, 1000],
        'gti_only'                  : [False],
        'remove_partial_ccd_frames' : [True],
        'energy_ranges'             : [[0.2, 2.0], [2.0, 12.0], [0.2, 12.0]],
        'clobber'                   : [False],
        'threshold_sigma'           : [3],
    }
    parameter_combinations = list(itertools.product(*parameter_grid.values()))
    logger.info(f'{len(parameter_combinations)} parameter combinations\n'
                f'predicted size ~{(500*len(parameter_combinations)) / 1e6} Gb\n'
                f'predicted runtime ~{1*len(parameter_combinations)/60/24:.2f} days (@ 1min/obs)')

    for combination in parameter_combinations:
        params = dict(zip(parameter_grid.keys(), combination))
        params['min_energy'] = params['energy_ranges'][0]
        params['max_energy'] = params['energy_ranges'][1]
        del(params['energy_ranges'])
        yield params

plot_pixel_lc(cube_mu, cube_n, x, y, plot=True)

Plot the lightcurve for a specific pixel (x,y), shows the three and five sigma error regions.

Source code in exod/processing/pipeline.py
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
def plot_pixel_lc(cube_mu, cube_n, x, y, plot=True):
    """Plot the lightcurve for a specific pixel (x,y), shows the three and five sigma error regions."""
    if not plot:
        return None
    cube_mu_xy = cube_mu[x, y]
    cube_data_xy = cube_n.data[x, y]
    time_interval = cube_n.time_interval

    # Plot lightcurves
    lw = 1.0
    frame_axis = np.arange(cube_mu.shape[2])  # Frame Number
    time_axis  = frame_axis * time_interval   # Zero'ed Time
    time_axis2 = cube_n.bin_t[:-1]            # Observation Time

    pbl_3_sig = PrecomputeBayesLimits(threshold_sigma=3)
    pbl_5_sig = PrecomputeBayesLimits(threshold_sigma=5)

    fig, ax = plt.subplots(2, 1, figsize=(15, 5), gridspec_kw={'height_ratios': [10, 1]}, sharex=True)
    ax[0].fill_between(time_axis, pbl_5_sig.n_eclipse_threshold(cube_mu_xy), pbl_5_sig.n_peak_threshold(cube_mu_xy), alpha=0.3, facecolor='steelblue', label=r'5 $\sigma$')
    ax[0].fill_between(time_axis, pbl_3_sig.n_eclipse_threshold(cube_mu_xy), pbl_3_sig.n_peak_threshold(cube_mu_xy), alpha=0.5, facecolor='steelblue', label=r'3 $\sigma$')
    ax[0].step(time_axis, cube_data_xy, color='black', where='mid', lw=lw, label=r'Observed ($n$)')
    ax[0].step(time_axis, cube_mu_xy, color='red', where='mid', lw=lw, label=r'Expected ($\mu$)')
    ax[1].step(time_axis, cube_mu_xy, color='red', where='mid', lw=lw, label=r'Expected ($\mu$)')
    ax2 = ax[0].twiny()
    ax2.plot(frame_axis, cube_data_xy, color='none')
    ax2.set_xlabel("Frame #")
    ax[0].legend()
    ax[1].set_xlabel("Time (s)")
    ax[0].set_ylabel("Counts")
    ax[1].set_ylabel("bkg")
    ax[0].set_ylim(0)
    ax[0].set_xlim(np.min(time_axis), np.max(time_axis))
    plt.subplots_adjust(hspace=0)
    plt.suptitle(f'Lightcurve for pixel x={x} y={y}')

utils

path

check_file_exists(file_path, clobber=True)

Check if a file exists and raise FileExistsError if clobber is False.

Parameters: - file_path (str or Path): The path to the file. - clobber (bool): If True, overwrite the file if it exists.

Raises: - FileExistsError: If the file exists and clobber is False.

Source code in exod/utils/path.py
68
69
70
71
72
73
74
75
76
77
78
79
80
def check_file_exists(file_path, clobber=True):
    """
    Check if a file exists and raise FileExistsError if clobber is False.

    Parameters:
    - file_path (str or Path): The path to the file.
    - clobber (bool): If True, overwrite the file if it exists.

    Raises:
    - FileExistsError: If the file exists and clobber is False.
    """
    if not clobber and Path(file_path).exists():
        raise FileExistsError(f'File {file_path} exists and clobber={clobber}!')

create_all_paths()

Create all paths if they don't exist.

Source code in exod/utils/path.py
62
63
64
65
def create_all_paths():
    """Create all paths if they don't exist."""
    for name, path in all_paths.items():
        os.makedirs(path, exist_ok=True)

read_observation_ids(file_path)

Read observation IDs from file. Each line should be a single observation.

Source code in exod/utils/path.py
83
84
85
86
87
88
89
90
def read_observation_ids(file_path):
    """
    Read observation IDs from file.
    Each line should be a single observation.
    """
    with open(file_path, 'r') as file:
        obs_ids = [line.strip() for line in file.readlines()]
    return obs_ids

plotting

cmap_image()

Colormap used in production of images.

Source code in exod/utils/plotting.py
32
33
34
35
36
def cmap_image():
    """Colormap used in production of images."""
    cmap = matplotlib.colormaps['hot']
    cmap.set_bad('black')
    return cmap

compare_images(images, titles, log=False, plot=False)

Creates a slideshow of multiple 2D numpy arrays of the same size. Args: images (list of np.ndarray): list of numpy arrays titles (list): titles of images log (bool): use Log normalisation for image plots. plot (bool): switch to disable or enable plotting.

Source code in exod/utils/plotting.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def compare_images(images, titles, log=False, plot=False):
    """
    Creates a slideshow of multiple 2D numpy arrays of the same size.
    Args:
        images (list of np.ndarray): list of numpy arrays
        titles (list): titles of images
        log (bool): use Log normalisation for image plots.
        plot (bool): switch to disable or enable plotting.
    """
    if not plot:
        return None

    N_images = len(images)

    fig, ax = plt.subplots(figsize=(7, 7))
    ax.set_title(f'images[0]={titles[0]}')
    norm = None
    if log:
        norm = LogNorm()

    im = ax.imshow(images[0], norm=norm, cmap=cmap_image(), interpolation='none')
    plt.colorbar(im, ax=ax, shrink=0.75)
    plt.tight_layout()

    def update(frame):
        i = frame % N_images
        im.set_array(images[i])
        ax.set_title(f'images[{i}]={titles[i]}')

    ani = FuncAnimation(fig, update, frames=range(N_images), interval=1000)  # Adjust frames and interval as needed
    plt.show()

fig2data_url(fig)

Convert a matplotlib Figure() to a base64 png data url (for use in a html tag)

Source code in exod/utils/plotting.py
301
302
303
304
305
306
307
def fig2data_url(fig):
    """Convert a matplotlib Figure() to a base64 png data url (for use in a html <img> tag)"""
    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    data_url = base64.b64encode(buf.read()).decode('ascii')
    return data_url

get_image_limits(image)

Get the x and y limits of the non-zero pixels in an image.

Source code in exod/utils/plotting.py
265
266
267
268
269
270
271
272
273
274
275
def get_image_limits(image):
    """Get the x and y limits of the non-zero pixels in an image."""
    # Find the non-zero rows and columns
    non_zero_rows = np.any(image != 0, axis=1)
    non_zero_cols = np.any(image != 0, axis=0)

    # Get the min and max of the non-zero rows and columns
    ymin, ymax = np.where(non_zero_rows)[0][[0, -1]]
    xmin, xmax = np.where(non_zero_cols)[0][[0, -1]]

    return (xmin, xmax), (ymin, ymax)

interactive_scatter_with_rectangles(x_data, y_data)

Plot x and y data, allowing the user to interactively select rectangles. Prints the (x, y, width, height) of the selected rectangle in the terminal.

Parameters:

Name Type Description Default
x_data array

the x coordinates of the scatter data.

required
y_data array

the y coordinates of the scatter data.

required
Source code in exod/utils/plotting.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
def interactive_scatter_with_rectangles(x_data, y_data):
    """
    Plot x and y data, allowing the user to interactively select rectangles.
    Prints the (x, y, width, height) of the selected rectangle in the terminal.

    Args:
        x_data (array): the x coordinates of the scatter data.
        y_data (array): the y coordinates of the scatter data.
    """
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, color='black', alpha=0.5, s=1)

    def onselect(eclick, erelease):
        x1, y1 = eclick.xdata, eclick.ydata
        x2, y2 = erelease.xdata, erelease.ydata
        width  = abs(x2 - x1)
        height = abs(y2 - y1)

        print(f"({x1:.2f}, {y1:.2f}, {width:.2f}, {height:.2f}) (x,y,w,h)")
        rect = plt.Rectangle((min(x1, x2), min(y1, y2)), width, height, linewidth=2, edgecolor='red', facecolor='none')
        ax.add_patch(rect)
        last_rect = rect
        plt.draw()

    rect_selector = RectangleSelector(ax, onselect, useblit=True, button=[1], minspanx=0.1, minspany=0.1, spancoords='data', interactive=True)
    plt.show()

plot_3d_image(image)

Plot an image as a 3D surface

Source code in exod/utils/plotting.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def plot_3d_image(image):
    """Plot an image as a 3D surface"""
    xx, yy = np.mgrid[0:image.shape[0], 0:image.shape[1]]
    fig = plt.figure(figsize=(15, 15))
    ax = fig.add_subplot(projection='3d')
    ax.plot_surface(xx, yy, image, rstride=1, cstride=1, cmap='plasma', linewidth=0)  # , antialiased=False

    ax.grid(False)
    ax.set_xticks([])
    ax.set_yticks([])
    # ax.set_zticks([])
    # ax.set_zlim(0,100)ax.set_zticks([])
    plt.tight_layout()
    plt.show()

plot_aitoff(ra_deg, dec_deg, savepath=None, color='grey', title=None)

Plot ra and dec coordinates on an aitoff plot.

Parameters:

Name Type Description Default
ra_deg list

ras in degrees.

required
dec_deg list

decs in degrees.

required
savepath str or Path

savepath for plot.

None
color str

color for the marker points.

'grey'
title str

Title for the plot.

None
Source code in exod/utils/plotting.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def plot_aitoff(ra_deg, dec_deg, savepath=None, color='grey', title=None):
    """
    Plot ra and dec coordinates on an aitoff plot.

    Args:
        ra_deg (list): ras in degrees.
        dec_deg (list): decs in degrees.
        savepath (str or Path): savepath for plot.
        color (str): color for the marker points.
        title (str): Title for the plot.
    """
    sky_coords = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg', frame='fk5', equinox='J2000')
    gal_coords = sky_coords.galactic

    l = -gal_coords.l.wrap_at(180 * u.deg).radian
    b = gal_coords.b.radian

    plt.figure()

    plt.subplot(111, projection='aitoff')
    plt.scatter(l, b, marker='.', s=1, color=color)
    plt.tight_layout()
    if title:
        plt.title(title)
    if savepath:
        logger.info(f'Saving Aitoff plot to {savepath}')
        plt.savefig(savepath)

plot_aitoff_density(ra_deg, dec_deg, savepath=None)

Plot ra and dec coordinates on an aitoff plot and color them by spatial density.

Parameters:

Name Type Description Default
ra_deg list

ras in degrees.

required
dec_deg list

decs in degrees.

required
savepath str or Path

savepath for plot.

None
Source code in exod/utils/plotting.py
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def plot_aitoff_density(ra_deg, dec_deg, savepath=None):
    """
    Plot ra and dec coordinates on an aitoff plot and color them by spatial density.

    Args:
        ra_deg (list): ras in degrees.
        dec_deg (list): decs in degrees.
        savepath (str or Path): savepath for plot.
    """
    sky_coords = SkyCoord(ra=ra_deg, dec=dec_deg, unit='deg', frame='fk5', equinox='J2000')
    gal_coords = sky_coords.galactic

    l = -gal_coords.l.wrap_at(180 * u.deg).radian
    b = gal_coords.b.radian

    # Calculate the density of points (this is kinda slow)
    xy = np.vstack([l, b])
    z = gaussian_kde(xy)(xy)

    fig = plt.figure(figsize=(5, 2.9))

    plt.subplot(111, projection='aitoff')
    # plt.grid(linewidth=0.5)
    # plt.scatter(l, b, marker='.', s=1, color='cyan', rasterized=True)
    plt.scatter(l, b, marker='.', s=1, c=z, cmap=cmasher.cosmic, rasterized=True)
    plt.tight_layout()

    xticks = ['210°', '240°', '270°', '300°', '330°', '0°', '30°', '60°', '90°', '120°', '150°']
    plt.xticks(fig.get_axes()[0].get_xticks(), xticks)
    # plt.xlabel('Galactic longitude (l)')
    # plt.ylabel('Galactic latitude (b)')
    if savepath:
        logger.info(f'Saving to {savepath}')
        plt.savefig(savepath, dpi=300)

plot_cube_statistics(data)

Plot various first order functions for a 3-dimensional data cube.

Source code in exod/utils/plotting.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def plot_cube_statistics(data):
    """Plot various first order functions for a 3-dimensional data cube."""
    cube = data
    logger.info('Calculating and plotting data cube statistics...')
    image_max = np.nanmax(cube, axis=2)
    image_min = np.nanmin(cube, axis=2)  # The Minimum and median are basically junk
    image_median = np.nanmedian(cube, axis=2)
    image_mean = np.nanmean(cube, axis=2)
    image_std = np.nanstd(cube, axis=2)
    image_sum = np.nansum(cube, axis=2)

    fig, ax = plt.subplots(2, 3, figsize=(15, 10))
    # Plotting images
    cmap = cmap_image()
    im_max = ax[0, 0].imshow(image_max.T, interpolation='none', origin='lower', cmap=cmap)
    im_min = ax[0, 1].imshow(image_min.T, interpolation='none', origin='lower', cmap=cmap)
    im_mean = ax[1, 0].imshow(image_mean.T, interpolation='none', origin='lower', cmap=cmap)
    im_median = ax[1, 1].imshow(image_median.T, interpolation='none', origin='lower', cmap=cmap)
    im_std = ax[1, 2].imshow(image_std.T, interpolation='none', origin='lower', cmap=cmap)
    im_sum = ax[0, 2].imshow(image_sum.T, interpolation='none', origin='lower', cmap=cmap)

    # Adding colorbars
    shrink = 0.55
    cbar_max = fig.colorbar(im_max, ax=ax[0, 0], shrink=shrink)
    cbar_min = fig.colorbar(im_min, ax=ax[0, 1], shrink=shrink)
    cbar_mean = fig.colorbar(im_mean, ax=ax[1, 0], shrink=shrink)
    cbar_median = fig.colorbar(im_median, ax=ax[1, 1], shrink=shrink)
    cbar_std = fig.colorbar(im_std, ax=ax[1, 2], shrink=shrink)
    cbar_sum = fig.colorbar(im_sum, ax=ax[0, 2], shrink=shrink)

    # Setting titles
    ax[0, 0].set_title('max')
    ax[0, 1].set_title('min')
    ax[1, 0].set_title('mean')
    ax[1, 1].set_title('median')
    ax[1, 2].set_title('std')
    ax[0, 2].set_title('sum')
    plt.tight_layout()

    plt.show()

plot_event_list_ccds(table)

Plot the images for each CCD for a given eventlist.

Parameters:

Name Type Description Default
table Table

Event List Table.

required

Returns:

Name Type Description
fig Figure

Figure with each subplots containing the image for the specific CCD.

Source code in exod/utils/plotting.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
def plot_event_list_ccds(table):
    """
    Plot the images for each CCD for a given eventlist.

    Parameters:
        table (astropy.table.Table): Event List Table.

    Returns:
        fig (matplotlib.figure.Figure): Figure with each subplots containing the image for the specific CCD.
    """
    ccdnrs = np.unique(table['CCDNR'])
    fig, ax = plt.subplots((len(ccdnrs)+1)//2, 2 , figsize=(20,20))
    ax = ax.flatten()
    for i, ccdnr in enumerate(ccdnrs):
        sub = table[table['CCDNR'] == ccdnr]
        xbins = np.arange(sub['RAWX'].min(), sub['RAWX'].max(), 1)  
        ybins = np.arange(sub['RAWY'].min(), sub['RAWY'].max(), 1) 
        im, _, _ = np.histogram2d(x=sub['RAWX'], y=sub['RAWY'], bins=[xbins,ybins])
        ax[i].imshow(im, cmap='hot', origin='lower', interpolation='none', aspect='equal', norm=LogNorm())
        ax[i].text(5, 5, s=ccdnr, color='white', bbox=dict(facecolor='black', edgecolor='white'))
    plt.show()
    return fig

plot_frame_masks(instrum, masks, labels, plot=False)

Helper function to view which frames were masked during processing making of the data cube.

Parameters:

Name Type Description Default
instrum str

Name of the instruement.

required
masks list of np.arrays

list of masks (true/false arrays) corresponding

required
labels(list of str

labels for each mask.

required
plot bool

switch for turn on/off plotting

False
Source code in exod/utils/plotting.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def plot_frame_masks(instrum, masks, labels, plot=False):
    """
    Helper function to view which frames were masked during processing making of the data cube.

    Args:
        instrum (str): Name of the instruement.
        masks (list of np.arrays): list of masks (true/false arrays) corresponding
        labels(list of str): labels for each mask.
        plot (bool): switch for turn on/off plotting
    """
    if not plot:
        return None
    cmap = ListedColormap([[1, 0, 0], [0, 1, 0]])
    mask_stack = np.vstack(masks)

    fig, ax = plt.subplots(figsize=(13, 5))
    ax.set_title(f'{instrum} | Frame Masks | Green=True, Red=False')
    ax.imshow(mask_stack, cmap=cmap, aspect='auto', interpolation='none')
    ax.set_yticks(np.arange(len(labels)))
    ax.set_yticklabels(labels)
    ax.set_xlabel('Frame Number')
    plt.tight_layout()
    plt.show()

plot_image(image_arr, title, log=False)

Plot a 2D numpy array as an image.

Source code in exod/utils/plotting.py
39
40
41
42
43
44
45
46
47
48
49
50
def plot_image(image_arr, title, log=False):
    """Plot a 2D numpy array as an image."""
    plt.figure(figsize=(7,7))
    plt.title(title)
    norm = None
    if log:
        norm = LogNorm()

    plt.imshow(image_arr, norm=norm, cmap=cmap_image(), interpolation='none')
    plt.colorbar(shrink=0.75)
    plt.tight_layout()
    plt.show()

set_latex_font()

Set matplotlib global font to STIX (same as latex).

Source code in exod/utils/plotting.py
26
27
28
29
def set_latex_font():
    """Set matplotlib global font to STIX (same as latex)."""
    matplotlib.rcParams['mathtext.fontset'] = 'stix'
    matplotlib.rcParams['font.family'] = 'STIXGeneral'

use_scienceplots()

Use the scienceplots module for matplotlib plots.

Source code in exod/utils/plotting.py
18
19
20
21
22
23
24
def use_scienceplots():
    """Use the scienceplots module for matplotlib plots."""
    try:
        import scienceplots
        plt.style.use('science')
    except ModuleNotFoundError:
        pass

synthetic_data

check_synthetic_peak_counts_diff(obsid)

Evaluate differences in peak counts for synthetic bursts added to XMM-Newton data cubes.

Parameters:

Name Type Description Default
obsid str

XMM Observation ID

required
Source code in exod/utils/synthetic_data.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
def check_synthetic_peak_counts_diff(obsid):
    """
    Evaluate differences in peak counts for synthetic bursts added to XMM-Newton data cubes.

    Parameters:
        obsid (str): XMM Observation ID
    """
    # Load observation and event data and WCS
    observation = Observation(obsid)
    observation.get_files()
    observation.get_events_overlapping_subsets()
    event_list = EventList.from_event_lists(observation.events_overlapping_subsets[0])
    event_list.read()

    # Parameters for the synthetic test
    size_arcsec = 20
    amplitudes = [0.5, 2, 5, 10]
    time_bins = [500, 1000, 2000]
    num_trials = 40

    all_res = []
    for time_bin in tqdm(time_bins):
        data_cube = DataCube(event_list=event_list, size_arcsec=size_arcsec, time_interval=time_bin)

        for amplitude in tqdm(amplitudes):
            count_differences = []

            for i in range(num_trials):
                # Random position and time fraction for the synthetic burst
                x_pos = np.random.randint(low=5, high=data_cube.data.shape[0] - 5)
                y_pos = np.random.randint(low=5, high=data_cube.data.shape[1] - 5)
                time_fraction = np.random.random()

                # Add synthetic burst to the data cube
                data_cube_with_burst_only = create_fake_onebin_burst(
                    data_cube=data_cube,
                    x_pos=x_pos,
                    y_pos=y_pos,
                    time_peak_fraction=time_fraction,
                    amplitude=amplitude * time_bin
                )

                data_cube_with_burst_added = data_cube.data + data_cube_with_burst_only
                count_difference = np.sum(data_cube_with_burst_added) - np.sum(data_cube.data)
                count_difference = int(count_difference)

                res = {'amplitude'        : amplitude,
                       'time_bin'         : time_bin,
                       'trial_num'        : i,
                       'count difference' : count_difference}
                all_res.append(res)
                count_differences.append(count_difference)

            plt.figure()
            plt.title(f"Burst Amplitude={amplitude} Time Bin={time_bin}\ntrials={num_trials}")
            plt.hist(count_differences, bins=20, label=r"$C_{data+burst} - C_{data}$")
            plt.axvline(x=amplitude * time_bin, color='red', linestyle='--', label="Amplitude*time_bin (Expected)")
            plt.axvline(x=amplitude * time_bin * 1.5, color='green', linestyle='--', label="1.5x Amplitude*time_bin")
            plt.xlabel(r"Difference in Counts")
            plt.ylabel("Frequency")
            plt.legend()
            plt.show()
    df_res = pd.DataFrame(all_res)
    return df_res

create_fake_Nbins_burst(data_cube, x_pos, y_pos, time_peak_fractions, amplitude)

Create Many fake bursts in the data cube

Parameters:

Name Type Description Default
data_cube DataCube

DataCube() Object.

required
x_pos int

x position of the burst.

required
y_pos int

y position of the burst.

required
time_peak_fractions list

list of time fractions where the bursts should be placed.

required
amplitude int

Height of burst.

required

Returns:

Name Type Description
peak_data array

Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def create_fake_Nbins_burst(data_cube, x_pos, y_pos, time_peak_fractions, amplitude):
    """
    Create Many fake bursts in the data cube

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fractions (list): list of time fractions where the bursts should be placed.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """
    peaks_data = np.zeros(data_cube.shape, dtype=int)
    for time_fraction in time_peak_fractions:
        peaks_data += create_fake_onebin_burst(data_cube, x_pos, y_pos, time_fraction, amplitude)
    return peaks_data

create_fake_burst(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude)

Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name Type Description Default
data_cube DataCube

DataCube() Object.

required
x_pos int

x position of the burst.

required
y_pos int

y position of the burst.

required
time_peak_fraction float

0.0 - 1.0.

required
width_time float

how long the burst lasts in seconds.

required
amplitude int

Height of burst.

required

Returns:

Name Type Description
peak_data array

Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def create_fake_burst(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude):
    """
    Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        width_time (float): how long the burst lasts in seconds.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """

    time       = np.arange(0, data_cube.shape[2])
    peak_data  = np.zeros(data_cube.shape, dtype=int)
    time_peak  = time_peak_fraction * len(time)
    width_bins = width_time / data_cube.time_interval

    #Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) # * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0,int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), peak_data.shape[0]-1)
    y_lo, y_hi = max(0,int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), peak_data.shape[1]-1)

    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = 1 / (2*np.pi*(sigma_2d**2)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * amplitude * data_cube.time_interval * np.exp(-(time-time_peak)**2 / (2*(width_bins**2)))
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            peak_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return peak_data

create_fake_eclipse(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude, constant_level)

Create a fake eclipse in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name Type Description Default
data_cube DataCube

DataCube() Object.

required
x_pos int

x position of the burst.

required
y_pos int

y position of the burst.

required
time_peak_fraction float

0.0 - 1.0.

required
width_time float

how long the Eclispe lasts in seconds.

required
amplitude int

Amplitude of Eclipse

required

Returns:

Name Type Description
peak_data array

Same size as data_cube with Eclipse data.

Source code in exod/utils/synthetic_data.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def create_fake_eclipse(data_cube, x_pos, y_pos, time_peak_fraction, width_time, amplitude, constant_level):
    """
    Create a fake eclipse in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        width_time (float): how long the Eclispe lasts in seconds.
        amplitude (int): Amplitude of Eclipse

    Returns:
        peak_data (np.array): Same size as data_cube with Eclipse data.
    """

    time         = np.arange(0, data_cube.shape[2])
    eclipse_data = np.zeros(data_cube.shape, dtype=int)
    time_peak    = time_peak_fraction * len(time)
    width_bins   = width_time / data_cube.time_interval

    #Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0, int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), eclipse_data.shape[0]-1)
    y_lo, y_hi = max(0, int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), eclipse_data.shape[1]-1)

    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = 1 / (2*np.pi*np.sqrt(sigma_2d)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * data_cube.time_interval * (constant_level - amplitude * np.exp(-(time-time_peak)**2 / (2*(width_bins**2))))
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            eclipse_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return eclipse_data

create_fake_onebin_burst(data_cube, x_pos, y_pos, time_peak_fraction, amplitude)

Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

Parameters:

Name Type Description Default
data_cube DataCube

DataCube() Object.

required
x_pos int

x position of the burst.

required
y_pos int

y position of the burst.

required
time_peak_fraction float

0.0 - 1.0.

required
amplitude int

Height of burst.

required

Returns:

Name Type Description
peak_data array

Same size as data_cube with burst data.

Source code in exod/utils/synthetic_data.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def create_fake_onebin_burst(data_cube, x_pos, y_pos, time_peak_fraction, amplitude):
    """
    Create a fake burst in the data_cube at position x_pos, y_pos, at time_peak_fraction of the time axis.

    Parameters:
        data_cube (DataCube): DataCube() Object.
        x_pos (int): x position of the burst.
        y_pos (int): y position of the burst.
        time_peak_fraction (float): 0.0 - 1.0.
        amplitude (int): Height of burst.

    Returns:
        peak_data (np.array): Same size as data_cube with burst data.
    """
    time        = np.arange(0, data_cube.shape[2])
    peak_data   = np.zeros(data_cube.shape, dtype=int)
    time_index  = int(time_peak_fraction * len(time))

    # Poissonian PSF
    # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
    sigma_2d = (6.6 / data_cube.size_arcsec) # * 2.355

    # Pixels to iterate over
    r = 10 # Size of box to calculate values
    x_lo, x_hi = max(0, int(x_pos-r*sigma_2d)), min(int(x_pos+r*sigma_2d), peak_data.shape[0]-1)
    y_lo, y_hi = max(0, int(y_pos-r*sigma_2d)), min(int(y_pos+r*sigma_2d), peak_data.shape[1]-1)


    for x in range(x_lo, x_hi):
        for y in range(y_lo, y_hi):
            dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
            psf        = (1 / (2*np.pi*(sigma_2d**2))) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
            bin_means  = psf * amplitude
            bin_values = np.random.poisson(lam=bin_means)
            # print(x, y, dist_sq, psf)
            # print(bin_means, bin_values)
            peak_data[x,y,time_index] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return peak_data

create_multiple_fake_eclipses(data_cube, tab_x_pos, tab_y_pos, tab_time_peak_fraction, tab_width_time, tab_amplitude, tab_constant_level)

Create Many fake eclipses in the data cube

Parameters:

Name Type Description Default
data_cube DataCube

DataCube() Object.

required
tab_x_pos list

list of x positions of the Eclipses.

required
tab_y_pos list

list of y positions of the Eclipses.

required
tab_time_peak_fraction list

list of time fractions where the Eclipses should be placed.

required
tab_width_time list

list of how long the Eclipses lasts in seconds.

required
tab_amplitude list

list of amplitudes of Eclipses

required

Returns:

Name Type Description
peak_data array

Same size as data_cube with Eclipse data.

Source code in exod/utils/synthetic_data.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
def create_multiple_fake_eclipses(data_cube, tab_x_pos, tab_y_pos, tab_time_peak_fraction, tab_width_time, tab_amplitude, tab_constant_level):
    """
    Create Many fake eclipses in the data cube

    Parameters:
        data_cube (DataCube): DataCube() Object.
        tab_x_pos (list): list of x positions of the Eclipses.
        tab_y_pos (list): list of y positions of the Eclipses.
        tab_time_peak_fraction (list): list of time fractions where the Eclipses should be placed.
        tab_width_time (list): list of how long the Eclipses lasts in seconds.
        tab_amplitude (list): list of amplitudes of Eclipses

    Returns:
        peak_data (np.array): Same size as data_cube with Eclipse data.
    """

    time          = np.arange(0, data_cube.shape[2])
    eclipse_data  = np.zeros(data_cube.shape, dtype=int)

    for time_peak_fraction, width_time, x_pos, y_pos, constant_level, amplitude in\
        zip(tab_time_peak_fraction,tab_width_time,tab_x_pos,tab_y_pos,tab_constant_level,tab_amplitude):
        time_peak  = time_peak_fraction * (len(time)-1)
        width_bins = width_time / data_cube.time_interval

        #Poissonian PSF
        # 6.6" FWHM, 4.1" per pixel, so (6.6/4.1) pixel FWHM, 2.355 to convert FWHM in sigma but seems too much
        sigma_2d = (6.6 / data_cube.size_arcsec) * 1.4 # * 2.355

        # Pixels to iterate over
        r = 10 # Size of box to calculate values
        x_lo, x_hi = max(0, int(x_pos - r * sigma_2d)), min(int(x_pos + r * sigma_2d), eclipse_data.shape[0] - 1)
        y_lo, y_hi = max(0, int(y_pos - r * sigma_2d)), min(int(y_pos + r * sigma_2d), eclipse_data.shape[1] - 1)
        if np.nansum(data_cube.data[x_pos,y_pos]) > 0:
            for x in range(x_lo, x_hi):
                for y in range(y_lo, y_hi):
                    dist_sq    = (x - x_pos)**2 + (y - y_pos)**2 # Distance^2 from burst center
                    psf        = 1 / (2*np.pi*np.sqrt(sigma_2d)) * np.exp(-(dist_sq) / (2*(sigma_2d**2)))
                    bin_means  = psf * data_cube.time_interval * (constant_level - amplitude * np.exp(-(time-time_peak)**2 / (2*(width_bins**2))))
                    bin_values = np.random.poisson(lam=bin_means)
                    # print(x, y, dist_sq, psf)
                    # print(bin_means, bin_values)
                    eclipse_data[x,y] += bin_values
    #peak_data = convolve(peak_data, np.ones((3,3,1), dtype=np.int64),mode='constant', cval=0.0)
    return eclipse_data

util

save_result(key, value, runid, savedir)

Save a label/value pair to a .csv file.

Source code in exod/utils/util.py
38
39
40
41
42
43
44
45
46
47
48
49
50
def save_result(key, value, runid, savedir):
    """Save a label/value pair to a .csv file."""
    if isinstance(value, pd.DataFrame):
        # Append the runid to the dataframe if it is not there
        if 'runid' not in value.columns:
            value['runid'] = runid
        save_df(df=value, savepath=savedir / f'{key}.csv')
    elif isinstance(value, dict):
        # append the runid to the dictionary if it is not there
        value['runid'] = runid
        save_info(dictionary=value, savepath=savedir / f'{key}.csv')
    else:
        logger.warning(f'{key} {value} is not a dict or df!!')

xmm

bad_obs

These observations are those that contained the most alerts using EXOD.

I have gone through them all and manually selected those that are a result of bad data, in a lot of cases, many spurious alerts are generated by the presence of extended sources, such as galaxy clusters or supernova remnants.

event_list

EventList

Class for handling EventList objects.

Attributes:

Name Type Description
path Path

Path to the event list file.

filename str

Name of the event list file.

data Table

Data from the event list file.

is_read bool

Flag indicating if the event list file has been read.

is_merged bool

Flag indicating if the event list is a merged one.

Methods:

Name Description
read

Reads the event list file and extracts necessary information.

from_event_lists

Creates a merged EventList from a list of existing ones.

filter_by_energy

Filters the event list data by energy range.

check_submode

Checks if the submode of the event list is supported.

is_supported_submode

Returns if the submode of the event list is supported.

remove_MOS_central_ccd

Removes the central CCD of MOS if not in PrimeFullWindow.

remove_bad_rows

Removes bad PN rows as per Struder et al. 2001b.

remove_borders

Removes borders from the event list data.

get_ccd_bins

Gets the CCD bins for the instrument.

unload_data

Unloads the data from the event list to save memory.

info

Returns a dictionary containing information about the event list.

Source code in exod/xmm/event_list.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
class EventList:
    """
    Class for handling EventList objects.

    Attributes:
        path (Path): Path to the event list file.
        filename (str): Name of the event list file.
        data (Table): Data from the event list file.
        is_read (bool): Flag indicating if the event list file has been read.
        is_merged (bool): Flag indicating if the event list is a merged one.

    Methods:
        read: Reads the event list file and extracts necessary information.
        from_event_lists: Creates a merged EventList from a list of existing ones.
        filter_by_energy: Filters the event list data by energy range.
        check_submode: Checks if the submode of the event list is supported.
        is_supported_submode: Returns if the submode of the event list is supported.
        remove_MOS_central_ccd: Removes the central CCD of MOS if not in PrimeFullWindow.
        remove_bad_rows: Removes bad PN rows as per Struder et al. 2001b.
        remove_borders: Removes borders from the event list data.
        get_ccd_bins: Gets the CCD bins for the instrument.
        unload_data: Unloads the data from the event list to save memory.
        info: Returns a dictionary containing information about the event list.
    """
    def __init__(self, path):
        self.path      = Path(path)
        self.filename  = self.path.name
        self.data      = None
        self.is_read   = False
        self.is_merged = False

        self.N_event_lists = 1

    def __repr__(self):
        return f'EventList({self.path})'

    def read(self, remove_bad_rows=True, remove_borders=True, remove_MOS_central_ccd=True, remove_hot_pixels=True):
        if self.is_read:
            return None
        self.hdul = fits.open(self.path)
        self.header = self.hdul[1].header
        self.data = Table(self.hdul[1].data)

        self.obsid      = self.header['OBS_ID']
        self.instrument = self.header['INSTRUME']
        self.submode    = self.header['SUBMODE']
        self.date       = self.header['DATE-OBS']
        self.revolution = self.header['REVOLUT']
        self.object     = self.header['OBJECT']
        self.exposure   = self.header['TELAPSE']
        self.pnt_angle  = self.header['PA_PNT']
        # self.N_events   = self.header['NAXIS2']
        # self.mean_rate  = self.N_events / self.exposure
        self.time_min   = np.min(self.data['TIME'])
        self.time_max   = np.max(self.data['TIME'])

        self.data['INSTRUMENT'] = self.instrument

        # self.check_submode()
        if remove_bad_rows:
            self.remove_bad_rows()
        if remove_borders:
            self.remove_borders()
        if remove_MOS_central_ccd:
            self.remove_MOS_central_ccd()
        if remove_hot_pixels:
            self.remove_hot_pixels()
        self.is_read = True

    @classmethod
    def from_event_lists(cls, event_lists):
        """
        Create a merged event list from a list of existing ones.
        Args:
            event_lists (list): list of EventList() objects.
        Returns:
            event_list (EventList): New merged eventlist object.
        """
        # EventList Object to return
        event_list = cls.__new__(cls) # Create the object without calling .__init__()

        for e in event_lists:
            e.read()

        # Remove unsupported EventLists
        event_lists = [e for e in event_lists if e.is_supported_submode()]
        if not event_lists:
            raise NotImplementedError(f'None of the supplied event lists are in a supported submode!')

        # Store the starts and ends of all event lists
        starts = [e.time_min for e in event_lists]
        stops = [e.time_max for e in event_lists]
        latest_start = max(starts)
        earliest_stop = min(stops)

        # Combine the data into a single table
        data_stacked = vstack([e.data for e in event_lists])

        # Crop when instruments are not all online
        # data_stacked = data_stacked[(data_stacked['TIME']>latest_start)&(data_stacked['TIME']<earliest_stop)]

        # Unload the data from the constituent event lists to save memory
        # for e in event_lists:
        #     e.unload_data()

        # Write Attributes
        event_list.path     = 'merged'
        event_list.filename = str([e.filename for e in event_lists])
        event_list.is_read  = True
        event_list.is_merged = True

        event_list.hdul   = None
        event_list.header = None
        event_list.data   = data_stacked

        event_list.event_lists   = event_lists
        event_list.N_event_lists = len(event_lists)
        event_list.obsid         = event_lists[0].obsid
        event_list.instrument    = [e.instrument for e in event_lists]
        event_list.submode       = [e.submode for e in event_lists]
        event_list.date          = event_lists[0].date
        event_list.revolution    = event_lists[0].revolution
        event_list.object        = event_lists[0].object
        event_list.time_min      = latest_start
        event_list.time_max      = earliest_stop
        event_list.exposure      = event_list.time_max - event_list.time_min
        event_list.pnt_angle     = event_lists[0].pnt_angle
        # event_list.N_events      = len(data_stacked)
        # event_list.mean_rate     = event_list.N_events / event_list.exposure
        return event_list

    def filter_by_energy(self, min_energy, max_energy):
        logger.info(f'Filtering Events list by energy min_energy={min_energy} max_energy={max_energy}')
        self.data = self.data[(min_energy * 1000 < self.data['PI']) & (self.data['PI'] < max_energy * 1000)]

    def check_submode(self):
        if not ALL_SUBMODES[self.submode]:
            raise NotImplementedError(f"The submode {self.submode} is not supported")

    def is_supported_submode(self):
        return ALL_SUBMODES[self.submode]

    def remove_MOS_central_ccd(self):
        if (self.instrument in ('EMOS1','EMOS2')) and (self.submode!='PrimeFullWindow'):
            logger.info('Removing central CCD of MOS because NOT in PrimeFullWindow')
            self.data = self.data[~(self.data['CCDNR'] == 1)]
            if len(self.data) == 0:
                raise ValueError(f'No data in EventList after removing central MOS events!')

    def remove_bad_rows(self):
        if self.instrument == 'EPN':
            logger.info('Removing bad PN rows Struder et al. 2001b')
            self.data = self.data[~((self.data['CCDNR'] == 4) & (self.data['RAWX'] == 12)) &
                                  ~((self.data['CCDNR'] == 5) & (self.data['RAWX'] == 11)) &
                                  ~((self.data['CCDNR'] == 10) & (self.data['RAWX'] == 28))]

    def remove_hot_pixels(self):
        """Remove hot pixels from the event list data."""
        hot_pixels = {'EPN': data_util / 'hotpix_PN.csv',
                      'EMOS1': data_util / 'hotpix_M1.csv',
                      'EMOS2': data_util / 'hotpix_M2.csv'}
        pre = self.N_events
        tab_hotpix = Table.read(hot_pixels[self.instrument], format='csv')
        tab_hotpix = tab_hotpix[(self.revolution > tab_hotpix['REV1']) & (self.revolution < tab_hotpix['REV2'])]
        if len(tab_hotpix) == 0:
            logger.info(f'No Hot Pixels found for {self.instrument}')
            return None
        self.data = setdiff(table1=self.data, table2=tab_hotpix, keys=['CCDNR', 'RAWX', 'RAWY'])
        logger.info(f'Removed Hot Pixels {self.instrument:<5} | Pre: {pre:,} Post: {self.N_events:,} (-{pre - self.N_events:,})')


    def remove_borders(self):
        """
        For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot
        that appears between two of the CCDs.

        PrimeFullWindow             PrimeLargeWindow   PrimeSmallWindow
        & PrimeFullWindowExtended
        RAWY MAX: 200               RAWY MAX: 200      RAWY MAX: 200
        RAWY MIN: 13                RAWY MIN: 102      RAWY MIN: 137
        RAWX MAX: 64                RAWX MAX: 64       RAWX MAX: 64
        RAWX MIN: 1                 RAWX MIN: 1        RAWX MIN: 1
        """
        margin = 3
        if self.instrument == 'EPN' and (self.submode == 'PrimeFullWindow' or self.submode == 'PrimeFullWindowExtended'):
            logger.info(f'Removing Borders: {self.instrument} {self.submode}')
            self.data = self.data[self.data['RAWY'] > 20+margin]
            self.data = self.data[self.data['RAWY'] < 200-margin]
            self.data = self.data[self.data['RAWX'] < 64-margin]
            self.data = self.data[self.data['RAWX'] > 1+margin]

        if self.instrument == 'EPN' and self.submode == 'PrimeLargeWindow':
            logger.info(f'Removing Borders: {self.instrument} {self.submode}')
            self.data = self.data[self.data['RAWY'] < 200-margin]
            self.data = self.data[self.data['RAWX'] < 64-margin]
            self.data = self.data[self.data['RAWX'] > 1+margin]

    def get_ccd_bins(self):
        """Get the CCD bins for the instrument. This is used in calculating the bad CCD bins."""
        ccd_bins = list(set(self.data['CCDNR']))
        ccd_bins.append(ccd_bins[-1] + 1)  # Just to get a right edge for the final bin
        return ccd_bins

    def get_events_within_bounds(self, X_lo, X_hi, Y_lo, Y_hi, TIME_lo, TIME_hi):
        evt_subset = self.data[(self.data['X'] > X_lo)       & (self.data['X'] < X_hi) &
                               (self.data['Y'] > Y_lo)       & (self.data['Y'] < Y_hi) &
                               (self.data['TIME'] > TIME_lo) & (self.data['TIME'] < TIME_hi)]
        return evt_subset

    def get_high_energy_lc(self, time_interval):
        min_energy_he = 10.0     # minimum extraction energy for High Energy Background events
        max_energy_he = 12.0     # maximum extraction energy for High Energy Background events
        time_interval_gti = min(time_interval, 100) #100  # Window Size to use for GTI extraction
        data = self.data
        time_min = self.time_min
        time_max = self.time_max
        logger.info(f'min_energy_he = {min_energy_he} max_energy_he = {max_energy_he} time_interval_gti = {time_interval_gti}')
        data_he = np.array(data['TIME'][(data['PI'] > min_energy_he * 1000) & (data['PI'] < max_energy_he * 1000)])

        t_bin_he = np.arange(time_min, time_max, time_interval_gti)
        lc_he = np.histogram(data_he, bins=t_bin_he)[0] / time_interval_gti  # Divide by the bin size to get in ct/s
        return t_bin_he, lc_he

    def unload_data(self):
        del(self.data)
        self.is_read = False

    @property
    def N_events(self):
        return len(self.data)

    @property
    def mean_rate(self):
        return self.N_events / self.exposure

    @property
    def info(self):
        info = {
            'filename'      : self.filename,
            'obsid'         : self.obsid,
            'N_event_lists' : self.N_event_lists,
            'instrument'    : self.instrument,
            'submode'       : self.submode,
            'date'          : self.date,
            'revolution'    : self.revolution,
            'object'        : self.object,
            'exposure'      : self.exposure,
            'pnt_angle'     : self.pnt_angle,
            'N_events'      : self.N_events,
            'mean_rate'     : self.mean_rate
            }
        for k, v in info.items():
            logger.info(f'{k:>14} : {v}')
        return info
from_event_lists(event_lists) classmethod

Create a merged event list from a list of existing ones. Args: event_lists (list): list of EventList() objects. Returns: event_list (EventList): New merged eventlist object.

Source code in exod/xmm/event_list.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
@classmethod
def from_event_lists(cls, event_lists):
    """
    Create a merged event list from a list of existing ones.
    Args:
        event_lists (list): list of EventList() objects.
    Returns:
        event_list (EventList): New merged eventlist object.
    """
    # EventList Object to return
    event_list = cls.__new__(cls) # Create the object without calling .__init__()

    for e in event_lists:
        e.read()

    # Remove unsupported EventLists
    event_lists = [e for e in event_lists if e.is_supported_submode()]
    if not event_lists:
        raise NotImplementedError(f'None of the supplied event lists are in a supported submode!')

    # Store the starts and ends of all event lists
    starts = [e.time_min for e in event_lists]
    stops = [e.time_max for e in event_lists]
    latest_start = max(starts)
    earliest_stop = min(stops)

    # Combine the data into a single table
    data_stacked = vstack([e.data for e in event_lists])

    # Crop when instruments are not all online
    # data_stacked = data_stacked[(data_stacked['TIME']>latest_start)&(data_stacked['TIME']<earliest_stop)]

    # Unload the data from the constituent event lists to save memory
    # for e in event_lists:
    #     e.unload_data()

    # Write Attributes
    event_list.path     = 'merged'
    event_list.filename = str([e.filename for e in event_lists])
    event_list.is_read  = True
    event_list.is_merged = True

    event_list.hdul   = None
    event_list.header = None
    event_list.data   = data_stacked

    event_list.event_lists   = event_lists
    event_list.N_event_lists = len(event_lists)
    event_list.obsid         = event_lists[0].obsid
    event_list.instrument    = [e.instrument for e in event_lists]
    event_list.submode       = [e.submode for e in event_lists]
    event_list.date          = event_lists[0].date
    event_list.revolution    = event_lists[0].revolution
    event_list.object        = event_lists[0].object
    event_list.time_min      = latest_start
    event_list.time_max      = earliest_stop
    event_list.exposure      = event_list.time_max - event_list.time_min
    event_list.pnt_angle     = event_lists[0].pnt_angle
    # event_list.N_events      = len(data_stacked)
    # event_list.mean_rate     = event_list.N_events / event_list.exposure
    return event_list
get_ccd_bins()

Get the CCD bins for the instrument. This is used in calculating the bad CCD bins.

Source code in exod/xmm/event_list.py
208
209
210
211
212
def get_ccd_bins(self):
    """Get the CCD bins for the instrument. This is used in calculating the bad CCD bins."""
    ccd_bins = list(set(self.data['CCDNR']))
    ccd_bins.append(ccd_bins[-1] + 1)  # Just to get a right edge for the final bin
    return ccd_bins
remove_borders()

For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot that appears between two of the CCDs.

PrimeFullWindow PrimeLargeWindow PrimeSmallWindow & PrimeFullWindowExtended RAWY MAX: 200 RAWY MAX: 200 RAWY MAX: 200 RAWY MIN: 13 RAWY MIN: 102 RAWY MIN: 137 RAWX MAX: 64 RAWX MAX: 64 RAWX MAX: 64 RAWX MIN: 1 RAWX MIN: 1 RAWX MIN: 1

Source code in exod/xmm/event_list.py
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def remove_borders(self):
    """
    For PN the RAWY is the long axis. The removal of 1px from each side gets rid of the weird hot-spot
    that appears between two of the CCDs.

    PrimeFullWindow             PrimeLargeWindow   PrimeSmallWindow
    & PrimeFullWindowExtended
    RAWY MAX: 200               RAWY MAX: 200      RAWY MAX: 200
    RAWY MIN: 13                RAWY MIN: 102      RAWY MIN: 137
    RAWX MAX: 64                RAWX MAX: 64       RAWX MAX: 64
    RAWX MIN: 1                 RAWX MIN: 1        RAWX MIN: 1
    """
    margin = 3
    if self.instrument == 'EPN' and (self.submode == 'PrimeFullWindow' or self.submode == 'PrimeFullWindowExtended'):
        logger.info(f'Removing Borders: {self.instrument} {self.submode}')
        self.data = self.data[self.data['RAWY'] > 20+margin]
        self.data = self.data[self.data['RAWY'] < 200-margin]
        self.data = self.data[self.data['RAWX'] < 64-margin]
        self.data = self.data[self.data['RAWX'] > 1+margin]

    if self.instrument == 'EPN' and self.submode == 'PrimeLargeWindow':
        logger.info(f'Removing Borders: {self.instrument} {self.submode}')
        self.data = self.data[self.data['RAWY'] < 200-margin]
        self.data = self.data[self.data['RAWX'] < 64-margin]
        self.data = self.data[self.data['RAWX'] > 1+margin]
remove_hot_pixels()

Remove hot pixels from the event list data.

Source code in exod/xmm/event_list.py
167
168
169
170
171
172
173
174
175
176
177
178
179
def remove_hot_pixels(self):
    """Remove hot pixels from the event list data."""
    hot_pixels = {'EPN': data_util / 'hotpix_PN.csv',
                  'EMOS1': data_util / 'hotpix_M1.csv',
                  'EMOS2': data_util / 'hotpix_M2.csv'}
    pre = self.N_events
    tab_hotpix = Table.read(hot_pixels[self.instrument], format='csv')
    tab_hotpix = tab_hotpix[(self.revolution > tab_hotpix['REV1']) & (self.revolution < tab_hotpix['REV2'])]
    if len(tab_hotpix) == 0:
        logger.info(f'No Hot Pixels found for {self.instrument}')
        return None
    self.data = setdiff(table1=self.data, table2=tab_hotpix, keys=['CCDNR', 'RAWX', 'RAWY'])
    logger.info(f'Removed Hot Pixels {self.instrument:<5} | Pre: {pre:,} Post: {self.N_events:,} (-{pre - self.N_events:,})')

observation

Observation

Class for handling XMM-Newton observations.

Attributes:

Name Type Description
obsid str

The observation ID.

path_raw Path

Path to the raw data directory (unprocessed).

path_processed Path

Path to the processed data directory (filtered).

path_results Path

Path to the results directory.

events_raw list

List of raw event lists.

events_processed list

List of processed event lists.

events_processed_pn list

List of processed PN event lists.

events_processed_mos1 list

List of processed MOS1 event lists.

events_processed_mos2 list

List of processed MOS2 event lists.

events_overlapping_subsets list

List of overlapping event list subsets.

images list

List of images.

source_list list

List of source lists (OMSMLI files).

Source code in exod/xmm/observation.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class Observation:
    """
    Class for handling XMM-Newton observations.

    Attributes:
        obsid (str): The observation ID.
        path_raw (Path): Path to the raw data directory (unprocessed).
        path_processed (Path): Path to the processed data directory (filtered).
        path_results (Path): Path to the results directory.
        events_raw (list): List of raw event lists.
        events_processed (list): List of processed event lists.
        events_processed_pn (list): List of processed PN event lists.
        events_processed_mos1 (list): List of processed MOS1 event lists.
        events_processed_mos2 (list): List of processed MOS2 event lists.
        events_overlapping_subsets (list): List of overlapping event list subsets.
        images (list): List of images.
        source_list (list): List of source lists (OMSMLI files).
    """
    def __init__(self, obsid):
        self.obsid = obsid
        self.path_raw = data_raw / obsid
        self.path_processed = data_processed / obsid
        self.path_results = data_results / obsid
        self.make_dirs()

        self.events_raw = []
        self.events_processed = []
        self.events_processed_pn = []
        self.events_processed_mos1 = []
        self.events_processed_mos2 = []
        self.events_overlapping_subsets = []

        self.images = []

        self.source_list = []

    def __repr__(self):
        return f"Observation('{self.obsid}')"

    def make_dirs(self):
        os.makedirs(self.path_raw, exist_ok=True)
        os.makedirs(self.path_processed, exist_ok=True)
        os.makedirs(self.path_results, exist_ok=True)

    def download_events(self):
        download_observation_events(self.obsid)

    def filter_events(self, min_energy=0.2, max_energy=12.0, clobber=False):
        filter_obsid_events(observation=self, min_energy=min_energy, max_energy=max_energy, clobber=clobber)

    def create_images(self, ximagebinsize=80, yimagebinsize=80, clobber=False):
        create_obsid_images(observation=self, ximagebinsize=ximagebinsize, yimagebinsize=yimagebinsize, clobber=clobber)

    def get_event_lists_raw(self):
        evt_raw = list(self.path_raw.glob('*EVLI*FTZ'))
        self.events_raw = [EventList(e) for e in evt_raw]

    def get_event_lists_processed(self):
        evt_processed = list(self.path_processed.glob('*FILT.fits'))
        self.events_processed      = [EventList(e) for e in evt_processed]
        self.events_processed_pn   = [e for e in self.events_processed if 'PI' in e.filename]
        self.events_processed_mos1 = [e for e in self.events_processed if 'M1' in e.filename]
        self.events_processed_mos2 = [e for e in self.events_processed if 'M2' in e.filename]

    def get_source_list(self):
        try:
            self.source_list = list(self.path_raw.glob('*EP*OBSMLI*FTZ'))[0]
        except IndexError:
            raise NotImplementedError(f'No EPIC OBSMLI file found!')

    def get_images(self):
        img_processed = list(self.path_processed.glob('*IMG.fits'))
        self.images = [Image(i) for i in img_processed]

    def get_files(self):
        logger.info(f'Getting {self} Files...')
        self.get_event_lists_raw()
        self.get_event_lists_processed()
        self.get_images()
        self.get_source_list()

    def get_events_overlapping_subsets(self):
        """
        Get the overlapping eventlists for a given observation.

        Returns the subsets as a list of lists
        [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]
        """
        self.get_event_lists_processed()

        if len(self.events_processed) == 0:
            raise KeyError(f'No eventlists found for {self.obsid}')

        subsets = get_overlapping_eventlist_subsets(self.events_processed)
        self.events_overlapping_subsets = subsets
        return self.events_overlapping_subsets


    def get_number_of_overlapping_subsets(self):
        return len(self.events_overlapping_subsets)


    @property
    def info(self):
        info = {'obsid' : self.obsid}

        for i, evt in enumerate(self.events_raw):
            info[f'evt_raw_{i}'] = evt.filename

        for i, evt in enumerate(self.events_processed):
            info[f'evt_filt_{i}'] = evt.filename

        for i, img in enumerate(self.images):
            info[f'img_{i}'] = img.filename

        info[f'source_list'] = self.source_list.name

        for k, v in info.items():
            logger.info(f'{k:>10} : {v}')
        return info
get_events_overlapping_subsets()

Get the overlapping eventlists for a given observation.

Returns the subsets as a list of lists [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]

Source code in exod/xmm/observation.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
def get_events_overlapping_subsets(self):
    """
    Get the overlapping eventlists for a given observation.

    Returns the subsets as a list of lists
    [[001PI.fits, 001M1.fits, 001M2.fits], [002PI.fits, 002M1.fits, 002M2.fits]]
    """
    self.get_event_lists_processed()

    if len(self.events_processed) == 0:
        raise KeyError(f'No eventlists found for {self.obsid}')

    subsets = get_overlapping_eventlist_subsets(self.events_processed)
    self.events_overlapping_subsets = subsets
    return self.events_overlapping_subsets

get_overlapping_eventlist_subsets(event_lists)

Return the overlapping eventlists for a given observation. Parameters: event_lists (list): list of EventList() objects that have been .read()

In most cases this will just return a list of 1 list like [[M1.fits, M2.fits, PN.fits'] with the event files. However, for some observations it will return multiple entires if there have been multiple seperate observations. This function will return all the overlapping subsets. e.g. for 2 subsets this will return: [[],[]]

Source code in exod/xmm/observation.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def get_overlapping_eventlist_subsets(event_lists):
    """
    Return the overlapping eventlists for a given observation.
    Parameters:
        event_lists (list): list of EventList() objects that have been .read()

    In most cases this will just return a list of 1 list like
    [[M1.fits, M2.fits, PN.fits'] with the event files. However,
    for some observations it will return multiple entires if there
    have been multiple seperate observations. This function will return
    all the overlapping subsets. e.g. for 2 subsets this will return: [[],[]]
    """
    def intervals_overlap(I1, I2):
        return I1[0] <= I2[1] and I1[1] >= I2[0]

    for e in event_lists:
        e.read()

    disjoint_set = DisjointSet(event_lists)
    file_intervals = {e: [e.time_min, e.time_max] for e in event_lists}

    for d1, d2 in combinations(file_intervals.items(), r=2):
        f1, I1 = d1[0], d1[1]
        f2, I2 = d2[0], d2[1]

        if intervals_overlap(I1, I2):
            disjoint_set.merge(f1, f2)
            logger.debug(f'f1: {f1.filename} : {I1} OVERLAP!')
            logger.debug(f'f2: {f2.filename} : {I2} OVERLAP!')
        else:
            logger.debug(f'f1: {f1.filename} : {I1} NO OVERLAP!')
            logger.debug(f'f2: {f2.filename} : {I2} NO OVERLAP!')
        logger.debug('=============')

    subsets = disjoint_set.subsets()
    logger.info(f'Found {len(subsets)} overlapping subsets.')

    subsets_to_return = []
    for s in subsets:
        if len(s) > 3:
            logger.info(f'Overlapping subset has more than 3 event files ({len(s)})')
            logger.info(f'Just going to take the longest three')
            m1_evt = [e for e in s if 'M1' in e.filename]
            m2_evt = [e for e in s if 'M2' in e.filename]
            pn_evt = [e for e in s if 'PN' in e.filename]

            m1_longest = max(m1_evt, key=lambda x: x.exposure)
            m2_longest = max(m2_evt, key=lambda x: x.exposure)
            pn_longest = max(pn_evt, key=lambda x: x.exposure)
            subset = [m1_longest, m2_longest, pn_longest]
            subsets_to_return.append(subset)
        else:
            subsets_to_return.append(list(s))

    # Sort the inner lists and outer lists
    subsets_to_return = [sorted(subset, key=lambda x: x.filename) for subset in subsets_to_return]
    subsets_to_return = sorted(subsets_to_return, key=lambda subset: subset[0].filename)
    return subsets_to_return