Welcome to katdal’s documentation!¶
User guide¶
Introduction to katdal¶
Data access library for data sets in the MeerKAT Visibility Format (MVF)
Overview¶
This module serves as a data access library to interact with the chunk stores and HDF5 files produced by the MeerKAT radio telescope and its predecessors (KAT-7 and Fringe Finder). It uses memory carefully, allowing data sets to be inspected and partially loaded into memory. Data sets may be concatenated and split via a flexible selection mechanism. In addition, it provides a script to convert these data sets to CASA MeasurementSets.
Quick Tutorial¶
Open any data set through a single function to obtain a data set object:
import katdal
d = katdal.open('1234567890.h5')
This automatically determines the version and storage location of the data set. The versions roughly map to the various instruments:
- v1 : Fringe Finder (HDF5 file)
- v2 : KAT-7 (HDF5 file)
- v3 : MeerKAT (HDF5 file)
- v4 : MeerKAT (chunk store based on objects in Ceph)
Multiple data sets (even of different versions) may also be concatenated together (as long as they have the same dump rate):
d = katdal.open(['1234567890.h5', '1234567891.h5'])
Inspect the contents of the data set by printing the object:
print d
Here is a typical output:
===============================================================================
Name: 1313067732.h5 (version 2.0)
===============================================================================
Observer: someone Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'Track on Hyd A,Vir A, 3C 286 and 3C 273'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
ID Antennas Inputs Corrprods
0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112
Spectral Windows: 1
ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)
0 1822.000 400.000 1024 390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
subarray=0
ants=['ant1', 'ant2', 'ant3', 'ant4', 'ant5', 'ant6', 'ant7']
spw=0
-------------------------------------------------------------------------------
Shape: (1054 dumps, 1024 channels, 112 correlation products) => Size: 967.049 MB
Antennas: *ant1,ant2,ant3,ant4,ant5,ant6,ant7 Inputs: 14 Autocorr: yes Crosscorr: yes
Channels: 1024 (index 0 - 1023, 2021.805 MHz - 1622.195 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)
0 Hyd A radec 9:18:05.28 -12:05:48.9 333 33.63
1 Vir A radec 12:30:49.42 12:23:28.0 251 166.50
2 3C 286 radec 13:31:08.29 30:30:33.0 230 12.97
3 3C 273 radec 12:29:06.70 2:03:08.6 240 39.96
Scans: 8 selected out of 8 total Compscans: 1 selected out of 1 total
Date Timerange(UTC) ScanState CompScanLabel Dumps Target
11-Aug-2011/13:02:14 - 13:04:26 0:slew 0: 133 0:Hyd A
13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A
13:07:47 - 13:08:37 2:slew 0: 51 1:Vir A
13:08:38 - 13:11:57 3:track 0: 200 1:Vir A
13:11:58 - 13:12:27 4:slew 0: 30 2:3C 286
13:12:28 - 13:15:47 5:track 0: 200 2:3C 286
13:15:48 - 13:16:27 6:slew 0: 40 3:3C 273
13:16:28 - 13:19:47 7:track 0: 200 3:3C 273
The first segment of the printout displays the static information of the data set, including observer, dump rate and all the available subarrays and spectral windows in the data set. The second segment (between the dashed lines) highlights the active selection criteria. The last segment displays dynamic information that is influenced by the selection, including the overall visibility array shape, antennas, channel frequencies, targets and scan info.
The data set is built around the concept of a three-dimensional visibility array with dimensions of time, frequency and correlation product. This is reflected in the shape of the dataset:
d.shape
which returns (1054, 1024, 112), meaning 1054 dumps by 1024 channels by 112 correlation products.
Let’s select a subset of the data set:
d.select(scans='track', channels=slice(200,300), ants='ant4')
print d
This results in the following printout:
===============================================================================
Name: /Users/schwardt/Downloads/1313067732.h5 (version 2.0)
===============================================================================
Observer: siphelele Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'track on Hyd A,Vir A, 3C 286 and 3C 273 for Lud'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
ID Antennas Inputs Corrprods
0 ant1,ant2,ant3,ant4,ant5,ant6,ant7 14 112
Spectral Windows: 1
ID CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz)
0 1822.000 400.000 1024 390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
channels=slice(200, 300, None)
subarray=0
scans='track'
ants='ant4'
spw=0
-------------------------------------------------------------------------------
Shape: (800 dumps, 100 channels, 4 correlation products) => Size: 2.560 MB
Antennas: ant4 Inputs: 2 Autocorr: yes Crosscorr: no
Channels: 100 (index 200 - 299, 1943.680 MHz - 1905.008 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy)
0 Hyd A radec 9:18:05.28 -12:05:48.9 200 31.83
1 Vir A radec 12:30:49.42 12:23:28.0 200 159.06
2 3C 286 radec 13:31:08.29 30:30:33.0 200 12.61
3 3C 273 radec 12:29:06.70 2:03:08.6 200 39.32
Scans: 4 selected out of 8 total Compscans: 1 selected out of 1 total
Date Timerange(UTC) ScanState CompScanLabel Dumps Target
11-Aug-2011/13:04:27 - 13:07:46 1:track 0: 200 0:Hyd A
13:08:38 - 13:11:57 3:track 0: 200 1:Vir A
13:12:28 - 13:15:47 5:track 0: 200 2:3C 286
13:16:28 - 13:19:47 7:track 0: 200 3:3C 273
Compared to the first printout, the static information has remained the same while the dynamic information now reflects the selected subset. There are many possible selection criteria, as illustrated below:
d.select(timerange=('2011-08-11 13:10:00', '2011-08-11 13:15:00'), targets=[1, 2])
d.select(spw=0, subarray=0)
d.select(ants='ant1,ant2', pol='H', scans=(0,1,2), freqrange=(1700e6, 1800e6))
See the docstring of DataSet.select()
for more detailed information (i.e.
do d.select? in IPython). Take note that only one subarray and one spectral
window must be selected.
Once a subset of the data has been selected, you can access the data and timestamps on the data set object:
vis = d.vis[:]
timestamps = d.timestamps[:]
Note the [:] indexing, as the vis and timestamps properties are special
LazyIndexer
objects that only give you the actual data when you use
indexing, in order not to inadvertently load the entire array into memory.
For the example dataset and no selection the vis array will have a shape of (1054, 1024, 112). The time dimension is labelled by d.timestamps, the frequency dimension by d.channel_freqs and the correlation product dimension by d.corr_products.
Another key concept in the data set object is that of sensors. These are named time series of arbritrary data that are either loaded from the data set (actual sensors) or calculated on the fly (virtual sensors). Both variants are accessed through the sensor cache (available as d.sensor) and cached there after the first access. The data set object also provides convenient properties to expose commonly-used sensors, as shown in the plot example below:
import matplotlib.pyplot as plt
plt.plot(d.az, d.el, 'o')
plt.xlabel('Azimuth (degrees)')
plt.ylabel('Elevation (degrees)')
Other useful attributes include ra, dec, lst, mjd, u, v, w, target_x and target_y. These are all one-dimensional NumPy arrays that dynamically change length depending on the active selection.
As in katdal’s predecessor (scape) there is a DataSet.scans()
generator
that allows you to step through the scans in the data set. It returns the
scan index, scan state and target object on each iteration, and updates
the active selection on the data set to include only the current scan.
It is also possible to iterate through the compound scans with the
DataSet.compscans()
generator, which yields the compound scan index, label
and first target on each iteration for convenience. These two iterators may also
be used together to traverse the data set structure:
for compscan, label, target in d.compscans():
plt.figure()
for scan, state, target in d.scans():
if state in ('scan', 'track'):
plt.plot(d.ra, d.dec, 'o')
plt.xlabel('Right ascension (J2000 degrees)')
plt.ylabel('Declination (J2000 degrees)')
plt.title(target.name)
Finally, all the targets (or fields) in the data set are stored in a catalogue available at d.catalogue, and the original HDF5 file is still accessible via a back door installed at d.file in the case of a single-file data set.
Tuning your application¶
It is possible to load data at high bandwidth using katdal: rates over 2.5 GB/s have been seen when loading from a local disk. However, it requires an understanding of the storage layout and choice of an appropriate access pattern.
This chapter is aimed at loading MVF version 4 (MeerKAT) data, as older versions typically contain far less data. Some of the advice is generic but some of the methods described here will not work on older data sets.
Chunking¶
The most important thing to understand is that the data is split into chunks, each of which are stored as a file on disk or an object in an S3 store. Retrieving any element of a chunk causes the entire chunk to be retrieved. Thus, aligning accesses to whole chunks will give the best performance, as data is not discarded.
As an illustration, consider an application that has an outer loop over the baselines, and loads data for one baseline at a time. Chunks typically span all baselines, so each time one baseline is loaded, katdal will actually load the entire data set. If the application can be redesigned to fetch data for a small time range for all baselines it will perform much better.
When using MVFv4, katdal uses dask to manage the chunking. After
opening a data set, you can determine the chunking for a particular
array by examining its dataset
member:
>>> d.vis.dataset
dask.array<1556179171-sdp, shape=(38, 4096, 40), dtype=complex64, chunksize=(32, 1024, 40)>
>>> d.vis.dataset.chunks
((32, 6), (1024, 1024, 1024, 1024), (40,))
For this data set, it will be optimal to load visibilities in 32 × 1024 × 40 element pieces.
Note that the chunking scheme may be different for visibilities, flags and weights.
Joint loading¶
The values returned by katdal are not the raw values stored in the chunks: there is processing involved, such as application of calibration solutions and flagging of missing data. Some of this processing is common between visibilities, flags and weights. It’s thus more efficient to load the visibilities, flags and weights as a single operation rather than as three separate operations.
This can be achieved using DaskLazyIndexer.get()
. For example,
replace
vis = d.vis[idx]
flags = d.flags[idx]
weights = d.weights[idx]
with
vis, flags, weights = DaskLazyIndexer.get([d.vis, d.flags, d.weights], idx)
Parallelism¶
Dask uses multiple worker threads. It defaults to one thread per CPU core, but for I/O-bound tasks this is often not enough to achieve maximum throughput. Refer to the dask scheduler documentation for details of how to configure the number of workers.
More workers only helps if there is enough parallel work to be performed, which means there need to be at least as many chunks loaded at a time as there are workers (and preferably many more). It’s thus advisable to load as much data at a time as possible without running out of memory.
Selection¶
Using DataSet.select()
is relatively expensive. For the best
performance, it should only be used occasionally (for example, to filter
out unwanted data at the start), with array access notation or
DaskLazyIndexer.get()
used to break up large data sets into
manageable pieces.
Dask also performs better with selections that select contiguous data.
You might be able to get a little more performance by using
DataSet.scans()
(which will yield a series of contiguous
selections) rather than using select()
with
scans='track'
.
When using MVF v4 one can also pass a preselect parameter to katdal.open()
which allows slicing a subset of the data (time and frequency). It is more
limited than DataSet.select()
(it can only select contiguous ranges, and
can only specify the selection in terms of channels and dumps), but if a script
is only interested in working on a subset of data, this method can be more
efficient and uses less memory.
Network versus local disk¶
When loading data from the network, latency is typically higher, and so more workers will be needed to achieve peak throughput. Network access is also more sensitive to access patterns that are mis-aligned with chunks, because chunks are not cached in memory by the operation system and hence must be re-fetched over the network if they are accessed again.
Benchmarking¶
To assist with testing out the effects of changing these tuning
parameters, the katdal source code includes a script called
mvf_read_benchmark.py
that allows a data set to be loaded in
various ways and reports the average throughput. The command-line
options are somewhat limited so you may need to edit it yourself, for
example, to add a custom selection.
Sign conventions¶
Visibilities¶
For a wave with frequency \(\omega\) and wave number \(k\), the phasor is
Visibilities are then \(e_1 \overline{e_2}\).
In KAT-7, the opposite sign convention is used in the HDF5 files, but katdal conjugates the visibilities to match MeerKAT.
Baseline coordinates¶
The UVW coordinates for the baseline (A, B) are \((u, v, w)_A - (u, v, w)_B\). Combined with the above, this means that ideal visibilities (ignoring any effects apart from geometric delay) are
Polarisation¶
KAT-7 and MeerKAT are linear feed systems. On MeerKAT, if one points one’s right thumb in the direction of vertical polarisation and the right index finger in the direction of horizontal polarisation, then the right middle finger points from the antenna towards the source.
When exporting to a Measurement Set, katdal maps H to (IEEE) x and V to y, and introduces a 90° offset to the parallactic angle rotation.
KAT-7 has the opposite convention for polarisation (due to the lack of a sub-reflector). katdal does not make any effort to compensate for this. Measurement sets exported from KAT-7 data should thus not be used for polarimetry without further correction.
Data set format reference¶
In most cases uses should not need to know the details of the data set formats, because katdal exists to hide these details and present a consistent, user-friendly view. It also contains workarounds for known issues in older data sets (which are not documented here). This is reference documentation useful to katdal developers and to power users who need to extract information not presented by the katdal interface.
MVF version 1 (Fringe Finder)¶
Last updated: 9 March 2010
This documents the HDF5 file format as written by augment version 4. Also indicated by
(**) are those parts of the data file used by the scape
package, which
handles single-dish- or single baseline back-end processing for the Fringe Finder.
HDF5 files contain a hierarchical structure inside, with groups connected as
nodes in a directed graph. Each group may contain datasets (multi-dimensional arrays) and
attributes (metadata such as strings and smaller arrays). More information on
these concepts can be found on the HDF5 website.
The format currently calls for a single HDF5 file per experiment, but this may
change in future.
The hierarchical structure of a typical data set, with levels ‘Experiment’ -> ‘Compound Scan’ -> ‘Scan’, is reflected in the HDF5 group structure. The root ‘/’ group of the file represents the ‘Experiment’ level and contains the groups ‘Antennas’, ‘Correlator’, ‘Scans’. In addition, the root ‘/’ group the augment log dataset.
The ‘Antennas’ group contains a group for each physical antenna, named ‘Antenna%d’, where %d should be replaced by the one-based physical antenna number.
The ‘Scans’ group contains a group for each compound scan, named ‘CompoundScan%d’, where %d should be replaced by the zero-based compound scan index integer. Each ‘CompoundScan%d’ group contains a group for each scan, named ‘Scan%d’, where %d should be replaced by the zero-based scan index integer. The group structure of a typical dataset is shown below:
/
/Antennas
/Antennas/Antenna1
/Antennas/Antenna1/H
/Antennas/Antenna1/V
/Antennas/Antenna1/Sensors
/Antennas/Antenna2
/Antennas/Antenna2/H
/Antennas/Antenna2/V
/Antennas/Antenna2/Sensors
/Correlator
/Scans
/Scans/CompoundScan0
/Scans/CompoundScan0/Scan0
/Scans/CompoundScan0/Scan1
/Scans/CompoundScan0/Scan2
/Scans/CompoundScan1
/Scans/CompoundScan1/Scan0
/Scans/CompoundScan1/Scan1
/Scans/CompoundScan1/Scan2
The following sections list the data entries in each group. Four fields describe each entry: its name, D or A for whether it is an HDF5 dataset or attribute, and its data type, separated by colons and followed by a general description on the next line.
The root ‘/’ group¶
_ experiment_id : A : string
The ID for this experiment (usually a UUID generated by the system at observe time).
_ observer : A : string
The observer who ran the experiment. Data files are copied into the archive using the observer as part of the file hierarchy.
_ description : A : string
Description of the experiment as specified by the observer at observe time.
_ k7w_file_version : A : integer, optional
Version of the k7writer program used to generate the raw data file
_ data_unit : A : string, one of {‘counts’, ‘K’, ‘Jy’}
Power unit recorded in the scans, typically ‘counts’ for uncalibrated correlator output or ‘K’ for Tsys-corrected data.
data_timestamps_at_sample_centers : A : bool
Boolean flag indicating whether data timestamps are aligned with the center (if True or 1) or the start (if False or 0) of each sample / integration period.
augment_version : A : string, optional
String added by augmenter, indicating the version of the augmented data format.
augment : A : string
String added by augmenter, indicating when the file was augmented. If this attribute is absent, the file contains unaugmented correlator data only and cannot be loaded by
scape
.
_ augment_log : D : record array, optional, of shape (N,2) with record:
{'section' : string, 'message' : string}
The augment program log ouput from the augmentation of this HDF5 file.
/Antennas¶
This has no additional attributes or datasets.
/Antennas/Antenna%d¶
_ description : A : string
Description string of antenna, used bykatpoint
package to constructkatpoint.Antenna
object viakatpoint.construct_antenna()
. The string includes antenna name, location, size, etc.
/Antennas/Antenna%d/H¶
_ dbe_input : A : string
DBE input mapping for the H channel.
_ delay_s : A : float64 (???)
Cable delay in seconds for the H channel.
_ coupler_nd_model : D : float64 array of shape (N,2) ???
Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column
_ pin_nd_model : D : float64 array of shape (N,2) ???
Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column
/Antennas/Antenna%d/V¶
_ dbe_input : A : string
DBE input mapping for the V channel.
_ delay_s : A : float64 (???)
Cable delay in seconds for the V channel.
_ coupler_nd_model : D : float64 array of shape (N,2) ???
Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column
_ pin_nd_model : D : float64 array of shape (N,2) ???
Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column
/Antennas/Antenna%d/Sensors¶
This is comprised of a set of record arrays, each corresponding to sensors on the telescope. Each record array has 4 attributes: name, description, units, type (??? indicate here which items are optional for scape):
_ name : A : string
The katcp name of the sensor.
_ description : A : string, optional
Human understandable (hopefully) description of sensor.
_ units : A : string, optional
The units for the sensor readings. Currently scape does not take any notice of these attributes and assumes ‘standard’ units for the sensors.
_ type : A : string, optional???
Data type for the sensor readings.
The set of record arrays (all optional for scape ???) is as below and may be missing if no sensor data was recorded during the collection of the correlator data (???). The ‘status’ field for each is the sensor status recorded along with the sensor data. This could be ‘nominal’, ‘warn’ (sensor in warning range), ‘error’ (sensor in error range), ‘failure’ (comms error to sensor), ‘unknown’ (no initial value yet). Currently, scape does not take any notice of these sensor statuses (???).
enviro_air_pressure : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Environmental measurements of air pressure at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The air pressure is assumed by scape to be in mbar. The status is the sensor status as described above.
enviro_air_relative_humidity : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Environmental measurements of air relative humidity at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The humidity is assumed by scape to be a percentage. The status is the sensor status as described above.
enviro_air_temperature : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Environmental measurements of air temperature at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The air temperature is assumed by scape to be in deg C. The status is the sensor status as described above.
enviro_wind_direction : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Environmental measurements of wind direction at arbitrary time instants, where N is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind direction is assumed by scape to be in degrees increasing clockwise from North. The status is the sensor status as described above.
enviro_wind_speed : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Environmental measurements of wind speed at arbitrary time instants, where N is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind speed is assumed by scape to be in metres per second. The status is the sensor status as described above.
pos_actual_pointm_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual azimuth after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_actual_pointm_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual elevation after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_actual_refrac_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual azimuth after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_actual_refrac_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual elevation after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_actual_scan_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual azimuth after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_actual_scan_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Measurements of actual elevation after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_pointm_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested azimuth after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_pointm_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested elevation after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_refrac_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested azimuth after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_refrac_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested elevation after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_scan_azim : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested azimuth after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.
pos_request_scan_elev : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : float32, 'status' : string}
Requested elevation after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.
rfe3_rfe15_noise_coupler_on : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : string, 'status' : string}
Coupler noise diode firing flag at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The values are boolean wrapped in strings (‘0’ for off or ‘1’ for on). The status is the sensor status as described above.
rfe3_rfe15_noise_pin_on : D : record array of shape (N,3), with record:
{'timestamp' : float64, 'value' : string, 'status' : string}
Pin noise diode firing flag at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The values are boolean wrapped in strings (‘0’ for off or ‘1’ for on). The status is the sensor status as described above.
/Correlator¶
_ instrument_type : A : integer, optional
Correlator instrument type as per DBE.
_ instance_id : A : integer, optional
Correlator instance id as per DBE.
_ channel_bandwidth_hz : A : uint64, optional???
The width of each frequency channel in the scan data in Hz.
_ adc_sample_rate : A : uint64, optional???
ADC sample rate in Hz.
_ accum_per_int : A : uint64, optional???
Number of FFT frames (polyphase filterbank output samples) per integration. This is the number of samples from the output of each PFB channel that are multiplied together and accumulated in the correlator to form a single visibility sample.
_ num_freq_channels : A : uint64, optional???
Number of frequency channels contained in the scan data.
_ dump_rate_hz : A : float64
Correlator dump rate, in Hz. This should satisfy:
dump_rate = adc_sample_rate / (2 * num_freq_channels * accum_per_int)
_ center_frequency_hz : A : float32???, optional
Center frequency of the scan data in Hz.
_ channel_select : D : bool array of shape (F)
Array of boolean values, indicating which channels should be processed. F is the number of frequency channels.
_ input_map : D : record array of shape (N,2) with record:
{'correlator_product_id' : integer, 'dbe_inputs' : string}
This is used, combined with the /Antennas/Antenna%d/H/dbe_input or
/Antennas/Antenna%d/V/dbe_input attribute to map the physical antenna
H or V channel to the DBE input used then through to the correlator output.
*N* is the number of correlation products.
/Scans¶
This has no additional attributes or datasets.
/Scans/CompoundScan%d¶
_ label : A : string
A label for this compound scan.
_ target : A : string
Description string of target, used bykatpoint
package to constructkatpoint.Target
object viakatpoint.construct_target()
.
pointing_model : D : float32 array of shape (22,)
Pointing model used during experiment.
CorrelatorConfig¶
center_freqs : D : float64 array of shape (F,), optional
Center frequency of each channel, in Hz, where F is the number of channels. This is the main specification for center frequencies. If this dataset is not present, the center frequencies are assumed to be regularly spaced and calculated from the center_frequency_hz, bandwidth_hz and num_freq_channels attributes of this group, which must be present in this case.
bandwidths : D : float64 array of shape (F,), optional
Bandwidth of each channel, in Hz, where F is the number of channels. This is the main specification for channel bandwidths. If this dataset is not present, the bandwidths are all set to the bandwidth_hz attribute, which must be present in this case.
center_frequency_hz : A : uint64, optional
Center frequency of entire spectral band encompassing all channels. This is used to calculate channel center frequencies in the absence of the center_freqs dataset.
channel_bandwidth_hz : A : uint64, optional
Bandwidth of each channel. This is used as channel bandwidths in the absence of the bandwidths dataset.
num_freq_channels : A : uint64, optional
Number of channels.
adc_sample_rate : A : uint64, optional
ADC sample rate, in Hz.
Scan%d¶
data : D : record array of shape (T, F), with record:
{'AxBx' : {'r' : float32, 'i' : float32}, 'AyBy' : {'r' : float32, 'i' : float32}, 'AxBy' : {'r' : float32, 'i' : float32}, 'AyBx' : {'r' : float32, 'i' : float32}}
The record structure is compatible with the NumPy dtype:
[('AxBx', complex64), ('AyBy', complex64), ('AxBy', complex64), ('AyBx', complex64)]
Correlation data for a single baseline between antennas A and B (specified by antenna and antenna2, respectively), involving the x and y DBE inputs for each antenna (typically mapping to the H and V feeds, respectively). If antennas A and B correspond to the DBE antennas 0 and 1, respectively, the term ‘AxBy’ would refer to the correlation between the 0x and 1y DBE input signals. For a single dish, antennas A and B refer to the same antenna. Each correlation datum consists of a real (‘r’) and imaginary (‘i’) part, specified as 32-bit floats. The array has T rows and F columns, where T is the number of time samples in the scan and F is the number of frequency channels in the data set. The data has been scaled by accum_per_int, the number of samples that have been integrated into a single visibility.
timestamps : D : uint64 or float64 array of shape (T,)
Timestamps of the start of each sample in number of UTC milliseconds since the Unix epoch, which is the native format of the DBE. T is the number of time samples in the scan. If data_timestamps_at_sample_centers is True, the timestamps are aligned with the middle of each sample period instead.
pointing : D : record array of shape (T,), optional, with record:
{'az' : float32, 'el' : float32}
Pointing information, consisting of azimuth and elevation values per time sample, where T is the number of time samples in the scan. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. If this dataset is present, the original pointing information has been processed by selecting a specific sensor and interpolating its measured values to coincide with the correlator data timestamps (timestamps). For interferometric data, this is the pointing data of the first antenna.
requested_pointing : D : record array of shape (Tp,), optional, with record:
{'timestamp' : float64, 'request_scan_azim' : float32, 'request_scan_elev' : float32, 'request_refrac_azim' : float32, 'request_refrac_elev' : float32, 'request_pointm_azim' : float32, 'request_pointm_elev' : float32}
The requested / predicted / commanded target position at various points in the coordinate conversion chain, at arbitrary time instants. The timestamps are UTC seconds since the Unix epoch. The rest of the field names are the names of the corresponding Fringe Finder sensors. These fields contain azimuth and elevation values per time sample, where Tp is the number of pointing data records. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. The scan fields represent the highest level, obtained at the input of the refraction correction step, while the refrac fields are obtained at the output of this step. The pointm fields represent the lowest level, obtained at the output of the pointing model correction step. This dataset may be missing if no new pointing info was recorded during the collection of the correlator data. For interferometric data, this is the pointing data of the first antenna.
actual_pointing : D : record array of shape (Tp,), optional, with record:
{'timestamp' : float64, 'actual_scan_azim' : float32, 'actual_scan_elev' : float32, 'actual_refrac_azim' : float32, 'actual_refrac_elev' : float32, 'actual_pointm_azim' : float32, 'actual_pointm_elev' : float32}
The actual / measured target position at various points in the coordinate conversion chain, at arbitrary time instants. The timestamps are UTC seconds since the Unix epoch. The rest of the field names are the names of the corresponding Fringe Finder sensors. These fields contain azimuth and elevation values per time sample, where Tp is the number of pointing data records. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. The pointm fields represent the lowest level, obtained at the input of the reverse pointing model correction step, while the refrac fields are obtained at the output of this step. The scan fields represent the highest level, obtained at the output of the reverse refraction correction step. This dataset may be missing if no new pointing info was recorded during the collection of the correlator data. For interferometric data, this is the pointing data of the first antenna.
flags : D : record array of shape (T,), with record:
{'valid' : bool, 'nd_on' : bool}
Flags per time sample, where T is the number of time samples in the scan.
enviro_ambient : D : record array of shape (Ta,), optional, with record:
{'timestamp' : float64, 'temperature' : float32, 'pressure' : float32, 'humidity' : float32}
Slowly-varying (‘ambient’) environmental measurements at arbitrary time instants, where Ta is the number of ambient environment data records. The timestamps are UTC seconds since the Unix epoch. The ambient temperature is in degrees Celsius, the atmospheric pressure is in hPa and the relative humidity is a percentage. This dataset may be missing if no new ambient environment info was recorded during the collection of the correlator data.
enviro_wind : D : record array of shape (Tw,), optional, with record:
{'timestamp' : float64, 'wind_speed' : float32, 'wind_direction' : float32}
Environmental measurements of wind speed and direction (which typically vary faster than the ambient data) at arbitrary time instants, where Tw is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind speed is in metres per second, and the wind direction is in degrees increasing clockwise from North. This dataset may be missing if no new wind environment info was recorded during the collection of the correlator data.
label : A : string
String that can be used to identify the type of scan in the back-end scripts. Typical contents are ‘scan’ to indicate a normal scan, ‘slew’ to indicate the telescope moving to the start of the next scan, and ‘cal’ to indicate a noise diode firing. These suggestions are not enforced or checked by the
scape
package, however.comment : A : string
Generic comment added to scan.
MVF version 2 (KAT-7)¶
Introduction¶
With the introduction of the KAT-7 correlator, we have taken the opportunity to revisit the correlator data storage format. This document describes this updated format.
Basic Concept¶
A single HDF5 corresponds to a single observation (contiguous telescope time segment for a specified subarray).
At highest level split into Data and MetaData.
MetaData contains two distinct types:
- Configuration is known a priori and is static for the duration of the observation.
- Sensors contains dynamic information provided in the form of katcp sensors. Typically only full known post observation.
Flags and History are special cases objects that get populated during run time but not from sensors. These are also the only groups that could get updated post augmentation.
Some datasets such as the noise_diode flags are synthesised from sensor information post capture. These base sensors could then be removed if space is a concern.
A major/minor version number is included in the file. The major indicates the overall structural philosophy (this document describes version 2.x). The minor is used to identify the mandtory members of the MetaData and Markup groups included in the file. This allows addition of members (and modification of existing members) to the required list without wholesale changes to the file structure. The mandatory members are described in the following document: TBA.
If used to store voltage data then both correlator_data and timestamps are omitted as timing is synthesized on the fly.
Nut - number of correlator timeslots in this observation Nt - number of averaged time timeslots Nuf - number of correlator frequency channels Nf - number of averaged frequency channels Nbl - number of baselines Np - number of polarisation products Na - number of antennas in a given subarray AntennaK - first antenna in a given subarray AntennaN - last antenna in a given subarray
HDF5 Format¶
The structural format is shown below.
Groups are named using CamelCase, datasets are all lower case with underscores. Attributes are indicated next to a group in {}:
/ {augment_ts}
{experiment_id}
{version}
/Data/ {ts_of_first_timeslot}
/correlator_data - (Nt,Nf,Nbl,2) array of float32 visibilities (real and imag components)
/timestamps - (Nt) array of float64 timestamps (UT seconds since Unix epoch)
/voltage_data - (optional) (Na, Nt, Nf) array of 8bit voltage samples
/MetaData/
/Configuration/
/Antennas/ {num_antennas, subarray_id}
/AntennaK..N/ {description, delays, diameter, location, etc...}
/ beam_pattern
/ h_coupler_noise_diode_model
/ h_pin_noise_diode_model
/ v_coupler_noide_diode_model
/ v_pin_noise_diode_model
/Correlator/ {num_channels, center_freq, channel_bw, etc...}
/Observation/ {type, pi, contact, sw_build_versions, etc...}
/PostProcessing/ {channel_averaging, rfi_threshold, etc...}
/time_averaging - TBD detail of baseline dep time avg
/Sensors/
/Antennas/ {num_antennas, subarray_id}
/AntennaK..N/
/... - dataset per antenna and pedestal sensor
/DBE/
/... - dataset per DBE sensor
/Enviro/
/... - dataset per enviro sensor
/Other/
/... - dataset per other sensor
/RFE/
/... - dataset per RFE sensor
/Source/
/phase_center
/antenna_target - array of target sensors for each antenna
/Markup/
/dropped_data - (optional) describes data dropped by receivers
/flags - (Nt,Nf,Nbl) post averaged uint8 flags - 1bit per flag, packed
/flags_description - (Nflags,3) index, name and description for each packed flag type
/flags_full - (optional) (Nut,Nuf,Nbl) pre-averaged uint8 flags - 1bit per flag, packed
/labels - (optional) descriptions of intent of each observational phase (e.g. scan, slew, cal, etc..)
/noise_diode - (Nt,Na) noise diode state during this averaged timeslot
/noise_diode_full - (optional) (Nut,Na) noise diode state per correlator timeslot
/weights - (Nt,Nf,Nbl,Nweights) weights for each sample
/History/
/augment_log - Log output of augmentation process
/script_log - Log output of observation script
MVF version 3 (early MeerKAT)¶
The version 3 format is an evolution of the v2 format, and continues to use HDF5 as the underlying format. It was used for early engineering and commissioning of MeerKAT, but replaced by v4 before science operations started.
At present there is no detailed documentation.
MVF version 4 (MeerKAT)¶
The version 4 format is the standard format for MeerKAT visibility data. Unlike previous versions, the data for an observation does not reside in a single HDF5 file, as such files would be unmanageably large. Instead, the data is split into chunks, each in its own file, which are loaded from disk or the network on demand. For this reason, the term “data set” is preferred over “file”.
Concepts¶
- Streams
- A stream is a collection of data and any associated metadata, whether multicast, queriable (e.g., sensors) or stored on disk. Every stream in a subarray product has a unique name and a type. A stream may consist of multiple items of related data e.g., visibilities, flags and weights may form a single stream.
- Subarray product
- A collection of streams in the MeerKAT Science Data Processor (SDP) forms a subarray product.
- Capture block
A capture block is a contiguous period over which data is captured from a specific subarray product. A subarray product can only capture one capture block at a time.
Each visibility belongs to a specific stream and capture block within a subarray product.
Capture block IDs are currently numbers representing the start time in the UNIX epoch, but they should be treated as opaque strings.
- Chunk store
- A location (such as a local disk or the MeerKAT archive) that stores the data from a capture block.
Metadata¶
The metadata for a data set is stored in a Redis dump file
(extension .rdb
), which is exported by
katsdptelstate. Refer to
katsdptelstate for details of how
attributes and sensors are encoded in the Redis database.
A single .rdb
file contains metadata for a single subarray but
potentially for multiple streams and capture blocks. The default capture
block and stream to access are stored in capture_block_id
and
stream_name
.
Keys are stored in one of the following namespaces:
- the global namespace
stream_name
(the “stream namespace”)capture_block_id
(the “capture-block namespace”)capture_block_id.stream_name
(the “capture-stream namespace”)
Here .
is used to indicate a sub-namespace, but the actual separator
is subject to change and one should always use
join()
to
construct compound namespace names.
Keys may move between these namespaces without notice. Readers should
search for keys from the most specific to the least specific appropriate
namespace (see for example
katdal.datasources.view_capture_stream()
).
Where values contain strings, they might contain either raw bytes (which should be decoded as UTF-8) or Unicode text. Readers should be prepared to accept either. The goal is to eventually migrate all such fields to use text. katdal recursively converts all strings to the Python interpreter’s native string type.
katsdptelstate stores two types of values: immutable “attributes”, and “sensors” which are lists of timestamped values. In the documentation below, most keys contain attributes, and sensors are indicated.
Global metadata¶
A subset of the sensors in the MeerKAT system are stored in the file, in the global namespace. Documenting the MeerKAT sensors is beyond the scope of this documentation.
The following keys are also stored.
sdp_config
(dict)- The JSON object used to configure the SDP subarray product. It is not intended to be parsed (the relevant information is intended to be available via other means), but it contains a wealth of debugging information about the streams and connections between them.
sdp_capture_block_id
(string) — sensor- All capture block IDs seen so far. This should not be confused with
capture_block_id
, which indicates the default capture block ID that should be consulted when the file is opened without specifying a capture block ID. sdp_image_tag
(string)- The Docker image tag for the Docker images forming the realtime SDP capture system. This is the closest thing to a “version number” for the implementation.
sdp_image_overrides
(dict)- Alternative Docker image tags for specific services within SDP,
overriding
sdp_image_tag
. Overriding individual images is a debugging tool and it should always be empty for science observations. config.*
(dict)- Command-line options passed to each of the services within SDP.
sdp_task_details
(dict)- Debug information about each of the services launched for the subarray product, including the host on which it ran and the Mesos TaskInfo structure.
Common stream metadata¶
The list of streams that can be accessed from the archive is available
in sdp_archived_streams
(in the global namespace). Within each
stream, the following keys may be defined (not all make sense for
every stream type).
Only stream_type
and src_streams
are guaranteed to be in the
stream namespace, i.e. independent of the capture block. The others may
appear either in the capture-stream namespace or the stream namespace.
inherit
(string)If present, it indicates another stream from which this stream inherits properties. Any property that cannot be found in the namespace of the current stream should first be looked up in that stream’s namespace.
This is typically used where a single multicast stream is recorded in multiple places. Each copy inherits the majority of metadata from the original and overrides a few keys.
stream_type
(string)Valid values are
sdp.vis
- Uncalibrated visibilities, flags and weights
sdp.flags
- Similar to
sdp.vis
, but containing only flags sdp.cal
- Calibration solutions. Older files may contain a
cal
stream which omits the stream information and which does not appear insdp_archived_streams
, so that should be considered as a fallback. sdp.continuum_image
- Continuum image (as a list of CLEAN components) and self-calibration solutions. FITS files will be stored in the MeerKAT archive but katdal does not currently support accessing them.
sdp.spectral_image
- Spectral-line image. FITS files will be stored in the MeerKAT archive but katdal does not currently support accessing them.
src_streams
(list of string)- The streams from which the current stream was computed. These are
not necessarily listed in
sdp_archived_streams
, particularly if they were produced by the MeerKAT Correlator/Beamformer (CBF) rather than the SDP. n_chans
(int)- Number of channels in a channelised product.
n_chans_per_substream
(int)- Number of channels in each SPEAD heap. Not relevant when loading archived data.
bandwidth
(float, Hz)- Bandwidth of the stream.
center_freq
(float, Hz)- Middle of the central channel. Note that if the number of channels is even, this is actually half a channel higher than the middle of the band.
channel_range
(int, int)- A half-open range of channels taken from the source stream. The
length of this range might not equal
n_chans
due to channel averaging.
Visibility stream metadata¶
The following are relevant to sdp.vis
and sdp.flags
streams.
n_bls
(int)- Number of baselines. Note that a baseline is a correlation between two polarised inputs (a single entry in a Jones matrix).
bls_ordering
(either a list of string pairs or a 2D array)- An array of pairs of strings. Each pair names two antenna inputs
that form a baseline. There will be
n_bls
rows. Note that this can be either a list of 2-element lists or a NumPy array. sync_time
,int_time
,first_timestamp
(float)- Refer to Timestamps below.
excise
(bool)- True if RFI detected in the source stream is excised during time and channel averaging. If missing, assume it is true.
calibrations_applied
(list of string)- Names of
sdp.cal
streams whose corrections have been applied to the data. need_weights_power_scale
(bool)- Refer to Weights below. If missing, assume it is false.
s3_endpoint_url
(string),chunk_info
- Refer to Data below.
Calibration solutions¶
Streams of type sdp.cal
have the following keys.
antlist
(list of string, length n_ants)- List of antenna names. Arrays of calibration solutions use this order along the antenna axis.
pol_ordering
(list of string, length n_pols)- List of polarisations (from
v
andh
). Arrays of calibration solutions use this order along the polarisation axis. bls_ordering
(either a list of string pairs or a 2D array)- Same meaning as for
sdp.vis
streams, but describes the internal ordering used within the calibration pipeline and not of much use to users. param_*
- Parameters used to configure the calibration.
refant
(string)- Name of the selected reference antenna (which will also appear in
antlist
). The reference antenna is only chosen when first needed in a capture block, so this key may be absent if there was no calibration yet. In older datasets this key contains the katpoint antenna description string instead of the name. product_G
(array of complex, shape (n_pols, n_ants)) — sensor- Gain solutions (derived e.g. on a phase calibrator), indexed by polarisation and antenna. The complex values in the array apply to the entire band.
product_K
(array of float, shape (n_pols, n_ants)) — sensor- Delay solutions (in seconds), indexed by polarisation and antenna. To correct data at frequency \(\nu\), multiply it by \(e^{-2\pi i\cdot K\cdot \nu}\).
product_B_parts
(int)- Number of keys across which bandpass-like solutions are split.
product_BN
(array of complex, shape (n_chans, n_pols, n_ants)) — sensorBandpass solutions, indexed by channel, polarisation and antenna.
For implementation reasons, the bandpass solutions are split across multiple keys. N is in the range [0,
product_B_parts
), and these pieces should be concatenated along the channel (first) axis to reconstruct the full solution. If some pieces are missing (which is rare but can occur), they should be assumed to have the same shape as the present pieces.product_KCROSS_DIODE
(array of float, shape (n_pols, n_ants)) — sensorCross-hand delay solutions (in seconds), indexed by polarisation and antenna. Derived using noise diode firings.
Data at a given frequency is corrected in the same manner as
product_K
. One polarisation will serve as the reference polarisation and have all zero solutions.product_KCROSS
(array of float, shape (n_pols, n_ants)) — sensorCross-hand delay solutions (in seconds), indexed by polarisation and antenna.
Solutions are similar to
product_KCROSS_DIODE
but solved for using a celestial source instead of a noise diode.product_BCROSS_DIODEN
(array of complex, shape (n_chans, n_pols, n_ants)) — sensorCross-hand bandpass phase solutions, indexed by channel, polarisation and antenna.
Amplitudes for these solutions should always be one. One polarisation will serve as the reference polarisation and have all zero phase solutions.
As for
product_BN`
the cross-hand bandpass solutions are split across multiple keys indexed by N, where N is in the range [0,product_B_parts
). The full solution should be reconstructed as forproduct_BN
, by concatenating along the channel (first) axis.shared_solve_*N*
,last_dump_index*N*
- These are used for internal communication between the calibration processes, and are not intended for external use.
Some common points to note that about the solutions:
- Solutions describe the systematic errors. To correct data, it must be divided by the solutions.
- The key will only be present if at least one solution was computed.
- The timestamp associated with each sensor value is the timestamp of the middle of the data that was used to compute the solution.
- Solutions may contain NaN values, which indicates that there was insufficient information to compute a solution (for example, because all the data was flagged).
- Solutions are only valid as long as the system gain controls are not altered. Re-using gains from one capture block to correct data from another capture block may yield incorrect results unless one takes extra steps to correct for changes in the system gains.
Image stream metadata¶
The following apply to sdp.continuum_image
and sdp.spectral_image
streams.
target_list
(dict)- This is only applicable for imaging streams. Each key is a katpoint target description and the value is the normalised target name, which is a string used to form target-specific sub-namespaces of the stream and capture-stream namespaces. A normalised target name looks similar to the target name but has a limited character set (suitable for forming filenames and telstate namespaces) and, where necessary, a sequence number appended to ensure uniqueness.
For each sdp.continuum_image
stream, there is a sub-namespace per target
(named with the normalised target name) with the following keys (keeping
in mind that .
is used to indicate whichever separator is in use by
katsdptelstate for this database):
target0.clean_components
(dict)Image of the target field as a set of point sources. The
target0
sub-namespace is used to allow for possible alternative ways to run the continuum imager in which a single execution would image multiple fields, in which case there would betargetN
sub-namespaces up to some N. This is not currently expected for MeerKAT science observations.The dictionary has two keys:
Each sub-namespace per target contains a further sub-sub-namespace called
selfcal
that contains the self-calibration solutions. It behaves like
an sdp.cal
stream namespace and has the following keys:
antlist
(list of string, length n_ants)- List of antenna names. Arrays of self-calibration solutions use this order along the antenna axis.
pol_ordering
(list of string, length n_pols)- List of polarisations (from
v
andh
). Arrays of self-calibration solutions use this order along the polarisation axis. n_chans
(int)- Number of channels in the self-calibration solutions, which corresponds to the number of “IFs” or sub-bands in the continuum imager.
bandwidth
(float, Hz)- Bandwidth of the self-calibration solutions.
center_freq
(float, Hz)- Middle of the central channel. Note that if the number of channels is even, this is actually half a channel higher than the middle of the band.
product_GPHASE
(array of complex, shape (n_chans, n_pols, n_ants)) — sensorPhase-only self-calibration solutions, indexed by channel, polarisation and antenna.
Amplitudes for these solutions will be very close to one (to within numerical precision).
product_GAMP_PHASE
(array of complex, shape (n_chans, n_pols, n_ants)) — sensor- Amplitude + phase self-calibration solutions, indexed by channel, polarisation and antenna.
Timestamps¶
Timestamps are not stored explicitly. Instead, the first timestamp and the interval between dumps are stored, from which timestamps can be synthesised. The ith dump has a central timestamp (in the UNIX epoch) of \(\text{sync_time} + \text{first_timestamp} + i \times \text{int_time}\). The split of the initial timestamp into two parts is for technical reasons.
There is also first_timestamp_adc
, which is the same as
first_timestamp
but in units of the digitiser ADC counts. It is
stored only for internal implementation reasons and should not be relied
upon.
Light RDB files¶
The MeerKAT system also writes a “light” version of each RDB file, which contains only a subset of the keys. It is intended to contain enough information to read the uncalibrated visibilities and some high-level metadata about the observation itself. It does not contain information about antenna pointing, calibration, or CLEAN components.
Data¶
Visibilities, flags and weights are subdivided into small chunks. The
chunking model is based on dask. Visibilities are treated as a 3D
array, with axes for time, frequency and baseline. The data is divided
into pieces along each axis. Each piece is stored in a separate file
in the archive, in .npy format. The metadata necessary to reconstruct
the array is stored in the telescope state and documented in more detail
later. It is possible that some chunks will be missing, because they
were lost during the capture process. On load, katdal will replace such
chunks with default values and set the data_lost
flag for them.
Weights and flags are similarly treated.
Chunks are named type/AAAAA_BBBBB_CCCCC.npy
where type
is one of correlator_data
(visibilities), flags
, weights
;
and AAAAA, BBBBB and CCCCC are the (zero-based) indices of the
first element in the chunk along each axis, padded to a minimum of five digits.
Additionally, there are chunks named
weights_channel/AAAAA_BBBBB.npy
, explained below.
Note that the chunking scheme typically differs between visibilities, flags and weights, so files with the same base name start at the same point but do not necessarily have the same extent.
All the data for one stream is located in a single chunk store. If it is
in the MeerKAT archive, the URL to the base of this chunk store
(implementing the S3 protocol) is stored in s3_endpoint_url
.
Capture-stream specific information is stored in chunk_info
, a
two-level dictionary. The outer key is the type listed above, and the
inner key is one of:
prefix
(string)- A path prefix for the data. In the case of S3, this is the bucket
name. For local storage, it is a directory name (the parent of the
type
directory). dtype
(string)- Numpy dtype of the data, which is expected to match the dtype encoded in the individual chunk files.
shape
(tuple)- Shape of the virtual dask array obtained by joining together all the chunks.
chunks
(tuple of tuples)- Sizes of the chunks along each axis, in the format used by dask.
Weights¶
To save space, the weights are represented in an indirect form that requires some calculation to reconstruct. The actual weight for a visibility is the product of three values:
- The value in the
weights
chunk. - A baseline-independent value in the
weights_channel
chunk. - If the stream has a
need_weights_power_scale
key in telstate and the value is true, the inverse of the product of the autocorrelation power for the two inputs in the baseline.
Flags¶
Each flag is a bitfield. The meaning of the individual bits is
documented in the katdal.flags
module. Note that it is possible
that a flag chunk is present but the corresponding visibility or weight
data is missing, in which case it is the reader’s responsibility to set
the data_lost
bit.
The MeerKAT Science Data Processor typically uses two levels of
flagging: a conservative first-pass flagger run directly on the
correlator output, and a more accurate flagger that operates on
data that has been averaged and (in some cases) calibrated. The latter
appears in a stream of type sdp.flags
, which contains only flags. It
can be linked to the corresponding visibilities and weights by checking
its source streams. The flags in this stream are a
superset of the flags in the originating stream and are guaranteed to
have the same timestamp and frequency metadata, so can be used in place
of the original flags. However, due to data loss it is possible that
the replacement flags will have slightly more or fewer dumps at the end,
which will need to be handled.
katdal¶
katdal package¶
Submodules¶
katdal.applycal module¶
Utilities for applying calibration solutions to visibilities and weights.
-
katdal.applycal.
complex_interp
(x, xi, yi, left=None, right=None)¶ Piecewise linear interpolation of magnitude and phase of complex values.
Given discrete data points (xi, yi), this returns a 1-D piecewise linear interpolation y evaluated at the x coordinates, similar to numpy.interp(x, xi, yi). While
numpy.interp()
interpolates the real and imaginary parts of yi separately, this function interpolates magnitude and (unwrapped) phase separately instead. This is useful when the phase of yi changes more rapidly than its magnitude, as in electronic gains.Parameters: - x (1-D sequence of float, length M) – The x-coordinates at which to evaluate the interpolated values
- xi (1-D sequence of float, length N) – The x-coordinates of the data points, must be sorted in ascending order
- yi (1-D sequence of complex, length N) – The y-coordinates of the data points, same length as xi
- left (complex, optional) – Value to return for x < xi[0], default is yi[0]
- right (complex, optional) – Value to return for x > xi[-1], default is yi[-1]
Returns: y – The evaluated y-coordinates, same length as x and same dtype as yi
Return type: array of complex, length M
-
katdal.applycal.
get_cal_product
(cache, cal_stream, product_type)¶ Extract calibration solution from cache as a sensor.
Parameters: - cache (
SensorCache
object) – Sensor cache serving cal product sensors - cal_stream (string) – Name of calibration stream (e.g. “l1”)
- product_type (string) – Calibration product type (e.g. “G”)
- cache (
-
katdal.applycal.
calc_delay_correction
(sensor, index, data_freqs)¶ Calculate correction sensor from delay calibration solution sensor.
Given the delay calibration solution sensor, this extracts the delay time series of the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).
Invalid delays (NaNs) are replaced by zeros, since bandpass calibration still has a shot at fixing any residual delay.
-
katdal.applycal.
calc_bandpass_correction
(sensor, index, data_freqs, cal_freqs)¶ Calculate correction sensor from bandpass calibration solution sensor.
Given the bandpass calibration solution sensor, this extracts the time series of bandpasses (channelised by cal_freqs) for the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).
Invalid solutions (NaNs) are replaced by linear interpolations over frequency (separately for magnitude and phase), as long as some channels have valid solutions.
-
katdal.applycal.
calc_gain_correction
(sensor, index, targets=None)¶ Calculate correction sensor from gain calibration solution sensor.
Given the gain calibration solution sensor, this extracts the time series of gains for the input specified by index (in the form (pol, ant)) and interpolates them over time to get the corresponding complex correction terms. The optional targets parameter is a
CategoricalData
i.e. a sensor indicating the target associated with each dump. The targets can be actualkatpoint.Target
objects or indices, as long as they uniquely identify the target. If provided, interpolate solutions derived from one target only at dumps associated with that target, which is what you want for self-calibration solutions (but not for standard calibration based on gain calibrator sources).Invalid solutions (NaNs) are replaced by linear interpolations over time (separately for magnitude and phase), as long as some dumps have valid solutions on the appropriate target.
-
katdal.applycal.
calibrate_flux
(sensor, targets, gaincal_flux)¶ Apply flux scale to calibrator gains (aka flux calibration).
Given the gain calibration solution sensor, this identifies the target associated with each set of solutions by looking up the gain events in the targets sensor, and then scales the gains by the inverse square root of the relevant flux if a valid match is found in the gaincal_flux dict. This is equivalent to the final step of the AIPS GETJY and CASA fluxscale tasks.
-
katdal.applycal.
add_applycal_sensors
(cache, attrs, data_freqs, cal_stream, cal_substreams=None, gaincal_flux={})¶ Register virtual sensors for one calibration stream.
This operates on a single calibration stream called cal_stream (possibly an alias), which derives from one or more underlying cal streams listed in cal_substreams and has stream attributes in attrs.
The first set of virtual sensors maps all cal products into a unified namespace (template ‘Calibration/Products/cal_stream/{product_type}’). Map receptor inputs to the relevant indices in each calibration product based on the ants and pols found in attrs. Then register a virtual sensor per product type and per input in the SensorCache cache, with template ‘Calibration/Corrections/cal_stream/{product_type}/{inp}’. The virtual sensor function picks the appropriate correction calculator based on the cal product type, which also uses auxiliary info like the channel frequencies, data_freqs.
Parameters: - cache (
SensorCache
object) – Sensor cache serving cal product sensors and receiving correction sensors - attrs (dict-like) – Calibration stream attributes (e.g. a “cal” telstate view)
- data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
- cal_stream (string) – Name of (possibly virtual) calibration stream (e.g. “l1”)
- cal_substreams (sequence of string, optional) – Names of actual underlying calibration streams (e.g. [“cal”]), defaults to [cal_stream] itself
- gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux stored in attrs (if available). A value of None disables flux calibration.
Returns: cal_freqs – Centre frequency of each frequency channel of calibration stream, in Hz (or None if no sensors were registered)
Return type: 1D array of float, or None
- cache (
-
class
katdal.applycal.
CorrectionParams
(inputs, input1_index, input2_index, corrections, channel_maps)¶ Bases:
object
Data needed to compute corrections in
calc_correction_per_corrprod()
.Once constructed, the data in this class must not be modified, as it will be baked into dask graphs.
Parameters: - inputs (list of str) – Names of inputs, in the same order as the input axis of products
- input2_index (input1_index,) – Indices into inputs of first and second items of correlation product
- corrections (dict) – A dictionary (indexed by cal product name) of lists (indexed by input) of sequences (indexed by dump) of numpy arrays, with corrections to apply.
- channel_maps (dict) – A dictionary (indexed by cal product name) of functions (signature g = channel_map(g, channels)) that map the frequency axis of the cal product g onto the frequency axis of the visibility data, where the vis frequency axis will be indexed by the slice channels.
-
katdal.applycal.
calc_correction_per_corrprod
(dump, channels, params)¶ Gain correction per channel per correlation product for a given dump.
This calculates an array of complex gain correction terms of shape (n_chans, n_corrprods) that can be directly applied to visibility data. This incorporates all requested calibration products at the specified dump and channels.
Parameters: - dump (int) – Dump index (applicable to full data set, i.e. absolute)
- channels (slice) – Channel indices (applicable to full data set, i.e. absolute)
- params (
CorrectionParams
) – Corrections per input, together with correlation product indices
Returns: gains – Gain corrections per channel per correlation product
Return type: array of complex64, shape (n_chans, n_corrprods)
Raises: KeyError
– If input and/or cal product has no associated correction
-
katdal.applycal.
calc_correction
(chunks, cache, corrprods, cal_products, data_freqs, all_cal_freqs, skip_missing_products=False)¶ Create a dask array containing applycal corrections.
Parameters: - chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see
dask.array.core.normalize_chunks()
). - cache (
SensorCache
object) – Sensor cache, used to look up individual correction sensors - corrprods (sequence of (string, string)) – Selected correlation products as pairs of correlator input labels
- cal_products (sequence of string) – Calibration products that will contribute to corrections (e.g. [“l1.G”])
- data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
- all_cal_freqs (dict) – Dictionary mapping cal stream name (e.g. “l1”) to array of associated frequencies
- skip_missing_products (bool) – If True, skip products with missing sensors instead of raising KeyError
Returns: - final_cal_products (list of string) – List of calibration products in the order that they will be applied (potentially a subset of cal_products if skipping missing products)
- corrections (
dask.array.Array
object, or None) – Dask array that produces corrections for entire vis array, or None if no calibration products were found (either cal_products is empty or all products had some missing sensors and skip_missing_products is True)
Raises: KeyError
– If a correction sensor for a given input and cal product is not found (and skip_missing_products is False)- chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see
-
katdal.applycal.
apply_vis_correction
¶ Clean up and apply correction to visibility data in data.
-
katdal.applycal.
apply_weights_correction
¶ Clean up and apply correction to weight data in data.
-
katdal.applycal.
apply_flags_correction
¶ Set POSTPROC flag wherever correction is invalid.
katdal.averager module¶
-
katdal.averager.
average_visibilities
(vis, weight, flag, timestamps, channel_freqs, timeav=10, chanav=8, flagav=False)¶ Average visibilities, flags and weights.
Visibilities are weight-averaged using the weights in the weight array with flagged data set to weight zero. The averaged weights are the sum of the input weights for each average block. An average flag is retained if all of the data in an averaging block is flagged (the averaged visibility in this case is the unweighted average of the input visibilities). In cases where the averaging size in channel or time does not evenly divide the size of the input data, the remaining channels or timestamps at the end of the array after averaging are discarded. Channels are averaged first and the timestamps are second. An array of timestamps and frequencies corresponding to each channel is also directly averaged and returned.
Parameters: - vis (array(numtimestamps,numchannels,numbaselines) of complex64.) – The input visibilities to be averaged.
- weight (array(numtimestamps,numchannels,numbaselines) of float32.) – The input weights (used for weighted averaging).
- flag (array(numtimestamps,numchannels,numbaselines) of boolean.) – Input flags (flagged data have weight zero before averaging).
- timestamps (array(numtimestamps) of int.) – The timestamps (in mjd seconds) corresponding to the input data.
- channel_freqs (array(numchannels) of int.) – The frequencies (in Hz) corresponding to the input channels.
- timeav (int.) – The desired averaging size in timestamps.
- chanav (int.) – The desired averaging size in channels.
- flagav (bool) – Flagged averaged data in when there is a single flag in the bin if true. Only flag averaged data when all data in the bin is flagged if false.
Returns: - av_vis (array(int(numtimestamps/timeav),int(numchannels/chanav)) of complex64.)
- av_weight (array(int(numtimestamps/timeav),int(numchannels/chanav)) of float32.)
- av_flag (array(int(numtimestamps/timeav),int(numchannels/chanav)) of boolean.)
- av_mjd (array(int(numtimestamps/timeav)) of int.)
- av_freq (array(int(numchannels)/chanav) of int.)
katdal.categorical module¶
Container for categorical (i.e. non-numerical) sensor data and related tools.
-
class
katdal.categorical.
ComparableArrayWrapper
(value)¶ Bases:
object
Wrapper that improves comparison of array objects.
This wrapper class has two main benefits:
- It prevents sensor values that are NumPy ndarrays themselves (or array-like objects such as tuples and lists) from dissolving and losing their identity when they are assembled into an array.
- It ensures that array-valued sensor values become properly comparable (avoiding array-valued booleans resulting from standard comparisons).
The former is needed because
SensorGetter
is treated as a structured array even if it contains object values. The latter is needed because the equality operator crops up in hard-to-reach places like inside list.index().Parameters: value (object) – The sensor value to be wrapped -
static
unwrap
(v)¶ Unwrap value if needed.
-
katdal.categorical.
infer_dtype
(values)¶ Figure out dtype of sequence of sensor values.
The common dtype is determined by explicit NumPy promotion. If the values are array-like themselves, treat them as opaque objects to simplify sensor processing. If the sequence is empty, the dtype is unknown and set to None. In addition, short-circuit to an actual dtype for objects with this attribute to simplify calling this on a mixed collection of sensor data.
Parameters: values (sequence, or object with dtype) – Sequence of sensor values (typically a list), or a sensor data object with a dtype attribute (like ndarray or SensorGetter
)Returns: dtype – Inferred dtype, or None if values is an empty sequence Return type: numpy.dtype
object or NoneNotes
This is almost, but not quite, entirely like
numpy.result_type()
. The differences are that this accepts generic objects in the sequence, treats ndarrays as objects regardless of their underlying dtype, supports a dtype of None and short-circuits the check if the sequence itself is an object with a dtype. And this accepts the sequence as the first parameter as opposed to being unpacked across the argument list.
-
katdal.categorical.
unique_in_order
(elements, return_inverse=False)¶ Extract unique elements from elements while preserving original order.
Parameters: - elements (sequence) – Sequence of equality-comparable objects
- return_inverse ({False, True}, optional) – If True, also return sequence of indices that can be used to reconstruct original elements sequence via [unique_elements[i] for i in inverse]
Returns: - unique_elements (list) – List of unique objects, in original order they were found in elements
- inverse (array of int, optional) – If return_inverse is True, sequence of indices that can be used to reconstruct original sequence
-
class
katdal.categorical.
CategoricalData
(sensor_values, events)¶ Bases:
object
Container for categorical (i.e. non-numerical) sensor data.
This container allows simple manipulation and interpolation of a time series of non-numerical data represented as discrete events. The data is stored as a list of sensor values and two integer arrays:
- unique_values stores one copy of each unique object in the data series
- events stores the time indices (dumps) where each event occurs
- indices stores indices linking each event to the unique_values list
The __getitem__ interface (i.e. data[dump]) returns the data associated with the last event before the requested dump(s), in effect doing a zeroth-order interpolation of the data at each event. Events can be added and removed and realigned, and the container can be split along the time axis, amongst other functionality.
Parameters: - sensor_values (sequence, length N) – Sequence of sensor values (of any type, preferably not None [see Notes])
- events (sequence of non-negative ints, length N + 1) – Corresponding monotonic sequence of dump indices where each sensor value came into effect. The last event is one past the last dump where the final sensor value applied, and therefore equal to the total number of dumps for which sensor values were specified.
-
unique_values
¶ List of unique sensor values in order they were found in sensor_values with any
ComparableArrayWrapper
objects unwrappedType: list, length M
-
indices
¶ Array of indices into unique_values, one per sensor event
Type: array of int, shape (N,)
-
dtype
¶ Sensor data type as NumPy dtype (found on demand from unique_values)
Type: numpy.dtype
object
Notes
Any object values wrapped in a
ComparableArrayWrapper
will be unwrapped before adding it to unique_values. When adding, removing and comparing values to this container, any object values will be wrapped again temporarily to ensure proper comparisons.It is discouraged to have a sensor value of None as this value is given a special meaning in methods such as
CategoricalData.add()
andsensor_to_categorical()
. On the other hand, it is the most sensible dummy object value and any Nones entering through this initialiser will probably not cause any issues.It is better to make unique_values a list instead of an array because an array assimilates objects such as tuples, lists and other arrays. The alternative is an array of
ComparableArrayWrapper
objects but these then need to be unpacked at some later stage which is also tricky.-
dtype
Sensor value type.
-
segments
()¶ Generator that iterates through events and returns segment and value.
Yields: - segment (slice object) – The slice representing range of dump indices of the current segment
- value (object) – Sensor value associated with segment
-
add
(event, value=None)¶ Add or override sensor event.
This adds a new event to the container, with a new value or a duplicate of the existing value at that dump. If the new event coincides with an existing one, it overrides the value at that dump.
Parameters: - event (int) – Dump of event to add or override
- value (object, optional) – New value for event (duplicate current value at this dump by default)
-
remove
(value)¶ Remove sensor value, remapping indices and merging segments in process.
If the sensor value does not exist, do nothing.
Parameters: value (object) – Sensor value to remove from container
-
add_unmatched
(segments, match_dist=1)¶ Add duplicate events for segment starts that don’t match sensor events.
Given a sequence of segments, this matches each segment start to the nearest sensor event dump (within match_dist). Any unmatched segment starts are added as duplicate sensor events (or ignored if they fall outside the sensor event range).
Parameters: - segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
- match_dist (int, optional) – Maximum distance in dumps that signify a match between events
-
align
(segments)¶ Align sensor events with segment starts, possibly discarding events.
Given a sequence of segments, this moves each sensor event dump onto the nearest segment start. If more than one event ends up in the same segment, only keep the last event, discarding the rest.
The end result is that the sensor event dumps become a subset of the segment starts and there cannot be more sensor events than segments.
Parameters: segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
-
partition
(segments)¶ Partition dataset into multiple sets along time axis.
Given a sequence of segments, split the container into a sequence of containers, one per segment. Each container contains only the events occurring within its corresponding segment, with event dumps relative to the start of the segment, and the containers share the same unique values.
Parameters: segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment Returns: split_data – Resulting multiple datasets in chronological order Return type: sequence of CategoricalData
objects
-
remove_repeats
()¶ Remove repeated events of the same value.
-
katdal.categorical.
concatenate_categorical
(split_data, **kwargs)¶ Concatenate multiple categorical datasets into one along time axis.
Join a sequence of categorical datasets together, by forming a common set of unique values, remapping events to these and incrementing the event dumps of each dataset to start off where the previous dataset ended.
Parameters: split_data (sequence of CategoricalData
objects) – Sequence of containers to concatenateReturns: data – Concatenated dataset Return type: CategoricalData
object
-
katdal.categorical.
sensor_to_categorical
(sensor_timestamps, sensor_values, dump_midtimes, dump_period, transform=None, initial_value=None, greedy_values=None, allow_repeats=False, **kwargs)¶ Align categorical sensor events with dumps and clean up spurious events.
This converts timestamped sensor data into a categorical dataset by comparing the sensor timestamps to a series of dump timestamps and assigning each sensor event to the dump in which it occurred. When multiple sensor events happen in the same dump, only the last one is kept. The first dump is guaranteed to have a valid value by either using the supplied initial_value or extrapolating the first proper value back in time. The sensor data may be transformed before events that repeat values are potentially discarded. Finally, events with values marked as “greedy” take precedence over normal events when both occur within the same dump (either changing from or to the greedy value, or if the greedy value occurs completely within a dump).
XXX Future improvements include picking the event with the longest duration within a dump as opposed to the final event, and “snapping” event boundaries to dump boundaries with a given tolerance (e.g. 5-10% of dump period).
Parameters: - sensor_timestamps (sequence of float, length M) – Sequence of sensor timestamps (typically UTC seconds since Unix epoch)
- sensor_values (sequence, length M) – Corresponding sequence of sensor values [potentially wrapped]
- dump_midtimes (sequence of float, length N) – Sequence of dump midtimes (same reference as sensor timestamps)
- dump_period (float) – Duration of each dump, in seconds
- transform (callable or None, optional) – Transform [unwrapped] sensor values before fixing initial value, mapping dumps to events and discarding repeats
- initial_value (object or None, optional) – Sensor value [transformed, unwrapped] to use for dump = 0 up to first proper event (force first proper event to start at dump = 0 by default)
- greedy_values (sequence or None, optional) – List of [transformed, unwrapped] sensor values considered “greedy”
- allow_repeats ({False, True}, optional) – If False, discard sensor events that do not change [transformed] value
Returns: data – Constructed categorical dataset [unwraps any wrapped values]
Return type: CategoricalData
object
katdal.chunkstore module¶
Base class for accessing a store of chunks (i.e. N-dimensional arrays).
-
exception
katdal.chunkstore.
ChunkStoreError
¶ Bases:
Exception
“Base class for all standard ChunkStore errors.
Bases:
OSError
,katdal.chunkstore.ChunkStoreError
Could not access underlying storage medium (offline, auth failed, etc).
-
exception
katdal.chunkstore.
ChunkNotFound
¶ Bases:
KeyError
,katdal.chunkstore.ChunkStoreError
The store was accessible but a chunk with the given name was not found.
-
exception
katdal.chunkstore.
BadChunk
¶ Bases:
ValueError
,katdal.chunkstore.ChunkStoreError
The chunk is malformed, e.g. bad dtype, actual shape differs from requested.
-
class
katdal.chunkstore.
PlaceholderChunk
(shape, dtype)¶ Bases:
object
Chunk returned to indicate missing data.
-
katdal.chunkstore.
generate_chunks
(shape, dtype, max_chunk_size, dims_to_split=None, power_of_two=False, max_dim_elements=None)¶ Generate dask chunk specification from ndarray parameters.
Parameters: - shape (sequence of int) – Array shape
- dtype (
numpy.dtype
object or equivalent) – Array data type - max_chunk_size (float or int) – Upper limit on chunk size (if allowed by dims_to_split), in bytes
- dims_to_split (sequence of int, optional) – Indices of dimensions that may be split into chunks (default all dims)
- power_of_two (bool, optional) – True if chunk size should be rounded down to a power of two (the last chunk size along each dimension will potentially be smaller)
- max_dim_elements (dict, optional) – Maximum number of elements on each dimension (each key is a dimension index). Dimensions that are not in dims_to_split are ignored.
Returns: chunks – Dask chunk specification, indicating chunk sizes along each dimension
Return type: tuple of tuple of int
-
katdal.chunkstore.
npy_header_and_body
(chunk)¶ Prepare a chunk for low-level writing.
Returns the .npy header and a view of the chunk corresponding to that header. The two should be concatenated (as buffer objects) to form a valid .npy file.
This is useful for high-performance code, as it allows a chunk to be encoded as a .npy file more efficiently than saving to a
io.BytesIO
.
-
class
katdal.chunkstore.
ChunkStore
(error_map=None)¶ Bases:
object
Base class for accessing a store of chunks (i.e. N-dimensional arrays).
A chunk is a simple (i.e. unit-stride) slice of an N-dimensional array known as its parent array. The array is identified by a string name, while the chunk within the array is identified by a sequence of slice objects which may be used to extract the chunk from the array. The array is a
numpy.ndarray
object with an associated dtype.The basic idea is that the chunk store contains multiple arrays addressed by name. The list of available arrays and all array metadata (shape, chunks and dtype) are stored elsewhere. The metadata is used to identify chunks, while the chunk store takes care of storing and retrieving bytestrings of actual chunk data. These are packaged back into NumPy arrays for the user. Each array can only be stored once, with a unique chunking scheme (i.e. different chunking of the same data is disallowed).
The naming scheme for arrays and chunks is reasonably generic but has some restrictions:
Names are treated like paths with components and a standard separator
The chunk name is formed by appending a string of indices to the array name
It is discouraged to have an array name that is a prefix of another name
Each chunk store has its own restrictions on valid characters in names: some treat names as URLs while others treat them as filenames. A safe choice for name components should be the valid characters for S3 buckets (also including underscores for non-bucket components):
VALID_BUCKET = re.compile(r’^[a-z0-9][a-z0-9.-]{2,62}$’)
Parameters: error_map (dict mapping Exception
toException
, optional) – Dict that maps store-specific errors to standard ChunkStore errors-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
get_chunk_or_default
(array_name, slices, dtype, default_value=0)¶ Get chunk from the store but return default value if it is missing.
-
get_chunk_or_placeholder
(array_name, slices, dtype)¶ Get chunk from the store but return a
PlaceholderChunk
if it is missing.
-
create_array
(array_name)¶ Create a new array if it does not already exist.
Parameters: array_name (string) – Identifier of array Raises: chunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
put_chunk_noraise
(array_name, slices, chunk)¶ Put chunk into store but return any exceptions instead of raising.
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
-
NAME_SEP
= '/'¶
-
NAME_INDEX_WIDTH
= 5¶
-
classmethod
join
(*names)¶ Join components of chunk name with supported separator.
-
classmethod
split
(name, maxsplit=-1)¶ Split chunk name into components based on supported separator.
-
classmethod
chunk_id_str
(slices)¶ Chunk identifier in string form (e.g. ‘00012_01024_00000’).
-
classmethod
chunk_metadata
(array_name, slices, chunk=None, dtype=None)¶ Turn array name and chunk identifier into chunk name and shape.
Form the full chunk name from array_name and slices and extract the chunk shape from slices, validating it in the process. If chunk or dtype is given, check that chunk is commensurate with slices and that dtype contains no objects which would cause nasty segfaults.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object, optional) – Actual chunk data as ndarray (used to validate shape / dtype) - dtype (
numpy.dtype
object or equivalent, optional) – Data type of array x (used for validation only)
Returns: - chunk_name (string) – Full chunk name used to find chunk in underlying storage medium
- shape (tuple of int) – Chunk shape tuple associated with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunk, or any dtype contains objects
-
get_dask_array
(array_name, chunks, dtype, offset=(), index=(), errors=0)¶ Get dask array from the store.
Handling of missing chunks is determined by the errors argument.
Parameters: - array_name (string) – Identifier of array in chunk store
- chunks (tuple of tuples of ints) – Chunk specification
- dtype (
numpy.dtype
object or equivalent) – Data type of array - offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
- errors (number or 'raise' or 'placeholder', optional) –
Error handling. If ‘raise’, exceptions are passed through, causing the evaluation to fail.
If ‘placeholder’, returns instances of
PlaceholderChunk
in place of missing chunks. Note that such an array cannot be used as-is, because an ndarray is expected, but it can be used as raw material for building new graphs via functions likeda.map_blocks()
.If a numeric value, it is used as a default value.
Returns: array – Dask array of given dtype
Return type: dask.array.Array
object
-
put_dask_array
(array_name, array, offset=())¶ Put dask array into the store.
Parameters: - array_name (string) – Identifier of array in chunk store
- array (
dask.array.Array
object) – Dask input array - offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
Returns: success – Dask array of objects indicating success of transfer of each chunk (None indicates success, otherwise there is an exception object)
Return type: dask.array.Array
object
katdal.chunkstore_dict module¶
A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.
-
class
katdal.chunkstore_dict.
DictChunkStore
(**kwargs)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.
This interprets all keyword arguments as NumPy arrays and stores them in an arrays dict. Each array is identified by its corresponding keyword. New arrays cannot be added via
put()
- they all need to be in place at store initialisation (or can be added afterwards via direct insertion into the arrays dict). The put method is only useful for in-place modification of existing arrays.-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ Create a new array if it does not already exist.
Parameters: array_name (string) – Identifier of array Raises: chunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
katdal.chunkstore_npy module¶
A store of chunks (i.e. N-dimensional arrays) based on NPY files.
-
class
katdal.chunkstore_npy.
NpyFileChunkStore
(path, direct_write=False)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on NPY files.
Each chunk is stored in a separate binary file in NumPy
.npy
format. The filename is constructed as“<path>/<array>/<idx>.npy”where “<path>” is the chunk store directory specified on construction, “<array>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”).
For a description of the
.npy
format, seenumpy.lib.format
or the relevant NumPy Enhancement Proposal here.Parameters: - path (string) – Top-level directory that contains NPY files of chunk store
- direct_write (bool) – If true, use
O_DIRECT
when writing the file. This bypasses the OS page cache, which can be useful to avoid filling it up with files that won’t be read again.
Raises: chunkstore.StoreUnavailable
– If path does not exist / is not readablechunkstore.StoreUnavailable
– If direct_write was requested but is not available
-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ See the docstring of
ChunkStore.create_array()
.
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
katdal.chunkstore_s3 module¶
A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.
-
exception
katdal.chunkstore_s3.
S3ObjectNotFound
¶ Bases:
katdal.chunkstore.ChunkNotFound
An object / bucket was not found in S3 object store.
-
exception
katdal.chunkstore_s3.
S3ServerGlitch
¶ Bases:
katdal.chunkstore.ChunkNotFound
S3 chunk store responded with an HTTP error deemed to be temporary.
-
katdal.chunkstore_s3.
read_array
(fp)¶ Read a numpy array in npy format from a file descriptor.
This is the same concept as
numpy.lib.format.read_array()
, but optimised for the case of reading fromhttp.client.HTTPResponse
. Using the numpy function reads pieces out then copies them into the array, while this implementation uses readinto. RaiseTruncatedRead
if the response runs out of data before the array is complete.It does not allow pickled dtypes.
-
exception
katdal.chunkstore_s3.
AuthorisationFailed
¶ Bases:
katdal.chunkstore.StoreUnavailable
Authorisation failed, e.g. due to invalid, malformed or expired token.
-
exception
katdal.chunkstore_s3.
InvalidToken
(token, message)¶ Bases:
katdal.chunkstore_s3.AuthorisationFailed
Invalid JSON Web Token (JWT).
-
katdal.chunkstore_s3.
decode_jwt
(token)¶ Decode JSON Web Token (JWT) string and extract claims.
The MeerKAT archive uses JWT bearer tokens for authorisation. Each token is a JSON Web Signature (JWS) string with a payload of claims. This function extracts the claims as a dict, while also doing basic checks on the token (mostly to catch copy-n-paste errors). The signature is decoded but not validated, since that would require the server secrets.
Parameters: token (str) – JWS Compact Serialization as an ASCII string (native string, not bytes) Returns: claims – The JWT Claims Set as a dict of key-value pairs Return type: dict Raises: InvalidToken
– If the token is malformed or truncated, or has expired
-
class
katdal.chunkstore_s3.
S3ChunkStore
(url, timeout=(30, 300), retries=2, token=None, credentials=None, public_read=False, expiry_days=0, **kwargs)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.
This object encapsulates the S3 client / session and its underlying connection pool, which allows subsequent get and put calls to share the connections.
The full identifier of each chunk (the “chunk name”) is given by
“<bucket>/<path>/<idx>”where “<bucket>” refers to the relevant S3 bucket, “<bucket>/<path>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”). The corresponding S3 key string of a chunk is “<path>/<idx>.npy” which reflects the fact that the chunk is stored as a string representation of an NPY file (complete with header).
Parameters: - url (str) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’. It can be specified as either bytes or unicode, and is converted to the native string type with UTF-8. The URL may also contain a path if this store is relative to an existing bucket, in which case the chunk name is a relative path (useful for unit tests).
- timeout (float or None or tuple of 2 floats or None's, optional) – Connect / read timeout, in seconds, either a single value for both or custom values as (connect, read) tuple. None means “wait forever”…
- retries (int or tuple of 2 ints or
urllib3.util.retry.Retry
, optional) – Number of connect / read retries, either a single value for both or custom values as (connect, read) tuple, or a Retry object for full customisation (including status retries). - token (str, optional) – Bearer token to authenticate
- credentials (tuple of str, optional) – AWS access key and secret key to authenticate
- public_read (bool, optional) – If set to true, new buckets will be created with a policy that allows everyone (including unauthenticated users) to read the data.
- expiry_days (int, optional) – If set to a value greater than 0 will set a future expiry time in days for any new buckets created.
- kwargs (dict) – Extra keyword arguments (unused)
Raises: chunkstore.StoreUnavailable
– If S3 server interaction failed (it’s down, no authentication, etc)-
request
(method, url, process=<function S3ChunkStore.<lambda>>, chunk_name='', ignored_errors=(), timeout=(), retries=None, **kwargs)¶ Send HTTP request to S3 server, process response and retry if needed.
This retries temporary HTTP errors, including reset connections while processing a successful response.
Parameters: - url (method,) – The standard required parameters of
requests.Session.request()
- process (function, signature
result = process(response)
, optional) – Function that will process response (just return response by default) - chunk_name (str, optional) – Name of chunk, used for error reporting only
- ignored_errors (collection of int, optional) – HTTP status codes that are treated like 200 OK, not raising an error
- timeout (float or None or tuple of 2 floats or None's, optional) – Override timeout for this request (use the store timeout by default)
- retries (int or tuple of 2 ints or
urllib3.util.retry.Retry
, optional) – Override retries for this request (use the store retries by default) - kwargs (optional) – These are passed on to
requests.Session.request()
Returns: result – The output of the process function applied to a successful response
Return type: object
Raises: AuthorisationFailed
– If the request is not authorised by appropriate token or credentialsS3ObjectNotFound
– If S3 object request fails because it does not existS3ServerGlitch
– If S3 object request fails because server is temporarily overloadedStoreUnavailable
– If a general HTTP error occurred that is not ignored
- url (method,) – The standard required parameters of
-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ See the docstring of
ChunkStore.create_array()
.
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
katdal.concatdata module¶
Class for concatenating visibility data sets.
-
exception
katdal.concatdata.
ConcatenationError
¶ Bases:
Exception
Sequence of objects could not be concatenated due to incompatibility.
-
class
katdal.concatdata.
ConcatenatedLazyIndexer
(indexers, transforms=None)¶ Bases:
katdal.lazy_indexer.LazyIndexer
Two-stage deferred indexer that concatenates multiple indexers.
This indexer concatenates a sequence of indexers along the first (i.e. time) axis. The index specification is broken down into chunks along this axis, sent to the applicable underlying indexers and the returned data are concatenated again before returning it.
Parameters: - indexers (sequence of
LazyIndexer
objects and/or arrays) – Sequence of indexers or raw arrays to be concatenated - transforms (list of
LazyTransform
objects or None, optional) – Extra chain of transforms to be applied to data after final indexing
-
name
¶ Name of first non-empty indexer (or empty string otherwise)
Type: string
Raises: InvalidTransform
– If transform chain does not obey restrictions on changing the data shape- indexers (sequence of
-
katdal.concatdata.
common_dtype
(sensor_data_sequence)¶ The dtype suitable to store all sensor data values in the given sequence.
This extracts the dtypes of a sequence of sensor data objects and finds the minimal dtype to which all of them may be safely cast using NumPy type promotion rules (which will typically be the dtype of a concatenation of the values).
Parameters: sensor_data_sequence (sequence of extracted sensor data objects) – These objects may include numpy.ndarray
andCategoricalData
Returns: dtype – The promoted dtype of the sequence, or None if sensor_data_sequence is empty Return type: numpy.dtype
object
-
class
katdal.concatdata.
ConcatenatedSensorGetter
(data)¶ Bases:
katdal.sensordata.SensorGetter
The concatenation of multiple raw (uncached) sensor data sets.
This is a convenient container for returning raw (uncached) sensor data sets from a
ConcatenatedSensorCache
object. It only accesses the underlying data sets when explicitly asked to via theget()
interface, but provides quick access to metadata such as sensor name.Parameters: data (sequence of SensorGetter
) – Uncached sensor data-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.concatdata.
ConcatenatedSensorCache
(caches, keep=None)¶ Bases:
katdal.sensordata.SensorCache
Sensor cache that is a concatenation of multiple underlying caches.
This concatenates a sequence of sensor caches along the time axis and makes them appear like a single sensor cache. The combined cache contains a superset of all actual and virtual sensors found in the underlying caches and replaces any missing sensor data with dummy values.
Parameters: - caches (sequence of
SensorCache
objects) – Sequence of underlying caches to be concatenated - keep (sequence of bool, optional) – Default (global) time selection specification as boolean mask that will be applied to sensor data (this can be disabled on data retrieval)
-
get
(name, select=False, extract=True, **kwargs)¶ Sensor values interpolated to correlator data timestamps.
Retrieve raw (uncached) or cached sensor data from each underlying cache and concatenate the results along the time axis.
Parameters: - name (string) – Sensor name
- select ({False, True}, optional) – True if preset time selection will be applied to returned data
- extract ({True, False}, optional) – True if sensor data should be extracted from store and cached
- kwargs (dict, optional) – Additional parameters are passed to underlying sensor caches
Returns: data – If extraction is disabled, this will be a
SensorGetter
object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as thetimestamps
attribute) for numerical data, and aCategoricalData
object for categorical data.Return type: array or
CategoricalData
orSensorGetter
objectRaises: KeyError
– If sensor name was not found in cache and did not match virtual template
- caches (sequence of
-
class
katdal.concatdata.
ConcatenatedDataSet
(datasets)¶ Bases:
katdal.dataset.DataSet
Class that concatenates existing visibility data sets.
This provides a single DataSet interface to a list of concatenated data sets. Where possible, identical targets, subarrays, spectral windows and observation sensors are merged. For more information on attributes, see the
DataSet
docstring.Parameters: datasets (sequence of DataSet
objects) – List of existing data sets-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
-
katdal.dataset module¶
Base class for accessing a visibility data set.
-
exception
katdal.dataset.
WrongVersion
¶ Bases:
Exception
Trying to access data using accessor class with the wrong version.
-
exception
katdal.dataset.
BrokenFile
¶ Bases:
Exception
Data set could not be loaded because file is inconsistent or misses critical bits.
-
class
katdal.dataset.
Subarray
(ants, corr_products)¶ Bases:
object
Subarray specification.
A subarray is determined by the specific correlation products produced by the correlator and the antenna objects associated with the inputs found in the correlation products.
Parameters: - ants (sequence of
katpoint.Antenna
objects) – List of antenna objects, culled to contain only antennas found in corr_products - corr_products (sequence of (string, string) pairs, length B) – Correlation products as pairs of input labels, e.g. (‘ant1h’, ‘ant2v’), exposed as an array of strings with shape (B, 2)
-
inputs
¶ List of correlator input labels found in corr_products, e.g. ‘ant1h’
Type: list of strings
- ants (sequence of
-
katdal.dataset.
parse_url_or_path
(url_or_path)¶ Parse URL into components, converting path to absolute file URL.
Parameters: url_or_path (string) – URL, or filesystem path if there is no scheme Returns: url_parts – Components of the parsed URL (‘file’ scheme will have an absolute path) Return type: urllib.parse.ParseResult
-
class
katdal.dataset.
DataSet
(name, ref_ant='', time_offset=0.0, url='')¶ Bases:
object
Base class for accessing a visibility data set.
This provides a simple interface to a generic file (or files) containing visibility data (both single-dish and interferometer data supported). The data are not loaded into memory on opening the file, but are accessible via properties after typically selecting a subset of the data. This allows the reading of huge files.
Parameters: - name (string) – Name / identifier of data set
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use by script)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- url (string, optional) – Location of data set (either local filename or full URL accepted)
-
version
¶ Format version string
Type: string
-
observer
¶ Name of person that recorded the data set
Type: string
-
description
¶ Short description of the purpose of the data set
Type: string
-
experiment_id
¶ Experiment ID, a unique string used to link the data files of an experiment together with blog entries, etc.
Type: string
-
obs_params
¶ Observation parameters, typically set in observation script
Type: dict mapping string to string or list of strings
-
obs_script_log
¶ Observation script output log (useful for debugging)
Type: list of strings
-
subarrays
¶ List of all subarrays in data set
Type: list of SubArray
objects
-
subarray
¶ Index of currently selected subarray
Type: int
-
ants
¶ List of selected antennas
Type: list of katpoint.Antenna
objects
-
inputs
¶ List of selected correlator input labels (‘ant1h’)
Type: array of strings
-
corr_products
¶ Array of selected correlation products as pairs of input labels (e.g. [(‘ant1h’, ‘ant1h’), (‘ant1h’, ‘ant2h’)])
Type: array of strings, shape (B, 2)
-
receivers
¶ Identifier of the active receiver on each antenna
Type: dict mapping string to string or list of strings
-
spectral_windows
¶ List of all spectral windows in data set
Type: list of SpectralWindow
objects
-
spw
¶ Index of currently selected spectral window
Type: int
-
channel_width
¶ Channel bandwidth of selected spectral window, in Hz
Type: float
-
freqs / channel_freqs
Centre frequency of each selected channel, in Hz
Type: array of float, shape (F,)
-
channels
¶ Original channel indices of selected channels
Type: array of int, shape (F,)
-
dump_period
¶ Dump period, in seconds
Type: float
-
sensor
¶ Sensor cache
Type: SensorCache
object
-
catalogue
¶ Catalogue of all targets / sources / fields in data set
Type: katpoint.Catalogue
object
-
start_time
¶ Timestamp of start of first sample in file, in UT seconds since Unix epoch
Type: katpoint.Timestamp
object
-
end_time
¶ Timestamp of end of last sample in file, in UT seconds since Unix epoch
Type: katpoint.Timestamp
object
-
dumps
¶ Original dump indices of selected dumps
Type: array of int, shape (T,)
-
scan_indices
¶ List of currently selected scans as indices
Type: list of int
-
compscan_indices
¶ List of currently selected compound scans as indices
Type: list of int
-
target_indices
¶ List of currently selected targets as indices into catalogue
Type: list of int
-
target_projection
¶ Type of spherical projection for target coordinates
Type: {‘ARC’, ‘SIN’, ‘TAN’, ‘STG’, ‘CAR’}, optional
-
target_coordsys
¶ Spherical pointing coordinate system for target coordinates
Type: {‘azel’, ‘radec’}, optional
-
shape
¶ Shape of selected visibility data array, as (T, F, B)
Type: tuple of 3 ints
-
size
¶ Size of selected visibility data array, in bytes
Type: int
-
applycal_products
¶ List of calibration products that will be applied to data
Type: list of string
-
select
(**kwargs)¶ Select subset of data, based on time / frequency / corrprod filters.
This applies a set of selection criteria to the data set, which updates the data set properties and attributes to match the selection. In other words, the
timestamps()
andvis()
methods will return the selected subset of the data, while attributes such asants
,channel_freqs
andshape
are updated. The sensor cache will also return the selected subset of sensor data via the __getitem__ interface. This function returns nothing, but modifies the existing data set in-place.The selection criteria are divided into groups, based on whether they affect the time, frequency or correlation product dimension:
* Time: `dumps`, `timerange`, `scans`, `compscans`, `targets` * Frequency: `channels`, `freqrange` * Correlation product: `corrprods`, `ants`, `inputs`, `pol`
The subarray and spw criteria are special, as they affect multiple dimensions (time + correlation product and time + frequency, respectively), are always active and are forced to be a single index.
If there are multiple criteria on the same dimension within a select() call, they are ANDed together, while multiple items within the same criterion (e.g. targets=[‘Hyd A’, ‘Vir A’]) are ORed together. When a second select() call is done, all new selections replace previous selections on the same dimension, while existing selections on other dimensions are preserved. The reset parameter finetunes this behaviour.
If
select()
is called without any parameters the selection is reset to the original data set.In addition, the weights and flags criteria are lists of names that select which weights and flags to include in the corresponding data set property.
Parameters: - strict ({True, False}, optional) – True if select() raises TypeError if it encounters an unknown kwarg
- dumps (int or slice or sequence of ints or sequence of bools, optional) – Select dumps by index, slice or boolean mask of length T (keep dumps where mask is True)
- timerange (sequence of 2
katpoint.Timestamp
objects) – or equivalent, optional Select range of times between given start and end times - scans (int or string or sequence, optional) – Select scans by index or state (or negate state by prepending ‘~’)
- compscans (int or string or sequence, optional) – Select compscans by index or label (or negate label by prepending ‘~’)
- targets (int or string or
katpoint.Target
object or sequence,) – optional Select targets by index or name or description or object - spw (int, optional) – Select spectral window by index (only one may be active)
- channels (int or slice or sequence of ints or sequence of bools, optional) – Select frequency channels by index, slice or boolean mask of length F (keep channels where mask is True)
- freqrange (sequence of 2 floats, optional) – Select range of frequencies between start and end frequencies, in Hz
- subarray (int, optional) – Select subarray by index (only one may be active)
- corrprods (int or slice or sequence of ints or sequence of bools or) – sequence of string pairs or {‘auto’, ‘cross’}, optional Select correlation products by index, slice or boolean mask of length B (keep products where mask is True). Alternatively, select by value via a sequence of string pairs, or select all autocorrelations via ‘auto’ or all cross-correlations via ‘cross’.
- ants (string or
katpoint.Antenna
object or sequence, optional) – Select antennas by name or object. If all antennas specified are prefaced by a ~ this is treated as a deselection and these antennas are excluded. - inputs (string or sequence of strings, optional) – Select inputs by label
- pol (string or sequence of strings) – {‘H’, ‘V’, ‘HH’, ‘VV’, ‘HV’, ‘VH’}, optional Select polarisation terms
- weights ('all' or string or sequence of strings, optional) – List of names of weights to be multiplied together, as a sequence or string of comma-separated names (combine all weights by default)
- flags ('all' or string or sequence of strings, optional) – List of names of flags that will be OR’ed together, as a sequence or string of comma-separated names (use all flags by default). An empty string or sequence discards all flags.
- reset ({'auto', '', 'T', 'F', 'B', 'TF', 'TB', 'FB', 'TFB'}, optional) – Remove existing selections on specified dimensions before applying the new selections. The default ‘auto’ option clears those dimensions that will be modified by the new selections and leaves the selections on unaffected dimensions intact except if select is called without any parameters, in which case all selections are cleared. By setting reset to ‘’, new selections apply on top of existing selections.
Raises: TypeError
– If a keyword argument is unknown and strict is enabledIndexError
– If spw or subarray is out of range
-
scans
()¶ Generator that iterates through scans in data set.
This iterates through the currently selected list of scans, returning the scan index, scan state and associated target object. In addition, after each iteration the data set will reflect the scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current scan. The scan selection applies on top of any existing selection.
Yields: - scan (int) – Scan index
- state (string) – Scan state
- target (
katpoint.Target
object) – Target associated with scan
-
compscans
()¶ Generator that iterates through compound scans in data set.
This iterates through the currently selected list of compound scans, returning the compound scan index, label and the first associated target object. In addition, after each iteration the data set will reflect the compound scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current compound scan. The compound scan selection applies on top of any existing selection.
Yields: - compscan (int) – Compound scan index
- label (string) – Compound scan label
- target (
katpoint.Target
object) – First target associated with compound scan
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and corrprod.
The visibility data are returned as an array of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.
-
flags
¶ Visibility flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
-
mjd
¶ Visibility timestamps in Modified Julian Days (MJD).
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
lst
¶ Local sidereal time at the reference antenna in hours.
The sidereal times are returned in an array of float, shape (T,).
-
az
¶ Azimuth angle of each dish in degrees.
The azimuth angles are returned in an array of float, shape (T, A).
-
el
¶ Elevation angle of each dish in degrees.
The elevation angles are returned in an array of float, shape (T, A).
-
ra
¶ Right ascension of the actual pointing of each dish in J2000 degrees.
The right ascensions are returned in an array of float, shape (T, A).
-
dec
¶ Declination of the actual pointing of each dish in J2000 degrees.
The declinations are returned in an array of float, shape (T, A).
-
parangle
¶ Parallactic angle of the actual pointing of each dish in degrees.
The parallactic angle is the position angle of the observer’s vertical on the sky, measured from north toward east. This is the angle between the great-circle arc connecting the celestial North pole to the dish pointing direction, and the great-circle arc connecting the zenith above the antenna to the pointing direction, or the angle between the hour circle and vertical circle through the pointing direction. It is returned as an array of float, shape (T, A).
-
target_x
¶ Target x coordinate of each dish in degrees.
The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the
target_projection
andtarget_coordsys
attributes, respectively. The target x coordinates are returned as an array of float, shape (T, A).
-
target_y
¶ Target y coordinate of each dish in degrees.
The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the
target_projection
andtarget_coordsys
attributes, respectively. The target y coordinates are returned as an array of float, shape (T, A).
-
u
¶ U coordinate for each correlation product in metres.
This calculates the u coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(u_1 - u_2\) for baseline (ant1, ant2).
-
v
¶ V coordinate for each correlation product in metres.
This calculates the v coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(v_1 - v_2\) for baseline (ant1, ant2).
-
w
¶ W coordinate for each correlation product in metres.
This calculates the w coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B).The sign convention is \(w_1 - w_2\) for baseline (ant1, ant2).
katdal.datasources module¶
Various sources of correlator data and metadata.
-
exception
katdal.datasources.
DataSourceNotFound
¶ Bases:
Exception
File associated with DataSource not found or server not responding.
-
class
katdal.datasources.
AttrsSensors
(attrs, sensors)¶ Bases:
object
Metadata in the form of attributes and sensors.
Parameters: - attrs (mapping from string to object) – Metadata attributes
- sensors (mapping from string to
SensorGetter
objects) – Metadata sensor cache mapping sensor names to raw sensor data
-
class
katdal.datasources.
DataSource
(metadata, timestamps, data=None)¶ Bases:
object
A generic data source presenting both correlator data and metadata.
Parameters: - metadata (
AttrsSensors
object) – Metadata attributes and sensors - timestamps (array-like of float, length T) – Timestamps at centroids of visibilities in UTC seconds since Unix epoch
- data (
VisFlagsWeights
object, optional) – Correlator data (visibilities, flags and weights)
- metadata (
-
katdal.datasources.
view_capture_stream
(telstate, capture_block_id, stream_name)¶ Create telstate view based on given capture block ID and stream name.
It constructs a view on telstate with at least the prefixes
- <capture_block_id>_<stream_name>
- <capture_block_id>
- <stream_name>
Additionally if there is a <stream_name>_inherit key, that stream is added too (recursively).
Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Original telescope state - capture_block_id (string) – Capture block ID
- stream_name (string) – Stream name
Returns: telstate – Telstate with a view that incorporates capture block, stream and combo
Return type: TelescopeState
object
-
katdal.datasources.
view_l0_capture_stream
(telstate, capture_block_id=None, stream_name=None, **kwargs)¶ Create telstate view based on auto-determined capture block ID and stream name.
This figures out the appropriate capture block ID and L0 stream name from a capture-stream specific telstate, or uses the provided ones. It then calls
view_capture_capture()
to generate a view.Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Original telescope state - capture_block_id (string, optional) – Specify capture block ID explicitly (detected otherwise)
- stream_name (string, optional) – Specify L0 stream name explicitly (detected otherwise)
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns: - telstate (
TelstateToStr
object) – Telstate with a view that incorporates capture block, stream and combo - capture_block_id (string) – Actual capture block ID used
- stream_name (string) – Actual L0 stream name used
Raises: ValueError
– If no capture block or L0 stream could be detected (with no override)- telstate (
-
katdal.datasources.
infer_chunk_store
(url_parts, telstate, npy_store_path=None, s3_endpoint_url=None, array='correlator_data', **kwargs)¶ Construct chunk store automatically from dataset URL and telstate.
Parameters: - url_parts (
urlparse.ParseResult
object) – Parsed dataset URL - telstate (
TelstateToStr
object) – Telescope state - npy_store_path (string, optional) – Top-level directory of NpyFileChunkStore (overrides the default)
- s3_endpoint_url (string, optional) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’ (overrides default)
- array (string, optional) – Array within the bucket from which to determine the prefix
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns: store – Chunk store for visibility data
Return type: katdal.ChunkStore
objectRaises: KeyError
– If telstate lacks critical keyskatdal.chunkstore.StoreUnavailable
– If the chunk store could not be constructed
- url_parts (
-
class
katdal.datasources.
TelstateDataSource
(telstate, capture_block_id, stream_name, chunk_store=None, timestamps=None, url='', upgrade_flags=True, van_vleck='off', preselect=None, **kwargs)¶ Bases:
katdal.datasources.DataSource
A data source based on
katsdptelstate.TelescopeState
.It is assumed that the provided telstate already has the appropriate views to find observation, stream and chunk store information. It typically needs the following prefixes:
- <capture block ID>_<L0 stream>
- <capture block ID>
- <L0 stream>
Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Telescope state with appropriate views - capture_block_id (string) – Capture block ID
- stream_name (string) – Name of the L0 stream
- chunk_store (
katdal.ChunkStore
object, optional) – Chunk store for visibility data (the default is no data - metadata only) - timestamps (array of float, optional) – Visibility timestamps, overriding (or fixing) the ones found in telstate
- url (string, optional) – Location of the telstate source
- upgrade_flags (bool, optional) – Look for associated flag streams and use them if True (default)
- van_vleck ({'off', 'autocorr'}, optional) – Type of Van Vleck (quantisation) correction to perform
- preselect (dict, optional) –
Subset of data to select. The keys in the dictionary correspond to the keyword arguments of
DataSet.select()
, but with restrictions:- Only
channels
anddumps
can be specified. - The values must be slices with unit step.
- Only
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Raises: KeyError
– If telstate lacks critical keysIndexError
– If preselect does not meet the criteria above.
-
classmethod
from_url
(url, chunk_store='auto', **kwargs)¶ Construct TelstateDataSource from URL or RDB filename.
The following URL styles are supported:
- Local RDB filename (no scheme): ‘1556574656/1556574656_sdp_l0.rdb’
- Archive: ‘https://archive/1556574656/1556574656_sdp_l0.rdb?token=<>’
- Redis server: ‘redis://cal5.sdp.mkat.karoo.kat.ac.za:31852’
Parameters: - url (string) – URL or RDB filename serving as entry point to data set
- chunk_store (
katdal.ChunkStore
object, optional) – Chunk store for visibility data (obtained automatically by default, or set to None for metadata-only data set) - kwargs (dict, optional) – Extra keyword arguments passed to init, telstate view, chunk store init
-
katdal.datasources.
open_data_source
(url, **kwargs)¶ Construct the data source described by the given URL.
katdal.flags module¶
Definitions of flag bits
katdal.h5datav1 module¶
Data accessor class for HDF5 files produced by Fringe Finder correlator.
-
class
katdal.h5datav1.
H5DataV1
(filename, ref_ant='', time_offset=0.0, mode='r', **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 1 file produced by Fringe Finder correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.h5datav2 module¶
Data accessor class for HDF5 files produced by KAT-7 correlator.
-
katdal.h5datav2.
get_single_value
(group, name)¶ Return single value from attribute or dataset with given name in group.
If name is an attribute of the HDF5 group group, it is returned, otherwise it is interpreted as an HDF5 dataset of group and the last value of name is returned. This is meant to retrieve static configuration values that potentially get set more than once during capture initialisation, but then does not change during actual capturing.
Parameters: - group (
h5py.Group
object) – HDF5 group to query - name (string) – Name of HDF5 attribute or dataset to query
Returns: value – Attribute or last value of dataset
Return type: object
- group (
-
katdal.h5datav2.
dummy_dataset
(name, shape, dtype, value)¶ Dummy HDF5 dataset containing a single value.
This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.
Parameters: - name (string) – Name of dataset
- shape (sequence of int) – Shape of dataset
- dtype (
numpy.dtype
object or equivalent) – Type of data stored in dataset - value (object) – All elements in the dataset will equal this value
Returns: dataset – Dummy HDF5 dataset
Return type: h5py.Dataset
object
-
class
katdal.h5datav2.
H5DataV2
(filename, ref_ant='', time_offset=0.0, mode='r', quicklook=False, keepdims=False, **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 2 file produced by KAT-7 correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- quicklook ({False, True}) – True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
- keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.h5datav3 module¶
Data accessor class for HDF5 files produced by RTS correlator.
-
katdal.h5datav3.
dummy_dataset
(name, shape, dtype, value)¶ Dummy HDF5 dataset containing a single value.
This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.
Parameters: - name (string) – Name of dataset
- shape (sequence of int) – Shape of dataset
- dtype (
numpy.dtype
object or equivalent) – Type of data stored in dataset - value (object) – All elements in the dataset will equal this value
Returns: dataset – Dummy HDF5 dataset
Return type: h5py.Dataset
object
-
class
katdal.h5datav3.
H5DataV3
(filename, ref_ant='', time_offset=0.0, mode='r', time_scale=None, time_origin=None, rotate_bls=False, centre_freq=None, band=None, keepdims=False, **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 3 file produced by RTS correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- time_scale (float or None, optional) – Resynthesise timestamps using this scale factor
- time_origin (float or None, optional) – Resynthesise timestamps using this sync time / epoch
- rotate_bls ({False, True}, optional) – Rotate baseline label list to work around early RTS correlator bug
- centre_freq (float or None, optional) – Override centre frequency if provided, in Hz
- band (string or None, optional) – Override receiver band if provided (e.g. ‘l’) - used to find ND models
- keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
stream_name
¶ Name of L0 data stream, for finding corresponding telescope state keys
Type: string
Notes
The timestamps can be resynchronised from the original sample counter values by specifying time_scale and/or time_origin. The basic formula is given by:
timestamp = sample_counter / time_scale + time_origin
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.lazy_indexer module¶
Two-stage deferred indexer for objects with expensive __getitem__ calls.
-
katdal.lazy_indexer.
dask_getitem
(x, indices)¶ Index a dask array, with N-D fancy index support and better performance.
This is a drop-in replacement for
x[indices]
that goes one further by implementing “N-D fancy indexing” which is still unsupported in dask. If indices contains multiple fancy indices, perform outer (oindex) indexing. This behaviour deviates from NumPy, which performs the more general (but also more obtuse) vectorized (vindex) indexing in this case. See NumPy NEP 21, dask #433 and h5py #652 for more details.In addition, this optimises performance by culling unnecessary nodes from the dask graph after indexing, which makes it cheaper to compute if only a small piece of the graph is needed, and by collapsing fancy indices in indices to slices where possible (which also implies oindex semantics).
-
exception
katdal.lazy_indexer.
InvalidTransform
¶ Bases:
Exception
Transform changes data shape in unallowed way.
-
class
katdal.lazy_indexer.
LazyTransform
(name=None, transform=<function LazyTransform.<lambda>>, new_shape=<function LazyTransform.<lambda>>, dtype=None)¶ Bases:
object
Transformation to be applied by LazyIndexer after final indexing.
A
LazyIndexer
potentially applies a chain of transforms to the data after the final second-stage indexing is done. These transforms are restricted in their capabilities to simplify the indexing process. Specifically, when it comes to the data shape, transforms may only:- add dimensions at the end of the data shape, or - drop dimensions at the end of the data shape.
The preserved dimensions are not allowed to change their shape or interpretation so that the second-stage indexing matches the first-stage indexing on these dimensions. The data type (aka dtype) is allowed to change.
Parameters: - name (string or None, optional) – Name of transform
- transform (function, signature
data = f(data, keep)
, optional) – Transform to apply to data (keep is user-specified second-stage index) - new_shape (function, signature
new_shape = f(old_shape)
, optional) – Function that predicts data array shape tuple after first-stage indexing and transformation, given its original shape tuple as input. Restrictions apply as described above. - dtype (
numpy.dtype
object or equivalent or None, optional) – Type of output array after transformation (None if same as input array)
-
class
katdal.lazy_indexer.
LazyIndexer
(dataset, keep=slice(None, None, None), transforms=None)¶ Bases:
object
Two-stage deferred indexer for objects with expensive __getitem__ calls.
This class was originally designed to extend and speed up the indexing functionality of HDF5 datasets as found in
h5py
, but works on any equivalent object (defined as any object with shape, dtype and __getitem__ members) where a call to __getitem__ may be very expensive. The following discussion focuses on the HDF5 use case as the main example.Direct extraction of a subset of an HDF5 dataset via the __getitem__ interface (i.e. dataset[index]) has a few issues:
- Data access can be very slow (or impossible) if a very large dataset is fully loaded into memory and then indexed again at a later stage
- Advanced indexing (via boolean masks or sequences of integer indices) is only supported on a single dimension in the current version of h5py (2.0)
- Even though advanced indexing has limited support, simple indexing (via single integer indices or slices) is frequently much faster.
This class wraps an
h5py.Dataset
or equivalent object and exposes a new __getitem__ interface on it. It efficiently composes two stages of indexing: a first stage specified at object instantiation time and a second stage that applies on top of the first stage when __getitem__ is called on this object. The data are only loaded after the combined index is determined, addressing issue 1.Furthermore, advanced indexing is allowed on any dimension by decomposing the selection as a series of slice selections covering contiguous segments of the dimension to alleviate issue 2. Finally, this also allows faster data retrieval by extracting a large slice from the HDF5 dataset and then performing advanced indexing on the resulting
numpy.ndarray
object instead, in response to issue 3.The keep parameter of the
__init__()
and__getitem__()
methods accepts a generic index or slice specification, i.e. anything that would be accepted by the__getitem__()
method of anumpy.ndarray
of the same shape as the dataset. This could be a single integer index, a sequence of integer indices, a slice object (representing the colon operator commonly used with __getitem__, e.g. representing x[1:10:2] as x[slice(1,10,2)]), a sequence of booleans as a mask, or a tuple containing any number of these (typically one index item per dataset dimension). Any missing dimensions will be fully selected, and any extra dimensions will be ignored.Parameters: - dataset (
h5py.Dataset
object or equivalent) – Underlying dataset or array object on which lazy indexing will be done. This can be any object with shape, dtype and __getitem__ members. - keep (NumPy index expression, optional) – First-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
- transforms (list of
LazyTransform
objects or None, optional) – Chain of transforms to be applied to data after final indexing. The chain as a whole may only add or drop dimensions at the end of data shape without changing the preserved dimensions.
-
name
¶ Name of HDF5 dataset (or empty string for unnamed ndarrays, etc.)
Type: string
Raises: InvalidTransform
– If transform chain does not obey restrictions on changing the data shape
-
class
katdal.lazy_indexer.
DaskLazyIndexer
(dataset, keep=(), transforms=())¶ Bases:
object
Turn a dask Array into a LazyIndexer by computing it upon indexing.
The LazyIndexer wraps an underlying dataset in the form of a dask Array. Upon first use, it applies a stored first-stage selection (keep) to the array, followed by a series of transforms. All of these actions are lazy and only update the dask graph of the dataset. Since these updates are computed only on first use, there is minimal cost in constructing an instance and immediately throwing it away again.
Second-stage selection occurs via a
__getitem__()
call on this object, which also triggers dask computation to return the finalnumpy.ndarray
output. Both selection steps follow outer indexing (“oindex”) semantics, by indexing each dimension / axis separately.DaskLazyIndexers can also index other DaskLazyIndexers, which allows them to share first-stage selections and/or transforms, and to construct nested or hierarchical indexers.
Parameters: - dataset (
dask.Array
orDaskLazyIndexer
) – The full dataset, from which a subset is chosen by keep - keep (NumPy index expression, optional) – Index expression describing first-stage selection (e.g. as applied by
katdal.DataSet.select()
), with oindex semantics - transforms (sequence of function, signature
array = f(array)
, optional) – Transformations that are applied after indexing by keep but before indexing on this object. Each transformation is a callable that takes a dask array and returns another dask array.
-
name
¶ The name of the (full) underlying dataset, useful for reporting
Type: str
-
dataset
¶ The dask array that is accessed by indexing (after applying keep and transforms). It can be used directly to perform dask computations.
Type: dask.Array
-
transforms
¶ Transformations that are applied after first-stage indexing.
-
dataset
Array after first-stage indexing and transformation.
-
classmethod
get
(arrays, keep, out=None)¶ Extract several arrays from the underlying dataset.
This is a variant of
__getitem__()
that pulls from several arrays jointly. This can be significantly more efficient if intermediate dask nodes can be shared.Parameters: - arrays (list of
DaskLazyIndexer
) – Arrays to index - keep (NumPy index expression) – Second-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
- out (list of
np.ndarray
) – If specified, output arrays in which to store results. It must be the same length as arrays and each array must have the appropriate shape and dtype.
Returns: out – Extracted output array (computed from the final dask version)
Return type: sequence of
numpy.ndarray
- arrays (list of
-
shape
¶ Shape of array after first-stage indexing and transformation.
-
dtype
¶ Data type of array after first-stage indexing and transformation.
- dataset (
katdal.ms_async module¶
katdal.ms_extra module¶
katdal.sensordata module¶
Container that stores cached (interpolated) and uncached (raw) sensor data.
-
class
katdal.sensordata.
SensorData
(name, timestamp, value, status=None)¶ Bases:
object
Raw (uninterpolated) sensor values.
This is a simple struct that holds timestamps, values, and optionally status.
Parameters: - name (string) – Sensor name
- timestamp (np.ndarray) – Array of timestamps
- value (np.ndarray) – Array of values (wrapped in
ComparableArrayWrapper
if necessary) - status (np.ndarray, optional) – Array of sensor statuses
-
class
katdal.sensordata.
SensorGetter
(name)¶ Bases:
object
Raw (uninterpolated) sensor data placeholder.
This is an abstract lazy interface that provides a
SensorData
object on request but does not store values itself. Subclasses must implementget()
to retrieve values from underlying storage. They should not cache the results.Where possible, object-valued sensors (including sensors with ndarrays as values) will have values wrapped by
ComparableArrayWrapper
.Parameters: name (string) – Sensor name -
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.sensordata.
SimpleSensorGetter
(name, timestamp, value, status=None)¶ Bases:
katdal.sensordata.SensorGetter
Raw sensor data held in memory.
This is a simple wrapper for
SensorData
that implements theSensorGetter
interface.-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.sensordata.
RecordSensorGetter
(data, name=None)¶ Bases:
katdal.sensordata.SensorGetter
Raw (uninterpolated) sensor data in record array form.
This is a wrapper for uninterpolated sensor data which resembles a record array with fields ‘timestamp’, ‘value’ and optionally ‘status’. This is also the typical format of HDF5 datasets used to store sensor data.
Technically, the data is interpreted as a NumPy “structured” array, which is a simpler version of a recarray that only provides item-style access to fields and not attribute-style access.
Object-valued sensors are not treated specially in this class, as it is assumed that any wrapping already occurred in the construction of the recarray-like data input and will be reflected in its dtype. The original HDF5 sensor datasets also did not contain any objects as they only support standard KATCP types, so there was no need for wrapping there.
Parameters: - data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as
an
h5py.Dataset
) - name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
- data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as
an
-
katdal.sensordata.
to_str
(value)¶ Convert string-likes to the native string type.
Bytes are decoded to str, with surrogateencoding error handler.
Tuples, lists, dicts and numpy arrays are processed recursively, with the exception that numpy structured types with string or object fields won’t be handled.
-
katdal.sensordata.
telstate_decode
(raw, no_decode=())¶ Load a katsdptelstate-encoded value that might be wrapped in np.void or np.ndarray.
The np.void/np.ndarray wrapping is needed to pass variable-length binary strings through h5py.
If the value is a string and is in no_decode, it is returned verbatim. This is for backwards compatibility with older files that didn’t use any encoding at all.
The return value is also passed through
to_str()
.
-
class
katdal.sensordata.
H5TelstateSensorGetter
(data, name=None)¶ Bases:
katdal.sensordata.RecordSensorGetter
Raw (uninterpolated) sensor data in HDF5 TelescopeState recarray form.
This wraps the telstate sensors stored in recent HDF5 files. It differs in two ways from the normal HDF5 sensors: no ‘status’ field and values encoded by katsdptelstate.
TODO: This is a temporary fix to get at missing sensors in telstate and should be replaced by a proper wrapping of any telstate object.
Object-valued sensors (including sensors with ndarrays as values) will have its values wrapped by
ComparableArrayWrapper
.Parameters: - data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as
an
h5py.Dataset
) - name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
-
get
()¶ Extract timestamp and value of each sensor data point.
- data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as
an
-
class
katdal.sensordata.
TelstateToStr
(telstate)¶ Bases:
object
Wrap an existing telescope state and pass return values through
to_str()
-
wrapped
¶
-
view
(name, add_separator=True, exclusive=False)¶
-
root
()¶
-
get_message
(channel=None)¶
-
get
(key, default=None, return_encoded=False)¶
-
get_range
(key, st=None, et=None, include_previous=None, include_end=False, return_encoded=False)¶
-
get_indexed
(key, sub_key, default=None, return_encoded=False)¶
-
-
class
katdal.sensordata.
TelstateSensorGetter
(telstate, name)¶ Bases:
katdal.sensordata.SensorGetter
Raw (uninterpolated) sensor data stored in original TelescopeState.
This wraps sensor data stored in a TelescopeState object. The data is only read out on item access.
Object-valued sensors (including sensors with ndarrays as values) will have their values wrapped by
ComparableArrayWrapper
.Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Telescope state object - name (string) – Sensor name, also used as telstate key
Raises: KeyError
– If sensor name is not found in telstate or it is an attribute instead-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
- telstate (
-
katdal.sensordata.
get_sensor_from_katstore
(store, name, start_time, end_time)¶ Get raw sensor data from katstore (CAM’s central sensor database).
Parameters: - store (string) – Hostname / endpoint of katstore webserver speaking katstore64 API
- name (string) – Sensor name (the normalised / escaped version with underscores)
- end_time (start_time,) – Time range for sensor records as UTC seconds since Unix epoch
Returns: data – Retrieved sensor data with ‘timestamp’, ‘value’ and ‘status’ fields
Return type: RecordSensorGetter
objectRaises: ConnectionError
– If this cannot connect to the katstore serverRuntimeError
– If connection succeeded but interaction with katstore64 API failedKeyError
– If the sensor was not found in the store or it has no data in time range
-
katdal.sensordata.
dummy_sensor_getter
(name, value=None, dtype=<class 'numpy.float64'>, timestamp=0.0)¶ Create a SensorGetter object with a single default value based on type.
This creates a dummy
SimpleSensorGetter
object based on a default value or a type, for use when no sensor data are available, but filler data is required (e.g. when concatenating sensors from different datasets and one dataset lacks the sensor). The dummy dataset contains a single data point with the filler value and a configurable timestamp (defaulting to way back). If the filler value is an object it will be wrapped in aComparableArrayWrapper
to match the behaviour of otherSensorGetter
objects.Parameters: - name (string) – Sensor name
- value (object, optional) – Filler value (default is None, meaning dtype will be used instead)
- dtype (
numpy.dtype
object or equivalent, optional) – Desired sensor data type, used if no explicit value is given - timestamp (float, optional) – Time when dummy value occurred (default is way back)
Returns: data – Dummy sensor data object with ‘timestamp’ and ‘value’ fields
Return type: SimpleSensorGetter
object, shape (1,)
-
katdal.sensordata.
remove_duplicates_and_invalid_values
(sensor)¶ Remove duplicate timestamps and invalid values from sensor data.
This sorts the ‘timestamp’ field of the sensor record array and removes any duplicate values, updating the corresponding ‘value’ and ‘status’ fields as well. If more than one timestamp has the same value, the value and status of the last of these timestamps are selected. If the values differ for the same timestamp, a warning is logged (and the last one is still picked).
In addition, if there is a ‘status’ field, get rid of data with a status other than ‘nominal’, ‘warn’ or ‘error’, which indicates that the sensor could not be read and the corresponding value will therefore be invalid. Afterwards, remove the ‘status’ field from the data as this is the only place it plays a role.
Parameters: sensor ( SensorData
object, length N) – Raw sensor dataset.Returns: clean_sensor – Sensor data with duplicate timestamps and invalid values removed (M <= N), and only ‘timestamp’ and ‘value’ attributes left. Return type: SensorData
object, length M
-
class
katdal.sensordata.
SensorCache
(cache, timestamps, dump_period, keep=slice(None, None, None), props=None, virtual={}, aliases={}, store=None)¶ Bases:
collections.abc.MutableMapping
Container for sensor data providing name lookup, interpolation and caching.
Sensor data is defined as a one-dimensional time series of values. The values may be numerical or non-numerical (categorical), and the timestamps are monotonically increasing but not necessarily regularly spaced.
A sensor cache stores sensor data with dictionary-like lookup based on the sensor name. Since the extraction of sensor data from e.g. HDF5 files may be costly, the data is first represented in uncached (raw) form as
SensorGetter
objects, which typically wrap the underlying sensor HDF5 datasets. After extraction, the sensor data are stored either as a NumPy array (for numerical data) or as aCategoricalData
object (for non-numerical data).The sensor cache stores a timestamp array (or indexer) onto which the sensor data will be interpolated, together with a boolean selection mask that selects a subset of the interpolated data as the final output. Interpolation is linear for numerical data and zeroth-order for non-numerical data. Both extraction and selection may be enabled or disabled through the appropriate use of the two main interfaces that retrieve sensor data:
- The __getitem__ interface (i.e. cache[sensor]) presents a simple high-level interface to the end user that always extracts the sensor data and selects the requested subset from it. In addition, the return type is always a NumPy array.
- The get() interface (i.e. cache.get(sensor)) is an advanced interface for library builders that provides full control of the extraction process via sensor properties. It does not apply selection by default, as this is more convenient for library routines.
In addition, the sensor cache may contain virtual sensors which calculate their values based on the values of other sensors. They are identified by pattern templates that potentially match multiple sensor names.
Parameters: - cache (mapping from string to
SensorGetter
objects) – Initial sensor cache mapping sensor names to raw (uncached) sensor data - timestamps (array of float) – Correlator data timestamps onto which sensor values will be interpolated, as UTC seconds since Unix epoch
- dump_period (float) – Dump period, in seconds
- keep (int or slice or sequence of int or sequence of bool, optional) – Default time selection specification that will be applied to sensor data (this can be disabled on data retrieval)
- props (dict, optional) – Default properties that govern how sensor data are interpreted and
interpolated (this can be overridden on data retrieval). Can use
*
as a wildcard anywhere in the key. - virtual (dict mapping string to function, optional) – Virtual sensors, specified as a pattern matching the virtual sensor name and a corresponding function that will create the sensor (together with any associated virtual sensors)
- aliases (dict mapping string to string, optional) – Alternate names for sensors, as a dictionary mapping each alias to the original sensor name suffix. This will create additional sensors with the aliased names and the data of the original sensors.
- store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
-
add_aliases
(alias, original)¶ Add alternate names / aliases for sensors.
Search for sensors with names ending in the original suffix and form a corresponding alternate name by replacing original with alias. The new aliased sensors will re-use the data of the original sensors.
Parameters: - alias (string) – The new sensor name suffix that replaces original
- original (string) – Sensors with names that end in this will get aliases
-
get
(name, select=False, extract=True, **kwargs)¶ Sensor values interpolated to correlator data timestamps.
Time selection is disabled by default, as this is a more advanced data extraction method typically called by library routines that want to operate on the full array of sensor values. For additional allowed parameters when extracting categorical data, see the docstring for
sensor_to_categorical()
.Parameters: - name (string) – Sensor name
- select ({False, True}, optional) – True if preset time selection will be applied to interpolated data
- extract ({True, False}, optional) – True if sensor data should be extracted, interpolated and cached
- categorical ({None, True, False}, optional) – Interpret sensor data as categorical or numerical (by default, data of type float is numerical and of any other type is categorical)
- kwargs (dict, optional) – Additional parameters are passed to
sensor_to_categorical()
Returns: data – If extraction is disabled, this will be a
SensorGetter
object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as thetimestamps
attribute) for numerical data, and aCategoricalData
object for categorical data.Return type: array or
CategoricalData
orSensorGetter
objectRaises: ValueError
– If select=True and extract=False, as select requires interpolationKeyError
– If sensor name was not found in cache and did not match virtual template
-
get_with_fallback
(sensor_type, names)¶ Sensor values interpolated to correlator data timestamps.
Get data for a type of sensor that may have one of several names. Try each name in turn until something works, or crash sensibly.
Parameters: - sensor_type (string) – Name of sensor class / type, used for informational purposes only
- names (sequence of strings) – Sensor names to try until one of them provides data
Returns: sensor_data – Interpolated sensor data as 1-D array, one value per selected timestamp
Return type: array
Raises: KeyError
– If none of the sensor names were found in the cache
katdal.spectral_window module¶
-
class
katdal.spectral_window.
SpectralWindow
(centre_freq, channel_width, num_chans, product=None, sideband=-1, band='L', bandwidth=None)¶ Bases:
object
Spectral window specification.
A spectral window is determined by the number of frequency channels produced by the correlator and their corresponding centre frequencies, as well as the channel width. The channels are assumed to be regularly spaced and to be the result of either lower-sideband downconversion (channel frequencies decreasing with channel index) or upper-sideband downconversion (frequencies increasing with index). For further information the receiver band and correlator product names are also available.
Warning
Instances should be treated as immutable. Changing the attributes will lead to inconsistencies between them.
Parameters: - centre_freq (float) – Centre frequency of spectral window, in Hz
- channel_width (float) – Bandwidth of each frequency channel, in Hz
- num_chans (int) – Number of frequency channels
- product (string, optional) – Name of data product / correlator mode
- sideband ({-1, +1}, optional) – Type of downconversion (-1 => lower sideband, +1 => upper sideband)
- band ({'L', 'UHF', 'S', 'X', 'Ku'}, optional) – Name of receiver / band
- bandwidth (float, optional) – The bandwidth of the whole spectral window, in Hz. If specified, channel_width is ignored and computed from the bandwidth. If not specified, bandwidth is computed from the channel width. Specifying this is a good idea if the channel width cannot be exactly represented in floating point.
-
channel_freqs
¶ Centre frequency of each frequency channel (assuming LSB mixing), in Hz
Type: array of float, shape (F,)
-
channel_freqs
-
subrange
(first, last)¶ Get a new
SpectralWindow
representing a subset of the channels.The returned
SpectralWindow
covers the same frequencies as channels [first, last) of the original.Raises: IndexError
– If [first, last) is not a (non-empty) subinterval of the channels
-
rechannelise
(num_chans)¶ Get a new
SpectralWindow
with a different number of channels.The returned
SpectralWindow
covers the same frequencies as the original, but dividing the bandwidth into a different number of channels.
katdal.visdatav4 module¶
Data accessor class for data and metadata from various sources in v4 format.
-
class
katdal.visdatav4.
VisibilityDataV4
(source, ref_ant='', time_offset=0.0, applycal='', gaincal_flux={}, sensor_store=None, preselect=None, **kwargs)¶ Bases:
katdal.dataset.DataSet
Access format version 4 visibility data and metadata.
For more information on attributes, see the
DataSet
docstring.Parameters: - source (
DataSource
object) – Correlator data (visibilities, flags and weights) and metadata - ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans, to determine the targets and as antenna for the data set catalogue (no relation to the calibration reference antenna…). The default is to use the observation activity sensor for scan partitioning, the CBF target and the array reference position as catalogue antenna.
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- applycal (string or sequence of strings, optional) – List of names of calibration products to apply to vis/weights/flags, as a sequence or string of comma-separated names. An empty string or sequence means no calibration will be applied (the default for now), while the keyword ‘all’ means all available products will be applied. NB In future the default will probably change to ‘all’. NB This is still very much an experimental feature…
- gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux produced by cal pipeline (if available). A value of None disables flux calibration.
- sensor_store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
- preselect (dict, optional) – Subset of the data to select. See
TelstateDataSource
for details. This selection is permanent, and further selections made byDataSet.select()
are relative to this subset. - kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
raw_flags
¶ Raw flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of uint8, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
excision
¶ Excision as a function of time, frequency and baseline.
The fraction of each visibility that has been excised in the SDP ingest pipeline is returned as an array indexer of bool, shape (T, F, B) with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
- source (
Module contents¶
Data access library for data sets in the MeerKAT Visibility Format (MVF).
-
katdal.
open
(filename, ref_ant='', time_offset=0.0, **kwargs)¶ Open data file(s) with loader of the appropriate version.
Parameters: - filename (string or sequence of strings) – Data file name or list of file names
- ref_ant (string, optional) – Name of reference antenna (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all timestamps, in seconds
- kwargs (dict, optional) –
Extra keyword arguments are passed on to underlying accessor class:
- mode (string, optional)
- [H5DataV*] File opening mode (e.g. ‘r+’ to open file in write mode)
- quicklook (bool)
- [H5DataV2] True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
See the documentation of
VisibilityDataV4
for the keywords it accepts.
Returns: data – Object providing
DataSet
interface to file(s)Return type: DataSet
object
-
katdal.
get_ants
(filename)¶ Quick look function to get the list of antennas in a data file.
Parameters: filename (string) – Data file name Returns: antennas Return type: list of katpoint.Antenna
objects
-
katdal.
get_targets
(filename)¶ Quick look function to get the list of targets in a data file.
Parameters: filename (string) – Data file name Returns: targets – All targets in file Return type: katpoint.Catalogue
object