Welcome to katdal’s documentation!

User guide

Introduction to katdal

Data access library for data sets in the MeerKAT Visibility Format (MVF)

Overview

This module serves as a data access library to interact with the chunk stores and HDF5 files produced by the MeerKAT radio telescope and its predecessors (KAT-7 and Fringe Finder). It uses memory carefully, allowing data sets to be inspected and partially loaded into memory. Data sets may be concatenated and split via a flexible selection mechanism. In addition, it provides a script to convert these data sets to CASA MeasurementSets.

Quick Tutorial

Open any data set through a single function to obtain a data set object:

import katdal
d = katdal.open('1234567890.h5')

This automatically determines the version and storage location of the data set. The versions roughly map to the various instruments:

- v1 : Fringe Finder (HDF5 file)
- v2 : KAT-7 (HDF5 file)
- v3 : MeerKAT (HDF5 file)
- v4 : MeerKAT (chunk store based on objects in Ceph)

Multiple data sets (even of different versions) may also be concatenated together (as long as they have the same dump rate):

d = katdal.open(['1234567890.h5', '1234567891.h5'])

Inspect the contents of the data set by printing the object:

print d

Here is a typical output:

===============================================================================
Name: 1313067732.h5 (version 2.0)
===============================================================================
Observer: someone  Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'Track on Hyd A,Vir A, 3C 286 and 3C 273'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
  ID  Antennas                            Inputs  Corrprods
   0  ant1,ant2,ant3,ant4,ant5,ant6,ant7  14      112
Spectral Windows: 1
  ID  CentreFreq(MHz)  Bandwidth(MHz)  Channels  ChannelWidth(kHz)
   0  1822.000         400.000          1024      390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
  subarray=0
  ants=['ant1', 'ant2', 'ant3', 'ant4', 'ant5', 'ant6', 'ant7']
  spw=0
-------------------------------------------------------------------------------
Shape: (1054 dumps, 1024 channels, 112 correlation products) => Size: 967.049 MB
Antennas: *ant1,ant2,ant3,ant4,ant5,ant6,ant7  Inputs: 14  Autocorr: yes  Crosscorr: yes
Channels: 1024 (index 0 - 1023, 2021.805 MHz - 1622.195 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
  ID  Name    Type      RA(J2000)     DEC(J2000)  Tags  Dumps  ModelFlux(Jy)
   0  Hyd A   radec      9:18:05.28  -12:05:48.9          333      33.63
   1  Vir A   radec     12:30:49.42   12:23:28.0          251     166.50
   2  3C 286  radec     13:31:08.29   30:30:33.0          230      12.97
   3  3C 273  radec     12:29:06.70    2:03:08.6          240      39.96
Scans: 8 selected out of 8 total       Compscans: 1 selected out of 1 total
  Date        Timerange(UTC)       ScanState  CompScanLabel  Dumps  Target
  11-Aug-2011/13:02:14 - 13:04:26    0:slew     0:             133    0:Hyd A
              13:04:27 - 13:07:46    1:track    0:             200    0:Hyd A
              13:07:47 - 13:08:37    2:slew     0:              51    1:Vir A
              13:08:38 - 13:11:57    3:track    0:             200    1:Vir A
              13:11:58 - 13:12:27    4:slew     0:              30    2:3C 286
              13:12:28 - 13:15:47    5:track    0:             200    2:3C 286
              13:15:48 - 13:16:27    6:slew     0:              40    3:3C 273
              13:16:28 - 13:19:47    7:track    0:             200    3:3C 273

The first segment of the printout displays the static information of the data set, including observer, dump rate and all the available subarrays and spectral windows in the data set. The second segment (between the dashed lines) highlights the active selection criteria. The last segment displays dynamic information that is influenced by the selection, including the overall visibility array shape, antennas, channel frequencies, targets and scan info.

The data set is built around the concept of a three-dimensional visibility array with dimensions of time, frequency and correlation product. This is reflected in the shape of the dataset:

d.shape

which returns (1054, 1024, 112), meaning 1054 dumps by 1024 channels by 112 correlation products.

Let’s select a subset of the data set:

d.select(scans='track', channels=slice(200,300), ants='ant4')
print d

This results in the following printout:

===============================================================================
Name: /Users/schwardt/Downloads/1313067732.h5 (version 2.0)
===============================================================================
Observer: siphelele  Experiment ID: 2118d346-c41a-11e0-b2df-a4badb44fe9f
Description: 'track on Hyd A,Vir A, 3C 286 and 3C 273 for Lud'
Observed from 2011-08-11 15:02:14.072 SAST to 2011-08-11 15:19:47.810 SAST
Dump rate: 1.00025 Hz
Subarrays: 1
  ID  Antennas                            Inputs  Corrprods
   0  ant1,ant2,ant3,ant4,ant5,ant6,ant7  14      112
Spectral Windows: 1
  ID  CentreFreq(MHz)  Bandwidth(MHz)  Channels  ChannelWidth(kHz)
   0  1822.000         400.000          1024      390.625
-------------------------------------------------------------------------------
Data selected according to the following criteria:
  channels=slice(200, 300, None)
  subarray=0
  scans='track'
  ants='ant4'
  spw=0
-------------------------------------------------------------------------------
Shape: (800 dumps, 100 channels, 4 correlation products) => Size: 2.560 MB
Antennas: ant4  Inputs: 2  Autocorr: yes  Crosscorr: no
Channels: 100 (index 200 - 299, 1943.680 MHz - 1905.008 MHz), each 390.625 kHz wide
Targets: 4 selected out of 4 in catalogue
  ID  Name    Type      RA(J2000)     DEC(J2000)  Tags  Dumps  ModelFlux(Jy)
   0  Hyd A   radec      9:18:05.28  -12:05:48.9          200      31.83
   1  Vir A   radec     12:30:49.42   12:23:28.0          200     159.06
   2  3C 286  radec     13:31:08.29   30:30:33.0          200      12.61
   3  3C 273  radec     12:29:06.70    2:03:08.6          200      39.32
Scans: 4 selected out of 8 total       Compscans: 1 selected out of 1 total
  Date        Timerange(UTC)       ScanState  CompScanLabel  Dumps  Target
  11-Aug-2011/13:04:27 - 13:07:46    1:track    0:             200    0:Hyd A
              13:08:38 - 13:11:57    3:track    0:             200    1:Vir A
              13:12:28 - 13:15:47    5:track    0:             200    2:3C 286
              13:16:28 - 13:19:47    7:track    0:             200    3:3C 273

Compared to the first printout, the static information has remained the same while the dynamic information now reflects the selected subset. There are many possible selection criteria, as illustrated below:

d.select(timerange=('2011-08-11 13:10:00', '2011-08-11 13:15:00'), targets=[1, 2])
d.select(spw=0, subarray=0)
d.select(ants='ant1,ant2', pol='H', scans=(0,1,2), freqrange=(1700e6, 1800e6))

See the docstring of DataSet.select() for more detailed information (i.e. do d.select? in IPython). Take note that only one subarray and one spectral window must be selected.

Once a subset of the data has been selected, you can access the data and timestamps on the data set object:

vis = d.vis[:]
timestamps = d.timestamps[:]

Note the [:] indexing, as the vis and timestamps properties are special LazyIndexer objects that only give you the actual data when you use indexing, in order not to inadvertently load the entire array into memory.

For the example dataset and no selection the vis array will have a shape of (1054, 1024, 112). The time dimension is labelled by d.timestamps, the frequency dimension by d.channel_freqs and the correlation product dimension by d.corr_products.

Another key concept in the data set object is that of sensors. These are named time series of arbritrary data that are either loaded from the data set (actual sensors) or calculated on the fly (virtual sensors). Both variants are accessed through the sensor cache (available as d.sensor) and cached there after the first access. The data set object also provides convenient properties to expose commonly-used sensors, as shown in the plot example below:

import matplotlib.pyplot as plt
plt.plot(d.az, d.el, 'o')
plt.xlabel('Azimuth (degrees)')
plt.ylabel('Elevation (degrees)')

Other useful attributes include ra, dec, lst, mjd, u, v, w, target_x and target_y. These are all one-dimensional NumPy arrays that dynamically change length depending on the active selection.

As in katdal’s predecessor (scape) there is a DataSet.scans() generator that allows you to step through the scans in the data set. It returns the scan index, scan state and target object on each iteration, and updates the active selection on the data set to include only the current scan. It is also possible to iterate through the compound scans with the DataSet.compscans() generator, which yields the compound scan index, label and first target on each iteration for convenience. These two iterators may also be used together to traverse the data set structure:

for compscan, label, target in d.compscans():
    plt.figure()
    for scan, state, target in d.scans():
        if state in ('scan', 'track'):
            plt.plot(d.ra, d.dec, 'o')
    plt.xlabel('Right ascension (J2000 degrees)')
    plt.ylabel('Declination (J2000 degrees)')
    plt.title(target.name)

Finally, all the targets (or fields) in the data set are stored in a catalogue available at d.catalogue, and the original HDF5 file is still accessible via a back door installed at d.file in the case of a single-file data set.

Tuning your application

It is possible to load data at high bandwidth using katdal: rates over 2.5 GB/s have been seen when loading from a local disk. However, it requires an understanding of the storage layout and choice of an appropriate access pattern.

This chapter is aimed at loading MVF version 4 (MeerKAT) data, as older versions typically contain far less data. Some of the advice is generic but some of the methods described here will not work on older data sets.

Chunking

The most important thing to understand is that the data is split into chunks, each of which are stored as a file on disk or an object in an S3 store. Retrieving any element of a chunk causes the entire chunk to be retrieved. Thus, aligning accesses to whole chunks will give the best performance, as data is not discarded.

As an illustration, consider an application that has an outer loop over the baselines, and loads data for one baseline at a time. Chunks typically span all baselines, so each time one baseline is loaded, katdal will actually load the entire data set. If the application can be redesigned to fetch data for a small time range for all baselines it will perform much better.

When using MVFv4, katdal uses dask to manage the chunking. After opening a data set, you can determine the chunking for a particular array by examining its dataset member:

>>> d.vis.dataset
dask.array<1556179171-sdp, shape=(38, 4096, 40), dtype=complex64, chunksize=(32, 1024, 40)>
>>> d.vis.dataset.chunks
((32, 6), (1024, 1024, 1024, 1024), (40,))

For this data set, it will be optimal to load visibilities in 32 × 1024 × 40 element pieces.

Note that the chunking scheme may be different for visibilities, flags and weights.

Joint loading

The values returned by katdal are not the raw values stored in the chunks: there is processing involved, such as application of calibration solutions and flagging of missing data. Some of this processing is common between visibilities, flags and weights. It’s thus more efficient to load the visibilities, flags and weights as a single operation rather than as three separate operations.

This can be achieved using DaskLazyIndexer.get(). For example, replace

vis = d.vis[idx]
flags = d.flags[idx]
weights = d.weights[idx]

with

vis, flags, weights = DaskLazyIndexer.get([d.vis, d.flags, d.weights], idx)

Parallelism

Dask uses multiple worker threads. It defaults to one thread per CPU core, but for I/O-bound tasks this is often not enough to achieve maximum throughput. Refer to the dask scheduler documentation for details of how to configure the number of workers.

More workers only helps if there is enough parallel work to be performed, which means there need to be at least as many chunks loaded at a time as there are workers (and preferably many more). It’s thus advisable to load as much data at a time as possible without running out of memory.

Selection

Using DataSet.select() is relatively expensive. For the best performance, it should only be used occasionally (for example, to filter out unwanted data at the start), with array access notation or DaskLazyIndexer.get() used to break up large data sets into manageable pieces.

Dask also performs better with selections that select contiguous data. You might be able to get a little more performance by using DataSet.scans() (which will yield a series of contiguous selections) rather than using select() with scans='track'.

When using MVF v4 one can also pass a preselect parameter to katdal.open() which allows slicing a subset of the data (time and frequency). It is more limited than DataSet.select() (it can only select contiguous ranges, and can only specify the selection in terms of channels and dumps), but if a script is only interested in working on a subset of data, this method can be more efficient and uses less memory.

Network versus local disk

When loading data from the network, latency is typically higher, and so more workers will be needed to achieve peak throughput. Network access is also more sensitive to access patterns that are mis-aligned with chunks, because chunks are not cached in memory by the operation system and hence must be re-fetched over the network if they are accessed again.

Benchmarking

To assist with testing out the effects of changing these tuning parameters, the katdal source code includes a script called mvf_read_benchmark.py that allows a data set to be loaded in various ways and reports the average throughput. The command-line options are somewhat limited so you may need to edit it yourself, for example, to add a custom selection.

Sign conventions

Visibilities

For a wave with frequency \(\omega\) and wave number \(k\), the phasor is

\[e^{(\omega t - kz)i}\]

Visibilities are then \(e_1 \overline{e_2}\).

In KAT-7, the opposite sign convention is used in the HDF5 files, but katdal conjugates the visibilities to match MeerKAT.

Baseline coordinates

The UVW coordinates for the baseline (A, B) are \((u, v, w)_A - (u, v, w)_B\). Combined with the above, this means that ideal visibilities (ignoring any effects apart from geometric delay) are

\[V(u, v, w) = \int \frac{I(l, m)}{n} e^{2\pi i(ul + vm + w(n - 1))}\ dl\ dm\]

Polarisation

KAT-7 and MeerKAT are linear feed systems. On MeerKAT, if one points one’s right thumb in the direction of vertical polarisation and the right index finger in the direction of horizontal polarisation, then the right middle finger points from the antenna towards the source.

When exporting to a Measurement Set, katdal maps H to (IEEE) x and V to y, and introduces a 90° offset to the parallactic angle rotation.

KAT-7 has the opposite convention for polarisation (due to the lack of a sub-reflector). katdal does not make any effort to compensate for this. Measurement sets exported from KAT-7 data should thus not be used for polarimetry without further correction.

Data set format reference

In most cases uses should not need to know the details of the data set formats, because katdal exists to hide these details and present a consistent, user-friendly view. It also contains workarounds for known issues in older data sets (which are not documented here). This is reference documentation useful to katdal developers and to power users who need to extract information not presented by the katdal interface.

MVF version 1 (Fringe Finder)

Last updated: 9 March 2010

This documents the HDF5 file format as written by augment version 4. Also indicated by (**) are those parts of the data file used by the scape package, which handles single-dish- or single baseline back-end processing for the Fringe Finder. HDF5 files contain a hierarchical structure inside, with groups connected as nodes in a directed graph. Each group may contain datasets (multi-dimensional arrays) and attributes (metadata such as strings and smaller arrays). More information on these concepts can be found on the HDF5 website. The format currently calls for a single HDF5 file per experiment, but this may change in future.

The hierarchical structure of a typical data set, with levels ‘Experiment’ -> ‘Compound Scan’ -> ‘Scan’, is reflected in the HDF5 group structure. The root ‘/’ group of the file represents the ‘Experiment’ level and contains the groups ‘Antennas’, ‘Correlator’, ‘Scans’. In addition, the root ‘/’ group the augment log dataset.

The ‘Antennas’ group contains a group for each physical antenna, named ‘Antenna%d’, where %d should be replaced by the one-based physical antenna number.

The ‘Scans’ group contains a group for each compound scan, named ‘CompoundScan%d’, where %d should be replaced by the zero-based compound scan index integer. Each ‘CompoundScan%d’ group contains a group for each scan, named ‘Scan%d’, where %d should be replaced by the zero-based scan index integer. The group structure of a typical dataset is shown below:

/
/Antennas
/Antennas/Antenna1
/Antennas/Antenna1/H
/Antennas/Antenna1/V
/Antennas/Antenna1/Sensors
/Antennas/Antenna2
/Antennas/Antenna2/H
/Antennas/Antenna2/V
/Antennas/Antenna2/Sensors
/Correlator
/Scans
/Scans/CompoundScan0
/Scans/CompoundScan0/Scan0
/Scans/CompoundScan0/Scan1
/Scans/CompoundScan0/Scan2
/Scans/CompoundScan1
/Scans/CompoundScan1/Scan0
/Scans/CompoundScan1/Scan1
/Scans/CompoundScan1/Scan2

The following sections list the data entries in each group. Four fields describe each entry: its name, D or A for whether it is an HDF5 dataset or attribute, and its data type, separated by colons and followed by a general description on the next line.

The root ‘/’ group

_ experiment_id : A : string

The ID for this experiment (usually a UUID generated by the system at observe time).

_ observer : A : string

The observer who ran the experiment. Data files are copied into the archive using the observer as part of the file hierarchy.

_ description : A : string

Description of the experiment as specified by the observer at observe time.

_ k7w_file_version : A : integer, optional

Version of the k7writer program used to generate the raw data file

_ data_unit : A : string, one of {‘counts’, ‘K’, ‘Jy’}

Power unit recorded in the scans, typically ‘counts’ for uncalibrated correlator output or ‘K’ for Tsys-corrected data.
  • data_timestamps_at_sample_centers : A : bool

    Boolean flag indicating whether data timestamps are aligned with the center (if True or 1) or the start (if False or 0) of each sample / integration period.

  • augment_version : A : string, optional

    String added by augmenter, indicating the version of the augmented data format.

  • augment : A : string

    String added by augmenter, indicating when the file was augmented. If this attribute is absent, the file contains unaugmented correlator data only and cannot be loaded by scape.

_ augment_log : D : record array, optional, of shape (N,2) with record:

{'section' : string, 'message' : string}

The augment program log ouput from the augmentation of this HDF5 file.

/Antennas

This has no additional attributes or datasets.

/Antennas/Antenna%d

_ description : A : string

Description string of antenna, used by katpoint package to construct katpoint.Antenna object via katpoint.construct_antenna(). The string includes antenna name, location, size, etc.

/Antennas/Antenna%d/H

_ dbe_input : A : string

DBE input mapping for the H channel.

_ delay_s : A : float64 (???)

Cable delay in seconds for the H channel.

_ coupler_nd_model : D : float64 array of shape (N,2) ???

Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column

_ pin_nd_model : D : float64 array of shape (N,2) ???

Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column

/Antennas/Antenna%d/V

_ dbe_input : A : string

DBE input mapping for the V channel.

_ delay_s : A : float64 (???)

Cable delay in seconds for the V channel.

_ coupler_nd_model : D : float64 array of shape (N,2) ???

Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column

_ pin_nd_model : D : float64 array of shape (N,2) ???

Table containing N frequencies in Hz in the first column and measured temperatures in K in the second column

/Antennas/Antenna%d/Sensors

This is comprised of a set of record arrays, each corresponding to sensors on the telescope. Each record array has 4 attributes: name, description, units, type (??? indicate here which items are optional for scape):

_ name : A : string

The katcp name of the sensor.

_ description : A : string, optional

Human understandable (hopefully) description of sensor.

_ units : A : string, optional

The units for the sensor readings. Currently scape does not take any notice of these attributes and assumes ‘standard’ units for the sensors.

_ type : A : string, optional???

Data type for the sensor readings.

The set of record arrays (all optional for scape ???) is as below and may be missing if no sensor data was recorded during the collection of the correlator data (???). The ‘status’ field for each is the sensor status recorded along with the sensor data. This could be ‘nominal’, ‘warn’ (sensor in warning range), ‘error’ (sensor in error range), ‘failure’ (comms error to sensor), ‘unknown’ (no initial value yet). Currently, scape does not take any notice of these sensor statuses (???).

  • enviro_air_pressure : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Environmental measurements of air pressure at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The air pressure is assumed by scape to be in mbar. The status is the sensor status as described above.

  • enviro_air_relative_humidity : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Environmental measurements of air relative humidity at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The humidity is assumed by scape to be a percentage. The status is the sensor status as described above.

  • enviro_air_temperature : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Environmental measurements of air temperature at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The air temperature is assumed by scape to be in deg C. The status is the sensor status as described above.

  • enviro_wind_direction : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Environmental measurements of wind direction at arbitrary time instants, where N is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind direction is assumed by scape to be in degrees increasing clockwise from North. The status is the sensor status as described above.

  • enviro_wind_speed : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Environmental measurements of wind speed at arbitrary time instants, where N is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind speed is assumed by scape to be in metres per second. The status is the sensor status as described above.

  • pos_actual_pointm_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual azimuth after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_actual_pointm_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual elevation after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_actual_refrac_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual azimuth after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_actual_refrac_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual elevation after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_actual_scan_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual azimuth after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_actual_scan_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Measurements of actual elevation after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_pointm_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested azimuth after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_pointm_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested elevation after pointing model at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_refrac_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested azimuth after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_refrac_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested elevation after refraction correction at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_scan_azim : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested azimuth after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The azimuth values is assumed by scape to be in deg. The status is the sensor status as described above.

  • pos_request_scan_elev : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : float32, 'status' : string}
    

    Requested elevation after scan offset at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The elevation values is assumed by scape to be in deg. The status is the sensor status as described above.

  • rfe3_rfe15_noise_coupler_on : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : string, 'status' : string}
    

    Coupler noise diode firing flag at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The values are boolean wrapped in strings (‘0’ for off or ‘1’ for on). The status is the sensor status as described above.

  • rfe3_rfe15_noise_pin_on : D : record array of shape (N,3), with record:

    {'timestamp' : float64, 'value' : string, 'status' : string}
    

    Pin noise diode firing flag at arbitrary time instants, where N is the number of data records. The timestamps are UTC seconds since the Unix epoch. The values are boolean wrapped in strings (‘0’ for off or ‘1’ for on). The status is the sensor status as described above.

/Correlator

_ instrument_type : A : integer, optional

Correlator instrument type as per DBE.

_ instance_id : A : integer, optional

Correlator instance id as per DBE.

_ channel_bandwidth_hz : A : uint64, optional???

The width of each frequency channel in the scan data in Hz.

_ adc_sample_rate : A : uint64, optional???

ADC sample rate in Hz.

_ accum_per_int : A : uint64, optional???

Number of FFT frames (polyphase filterbank output samples) per integration. This is the number of samples from the output of each PFB channel that are multiplied together and accumulated in the correlator to form a single visibility sample.

_ num_freq_channels : A : uint64, optional???

Number of frequency channels contained in the scan data.

_ dump_rate_hz : A : float64

Correlator dump rate, in Hz. This should satisfy:

dump_rate = adc_sample_rate / (2 * num_freq_channels * accum_per_int)

_ center_frequency_hz : A : float32???, optional

Center frequency of the scan data in Hz.

_ channel_select : D : bool array of shape (F)

Array of boolean values, indicating which channels should be processed. F is the number of frequency channels.

_ input_map : D : record array of shape (N,2) with record:

 {'correlator_product_id' : integer, 'dbe_inputs' : string}

This is used, combined with the /Antennas/Antenna%d/H/dbe_input or
/Antennas/Antenna%d/V/dbe_input attribute to map the physical antenna
H or V channel to the DBE input used then through to the correlator output.
*N* is the number of correlation products.

/Scans

This has no additional attributes or datasets.

/Scans/CompoundScan%d

_ label : A : string

A label for this compound scan.

_ target : A : string

Description string of target, used by katpoint package to construct katpoint.Target object via katpoint.construct_target().

  • pointing_model : D : float32 array of shape (22,)

    Pointing model used during experiment.

CorrelatorConfig

  • center_freqs : D : float64 array of shape (F,), optional

    Center frequency of each channel, in Hz, where F is the number of channels. This is the main specification for center frequencies. If this dataset is not present, the center frequencies are assumed to be regularly spaced and calculated from the center_frequency_hz, bandwidth_hz and num_freq_channels attributes of this group, which must be present in this case.

  • bandwidths : D : float64 array of shape (F,), optional

    Bandwidth of each channel, in Hz, where F is the number of channels. This is the main specification for channel bandwidths. If this dataset is not present, the bandwidths are all set to the bandwidth_hz attribute, which must be present in this case.

  • center_frequency_hz : A : uint64, optional

    Center frequency of entire spectral band encompassing all channels. This is used to calculate channel center frequencies in the absence of the center_freqs dataset.

  • channel_bandwidth_hz : A : uint64, optional

    Bandwidth of each channel. This is used as channel bandwidths in the absence of the bandwidths dataset.

  • num_freq_channels : A : uint64, optional

    Number of channels.

  • adc_sample_rate : A : uint64, optional

    ADC sample rate, in Hz.

Scan%d

  • data : D : record array of shape (T, F), with record:

    {'AxBx' : {'r' : float32, 'i' : float32}, 'AyBy' : {'r' : float32, 'i' : float32},
     'AxBy' : {'r' : float32, 'i' : float32}, 'AyBx' : {'r' : float32, 'i' : float32}}
    

    The record structure is compatible with the NumPy dtype:

    [('AxBx', complex64), ('AyBy', complex64), ('AxBy', complex64), ('AyBx', complex64)]
    

    Correlation data for a single baseline between antennas A and B (specified by antenna and antenna2, respectively), involving the x and y DBE inputs for each antenna (typically mapping to the H and V feeds, respectively). If antennas A and B correspond to the DBE antennas 0 and 1, respectively, the term ‘AxBy’ would refer to the correlation between the 0x and 1y DBE input signals. For a single dish, antennas A and B refer to the same antenna. Each correlation datum consists of a real (‘r’) and imaginary (‘i’) part, specified as 32-bit floats. The array has T rows and F columns, where T is the number of time samples in the scan and F is the number of frequency channels in the data set. The data has been scaled by accum_per_int, the number of samples that have been integrated into a single visibility.

  • timestamps : D : uint64 or float64 array of shape (T,)

    Timestamps of the start of each sample in number of UTC milliseconds since the Unix epoch, which is the native format of the DBE. T is the number of time samples in the scan. If data_timestamps_at_sample_centers is True, the timestamps are aligned with the middle of each sample period instead.

  • pointing : D : record array of shape (T,), optional, with record:

    {'az' : float32, 'el' : float32}
    

    Pointing information, consisting of azimuth and elevation values per time sample, where T is the number of time samples in the scan. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. If this dataset is present, the original pointing information has been processed by selecting a specific sensor and interpolating its measured values to coincide with the correlator data timestamps (timestamps). For interferometric data, this is the pointing data of the first antenna.

  • requested_pointing : D : record array of shape (Tp,), optional, with record:

    {'timestamp' : float64,
     'request_scan_azim' : float32, 'request_scan_elev' : float32,
     'request_refrac_azim' : float32, 'request_refrac_elev' : float32,
     'request_pointm_azim' : float32, 'request_pointm_elev' : float32}
    

    The requested / predicted / commanded target position at various points in the coordinate conversion chain, at arbitrary time instants. The timestamps are UTC seconds since the Unix epoch. The rest of the field names are the names of the corresponding Fringe Finder sensors. These fields contain azimuth and elevation values per time sample, where Tp is the number of pointing data records. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. The scan fields represent the highest level, obtained at the input of the refraction correction step, while the refrac fields are obtained at the output of this step. The pointm fields represent the lowest level, obtained at the output of the pointing model correction step. This dataset may be missing if no new pointing info was recorded during the collection of the correlator data. For interferometric data, this is the pointing data of the first antenna.

  • actual_pointing : D : record array of shape (Tp,), optional, with record:

    {'timestamp' : float64,
     'actual_scan_azim' : float32, 'actual_scan_elev' : float32,
     'actual_refrac_azim' : float32, 'actual_refrac_elev' : float32,
     'actual_pointm_azim' : float32, 'actual_pointm_elev' : float32}
    

    The actual / measured target position at various points in the coordinate conversion chain, at arbitrary time instants. The timestamps are UTC seconds since the Unix epoch. The rest of the field names are the names of the corresponding Fringe Finder sensors. These fields contain azimuth and elevation values per time sample, where Tp is the number of pointing data records. All angles are in degrees. The elevation should be between -90 and 90 degrees, while azimuth has no restrictions, but is nominally between -180 and 180 degrees. The pointm fields represent the lowest level, obtained at the input of the reverse pointing model correction step, while the refrac fields are obtained at the output of this step. The scan fields represent the highest level, obtained at the output of the reverse refraction correction step. This dataset may be missing if no new pointing info was recorded during the collection of the correlator data. For interferometric data, this is the pointing data of the first antenna.

  • flags : D : record array of shape (T,), with record:

    {'valid' : bool, 'nd_on' : bool}
    

    Flags per time sample, where T is the number of time samples in the scan.

  • enviro_ambient : D : record array of shape (Ta,), optional, with record:

    {'timestamp' : float64,
     'temperature' : float32, 'pressure' : float32, 'humidity' : float32}
    

    Slowly-varying (‘ambient’) environmental measurements at arbitrary time instants, where Ta is the number of ambient environment data records. The timestamps are UTC seconds since the Unix epoch. The ambient temperature is in degrees Celsius, the atmospheric pressure is in hPa and the relative humidity is a percentage. This dataset may be missing if no new ambient environment info was recorded during the collection of the correlator data.

  • enviro_wind : D : record array of shape (Tw,), optional, with record:

    {'timestamp' : float64, 'wind_speed' : float32, 'wind_direction' : float32}
    

    Environmental measurements of wind speed and direction (which typically vary faster than the ambient data) at arbitrary time instants, where Tw is the number of wind data records. The timestamps are UTC seconds since the Unix epoch. The wind speed is in metres per second, and the wind direction is in degrees increasing clockwise from North. This dataset may be missing if no new wind environment info was recorded during the collection of the correlator data.

  • label : A : string

    String that can be used to identify the type of scan in the back-end scripts. Typical contents are ‘scan’ to indicate a normal scan, ‘slew’ to indicate the telescope moving to the start of the next scan, and ‘cal’ to indicate a noise diode firing. These suggestions are not enforced or checked by the scape package, however.

  • comment : A : string

    Generic comment added to scan.

MVF version 2 (KAT-7)

Introduction

With the introduction of the KAT-7 correlator, we have taken the opportunity to revisit the correlator data storage format. This document describes this updated format.

Basic Concept

A single HDF5 corresponds to a single observation (contiguous telescope time segment for a specified subarray).

At highest level split into Data and MetaData.

MetaData contains two distinct types:

  • Configuration is known a priori and is static for the duration of the observation.
  • Sensors contains dynamic information provided in the form of katcp sensors. Typically only full known post observation.

Flags and History are special cases objects that get populated during run time but not from sensors. These are also the only groups that could get updated post augmentation.

Some datasets such as the noise_diode flags are synthesised from sensor information post capture. These base sensors could then be removed if space is a concern.

A major/minor version number is included in the file. The major indicates the overall structural philosophy (this document describes version 2.x). The minor is used to identify the mandtory members of the MetaData and Markup groups included in the file. This allows addition of members (and modification of existing members) to the required list without wholesale changes to the file structure. The mandatory members are described in the following document: TBA.

If used to store voltage data then both correlator_data and timestamps are omitted as timing is synthesized on the fly.

Nut - number of correlator timeslots in this observation Nt - number of averaged time timeslots Nuf - number of correlator frequency channels Nf - number of averaged frequency channels Nbl - number of baselines Np - number of polarisation products Na - number of antennas in a given subarray AntennaK - first antenna in a given subarray AntennaN - last antenna in a given subarray

HDF5 Format

The structural format is shown below.

Groups are named using CamelCase, datasets are all lower case with underscores. Attributes are indicated next to a group in {}:

/ {augment_ts}
  {experiment_id}
  {version}

/Data/ {ts_of_first_timeslot}
     /correlator_data - (Nt,Nf,Nbl,2) array of float32 visibilities (real and imag components)
     /timestamps - (Nt) array of float64 timestamps (UT seconds since Unix epoch)
     /voltage_data - (optional) (Na, Nt, Nf) array of 8bit voltage samples

/MetaData/
         /Configuration/
                       /Antennas/ {num_antennas, subarray_id}
                                /AntennaK..N/ {description, delays, diameter, location, etc...}
                                            / beam_pattern
                                            / h_coupler_noise_diode_model
                                            / h_pin_noise_diode_model
                                            / v_coupler_noide_diode_model
                                            / v_pin_noise_diode_model
                       /Correlator/ {num_channels, center_freq, channel_bw, etc...}
                       /Observation/ {type, pi, contact, sw_build_versions, etc...}
                       /PostProcessing/ {channel_averaging, rfi_threshold, etc...}
                                      /time_averaging - TBD detail of baseline dep time avg
        /Sensors/
                /Antennas/ {num_antennas, subarray_id}
                         /AntennaK..N/
                                     /... - dataset per antenna and pedestal sensor
                /DBE/
                       /... - dataset per DBE sensor
                /Enviro/
                       /... - dataset per enviro sensor
                /Other/
                      /... - dataset per other sensor
                /RFE/
                    /... - dataset per RFE sensor
                /Source/
                       /phase_center
                       /antenna_target - array of target sensors for each antenna

/Markup/
       /dropped_data - (optional) describes data dropped by receivers
       /flags - (Nt,Nf,Nbl) post averaged uint8 flags - 1bit per flag, packed
       /flags_description - (Nflags,3) index, name and description for each packed flag type
       /flags_full - (optional) (Nut,Nuf,Nbl) pre-averaged uint8 flags - 1bit per flag, packed
       /labels - (optional) descriptions of intent of each observational phase (e.g. scan, slew, cal, etc..)
       /noise_diode - (Nt,Na) noise diode state during this averaged timeslot
       /noise_diode_full - (optional) (Nut,Na) noise diode state per correlator timeslot
       /weights - (Nt,Nf,Nbl,Nweights) weights for each sample

/History/
        /augment_log - Log output of augmentation process
        /script_log - Log output of observation script

MVF version 3 (early MeerKAT)

The version 3 format is an evolution of the v2 format, and continues to use HDF5 as the underlying format. It was used for early engineering and commissioning of MeerKAT, but replaced by v4 before science operations started.

At present there is no detailed documentation.

MVF version 4 (MeerKAT)

The version 4 format is the standard format for MeerKAT visibility data. Unlike previous versions, the data for an observation does not reside in a single HDF5 file, as such files would be unmanageably large. Instead, the data is split into chunks, each in its own file, which are loaded from disk or the network on demand. For this reason, the term “data set” is preferred over “file”.

Concepts

Streams
A stream is a collection of data and any associated metadata, whether multicast, queriable (e.g., sensors) or stored on disk. Every stream in a subarray product has a unique name and a type. A stream may consist of multiple items of related data e.g., visibilities, flags and weights may form a single stream.
Subarray product
A collection of streams in the MeerKAT Science Data Processor (SDP) forms a subarray product.
Capture block

A capture block is a contiguous period over which data is captured from a specific subarray product. A subarray product can only capture one capture block at a time.

Each visibility belongs to a specific stream and capture block within a subarray product.

Capture block IDs are currently numbers representing the start time in the UNIX epoch, but they should be treated as opaque strings.

Chunk store
A location (such as a local disk or the MeerKAT archive) that stores the data from a capture block.

Metadata

The metadata for a data set is stored in a Redis dump file (extension .rdb), which is exported by katsdptelstate. Refer to katsdptelstate for details of how attributes and sensors are encoded in the Redis database.

A single .rdb file contains metadata for a single subarray but potentially for multiple streams and capture blocks. The default capture block and stream to access are stored in capture_block_id and stream_name.

Keys are stored in one of the following namespaces:

  • the global namespace
  • stream_name (the “stream namespace”)
  • capture_block_id (the “capture-block namespace”)
  • capture_block_id.stream_name (the “capture-stream namespace”)

Here . is used to indicate a sub-namespace, but the actual separator is subject to change and one should always use join() to construct compound namespace names.

Keys may move between these namespaces without notice. Readers should search for keys from the most specific to the least specific appropriate namespace (see for example katdal.datasources.view_capture_stream()).

Where values contain strings, they might contain either raw bytes (which should be decoded as UTF-8) or Unicode text. Readers should be prepared to accept either. The goal is to eventually migrate all such fields to use text. katdal recursively converts all strings to the Python interpreter’s native string type.

katsdptelstate stores two types of values: immutable “attributes”, and “sensors” which are lists of timestamped values. In the documentation below, most keys contain attributes, and sensors are indicated.

Global metadata

A subset of the sensors in the MeerKAT system are stored in the file, in the global namespace. Documenting the MeerKAT sensors is beyond the scope of this documentation.

The following keys are also stored.

sdp_config (dict)
The JSON object used to configure the SDP subarray product. It is not intended to be parsed (the relevant information is intended to be available via other means), but it contains a wealth of debugging information about the streams and connections between them.
sdp_capture_block_id (string) — sensor
All capture block IDs seen so far. This should not be confused with capture_block_id, which indicates the default capture block ID that should be consulted when the file is opened without specifying a capture block ID.
sdp_image_tag (string)
The Docker image tag for the Docker images forming the realtime SDP capture system. This is the closest thing to a “version number” for the implementation.
sdp_image_overrides (dict)
Alternative Docker image tags for specific services within SDP, overriding sdp_image_tag. Overriding individual images is a debugging tool and it should always be empty for science observations.
config.* (dict)
Command-line options passed to each of the services within SDP.
sdp_task_details (dict)
Debug information about each of the services launched for the subarray product, including the host on which it ran and the Mesos TaskInfo structure.
Common stream metadata

The list of streams that can be accessed from the archive is available in sdp_archived_streams (in the global namespace). Within each stream, the following keys may be defined (not all make sense for every stream type).

Only stream_type and src_streams are guaranteed to be in the stream namespace, i.e. independent of the capture block. The others may appear either in the capture-stream namespace or the stream namespace.

inherit (string)

If present, it indicates another stream from which this stream inherits properties. Any property that cannot be found in the namespace of the current stream should first be looked up in that stream’s namespace.

This is typically used where a single multicast stream is recorded in multiple places. Each copy inherits the majority of metadata from the original and overrides a few keys.

stream_type (string)

Valid values are

sdp.vis
Uncalibrated visibilities, flags and weights
sdp.flags
Similar to sdp.vis, but containing only flags
sdp.cal
Calibration solutions. Older files may contain a cal stream which omits the stream information and which does not appear in sdp_archived_streams, so that should be considered as a fallback.
sdp.continuum_image
Continuum image (as a list of CLEAN components) and self-calibration solutions. FITS files will be stored in the MeerKAT archive but katdal does not currently support accessing them.
sdp.spectral_image
Spectral-line image. FITS files will be stored in the MeerKAT archive but katdal does not currently support accessing them.
src_streams (list of string)
The streams from which the current stream was computed. These are not necessarily listed in sdp_archived_streams, particularly if they were produced by the MeerKAT Correlator/Beamformer (CBF) rather than the SDP.
n_chans (int)
Number of channels in a channelised product.
n_chans_per_substream (int)
Number of channels in each SPEAD heap. Not relevant when loading archived data.
bandwidth (float, Hz)
Bandwidth of the stream.
center_freq (float, Hz)
Middle of the central channel. Note that if the number of channels is even, this is actually half a channel higher than the middle of the band.
channel_range (int, int)
A half-open range of channels taken from the source stream. The length of this range might not equal n_chans due to channel averaging.
Visibility stream metadata

The following are relevant to sdp.vis and sdp.flags streams.

n_bls (int)
Number of baselines. Note that a baseline is a correlation between two polarised inputs (a single entry in a Jones matrix).
bls_ordering (either a list of string pairs or a 2D array)
An array of pairs of strings. Each pair names two antenna inputs that form a baseline. There will be n_bls rows. Note that this can be either a list of 2-element lists or a NumPy array.
sync_time, int_time, first_timestamp (float)
Refer to Timestamps below.
excise (bool)
True if RFI detected in the source stream is excised during time and channel averaging. If missing, assume it is true.
calibrations_applied (list of string)
Names of sdp.cal streams whose corrections have been applied to the data.
need_weights_power_scale (bool)
Refer to Weights below. If missing, assume it is false.
s3_endpoint_url (string), chunk_info
Refer to Data below.
Calibration solutions

Streams of type sdp.cal have the following keys.

antlist (list of string, length n_ants)
List of antenna names. Arrays of calibration solutions use this order along the antenna axis.
pol_ordering (list of string, length n_pols)
List of polarisations (from v and h). Arrays of calibration solutions use this order along the polarisation axis.
bls_ordering (either a list of string pairs or a 2D array)
Same meaning as for sdp.vis streams, but describes the internal ordering used within the calibration pipeline and not of much use to users.
param_*
Parameters used to configure the calibration.
refant (string)
Name of the selected reference antenna (which will also appear in antlist). The reference antenna is only chosen when first needed in a capture block, so this key may be absent if there was no calibration yet. In older datasets this key contains the katpoint antenna description string instead of the name.
product_G (array of complex, shape (n_pols, n_ants)) — sensor
Gain solutions (derived e.g. on a phase calibrator), indexed by polarisation and antenna. The complex values in the array apply to the entire band.
product_K (array of float, shape (n_pols, n_ants)) — sensor
Delay solutions (in seconds), indexed by polarisation and antenna. To correct data at frequency \(\nu\), multiply it by \(e^{-2\pi i\cdot K\cdot \nu}\).
product_B_parts (int)
Number of keys across which bandpass-like solutions are split.
product_BN (array of complex, shape (n_chans, n_pols, n_ants)) — sensor

Bandpass solutions, indexed by channel, polarisation and antenna.

For implementation reasons, the bandpass solutions are split across multiple keys. N is in the range [0, product_B_parts), and these pieces should be concatenated along the channel (first) axis to reconstruct the full solution. If some pieces are missing (which is rare but can occur), they should be assumed to have the same shape as the present pieces.

product_KCROSS_DIODE (array of float, shape (n_pols, n_ants)) — sensor

Cross-hand delay solutions (in seconds), indexed by polarisation and antenna. Derived using noise diode firings.

Data at a given frequency is corrected in the same manner as product_K. One polarisation will serve as the reference polarisation and have all zero solutions.

product_KCROSS (array of float, shape (n_pols, n_ants)) — sensor

Cross-hand delay solutions (in seconds), indexed by polarisation and antenna.

Solutions are similar to product_KCROSS_DIODE but solved for using a celestial source instead of a noise diode.

product_BCROSS_DIODEN (array of complex, shape (n_chans, n_pols, n_ants)) — sensor

Cross-hand bandpass phase solutions, indexed by channel, polarisation and antenna.

Amplitudes for these solutions should always be one. One polarisation will serve as the reference polarisation and have all zero phase solutions.

As for product_BN` the cross-hand bandpass solutions are split across multiple keys indexed by N, where N is in the range [0, product_B_parts). The full solution should be reconstructed as for product_BN, by concatenating along the channel (first) axis.

shared_solve_*N*, last_dump_index*N*
These are used for internal communication between the calibration processes, and are not intended for external use.

Some common points to note that about the solutions:

  • Solutions describe the systematic errors. To correct data, it must be divided by the solutions.
  • The key will only be present if at least one solution was computed.
  • The timestamp associated with each sensor value is the timestamp of the middle of the data that was used to compute the solution.
  • Solutions may contain NaN values, which indicates that there was insufficient information to compute a solution (for example, because all the data was flagged).
  • Solutions are only valid as long as the system gain controls are not altered. Re-using gains from one capture block to correct data from another capture block may yield incorrect results unless one takes extra steps to correct for changes in the system gains.
Image stream metadata

The following apply to sdp.continuum_image and sdp.spectral_image streams.

target_list (dict)
This is only applicable for imaging streams. Each key is a katpoint target description and the value is the normalised target name, which is a string used to form target-specific sub-namespaces of the stream and capture-stream namespaces. A normalised target name looks similar to the target name but has a limited character set (suitable for forming filenames and telstate namespaces) and, where necessary, a sequence number appended to ensure uniqueness.

For each sdp.continuum_image stream, there is a sub-namespace per target (named with the normalised target name) with the following keys (keeping in mind that . is used to indicate whichever separator is in use by katsdptelstate for this database):

target0.clean_components (dict)

Image of the target field as a set of point sources. The target0 sub-namespace is used to allow for possible alternative ways to run the continuum imager in which a single execution would image multiple fields, in which case there would be targetN sub-namespaces up to some N. This is not currently expected for MeerKAT science observations.

The dictionary has two keys:

description (string)
katpoint description of the target field (specifically, the phase centre).
components (list of string)
katpoint target descriptions for the CLEAN components. The names are arbitrary. This describes the perceived sky i.e., are modulated by the primary beam.

Each sub-namespace per target contains a further sub-sub-namespace called selfcal that contains the self-calibration solutions. It behaves like an sdp.cal stream namespace and has the following keys:

antlist (list of string, length n_ants)
List of antenna names. Arrays of self-calibration solutions use this order along the antenna axis.
pol_ordering (list of string, length n_pols)
List of polarisations (from v and h). Arrays of self-calibration solutions use this order along the polarisation axis.
n_chans (int)
Number of channels in the self-calibration solutions, which corresponds to the number of “IFs” or sub-bands in the continuum imager.
bandwidth (float, Hz)
Bandwidth of the self-calibration solutions.
center_freq (float, Hz)
Middle of the central channel. Note that if the number of channels is even, this is actually half a channel higher than the middle of the band.
product_GPHASE (array of complex, shape (n_chans, n_pols, n_ants)) — sensor

Phase-only self-calibration solutions, indexed by channel, polarisation and antenna.

Amplitudes for these solutions will be very close to one (to within numerical precision).

product_GAMP_PHASE (array of complex, shape (n_chans, n_pols, n_ants)) — sensor
Amplitude + phase self-calibration solutions, indexed by channel, polarisation and antenna.
Timestamps

Timestamps are not stored explicitly. Instead, the first timestamp and the interval between dumps are stored, from which timestamps can be synthesised. The ith dump has a central timestamp (in the UNIX epoch) of \(\text{sync_time} + \text{first_timestamp} + i \times \text{int_time}\). The split of the initial timestamp into two parts is for technical reasons.

There is also first_timestamp_adc, which is the same as first_timestamp but in units of the digitiser ADC counts. It is stored only for internal implementation reasons and should not be relied upon.

Light RDB files

The MeerKAT system also writes a “light” version of each RDB file, which contains only a subset of the keys. It is intended to contain enough information to read the uncalibrated visibilities and some high-level metadata about the observation itself. It does not contain information about antenna pointing, calibration, or CLEAN components.

Data

Visibilities, flags and weights are subdivided into small chunks. The chunking model is based on dask. Visibilities are treated as a 3D array, with axes for time, frequency and baseline. The data is divided into pieces along each axis. Each piece is stored in a separate file in the archive, in .npy format. The metadata necessary to reconstruct the array is stored in the telescope state and documented in more detail later. It is possible that some chunks will be missing, because they were lost during the capture process. On load, katdal will replace such chunks with default values and set the data_lost flag for them. Weights and flags are similarly treated.

Chunks are named type/AAAAA_BBBBB_CCCCC.npy where type is one of correlator_data (visibilities), flags, weights; and AAAAA, BBBBB and CCCCC are the (zero-based) indices of the first element in the chunk along each axis, padded to a minimum of five digits. Additionally, there are chunks named weights_channel/AAAAA_BBBBB.npy, explained below.

Note that the chunking scheme typically differs between visibilities, flags and weights, so files with the same base name start at the same point but do not necessarily have the same extent.

All the data for one stream is located in a single chunk store. If it is in the MeerKAT archive, the URL to the base of this chunk store (implementing the S3 protocol) is stored in s3_endpoint_url. Capture-stream specific information is stored in chunk_info, a two-level dictionary. The outer key is the type listed above, and the inner key is one of:

prefix (string)
A path prefix for the data. In the case of S3, this is the bucket name. For local storage, it is a directory name (the parent of the type directory).
dtype (string)
Numpy dtype of the data, which is expected to match the dtype encoded in the individual chunk files.
shape (tuple)
Shape of the virtual dask array obtained by joining together all the chunks.
chunks (tuple of tuples)
Sizes of the chunks along each axis, in the format used by dask.
Weights

To save space, the weights are represented in an indirect form that requires some calculation to reconstruct. The actual weight for a visibility is the product of three values:

  • The value in the weights chunk.
  • A baseline-independent value in the weights_channel chunk.
  • If the stream has a need_weights_power_scale key in telstate and the value is true, the inverse of the product of the autocorrelation power for the two inputs in the baseline.
Flags

Each flag is a bitfield. The meaning of the individual bits is documented in the katdal.flags module. Note that it is possible that a flag chunk is present but the corresponding visibility or weight data is missing, in which case it is the reader’s responsibility to set the data_lost bit.

The MeerKAT Science Data Processor typically uses two levels of flagging: a conservative first-pass flagger run directly on the correlator output, and a more accurate flagger that operates on data that has been averaged and (in some cases) calibrated. The latter appears in a stream of type sdp.flags, which contains only flags. It can be linked to the corresponding visibilities and weights by checking its source streams. The flags in this stream are a superset of the flags in the originating stream and are guaranteed to have the same timestamp and frequency metadata, so can be used in place of the original flags. However, due to data loss it is possible that the replacement flags will have slightly more or fewer dumps at the end, which will need to be handled.

katdal

katdal package

Submodules

katdal.applycal module

Utilities for applying calibration solutions to visibilities and weights.

katdal.applycal.complex_interp(x, xi, yi, left=None, right=None)

Piecewise linear interpolation of magnitude and phase of complex values.

Given discrete data points (xi, yi), this returns a 1-D piecewise linear interpolation y evaluated at the x coordinates, similar to numpy.interp(x, xi, yi). While numpy.interp() interpolates the real and imaginary parts of yi separately, this function interpolates magnitude and (unwrapped) phase separately instead. This is useful when the phase of yi changes more rapidly than its magnitude, as in electronic gains.

Parameters:
  • x (1-D sequence of float, length M) – The x-coordinates at which to evaluate the interpolated values
  • xi (1-D sequence of float, length N) – The x-coordinates of the data points, must be sorted in ascending order
  • yi (1-D sequence of complex, length N) – The y-coordinates of the data points, same length as xi
  • left (complex, optional) – Value to return for x < xi[0], default is yi[0]
  • right (complex, optional) – Value to return for x > xi[-1], default is yi[-1]
Returns:

y – The evaluated y-coordinates, same length as x and same dtype as yi

Return type:

array of complex, length M

katdal.applycal.get_cal_product(cache, cal_stream, product_type)

Extract calibration solution from cache as a sensor.

Parameters:
  • cache (SensorCache object) – Sensor cache serving cal product sensors
  • cal_stream (string) – Name of calibration stream (e.g. “l1”)
  • product_type (string) – Calibration product type (e.g. “G”)
katdal.applycal.calc_delay_correction(sensor, index, data_freqs)

Calculate correction sensor from delay calibration solution sensor.

Given the delay calibration solution sensor, this extracts the delay time series of the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).

Invalid delays (NaNs) are replaced by zeros, since bandpass calibration still has a shot at fixing any residual delay.

katdal.applycal.calc_bandpass_correction(sensor, index, data_freqs, cal_freqs)

Calculate correction sensor from bandpass calibration solution sensor.

Given the bandpass calibration solution sensor, this extracts the time series of bandpasses (channelised by cal_freqs) for the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).

Invalid solutions (NaNs) are replaced by linear interpolations over frequency (separately for magnitude and phase), as long as some channels have valid solutions.

katdal.applycal.calc_gain_correction(sensor, index, targets=None)

Calculate correction sensor from gain calibration solution sensor.

Given the gain calibration solution sensor, this extracts the time series of gains for the input specified by index (in the form (pol, ant)) and interpolates them over time to get the corresponding complex correction terms. The optional targets parameter is a CategoricalData i.e. a sensor indicating the target associated with each dump. The targets can be actual katpoint.Target objects or indices, as long as they uniquely identify the target. If provided, interpolate solutions derived from one target only at dumps associated with that target, which is what you want for self-calibration solutions (but not for standard calibration based on gain calibrator sources).

Invalid solutions (NaNs) are replaced by linear interpolations over time (separately for magnitude and phase), as long as some dumps have valid solutions on the appropriate target.

katdal.applycal.calibrate_flux(sensor, targets, gaincal_flux)

Apply flux scale to calibrator gains (aka flux calibration).

Given the gain calibration solution sensor, this identifies the target associated with each set of solutions by looking up the gain events in the targets sensor, and then scales the gains by the inverse square root of the relevant flux if a valid match is found in the gaincal_flux dict. This is equivalent to the final step of the AIPS GETJY and CASA fluxscale tasks.

katdal.applycal.add_applycal_sensors(cache, attrs, data_freqs, cal_stream, cal_substreams=None, gaincal_flux={})

Register virtual sensors for one calibration stream.

This operates on a single calibration stream called cal_stream (possibly an alias), which derives from one or more underlying cal streams listed in cal_substreams and has stream attributes in attrs.

The first set of virtual sensors maps all cal products into a unified namespace (template ‘Calibration/Products/cal_stream/{product_type}’). Map receptor inputs to the relevant indices in each calibration product based on the ants and pols found in attrs. Then register a virtual sensor per product type and per input in the SensorCache cache, with template ‘Calibration/Corrections/cal_stream/{product_type}/{inp}’. The virtual sensor function picks the appropriate correction calculator based on the cal product type, which also uses auxiliary info like the channel frequencies, data_freqs.

Parameters:
  • cache (SensorCache object) – Sensor cache serving cal product sensors and receiving correction sensors
  • attrs (dict-like) – Calibration stream attributes (e.g. a “cal” telstate view)
  • data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
  • cal_stream (string) – Name of (possibly virtual) calibration stream (e.g. “l1”)
  • cal_substreams (sequence of string, optional) – Names of actual underlying calibration streams (e.g. [“cal”]), defaults to [cal_stream] itself
  • gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux stored in attrs (if available). A value of None disables flux calibration.
Returns:

cal_freqs – Centre frequency of each frequency channel of calibration stream, in Hz (or None if no sensors were registered)

Return type:

1D array of float, or None

class katdal.applycal.CorrectionParams(inputs, input1_index, input2_index, corrections, channel_maps)

Bases: object

Data needed to compute corrections in calc_correction_per_corrprod().

Once constructed, the data in this class must not be modified, as it will be baked into dask graphs.

Parameters:
  • inputs (list of str) – Names of inputs, in the same order as the input axis of products
  • input2_index (input1_index,) – Indices into inputs of first and second items of correlation product
  • corrections (dict) – A dictionary (indexed by cal product name) of lists (indexed by input) of sequences (indexed by dump) of numpy arrays, with corrections to apply.
  • channel_maps (dict) – A dictionary (indexed by cal product name) of functions (signature g = channel_map(g, channels)) that map the frequency axis of the cal product g onto the frequency axis of the visibility data, where the vis frequency axis will be indexed by the slice channels.
katdal.applycal.calc_correction_per_corrprod(dump, channels, params)

Gain correction per channel per correlation product for a given dump.

This calculates an array of complex gain correction terms of shape (n_chans, n_corrprods) that can be directly applied to visibility data. This incorporates all requested calibration products at the specified dump and channels.

Parameters:
  • dump (int) – Dump index (applicable to full data set, i.e. absolute)
  • channels (slice) – Channel indices (applicable to full data set, i.e. absolute)
  • params (CorrectionParams) – Corrections per input, together with correlation product indices
Returns:

gains – Gain corrections per channel per correlation product

Return type:

array of complex64, shape (n_chans, n_corrprods)

Raises:

KeyError – If input and/or cal product has no associated correction

katdal.applycal.calc_correction(chunks, cache, corrprods, cal_products, data_freqs, all_cal_freqs, skip_missing_products=False)

Create a dask array containing applycal corrections.

Parameters:
  • chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see dask.array.core.normalize_chunks()).
  • cache (SensorCache object) – Sensor cache, used to look up individual correction sensors
  • corrprods (sequence of (string, string)) – Selected correlation products as pairs of correlator input labels
  • cal_products (sequence of string) – Calibration products that will contribute to corrections (e.g. [“l1.G”])
  • data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
  • all_cal_freqs (dict) – Dictionary mapping cal stream name (e.g. “l1”) to array of associated frequencies
  • skip_missing_products (bool) – If True, skip products with missing sensors instead of raising KeyError
Returns:

  • final_cal_products (list of string) – List of calibration products in the order that they will be applied (potentially a subset of cal_products if skipping missing products)
  • corrections (dask.array.Array object, or None) – Dask array that produces corrections for entire vis array, or None if no calibration products were found (either cal_products is empty or all products had some missing sensors and skip_missing_products is True)

Raises:

KeyError – If a correction sensor for a given input and cal product is not found (and skip_missing_products is False)

katdal.applycal.apply_vis_correction

Clean up and apply correction to visibility data in data.

katdal.applycal.apply_weights_correction

Clean up and apply correction to weight data in data.

katdal.applycal.apply_flags_correction

Set POSTPROC flag wherever correction is invalid.

katdal.averager module

katdal.averager.average_visibilities(vis, weight, flag, timestamps, channel_freqs, timeav=10, chanav=8, flagav=False)

Average visibilities, flags and weights.

Visibilities are weight-averaged using the weights in the weight array with flagged data set to weight zero. The averaged weights are the sum of the input weights for each average block. An average flag is retained if all of the data in an averaging block is flagged (the averaged visibility in this case is the unweighted average of the input visibilities). In cases where the averaging size in channel or time does not evenly divide the size of the input data, the remaining channels or timestamps at the end of the array after averaging are discarded. Channels are averaged first and the timestamps are second. An array of timestamps and frequencies corresponding to each channel is also directly averaged and returned.

Parameters:
  • vis (array(numtimestamps,numchannels,numbaselines) of complex64.) – The input visibilities to be averaged.
  • weight (array(numtimestamps,numchannels,numbaselines) of float32.) – The input weights (used for weighted averaging).
  • flag (array(numtimestamps,numchannels,numbaselines) of boolean.) – Input flags (flagged data have weight zero before averaging).
  • timestamps (array(numtimestamps) of int.) – The timestamps (in mjd seconds) corresponding to the input data.
  • channel_freqs (array(numchannels) of int.) – The frequencies (in Hz) corresponding to the input channels.
  • timeav (int.) – The desired averaging size in timestamps.
  • chanav (int.) – The desired averaging size in channels.
  • flagav (bool) – Flagged averaged data in when there is a single flag in the bin if true. Only flag averaged data when all data in the bin is flagged if false.
Returns:

  • av_vis (array(int(numtimestamps/timeav),int(numchannels/chanav)) of complex64.)
  • av_weight (array(int(numtimestamps/timeav),int(numchannels/chanav)) of float32.)
  • av_flag (array(int(numtimestamps/timeav),int(numchannels/chanav)) of boolean.)
  • av_mjd (array(int(numtimestamps/timeav)) of int.)
  • av_freq (array(int(numchannels)/chanav) of int.)

katdal.categorical module

Container for categorical (i.e. non-numerical) sensor data and related tools.

class katdal.categorical.ComparableArrayWrapper(value)

Bases: object

Wrapper that improves comparison of array objects.

This wrapper class has two main benefits:

  • It prevents sensor values that are NumPy ndarrays themselves (or array-like objects such as tuples and lists) from dissolving and losing their identity when they are assembled into an array.
  • It ensures that array-valued sensor values become properly comparable (avoiding array-valued booleans resulting from standard comparisons).

The former is needed because SensorGetter is treated as a structured array even if it contains object values. The latter is needed because the equality operator crops up in hard-to-reach places like inside list.index().

Parameters:value (object) – The sensor value to be wrapped
static unwrap(v)

Unwrap value if needed.

katdal.categorical.infer_dtype(values)

Figure out dtype of sequence of sensor values.

The common dtype is determined by explicit NumPy promotion. If the values are array-like themselves, treat them as opaque objects to simplify sensor processing. If the sequence is empty, the dtype is unknown and set to None. In addition, short-circuit to an actual dtype for objects with this attribute to simplify calling this on a mixed collection of sensor data.

Parameters:values (sequence, or object with dtype) – Sequence of sensor values (typically a list), or a sensor data object with a dtype attribute (like ndarray or SensorGetter)
Returns:dtype – Inferred dtype, or None if values is an empty sequence
Return type:numpy.dtype object or None

Notes

This is almost, but not quite, entirely like numpy.result_type(). The differences are that this accepts generic objects in the sequence, treats ndarrays as objects regardless of their underlying dtype, supports a dtype of None and short-circuits the check if the sequence itself is an object with a dtype. And this accepts the sequence as the first parameter as opposed to being unpacked across the argument list.

katdal.categorical.unique_in_order(elements, return_inverse=False)

Extract unique elements from elements while preserving original order.

Parameters:
  • elements (sequence) – Sequence of equality-comparable objects
  • return_inverse ({False, True}, optional) – If True, also return sequence of indices that can be used to reconstruct original elements sequence via [unique_elements[i] for i in inverse]
Returns:

  • unique_elements (list) – List of unique objects, in original order they were found in elements
  • inverse (array of int, optional) – If return_inverse is True, sequence of indices that can be used to reconstruct original sequence

class katdal.categorical.CategoricalData(sensor_values, events)

Bases: object

Container for categorical (i.e. non-numerical) sensor data.

This container allows simple manipulation and interpolation of a time series of non-numerical data represented as discrete events. The data is stored as a list of sensor values and two integer arrays:

  • unique_values stores one copy of each unique object in the data series
  • events stores the time indices (dumps) where each event occurs
  • indices stores indices linking each event to the unique_values list

The __getitem__ interface (i.e. data[dump]) returns the data associated with the last event before the requested dump(s), in effect doing a zeroth-order interpolation of the data at each event. Events can be added and removed and realigned, and the container can be split along the time axis, amongst other functionality.

Parameters:
  • sensor_values (sequence, length N) – Sequence of sensor values (of any type, preferably not None [see Notes])
  • events (sequence of non-negative ints, length N + 1) – Corresponding monotonic sequence of dump indices where each sensor value came into effect. The last event is one past the last dump where the final sensor value applied, and therefore equal to the total number of dumps for which sensor values were specified.
unique_values

List of unique sensor values in order they were found in sensor_values with any ComparableArrayWrapper objects unwrapped

Type:list, length M
indices

Array of indices into unique_values, one per sensor event

Type:array of int, shape (N,)
dtype

Sensor data type as NumPy dtype (found on demand from unique_values)

Type:numpy.dtype object

Notes

Any object values wrapped in a ComparableArrayWrapper will be unwrapped before adding it to unique_values. When adding, removing and comparing values to this container, any object values will be wrapped again temporarily to ensure proper comparisons.

It is discouraged to have a sensor value of None as this value is given a special meaning in methods such as CategoricalData.add() and sensor_to_categorical(). On the other hand, it is the most sensible dummy object value and any Nones entering through this initialiser will probably not cause any issues.

It is better to make unique_values a list instead of an array because an array assimilates objects such as tuples, lists and other arrays. The alternative is an array of ComparableArrayWrapper objects but these then need to be unpacked at some later stage which is also tricky.

dtype

Sensor value type.

segments()

Generator that iterates through events and returns segment and value.

Yields:
  • segment (slice object) – The slice representing range of dump indices of the current segment
  • value (object) – Sensor value associated with segment
add(event, value=None)

Add or override sensor event.

This adds a new event to the container, with a new value or a duplicate of the existing value at that dump. If the new event coincides with an existing one, it overrides the value at that dump.

Parameters:
  • event (int) – Dump of event to add or override
  • value (object, optional) – New value for event (duplicate current value at this dump by default)
remove(value)

Remove sensor value, remapping indices and merging segments in process.

If the sensor value does not exist, do nothing.

Parameters:value (object) – Sensor value to remove from container
add_unmatched(segments, match_dist=1)

Add duplicate events for segment starts that don’t match sensor events.

Given a sequence of segments, this matches each segment start to the nearest sensor event dump (within match_dist). Any unmatched segment starts are added as duplicate sensor events (or ignored if they fall outside the sensor event range).

Parameters:
  • segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
  • match_dist (int, optional) – Maximum distance in dumps that signify a match between events
align(segments)

Align sensor events with segment starts, possibly discarding events.

Given a sequence of segments, this moves each sensor event dump onto the nearest segment start. If more than one event ends up in the same segment, only keep the last event, discarding the rest.

The end result is that the sensor event dumps become a subset of the segment starts and there cannot be more sensor events than segments.

Parameters:segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
partition(segments)

Partition dataset into multiple sets along time axis.

Given a sequence of segments, split the container into a sequence of containers, one per segment. Each container contains only the events occurring within its corresponding segment, with event dumps relative to the start of the segment, and the containers share the same unique values.

Parameters:segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
Returns:split_data – Resulting multiple datasets in chronological order
Return type:sequence of CategoricalData objects
remove_repeats()

Remove repeated events of the same value.

katdal.categorical.concatenate_categorical(split_data, **kwargs)

Concatenate multiple categorical datasets into one along time axis.

Join a sequence of categorical datasets together, by forming a common set of unique values, remapping events to these and incrementing the event dumps of each dataset to start off where the previous dataset ended.

Parameters:split_data (sequence of CategoricalData objects) – Sequence of containers to concatenate
Returns:data – Concatenated dataset
Return type:CategoricalData object
katdal.categorical.sensor_to_categorical(sensor_timestamps, sensor_values, dump_midtimes, dump_period, transform=None, initial_value=None, greedy_values=None, allow_repeats=False, **kwargs)

Align categorical sensor events with dumps and clean up spurious events.

This converts timestamped sensor data into a categorical dataset by comparing the sensor timestamps to a series of dump timestamps and assigning each sensor event to the dump in which it occurred. When multiple sensor events happen in the same dump, only the last one is kept. The first dump is guaranteed to have a valid value by either using the supplied initial_value or extrapolating the first proper value back in time. The sensor data may be transformed before events that repeat values are potentially discarded. Finally, events with values marked as “greedy” take precedence over normal events when both occur within the same dump (either changing from or to the greedy value, or if the greedy value occurs completely within a dump).

XXX Future improvements include picking the event with the longest duration within a dump as opposed to the final event, and “snapping” event boundaries to dump boundaries with a given tolerance (e.g. 5-10% of dump period).

Parameters:
  • sensor_timestamps (sequence of float, length M) – Sequence of sensor timestamps (typically UTC seconds since Unix epoch)
  • sensor_values (sequence, length M) – Corresponding sequence of sensor values [potentially wrapped]
  • dump_midtimes (sequence of float, length N) – Sequence of dump midtimes (same reference as sensor timestamps)
  • dump_period (float) – Duration of each dump, in seconds
  • transform (callable or None, optional) – Transform [unwrapped] sensor values before fixing initial value, mapping dumps to events and discarding repeats
  • initial_value (object or None, optional) – Sensor value [transformed, unwrapped] to use for dump = 0 up to first proper event (force first proper event to start at dump = 0 by default)
  • greedy_values (sequence or None, optional) – List of [transformed, unwrapped] sensor values considered “greedy”
  • allow_repeats ({False, True}, optional) – If False, discard sensor events that do not change [transformed] value
Returns:

data – Constructed categorical dataset [unwraps any wrapped values]

Return type:

CategoricalData object

katdal.chunkstore module

Base class for accessing a store of chunks (i.e. N-dimensional arrays).

exception katdal.chunkstore.ChunkStoreError

Bases: Exception

“Base class for all standard ChunkStore errors.

exception katdal.chunkstore.StoreUnavailable

Bases: OSError, katdal.chunkstore.ChunkStoreError

Could not access underlying storage medium (offline, auth failed, etc).

exception katdal.chunkstore.ChunkNotFound

Bases: KeyError, katdal.chunkstore.ChunkStoreError

The store was accessible but a chunk with the given name was not found.

exception katdal.chunkstore.BadChunk

Bases: ValueError, katdal.chunkstore.ChunkStoreError

The chunk is malformed, e.g. bad dtype, actual shape differs from requested.

class katdal.chunkstore.PlaceholderChunk(shape, dtype)

Bases: object

Chunk returned to indicate missing data.

katdal.chunkstore.generate_chunks(shape, dtype, max_chunk_size, dims_to_split=None, power_of_two=False, max_dim_elements=None)

Generate dask chunk specification from ndarray parameters.

Parameters:
  • shape (sequence of int) – Array shape
  • dtype (numpy.dtype object or equivalent) – Array data type
  • max_chunk_size (float or int) – Upper limit on chunk size (if allowed by dims_to_split), in bytes
  • dims_to_split (sequence of int, optional) – Indices of dimensions that may be split into chunks (default all dims)
  • power_of_two (bool, optional) – True if chunk size should be rounded down to a power of two (the last chunk size along each dimension will potentially be smaller)
  • max_dim_elements (dict, optional) – Maximum number of elements on each dimension (each key is a dimension index). Dimensions that are not in dims_to_split are ignored.
Returns:

chunks – Dask chunk specification, indicating chunk sizes along each dimension

Return type:

tuple of tuple of int

katdal.chunkstore.npy_header_and_body(chunk)

Prepare a chunk for low-level writing.

Returns the .npy header and a view of the chunk corresponding to that header. The two should be concatenated (as buffer objects) to form a valid .npy file.

This is useful for high-performance code, as it allows a chunk to be encoded as a .npy file more efficiently than saving to a io.BytesIO.

class katdal.chunkstore.ChunkStore(error_map=None)

Bases: object

Base class for accessing a store of chunks (i.e. N-dimensional arrays).

A chunk is a simple (i.e. unit-stride) slice of an N-dimensional array known as its parent array. The array is identified by a string name, while the chunk within the array is identified by a sequence of slice objects which may be used to extract the chunk from the array. The array is a numpy.ndarray object with an associated dtype.

The basic idea is that the chunk store contains multiple arrays addressed by name. The list of available arrays and all array metadata (shape, chunks and dtype) are stored elsewhere. The metadata is used to identify chunks, while the chunk store takes care of storing and retrieving bytestrings of actual chunk data. These are packaged back into NumPy arrays for the user. Each array can only be stored once, with a unique chunking scheme (i.e. different chunking of the same data is disallowed).

The naming scheme for arrays and chunks is reasonably generic but has some restrictions:

  • Names are treated like paths with components and a standard separator

  • The chunk name is formed by appending a string of indices to the array name

  • It is discouraged to have an array name that is a prefix of another name

  • Each chunk store has its own restrictions on valid characters in names: some treat names as URLs while others treat them as filenames. A safe choice for name components should be the valid characters for S3 buckets (also including underscores for non-bucket components):

    VALID_BUCKET = re.compile(r’^[a-z0-9][a-z0-9.-]{2,62}$’)

Parameters:error_map (dict mapping Exception to Exception, optional) – Dict that maps store-specific errors to standard ChunkStore errors
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
get_chunk_or_default(array_name, slices, dtype, default_value=0)

Get chunk from the store but return default value if it is missing.

get_chunk_or_placeholder(array_name, slices, dtype)

Get chunk from the store but return a PlaceholderChunk if it is missing.

create_array(array_name)

Create a new array if it does not already exist.

Parameters:array_name (string) – Identifier of array
Raises:chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
put_chunk_noraise(array_name, slices, chunk)

Put chunk into store but return any exceptions instead of raising.

mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

NAME_SEP = '/'
NAME_INDEX_WIDTH = 5
classmethod join(*names)

Join components of chunk name with supported separator.

classmethod split(name, maxsplit=-1)

Split chunk name into components based on supported separator.

classmethod chunk_id_str(slices)

Chunk identifier in string form (e.g. ‘00012_01024_00000’).

classmethod chunk_metadata(array_name, slices, chunk=None, dtype=None)

Turn array name and chunk identifier into chunk name and shape.

Form the full chunk name from array_name and slices and extract the chunk shape from slices, validating it in the process. If chunk or dtype is given, check that chunk is commensurate with slices and that dtype contains no objects which would cause nasty segfaults.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object, optional) – Actual chunk data as ndarray (used to validate shape / dtype)
  • dtype (numpy.dtype object or equivalent, optional) – Data type of array x (used for validation only)
Returns:

  • chunk_name (string) – Full chunk name used to find chunk in underlying storage medium
  • shape (tuple of int) – Chunk shape tuple associated with slices

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk, or any dtype contains objects
get_dask_array(array_name, chunks, dtype, offset=(), index=(), errors=0)

Get dask array from the store.

Handling of missing chunks is determined by the errors argument.

Parameters:
  • array_name (string) – Identifier of array in chunk store
  • chunks (tuple of tuples of ints) – Chunk specification
  • dtype (numpy.dtype object or equivalent) – Data type of array
  • offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
  • errors (number or 'raise' or 'placeholder', optional) –

    Error handling. If ‘raise’, exceptions are passed through, causing the evaluation to fail.

    If ‘placeholder’, returns instances of PlaceholderChunk in place of missing chunks. Note that such an array cannot be used as-is, because an ndarray is expected, but it can be used as raw material for building new graphs via functions like da.map_blocks().

    If a numeric value, it is used as a default value.

Returns:

array – Dask array of given dtype

Return type:

dask.array.Array object

put_dask_array(array_name, array, offset=())

Put dask array into the store.

Parameters:
  • array_name (string) – Identifier of array in chunk store
  • array (dask.array.Array object) – Dask input array
  • offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
Returns:

success – Dask array of objects indicating success of transfer of each chunk (None indicates success, otherwise there is an exception object)

Return type:

dask.array.Array object

katdal.chunkstore_dict module

A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.

class katdal.chunkstore_dict.DictChunkStore(**kwargs)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.

This interprets all keyword arguments as NumPy arrays and stores them in an arrays dict. Each array is identified by its corresponding keyword. New arrays cannot be added via put() - they all need to be in place at store initialisation (or can be added afterwards via direct insertion into the arrays dict). The put method is only useful for in-place modification of existing arrays.

get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

Create a new array if it does not already exist.

Parameters:array_name (string) – Identifier of array
Raises:chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store

katdal.chunkstore_npy module

A store of chunks (i.e. N-dimensional arrays) based on NPY files.

class katdal.chunkstore_npy.NpyFileChunkStore(path, direct_write=False)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on NPY files.

Each chunk is stored in a separate binary file in NumPy .npy format. The filename is constructed as

“<path>/<array>/<idx>.npy”

where “<path>” is the chunk store directory specified on construction, “<array>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”).

For a description of the .npy format, see numpy.lib.format or the relevant NumPy Enhancement Proposal here.

Parameters:
  • path (string) – Top-level directory that contains NPY files of chunk store
  • direct_write (bool) – If true, use O_DIRECT when writing the file. This bypasses the OS page cache, which can be useful to avoid filling it up with files that won’t be read again.
Raises:
  • chunkstore.StoreUnavailable – If path does not exist / is not readable
  • chunkstore.StoreUnavailable – If direct_write was requested but is not available
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

See the docstring of ChunkStore.create_array().

put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

katdal.chunkstore_s3 module

A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.

exception katdal.chunkstore_s3.S3ObjectNotFound

Bases: katdal.chunkstore.ChunkNotFound

An object / bucket was not found in S3 object store.

exception katdal.chunkstore_s3.S3ServerGlitch

Bases: katdal.chunkstore.ChunkNotFound

S3 chunk store responded with an HTTP error deemed to be temporary.

katdal.chunkstore_s3.read_array(fp)

Read a numpy array in npy format from a file descriptor.

This is the same concept as numpy.lib.format.read_array(), but optimised for the case of reading from http.client.HTTPResponse. Using the numpy function reads pieces out then copies them into the array, while this implementation uses readinto. Raise TruncatedRead if the response runs out of data before the array is complete.

It does not allow pickled dtypes.

exception katdal.chunkstore_s3.AuthorisationFailed

Bases: katdal.chunkstore.StoreUnavailable

Authorisation failed, e.g. due to invalid, malformed or expired token.

exception katdal.chunkstore_s3.InvalidToken(token, message)

Bases: katdal.chunkstore_s3.AuthorisationFailed

Invalid JSON Web Token (JWT).

katdal.chunkstore_s3.decode_jwt(token)

Decode JSON Web Token (JWT) string and extract claims.

The MeerKAT archive uses JWT bearer tokens for authorisation. Each token is a JSON Web Signature (JWS) string with a payload of claims. This function extracts the claims as a dict, while also doing basic checks on the token (mostly to catch copy-n-paste errors). The signature is decoded but not validated, since that would require the server secrets.

Parameters:token (str) – JWS Compact Serialization as an ASCII string (native string, not bytes)
Returns:claims – The JWT Claims Set as a dict of key-value pairs
Return type:dict
Raises:InvalidToken – If the token is malformed or truncated, or has expired
class katdal.chunkstore_s3.S3ChunkStore(url, timeout=(30, 300), retries=2, token=None, credentials=None, public_read=False, expiry_days=0, **kwargs)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.

This object encapsulates the S3 client / session and its underlying connection pool, which allows subsequent get and put calls to share the connections.

The full identifier of each chunk (the “chunk name”) is given by

“<bucket>/<path>/<idx>”

where “<bucket>” refers to the relevant S3 bucket, “<bucket>/<path>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”). The corresponding S3 key string of a chunk is “<path>/<idx>.npy” which reflects the fact that the chunk is stored as a string representation of an NPY file (complete with header).

Parameters:
  • url (str) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’. It can be specified as either bytes or unicode, and is converted to the native string type with UTF-8. The URL may also contain a path if this store is relative to an existing bucket, in which case the chunk name is a relative path (useful for unit tests).
  • timeout (float or None or tuple of 2 floats or None's, optional) – Connect / read timeout, in seconds, either a single value for both or custom values as (connect, read) tuple. None means “wait forever”…
  • retries (int or tuple of 2 ints or urllib3.util.retry.Retry, optional) – Number of connect / read retries, either a single value for both or custom values as (connect, read) tuple, or a Retry object for full customisation (including status retries).
  • token (str, optional) – Bearer token to authenticate
  • credentials (tuple of str, optional) – AWS access key and secret key to authenticate
  • public_read (bool, optional) – If set to true, new buckets will be created with a policy that allows everyone (including unauthenticated users) to read the data.
  • expiry_days (int, optional) – If set to a value greater than 0 will set a future expiry time in days for any new buckets created.
  • kwargs (dict) – Extra keyword arguments (unused)
Raises:

chunkstore.StoreUnavailable – If S3 server interaction failed (it’s down, no authentication, etc)

request(method, url, process=<function S3ChunkStore.<lambda>>, chunk_name='', ignored_errors=(), timeout=(), retries=None, **kwargs)

Send HTTP request to S3 server, process response and retry if needed.

This retries temporary HTTP errors, including reset connections while processing a successful response.

Parameters:
  • url (method,) – The standard required parameters of requests.Session.request()
  • process (function, signature result = process(response), optional) – Function that will process response (just return response by default)
  • chunk_name (str, optional) – Name of chunk, used for error reporting only
  • ignored_errors (collection of int, optional) – HTTP status codes that are treated like 200 OK, not raising an error
  • timeout (float or None or tuple of 2 floats or None's, optional) – Override timeout for this request (use the store timeout by default)
  • retries (int or tuple of 2 ints or urllib3.util.retry.Retry, optional) – Override retries for this request (use the store retries by default)
  • kwargs (optional) – These are passed on to requests.Session.request()
Returns:

result – The output of the process function applied to a successful response

Return type:

object

Raises:
  • AuthorisationFailed – If the request is not authorised by appropriate token or credentials
  • S3ObjectNotFound – If S3 object request fails because it does not exist
  • S3ServerGlitch – If S3 object request fails because server is temporarily overloaded
  • StoreUnavailable – If a general HTTP error occurred that is not ignored
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

See the docstring of ChunkStore.create_array().

put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

katdal.concatdata module

Class for concatenating visibility data sets.

exception katdal.concatdata.ConcatenationError

Bases: Exception

Sequence of objects could not be concatenated due to incompatibility.

class katdal.concatdata.ConcatenatedLazyIndexer(indexers, transforms=None)

Bases: katdal.lazy_indexer.LazyIndexer

Two-stage deferred indexer that concatenates multiple indexers.

This indexer concatenates a sequence of indexers along the first (i.e. time) axis. The index specification is broken down into chunks along this axis, sent to the applicable underlying indexers and the returned data are concatenated again before returning it.

Parameters:
  • indexers (sequence of LazyIndexer objects and/or arrays) – Sequence of indexers or raw arrays to be concatenated
  • transforms (list of LazyTransform objects or None, optional) – Extra chain of transforms to be applied to data after final indexing
name

Name of first non-empty indexer (or empty string otherwise)

Type:string
Raises:InvalidTransform – If transform chain does not obey restrictions on changing the data shape
katdal.concatdata.common_dtype(sensor_data_sequence)

The dtype suitable to store all sensor data values in the given sequence.

This extracts the dtypes of a sequence of sensor data objects and finds the minimal dtype to which all of them may be safely cast using NumPy type promotion rules (which will typically be the dtype of a concatenation of the values).

Parameters:sensor_data_sequence (sequence of extracted sensor data objects) – These objects may include numpy.ndarray and CategoricalData
Returns:dtype – The promoted dtype of the sequence, or None if sensor_data_sequence is empty
Return type:numpy.dtype object
class katdal.concatdata.ConcatenatedSensorGetter(data)

Bases: katdal.sensordata.SensorGetter

The concatenation of multiple raw (uncached) sensor data sets.

This is a convenient container for returning raw (uncached) sensor data sets from a ConcatenatedSensorCache object. It only accesses the underlying data sets when explicitly asked to via the get() interface, but provides quick access to metadata such as sensor name.

Parameters:data (sequence of SensorGetter) – Uncached sensor data
get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.concatdata.ConcatenatedSensorCache(caches, keep=None)

Bases: katdal.sensordata.SensorCache

Sensor cache that is a concatenation of multiple underlying caches.

This concatenates a sequence of sensor caches along the time axis and makes them appear like a single sensor cache. The combined cache contains a superset of all actual and virtual sensors found in the underlying caches and replaces any missing sensor data with dummy values.

Parameters:
  • caches (sequence of SensorCache objects) – Sequence of underlying caches to be concatenated
  • keep (sequence of bool, optional) – Default (global) time selection specification as boolean mask that will be applied to sensor data (this can be disabled on data retrieval)
get(name, select=False, extract=True, **kwargs)

Sensor values interpolated to correlator data timestamps.

Retrieve raw (uncached) or cached sensor data from each underlying cache and concatenate the results along the time axis.

Parameters:
  • name (string) – Sensor name
  • select ({False, True}, optional) – True if preset time selection will be applied to returned data
  • extract ({True, False}, optional) – True if sensor data should be extracted from store and cached
  • kwargs (dict, optional) – Additional parameters are passed to underlying sensor caches
Returns:

data – If extraction is disabled, this will be a SensorGetter object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as the timestamps attribute) for numerical data, and a CategoricalData object for categorical data.

Return type:

array or CategoricalData or SensorGetter object

Raises:

KeyError – If sensor name was not found in cache and did not match virtual template

class katdal.concatdata.ConcatenatedDataSet(datasets)

Bases: katdal.dataset.DataSet

Class that concatenates existing visibility data sets.

This provides a single DataSet interface to a list of concatenated data sets. Where possible, identical targets, subarrays, spectral windows and observation sensors are merged. For more information on attributes, see the DataSet docstring.

Parameters:datasets (sequence of DataSet objects) – List of existing data sets
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.dataset module

Base class for accessing a visibility data set.

exception katdal.dataset.WrongVersion

Bases: Exception

Trying to access data using accessor class with the wrong version.

exception katdal.dataset.BrokenFile

Bases: Exception

Data set could not be loaded because file is inconsistent or misses critical bits.

class katdal.dataset.Subarray(ants, corr_products)

Bases: object

Subarray specification.

A subarray is determined by the specific correlation products produced by the correlator and the antenna objects associated with the inputs found in the correlation products.

Parameters:
  • ants (sequence of katpoint.Antenna objects) – List of antenna objects, culled to contain only antennas found in corr_products
  • corr_products (sequence of (string, string) pairs, length B) – Correlation products as pairs of input labels, e.g. (‘ant1h’, ‘ant2v’), exposed as an array of strings with shape (B, 2)
inputs

List of correlator input labels found in corr_products, e.g. ‘ant1h’

Type:list of strings
katdal.dataset.parse_url_or_path(url_or_path)

Parse URL into components, converting path to absolute file URL.

Parameters:url_or_path (string) – URL, or filesystem path if there is no scheme
Returns:url_parts – Components of the parsed URL (‘file’ scheme will have an absolute path)
Return type:urllib.parse.ParseResult
class katdal.dataset.DataSet(name, ref_ant='', time_offset=0.0, url='')

Bases: object

Base class for accessing a visibility data set.

This provides a simple interface to a generic file (or files) containing visibility data (both single-dish and interferometer data supported). The data are not loaded into memory on opening the file, but are accessible via properties after typically selecting a subset of the data. This allows the reading of huge files.

Parameters:
  • name (string) – Name / identifier of data set
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use by script)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • url (string, optional) – Location of data set (either local filename or full URL accepted)
version

Format version string

Type:string
observer

Name of person that recorded the data set

Type:string
description

Short description of the purpose of the data set

Type:string
experiment_id

Experiment ID, a unique string used to link the data files of an experiment together with blog entries, etc.

Type:string
obs_params

Observation parameters, typically set in observation script

Type:dict mapping string to string or list of strings
obs_script_log

Observation script output log (useful for debugging)

Type:list of strings
subarrays

List of all subarrays in data set

Type:list of SubArray objects
subarray

Index of currently selected subarray

Type:int
ants

List of selected antennas

Type:list of katpoint.Antenna objects
inputs

List of selected correlator input labels (‘ant1h’)

Type:array of strings
corr_products

Array of selected correlation products as pairs of input labels (e.g. [(‘ant1h’, ‘ant1h’), (‘ant1h’, ‘ant2h’)])

Type:array of strings, shape (B, 2)
receivers

Identifier of the active receiver on each antenna

Type:dict mapping string to string or list of strings
spectral_windows

List of all spectral windows in data set

Type:list of SpectralWindow objects
spw

Index of currently selected spectral window

Type:int
channel_width

Channel bandwidth of selected spectral window, in Hz

Type:float
freqs / channel_freqs

Centre frequency of each selected channel, in Hz

Type:array of float, shape (F,)
channels

Original channel indices of selected channels

Type:array of int, shape (F,)
dump_period

Dump period, in seconds

Type:float
sensor

Sensor cache

Type:SensorCache object
catalogue

Catalogue of all targets / sources / fields in data set

Type:katpoint.Catalogue object
start_time

Timestamp of start of first sample in file, in UT seconds since Unix epoch

Type:katpoint.Timestamp object
end_time

Timestamp of end of last sample in file, in UT seconds since Unix epoch

Type:katpoint.Timestamp object
dumps

Original dump indices of selected dumps

Type:array of int, shape (T,)
scan_indices

List of currently selected scans as indices

Type:list of int
compscan_indices

List of currently selected compound scans as indices

Type:list of int
target_indices

List of currently selected targets as indices into catalogue

Type:list of int
target_projection

Type of spherical projection for target coordinates

Type:{‘ARC’, ‘SIN’, ‘TAN’, ‘STG’, ‘CAR’}, optional
target_coordsys

Spherical pointing coordinate system for target coordinates

Type:{‘azel’, ‘radec’}, optional
shape

Shape of selected visibility data array, as (T, F, B)

Type:tuple of 3 ints
size

Size of selected visibility data array, in bytes

Type:int
applycal_products

List of calibration products that will be applied to data

Type:list of string
select(**kwargs)

Select subset of data, based on time / frequency / corrprod filters.

This applies a set of selection criteria to the data set, which updates the data set properties and attributes to match the selection. In other words, the timestamps() and vis() methods will return the selected subset of the data, while attributes such as ants, channel_freqs and shape are updated. The sensor cache will also return the selected subset of sensor data via the __getitem__ interface. This function returns nothing, but modifies the existing data set in-place.

The selection criteria are divided into groups, based on whether they affect the time, frequency or correlation product dimension:

* Time: `dumps`, `timerange`, `scans`, `compscans`, `targets`
* Frequency: `channels`, `freqrange`
* Correlation product: `corrprods`, `ants`, `inputs`, `pol`

The subarray and spw criteria are special, as they affect multiple dimensions (time + correlation product and time + frequency, respectively), are always active and are forced to be a single index.

If there are multiple criteria on the same dimension within a select() call, they are ANDed together, while multiple items within the same criterion (e.g. targets=[‘Hyd A’, ‘Vir A’]) are ORed together. When a second select() call is done, all new selections replace previous selections on the same dimension, while existing selections on other dimensions are preserved. The reset parameter finetunes this behaviour.

If select() is called without any parameters the selection is reset to the original data set.

In addition, the weights and flags criteria are lists of names that select which weights and flags to include in the corresponding data set property.

Parameters:
  • strict ({True, False}, optional) – True if select() raises TypeError if it encounters an unknown kwarg
  • dumps (int or slice or sequence of ints or sequence of bools, optional) – Select dumps by index, slice or boolean mask of length T (keep dumps where mask is True)
  • timerange (sequence of 2 katpoint.Timestamp objects) – or equivalent, optional Select range of times between given start and end times
  • scans (int or string or sequence, optional) – Select scans by index or state (or negate state by prepending ‘~’)
  • compscans (int or string or sequence, optional) – Select compscans by index or label (or negate label by prepending ‘~’)
  • targets (int or string or katpoint.Target object or sequence,) – optional Select targets by index or name or description or object
  • spw (int, optional) – Select spectral window by index (only one may be active)
  • channels (int or slice or sequence of ints or sequence of bools, optional) – Select frequency channels by index, slice or boolean mask of length F (keep channels where mask is True)
  • freqrange (sequence of 2 floats, optional) – Select range of frequencies between start and end frequencies, in Hz
  • subarray (int, optional) – Select subarray by index (only one may be active)
  • corrprods (int or slice or sequence of ints or sequence of bools or) – sequence of string pairs or {‘auto’, ‘cross’}, optional Select correlation products by index, slice or boolean mask of length B (keep products where mask is True). Alternatively, select by value via a sequence of string pairs, or select all autocorrelations via ‘auto’ or all cross-correlations via ‘cross’.
  • ants (string or katpoint.Antenna object or sequence, optional) – Select antennas by name or object. If all antennas specified are prefaced by a ~ this is treated as a deselection and these antennas are excluded.
  • inputs (string or sequence of strings, optional) – Select inputs by label
  • pol (string or sequence of strings) – {‘H’, ‘V’, ‘HH’, ‘VV’, ‘HV’, ‘VH’}, optional Select polarisation terms
  • weights ('all' or string or sequence of strings, optional) – List of names of weights to be multiplied together, as a sequence or string of comma-separated names (combine all weights by default)
  • flags ('all' or string or sequence of strings, optional) – List of names of flags that will be OR’ed together, as a sequence or string of comma-separated names (use all flags by default). An empty string or sequence discards all flags.
  • reset ({'auto', '', 'T', 'F', 'B', 'TF', 'TB', 'FB', 'TFB'}, optional) – Remove existing selections on specified dimensions before applying the new selections. The default ‘auto’ option clears those dimensions that will be modified by the new selections and leaves the selections on unaffected dimensions intact except if select is called without any parameters, in which case all selections are cleared. By setting reset to ‘’, new selections apply on top of existing selections.
Raises:
  • TypeError – If a keyword argument is unknown and strict is enabled
  • IndexError – If spw or subarray is out of range
scans()

Generator that iterates through scans in data set.

This iterates through the currently selected list of scans, returning the scan index, scan state and associated target object. In addition, after each iteration the data set will reflect the scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current scan. The scan selection applies on top of any existing selection.

Yields:
  • scan (int) – Scan index
  • state (string) – Scan state
  • target (katpoint.Target object) – Target associated with scan
compscans()

Generator that iterates through compound scans in data set.

This iterates through the currently selected list of compound scans, returning the compound scan index, label and the first associated target object. In addition, after each iteration the data set will reflect the compound scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current compound scan. The compound scan selection applies on top of any existing selection.

Yields:
  • compscan (int) – Compound scan index
  • label (string) – Compound scan label
  • target (katpoint.Target object) – First target associated with compound scan
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and corrprod.

The visibility data are returned as an array of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

flags

Visibility flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

mjd

Visibility timestamps in Modified Julian Days (MJD).

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

lst

Local sidereal time at the reference antenna in hours.

The sidereal times are returned in an array of float, shape (T,).

az

Azimuth angle of each dish in degrees.

The azimuth angles are returned in an array of float, shape (T, A).

el

Elevation angle of each dish in degrees.

The elevation angles are returned in an array of float, shape (T, A).

ra

Right ascension of the actual pointing of each dish in J2000 degrees.

The right ascensions are returned in an array of float, shape (T, A).

dec

Declination of the actual pointing of each dish in J2000 degrees.

The declinations are returned in an array of float, shape (T, A).

parangle

Parallactic angle of the actual pointing of each dish in degrees.

The parallactic angle is the position angle of the observer’s vertical on the sky, measured from north toward east. This is the angle between the great-circle arc connecting the celestial North pole to the dish pointing direction, and the great-circle arc connecting the zenith above the antenna to the pointing direction, or the angle between the hour circle and vertical circle through the pointing direction. It is returned as an array of float, shape (T, A).

target_x

Target x coordinate of each dish in degrees.

The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the target_projection and target_coordsys attributes, respectively. The target x coordinates are returned as an array of float, shape (T, A).

target_y

Target y coordinate of each dish in degrees.

The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the target_projection and target_coordsys attributes, respectively. The target y coordinates are returned as an array of float, shape (T, A).

u

U coordinate for each correlation product in metres.

This calculates the u coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(u_1 - u_2\) for baseline (ant1, ant2).

v

V coordinate for each correlation product in metres.

This calculates the v coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(v_1 - v_2\) for baseline (ant1, ant2).

w

W coordinate for each correlation product in metres.

This calculates the w coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B).The sign convention is \(w_1 - w_2\) for baseline (ant1, ant2).

katdal.datasources module

Various sources of correlator data and metadata.

exception katdal.datasources.DataSourceNotFound

Bases: Exception

File associated with DataSource not found or server not responding.

class katdal.datasources.AttrsSensors(attrs, sensors)

Bases: object

Metadata in the form of attributes and sensors.

Parameters:
  • attrs (mapping from string to object) – Metadata attributes
  • sensors (mapping from string to SensorGetter objects) – Metadata sensor cache mapping sensor names to raw sensor data
class katdal.datasources.DataSource(metadata, timestamps, data=None)

Bases: object

A generic data source presenting both correlator data and metadata.

Parameters:
  • metadata (AttrsSensors object) – Metadata attributes and sensors
  • timestamps (array-like of float, length T) – Timestamps at centroids of visibilities in UTC seconds since Unix epoch
  • data (VisFlagsWeights object, optional) – Correlator data (visibilities, flags and weights)
katdal.datasources.view_capture_stream(telstate, capture_block_id, stream_name)

Create telstate view based on given capture block ID and stream name.

It constructs a view on telstate with at least the prefixes

  • <capture_block_id>_<stream_name>
  • <capture_block_id>
  • <stream_name>

Additionally if there is a <stream_name>_inherit key, that stream is added too (recursively).

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Original telescope state
  • capture_block_id (string) – Capture block ID
  • stream_name (string) – Stream name
Returns:

telstate – Telstate with a view that incorporates capture block, stream and combo

Return type:

TelescopeState object

katdal.datasources.view_l0_capture_stream(telstate, capture_block_id=None, stream_name=None, **kwargs)

Create telstate view based on auto-determined capture block ID and stream name.

This figures out the appropriate capture block ID and L0 stream name from a capture-stream specific telstate, or uses the provided ones. It then calls view_capture_capture() to generate a view.

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Original telescope state
  • capture_block_id (string, optional) – Specify capture block ID explicitly (detected otherwise)
  • stream_name (string, optional) – Specify L0 stream name explicitly (detected otherwise)
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns:

  • telstate (TelstateToStr object) – Telstate with a view that incorporates capture block, stream and combo
  • capture_block_id (string) – Actual capture block ID used
  • stream_name (string) – Actual L0 stream name used

Raises:

ValueError – If no capture block or L0 stream could be detected (with no override)

katdal.datasources.infer_chunk_store(url_parts, telstate, npy_store_path=None, s3_endpoint_url=None, array='correlator_data', **kwargs)

Construct chunk store automatically from dataset URL and telstate.

Parameters:
  • url_parts (urlparse.ParseResult object) – Parsed dataset URL
  • telstate (TelstateToStr object) – Telescope state
  • npy_store_path (string, optional) – Top-level directory of NpyFileChunkStore (overrides the default)
  • s3_endpoint_url (string, optional) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’ (overrides default)
  • array (string, optional) – Array within the bucket from which to determine the prefix
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns:

store – Chunk store for visibility data

Return type:

katdal.ChunkStore object

Raises:
class katdal.datasources.TelstateDataSource(telstate, capture_block_id, stream_name, chunk_store=None, timestamps=None, url='', upgrade_flags=True, van_vleck='off', preselect=None, **kwargs)

Bases: katdal.datasources.DataSource

A data source based on katsdptelstate.TelescopeState.

It is assumed that the provided telstate already has the appropriate views to find observation, stream and chunk store information. It typically needs the following prefixes:

  • <capture block ID>_<L0 stream>
  • <capture block ID>
  • <L0 stream>
Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Telescope state with appropriate views
  • capture_block_id (string) – Capture block ID
  • stream_name (string) – Name of the L0 stream
  • chunk_store (katdal.ChunkStore object, optional) – Chunk store for visibility data (the default is no data - metadata only)
  • timestamps (array of float, optional) – Visibility timestamps, overriding (or fixing) the ones found in telstate
  • url (string, optional) – Location of the telstate source
  • upgrade_flags (bool, optional) – Look for associated flag streams and use them if True (default)
  • van_vleck ({'off', 'autocorr'}, optional) – Type of Van Vleck (quantisation) correction to perform
  • preselect (dict, optional) –

    Subset of data to select. The keys in the dictionary correspond to the keyword arguments of DataSet.select(), but with restrictions:

    • Only channels and dumps can be specified.
    • The values must be slices with unit step.
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Raises:
  • KeyError – If telstate lacks critical keys
  • IndexError – If preselect does not meet the criteria above.
classmethod from_url(url, chunk_store='auto', **kwargs)

Construct TelstateDataSource from URL or RDB filename.

The following URL styles are supported:

Parameters:
  • url (string) – URL or RDB filename serving as entry point to data set
  • chunk_store (katdal.ChunkStore object, optional) – Chunk store for visibility data (obtained automatically by default, or set to None for metadata-only data set)
  • kwargs (dict, optional) – Extra keyword arguments passed to init, telstate view, chunk store init
katdal.datasources.open_data_source(url, **kwargs)

Construct the data source described by the given URL.

katdal.flags module

Definitions of flag bits

katdal.h5datav1 module

Data accessor class for HDF5 files produced by Fringe Finder correlator.

class katdal.h5datav1.H5DataV1(filename, ref_ant='', time_offset=0.0, mode='r', **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 1 file produced by Fringe Finder correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.h5datav2 module

Data accessor class for HDF5 files produced by KAT-7 correlator.

katdal.h5datav2.get_single_value(group, name)

Return single value from attribute or dataset with given name in group.

If name is an attribute of the HDF5 group group, it is returned, otherwise it is interpreted as an HDF5 dataset of group and the last value of name is returned. This is meant to retrieve static configuration values that potentially get set more than once during capture initialisation, but then does not change during actual capturing.

Parameters:
  • group (h5py.Group object) – HDF5 group to query
  • name (string) – Name of HDF5 attribute or dataset to query
Returns:

value – Attribute or last value of dataset

Return type:

object

katdal.h5datav2.dummy_dataset(name, shape, dtype, value)

Dummy HDF5 dataset containing a single value.

This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.

Parameters:
  • name (string) – Name of dataset
  • shape (sequence of int) – Shape of dataset
  • dtype (numpy.dtype object or equivalent) – Type of data stored in dataset
  • value (object) – All elements in the dataset will equal this value
Returns:

dataset – Dummy HDF5 dataset

Return type:

h5py.Dataset object

class katdal.h5datav2.H5DataV2(filename, ref_ant='', time_offset=0.0, mode='r', quicklook=False, keepdims=False, **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 2 file produced by KAT-7 correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • quicklook ({False, True}) – True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
  • keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.h5datav3 module

Data accessor class for HDF5 files produced by RTS correlator.

katdal.h5datav3.dummy_dataset(name, shape, dtype, value)

Dummy HDF5 dataset containing a single value.

This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.

Parameters:
  • name (string) – Name of dataset
  • shape (sequence of int) – Shape of dataset
  • dtype (numpy.dtype object or equivalent) – Type of data stored in dataset
  • value (object) – All elements in the dataset will equal this value
Returns:

dataset – Dummy HDF5 dataset

Return type:

h5py.Dataset object

class katdal.h5datav3.H5DataV3(filename, ref_ant='', time_offset=0.0, mode='r', time_scale=None, time_origin=None, rotate_bls=False, centre_freq=None, band=None, keepdims=False, **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 3 file produced by RTS correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • time_scale (float or None, optional) – Resynthesise timestamps using this scale factor
  • time_origin (float or None, optional) – Resynthesise timestamps using this sync time / epoch
  • rotate_bls ({False, True}, optional) – Rotate baseline label list to work around early RTS correlator bug
  • centre_freq (float or None, optional) – Override centre frequency if provided, in Hz
  • band (string or None, optional) – Override receiver band if provided (e.g. ‘l’) - used to find ND models
  • keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
stream_name

Name of L0 data stream, for finding corresponding telescope state keys

Type:string

Notes

The timestamps can be resynchronised from the original sample counter values by specifying time_scale and/or time_origin. The basic formula is given by:

timestamp = sample_counter / time_scale + time_origin
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.lazy_indexer module

Two-stage deferred indexer for objects with expensive __getitem__ calls.

katdal.lazy_indexer.dask_getitem(x, indices)

Index a dask array, with N-D fancy index support and better performance.

This is a drop-in replacement for x[indices] that goes one further by implementing “N-D fancy indexing” which is still unsupported in dask. If indices contains multiple fancy indices, perform outer (oindex) indexing. This behaviour deviates from NumPy, which performs the more general (but also more obtuse) vectorized (vindex) indexing in this case. See NumPy NEP 21, dask #433 and h5py #652 for more details.

In addition, this optimises performance by culling unnecessary nodes from the dask graph after indexing, which makes it cheaper to compute if only a small piece of the graph is needed, and by collapsing fancy indices in indices to slices where possible (which also implies oindex semantics).

exception katdal.lazy_indexer.InvalidTransform

Bases: Exception

Transform changes data shape in unallowed way.

class katdal.lazy_indexer.LazyTransform(name=None, transform=<function LazyTransform.<lambda>>, new_shape=<function LazyTransform.<lambda>>, dtype=None)

Bases: object

Transformation to be applied by LazyIndexer after final indexing.

A LazyIndexer potentially applies a chain of transforms to the data after the final second-stage indexing is done. These transforms are restricted in their capabilities to simplify the indexing process. Specifically, when it comes to the data shape, transforms may only:

- add dimensions at the end of the data shape, or
- drop dimensions at the end of the data shape.

The preserved dimensions are not allowed to change their shape or interpretation so that the second-stage indexing matches the first-stage indexing on these dimensions. The data type (aka dtype) is allowed to change.

Parameters:
  • name (string or None, optional) – Name of transform
  • transform (function, signature data = f(data, keep), optional) – Transform to apply to data (keep is user-specified second-stage index)
  • new_shape (function, signature new_shape = f(old_shape), optional) – Function that predicts data array shape tuple after first-stage indexing and transformation, given its original shape tuple as input. Restrictions apply as described above.
  • dtype (numpy.dtype object or equivalent or None, optional) – Type of output array after transformation (None if same as input array)
class katdal.lazy_indexer.LazyIndexer(dataset, keep=slice(None, None, None), transforms=None)

Bases: object

Two-stage deferred indexer for objects with expensive __getitem__ calls.

This class was originally designed to extend and speed up the indexing functionality of HDF5 datasets as found in h5py, but works on any equivalent object (defined as any object with shape, dtype and __getitem__ members) where a call to __getitem__ may be very expensive. The following discussion focuses on the HDF5 use case as the main example.

Direct extraction of a subset of an HDF5 dataset via the __getitem__ interface (i.e. dataset[index]) has a few issues:

  1. Data access can be very slow (or impossible) if a very large dataset is fully loaded into memory and then indexed again at a later stage
  2. Advanced indexing (via boolean masks or sequences of integer indices) is only supported on a single dimension in the current version of h5py (2.0)
  3. Even though advanced indexing has limited support, simple indexing (via single integer indices or slices) is frequently much faster.

This class wraps an h5py.Dataset or equivalent object and exposes a new __getitem__ interface on it. It efficiently composes two stages of indexing: a first stage specified at object instantiation time and a second stage that applies on top of the first stage when __getitem__ is called on this object. The data are only loaded after the combined index is determined, addressing issue 1.

Furthermore, advanced indexing is allowed on any dimension by decomposing the selection as a series of slice selections covering contiguous segments of the dimension to alleviate issue 2. Finally, this also allows faster data retrieval by extracting a large slice from the HDF5 dataset and then performing advanced indexing on the resulting numpy.ndarray object instead, in response to issue 3.

The keep parameter of the __init__() and __getitem__() methods accepts a generic index or slice specification, i.e. anything that would be accepted by the __getitem__() method of a numpy.ndarray of the same shape as the dataset. This could be a single integer index, a sequence of integer indices, a slice object (representing the colon operator commonly used with __getitem__, e.g. representing x[1:10:2] as x[slice(1,10,2)]), a sequence of booleans as a mask, or a tuple containing any number of these (typically one index item per dataset dimension). Any missing dimensions will be fully selected, and any extra dimensions will be ignored.

Parameters:
  • dataset (h5py.Dataset object or equivalent) – Underlying dataset or array object on which lazy indexing will be done. This can be any object with shape, dtype and __getitem__ members.
  • keep (NumPy index expression, optional) – First-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
  • transforms (list of LazyTransform objects or None, optional) – Chain of transforms to be applied to data after final indexing. The chain as a whole may only add or drop dimensions at the end of data shape without changing the preserved dimensions.
name

Name of HDF5 dataset (or empty string for unnamed ndarrays, etc.)

Type:string
Raises:InvalidTransform – If transform chain does not obey restrictions on changing the data shape
shape

].shape``.

Type:Shape of data array after first-stage indexing and transformation, i.e. ``self[
dtype

].dtype``.

Type:Type of data array after transformation, i.e. ``self[
class katdal.lazy_indexer.DaskLazyIndexer(dataset, keep=(), transforms=())

Bases: object

Turn a dask Array into a LazyIndexer by computing it upon indexing.

The LazyIndexer wraps an underlying dataset in the form of a dask Array. Upon first use, it applies a stored first-stage selection (keep) to the array, followed by a series of transforms. All of these actions are lazy and only update the dask graph of the dataset. Since these updates are computed only on first use, there is minimal cost in constructing an instance and immediately throwing it away again.

Second-stage selection occurs via a __getitem__() call on this object, which also triggers dask computation to return the final numpy.ndarray output. Both selection steps follow outer indexing (“oindex”) semantics, by indexing each dimension / axis separately.

DaskLazyIndexers can also index other DaskLazyIndexers, which allows them to share first-stage selections and/or transforms, and to construct nested or hierarchical indexers.

Parameters:
  • dataset (dask.Array or DaskLazyIndexer) – The full dataset, from which a subset is chosen by keep
  • keep (NumPy index expression, optional) – Index expression describing first-stage selection (e.g. as applied by katdal.DataSet.select()), with oindex semantics
  • transforms (sequence of function, signature array = f(array), optional) – Transformations that are applied after indexing by keep but before indexing on this object. Each transformation is a callable that takes a dask array and returns another dask array.
name

The name of the (full) underlying dataset, useful for reporting

Type:str
dataset

The dask array that is accessed by indexing (after applying keep and transforms). It can be used directly to perform dask computations.

Type:dask.Array
transforms

Transformations that are applied after first-stage indexing.

dataset

Array after first-stage indexing and transformation.

classmethod get(arrays, keep, out=None)

Extract several arrays from the underlying dataset.

This is a variant of __getitem__() that pulls from several arrays jointly. This can be significantly more efficient if intermediate dask nodes can be shared.

Parameters:
  • arrays (list of DaskLazyIndexer) – Arrays to index
  • keep (NumPy index expression) – Second-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
  • out (list of np.ndarray) – If specified, output arrays in which to store results. It must be the same length as arrays and each array must have the appropriate shape and dtype.
Returns:

out – Extracted output array (computed from the final dask version)

Return type:

sequence of numpy.ndarray

shape

Shape of array after first-stage indexing and transformation.

dtype

Data type of array after first-stage indexing and transformation.

katdal.ms_async module

katdal.ms_extra module

katdal.sensordata module

Container that stores cached (interpolated) and uncached (raw) sensor data.

class katdal.sensordata.SensorData(name, timestamp, value, status=None)

Bases: object

Raw (uninterpolated) sensor values.

This is a simple struct that holds timestamps, values, and optionally status.

Parameters:
  • name (string) – Sensor name
  • timestamp (np.ndarray) – Array of timestamps
  • value (np.ndarray) – Array of values (wrapped in ComparableArrayWrapper if necessary)
  • status (np.ndarray, optional) – Array of sensor statuses
class katdal.sensordata.SensorGetter(name)

Bases: object

Raw (uninterpolated) sensor data placeholder.

This is an abstract lazy interface that provides a SensorData object on request but does not store values itself. Subclasses must implement get() to retrieve values from underlying storage. They should not cache the results.

Where possible, object-valued sensors (including sensors with ndarrays as values) will have values wrapped by ComparableArrayWrapper.

Parameters:name (string) – Sensor name
get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.sensordata.SimpleSensorGetter(name, timestamp, value, status=None)

Bases: katdal.sensordata.SensorGetter

Raw sensor data held in memory.

This is a simple wrapper for SensorData that implements the SensorGetter interface.

get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.sensordata.RecordSensorGetter(data, name=None)

Bases: katdal.sensordata.SensorGetter

Raw (uninterpolated) sensor data in record array form.

This is a wrapper for uninterpolated sensor data which resembles a record array with fields ‘timestamp’, ‘value’ and optionally ‘status’. This is also the typical format of HDF5 datasets used to store sensor data.

Technically, the data is interpreted as a NumPy “structured” array, which is a simpler version of a recarray that only provides item-style access to fields and not attribute-style access.

Object-valued sensors are not treated specially in this class, as it is assumed that any wrapping already occurred in the construction of the recarray-like data input and will be reflected in its dtype. The original HDF5 sensor datasets also did not contain any objects as they only support standard KATCP types, so there was no need for wrapping there.

Parameters:
  • data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as an h5py.Dataset)
  • name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
get()

Extract timestamp, value and status of each sensor data point.

Values are passed through to_str().

katdal.sensordata.to_str(value)

Convert string-likes to the native string type.

Bytes are decoded to str, with surrogateencoding error handler.

Tuples, lists, dicts and numpy arrays are processed recursively, with the exception that numpy structured types with string or object fields won’t be handled.

katdal.sensordata.telstate_decode(raw, no_decode=())

Load a katsdptelstate-encoded value that might be wrapped in np.void or np.ndarray.

The np.void/np.ndarray wrapping is needed to pass variable-length binary strings through h5py.

If the value is a string and is in no_decode, it is returned verbatim. This is for backwards compatibility with older files that didn’t use any encoding at all.

The return value is also passed through to_str().

class katdal.sensordata.H5TelstateSensorGetter(data, name=None)

Bases: katdal.sensordata.RecordSensorGetter

Raw (uninterpolated) sensor data in HDF5 TelescopeState recarray form.

This wraps the telstate sensors stored in recent HDF5 files. It differs in two ways from the normal HDF5 sensors: no ‘status’ field and values encoded by katsdptelstate.

TODO: This is a temporary fix to get at missing sensors in telstate and should be replaced by a proper wrapping of any telstate object.

Object-valued sensors (including sensors with ndarrays as values) will have its values wrapped by ComparableArrayWrapper.

Parameters:
  • data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as an h5py.Dataset)
  • name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
get()

Extract timestamp and value of each sensor data point.

class katdal.sensordata.TelstateToStr(telstate)

Bases: object

Wrap an existing telescope state and pass return values through to_str()

wrapped
view(name, add_separator=True, exclusive=False)
root()
get_message(channel=None)
get(key, default=None, return_encoded=False)
get_range(key, st=None, et=None, include_previous=None, include_end=False, return_encoded=False)
get_indexed(key, sub_key, default=None, return_encoded=False)
class katdal.sensordata.TelstateSensorGetter(telstate, name)

Bases: katdal.sensordata.SensorGetter

Raw (uninterpolated) sensor data stored in original TelescopeState.

This wraps sensor data stored in a TelescopeState object. The data is only read out on item access.

Object-valued sensors (including sensors with ndarrays as values) will have their values wrapped by ComparableArrayWrapper.

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Telescope state object
  • name (string) – Sensor name, also used as telstate key
Raises:

KeyError – If sensor name is not found in telstate or it is an attribute instead

get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
katdal.sensordata.get_sensor_from_katstore(store, name, start_time, end_time)

Get raw sensor data from katstore (CAM’s central sensor database).

Parameters:
  • store (string) – Hostname / endpoint of katstore webserver speaking katstore64 API
  • name (string) – Sensor name (the normalised / escaped version with underscores)
  • end_time (start_time,) – Time range for sensor records as UTC seconds since Unix epoch
Returns:

data – Retrieved sensor data with ‘timestamp’, ‘value’ and ‘status’ fields

Return type:

RecordSensorGetter object

Raises:
  • ConnectionError – If this cannot connect to the katstore server
  • RuntimeError – If connection succeeded but interaction with katstore64 API failed
  • KeyError – If the sensor was not found in the store or it has no data in time range
katdal.sensordata.dummy_sensor_getter(name, value=None, dtype=<class 'numpy.float64'>, timestamp=0.0)

Create a SensorGetter object with a single default value based on type.

This creates a dummy SimpleSensorGetter object based on a default value or a type, for use when no sensor data are available, but filler data is required (e.g. when concatenating sensors from different datasets and one dataset lacks the sensor). The dummy dataset contains a single data point with the filler value and a configurable timestamp (defaulting to way back). If the filler value is an object it will be wrapped in a ComparableArrayWrapper to match the behaviour of other SensorGetter objects.

Parameters:
  • name (string) – Sensor name
  • value (object, optional) – Filler value (default is None, meaning dtype will be used instead)
  • dtype (numpy.dtype object or equivalent, optional) – Desired sensor data type, used if no explicit value is given
  • timestamp (float, optional) – Time when dummy value occurred (default is way back)
Returns:

data – Dummy sensor data object with ‘timestamp’ and ‘value’ fields

Return type:

SimpleSensorGetter object, shape (1,)

katdal.sensordata.remove_duplicates_and_invalid_values(sensor)

Remove duplicate timestamps and invalid values from sensor data.

This sorts the ‘timestamp’ field of the sensor record array and removes any duplicate values, updating the corresponding ‘value’ and ‘status’ fields as well. If more than one timestamp has the same value, the value and status of the last of these timestamps are selected. If the values differ for the same timestamp, a warning is logged (and the last one is still picked).

In addition, if there is a ‘status’ field, get rid of data with a status other than ‘nominal’, ‘warn’ or ‘error’, which indicates that the sensor could not be read and the corresponding value will therefore be invalid. Afterwards, remove the ‘status’ field from the data as this is the only place it plays a role.

Parameters:sensor (SensorData object, length N) – Raw sensor dataset.
Returns:clean_sensor – Sensor data with duplicate timestamps and invalid values removed (M <= N), and only ‘timestamp’ and ‘value’ attributes left.
Return type:SensorData object, length M
class katdal.sensordata.SensorCache(cache, timestamps, dump_period, keep=slice(None, None, None), props=None, virtual={}, aliases={}, store=None)

Bases: collections.abc.MutableMapping

Container for sensor data providing name lookup, interpolation and caching.

Sensor data is defined as a one-dimensional time series of values. The values may be numerical or non-numerical (categorical), and the timestamps are monotonically increasing but not necessarily regularly spaced.

A sensor cache stores sensor data with dictionary-like lookup based on the sensor name. Since the extraction of sensor data from e.g. HDF5 files may be costly, the data is first represented in uncached (raw) form as SensorGetter objects, which typically wrap the underlying sensor HDF5 datasets. After extraction, the sensor data are stored either as a NumPy array (for numerical data) or as a CategoricalData object (for non-numerical data).

The sensor cache stores a timestamp array (or indexer) onto which the sensor data will be interpolated, together with a boolean selection mask that selects a subset of the interpolated data as the final output. Interpolation is linear for numerical data and zeroth-order for non-numerical data. Both extraction and selection may be enabled or disabled through the appropriate use of the two main interfaces that retrieve sensor data:

  • The __getitem__ interface (i.e. cache[sensor]) presents a simple high-level interface to the end user that always extracts the sensor data and selects the requested subset from it. In addition, the return type is always a NumPy array.
  • The get() interface (i.e. cache.get(sensor)) is an advanced interface for library builders that provides full control of the extraction process via sensor properties. It does not apply selection by default, as this is more convenient for library routines.

In addition, the sensor cache may contain virtual sensors which calculate their values based on the values of other sensors. They are identified by pattern templates that potentially match multiple sensor names.

Parameters:
  • cache (mapping from string to SensorGetter objects) – Initial sensor cache mapping sensor names to raw (uncached) sensor data
  • timestamps (array of float) – Correlator data timestamps onto which sensor values will be interpolated, as UTC seconds since Unix epoch
  • dump_period (float) – Dump period, in seconds
  • keep (int or slice or sequence of int or sequence of bool, optional) – Default time selection specification that will be applied to sensor data (this can be disabled on data retrieval)
  • props (dict, optional) – Default properties that govern how sensor data are interpreted and interpolated (this can be overridden on data retrieval). Can use * as a wildcard anywhere in the key.
  • virtual (dict mapping string to function, optional) – Virtual sensors, specified as a pattern matching the virtual sensor name and a corresponding function that will create the sensor (together with any associated virtual sensors)
  • aliases (dict mapping string to string, optional) – Alternate names for sensors, as a dictionary mapping each alias to the original sensor name suffix. This will create additional sensors with the aliased names and the data of the original sensors.
  • store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
add_aliases(alias, original)

Add alternate names / aliases for sensors.

Search for sensors with names ending in the original suffix and form a corresponding alternate name by replacing original with alias. The new aliased sensors will re-use the data of the original sensors.

Parameters:
  • alias (string) – The new sensor name suffix that replaces original
  • original (string) – Sensors with names that end in this will get aliases
get(name, select=False, extract=True, **kwargs)

Sensor values interpolated to correlator data timestamps.

Time selection is disabled by default, as this is a more advanced data extraction method typically called by library routines that want to operate on the full array of sensor values. For additional allowed parameters when extracting categorical data, see the docstring for sensor_to_categorical().

Parameters:
  • name (string) – Sensor name
  • select ({False, True}, optional) – True if preset time selection will be applied to interpolated data
  • extract ({True, False}, optional) – True if sensor data should be extracted, interpolated and cached
  • categorical ({None, True, False}, optional) – Interpret sensor data as categorical or numerical (by default, data of type float is numerical and of any other type is categorical)
  • kwargs (dict, optional) – Additional parameters are passed to sensor_to_categorical()
Returns:

data – If extraction is disabled, this will be a SensorGetter object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as the timestamps attribute) for numerical data, and a CategoricalData object for categorical data.

Return type:

array or CategoricalData or SensorGetter object

Raises:
  • ValueError – If select=True and extract=False, as select requires interpolation
  • KeyError – If sensor name was not found in cache and did not match virtual template
get_with_fallback(sensor_type, names)

Sensor values interpolated to correlator data timestamps.

Get data for a type of sensor that may have one of several names. Try each name in turn until something works, or crash sensibly.

Parameters:
  • sensor_type (string) – Name of sensor class / type, used for informational purposes only
  • names (sequence of strings) – Sensor names to try until one of them provides data
Returns:

sensor_data – Interpolated sensor data as 1-D array, one value per selected timestamp

Return type:

array

Raises:

KeyError – If none of the sensor names were found in the cache

katdal.spectral_window module

class katdal.spectral_window.SpectralWindow(centre_freq, channel_width, num_chans, product=None, sideband=-1, band='L', bandwidth=None)

Bases: object

Spectral window specification.

A spectral window is determined by the number of frequency channels produced by the correlator and their corresponding centre frequencies, as well as the channel width. The channels are assumed to be regularly spaced and to be the result of either lower-sideband downconversion (channel frequencies decreasing with channel index) or upper-sideband downconversion (frequencies increasing with index). For further information the receiver band and correlator product names are also available.

Warning

Instances should be treated as immutable. Changing the attributes will lead to inconsistencies between them.

Parameters:
  • centre_freq (float) – Centre frequency of spectral window, in Hz
  • channel_width (float) – Bandwidth of each frequency channel, in Hz
  • num_chans (int) – Number of frequency channels
  • product (string, optional) – Name of data product / correlator mode
  • sideband ({-1, +1}, optional) – Type of downconversion (-1 => lower sideband, +1 => upper sideband)
  • band ({'L', 'UHF', 'S', 'X', 'Ku'}, optional) – Name of receiver / band
  • bandwidth (float, optional) – The bandwidth of the whole spectral window, in Hz. If specified, channel_width is ignored and computed from the bandwidth. If not specified, bandwidth is computed from the channel width. Specifying this is a good idea if the channel width cannot be exactly represented in floating point.
channel_freqs

Centre frequency of each frequency channel (assuming LSB mixing), in Hz

Type:array of float, shape (F,)
channel_freqs
subrange(first, last)

Get a new SpectralWindow representing a subset of the channels.

The returned SpectralWindow covers the same frequencies as channels [first, last) of the original.

Raises:IndexError – If [first, last) is not a (non-empty) subinterval of the channels
rechannelise(num_chans)

Get a new SpectralWindow with a different number of channels.

The returned SpectralWindow covers the same frequencies as the original, but dividing the bandwidth into a different number of channels.

katdal.visdatav4 module

Data accessor class for data and metadata from various sources in v4 format.

class katdal.visdatav4.VisibilityDataV4(source, ref_ant='', time_offset=0.0, applycal='', gaincal_flux={}, sensor_store=None, preselect=None, **kwargs)

Bases: katdal.dataset.DataSet

Access format version 4 visibility data and metadata.

For more information on attributes, see the DataSet docstring.

Parameters:
  • source (DataSource object) – Correlator data (visibilities, flags and weights) and metadata
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans, to determine the targets and as antenna for the data set catalogue (no relation to the calibration reference antenna…). The default is to use the observation activity sensor for scan partitioning, the CBF target and the array reference position as catalogue antenna.
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • applycal (string or sequence of strings, optional) – List of names of calibration products to apply to vis/weights/flags, as a sequence or string of comma-separated names. An empty string or sequence means no calibration will be applied (the default for now), while the keyword ‘all’ means all available products will be applied. NB In future the default will probably change to ‘all’. NB This is still very much an experimental feature…
  • gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux produced by cal pipeline (if available). A value of None disables flux calibration.
  • sensor_store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
  • preselect (dict, optional) – Subset of the data to select. See TelstateDataSource for details. This selection is permanent, and further selections made by DataSet.select() are relative to this subset.
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

raw_flags

Raw flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of uint8, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

excision

Excision as a function of time, frequency and baseline.

The fraction of each visibility that has been excised in the SDP ingest pipeline is returned as an array indexer of bool, shape (T, F, B) with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

Module contents

Data access library for data sets in the MeerKAT Visibility Format (MVF).

katdal.open(filename, ref_ant='', time_offset=0.0, **kwargs)

Open data file(s) with loader of the appropriate version.

Parameters:
  • filename (string or sequence of strings) – Data file name or list of file names
  • ref_ant (string, optional) – Name of reference antenna (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all timestamps, in seconds
  • kwargs (dict, optional) –

    Extra keyword arguments are passed on to underlying accessor class:

    mode (string, optional)
    [H5DataV*] File opening mode (e.g. ‘r+’ to open file in write mode)
    quicklook (bool)
    [H5DataV2] True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders

    See the documentation of VisibilityDataV4 for the keywords it accepts.

Returns:

data – Object providing DataSet interface to file(s)

Return type:

DataSet object

katdal.get_ants(filename)

Quick look function to get the list of antennas in a data file.

Parameters:filename (string) – Data file name
Returns:antennas
Return type:list of katpoint.Antenna objects
katdal.get_targets(filename)

Quick look function to get the list of targets in a data file.

Parameters:filename (string) – Data file name
Returns:targets – All targets in file
Return type:katpoint.Catalogue object

Indices and tables