katdal package

Submodules

katdal.applycal module

Utilities for applying calibration solutions to visibilities and weights.

katdal.applycal.complex_interp(x, xi, yi, left=None, right=None)

Piecewise linear interpolation of magnitude and phase of complex values.

Given discrete data points (xi, yi), this returns a 1-D piecewise linear interpolation y evaluated at the x coordinates, similar to numpy.interp(x, xi, yi). While numpy.interp() interpolates the real and imaginary parts of yi separately, this function interpolates magnitude and (unwrapped) phase separately instead. This is useful when the phase of yi changes more rapidly than its magnitude, as in electronic gains.

Parameters:
  • x (1-D sequence of float, length M) – The x-coordinates at which to evaluate the interpolated values
  • xi (1-D sequence of float, length N) – The x-coordinates of the data points, must be sorted in ascending order
  • yi (1-D sequence of complex, length N) – The y-coordinates of the data points, same length as xi
  • left (complex, optional) – Value to return for x < xi[0], default is yi[0]
  • right (complex, optional) – Value to return for x > xi[-1], default is yi[-1]
Returns:

y – The evaluated y-coordinates, same length as x and same dtype as yi

Return type:

array of complex, length M

katdal.applycal.get_cal_product(cache, cal_stream, product_type)

Extract calibration solution from cache as a sensor.

Parameters:
  • cache (SensorCache object) – Sensor cache serving cal product sensors
  • cal_stream (string) – Name of calibration stream (e.g. “l1”)
  • product_type (string) – Calibration product type (e.g. “G”)
katdal.applycal.calc_delay_correction(sensor, index, data_freqs)

Calculate correction sensor from delay calibration solution sensor.

Given the delay calibration solution sensor, this extracts the delay time series of the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).

Invalid delays (NaNs) are replaced by zeros, since bandpass calibration still has a shot at fixing any residual delay.

katdal.applycal.calc_bandpass_correction(sensor, index, data_freqs, cal_freqs)

Calculate correction sensor from bandpass calibration solution sensor.

Given the bandpass calibration solution sensor, this extracts the time series of bandpasses (channelised by cal_freqs) for the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).

Invalid solutions (NaNs) are replaced by linear interpolations over frequency (separately for magnitude and phase), as long as some channels have valid solutions.

katdal.applycal.calc_gain_correction(sensor, index, targets=None)

Calculate correction sensor from gain calibration solution sensor.

Given the gain calibration solution sensor, this extracts the time series of gains for the input specified by index (in the form (pol, ant)) and interpolates them over time to get the corresponding complex correction terms. The optional targets parameter is a CategoricalData i.e. a sensor indicating the target associated with each dump. The targets can be actual katpoint.Target objects or indices, as long as they uniquely identify the target. If provided, interpolate solutions derived from one target only at dumps associated with that target, which is what you want for self-calibration solutions (but not for standard calibration based on gain calibrator sources).

Invalid solutions (NaNs) are replaced by linear interpolations over time (separately for magnitude and phase), as long as some dumps have valid solutions on the appropriate target.

katdal.applycal.calibrate_flux(sensor, targets, gaincal_flux)

Apply flux scale to calibrator gains (aka flux calibration).

Given the gain calibration solution sensor, this identifies the target associated with each set of solutions by looking up the gain events in the targets sensor, and then scales the gains by the inverse square root of the relevant flux if a valid match is found in the gaincal_flux dict. This is equivalent to the final step of the AIPS GETJY and CASA fluxscale tasks.

katdal.applycal.add_applycal_sensors(cache, attrs, data_freqs, cal_stream, cal_substreams=None, gaincal_flux={})

Register virtual sensors for one calibration stream.

This operates on a single calibration stream called cal_stream (possibly an alias), which derives from one or more underlying cal streams listed in cal_substreams and has stream attributes in attrs.

The first set of virtual sensors maps all cal products into a unified namespace (template ‘Calibration/Products/cal_stream/{product_type}’). Map receptor inputs to the relevant indices in each calibration product based on the ants and pols found in attrs. Then register a virtual sensor per product type and per input in the SensorCache cache, with template ‘Calibration/Corrections/cal_stream/{product_type}/{inp}’. The virtual sensor function picks the appropriate correction calculator based on the cal product type, which also uses auxiliary info like the channel frequencies, data_freqs.

Parameters:
  • cache (SensorCache object) – Sensor cache serving cal product sensors and receiving correction sensors
  • attrs (dict-like) – Calibration stream attributes (e.g. a “cal” telstate view)
  • data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
  • cal_stream (string) – Name of (possibly virtual) calibration stream (e.g. “l1”)
  • cal_substreams (sequence of string, optional) – Names of actual underlying calibration streams (e.g. [“cal”]), defaults to [cal_stream] itself
  • gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux stored in attrs (if available). A value of None disables flux calibration.
Returns:

cal_freqs – Centre frequency of each frequency channel of calibration stream, in Hz (or None if no sensors were registered)

Return type:

1D array of float, or None

class katdal.applycal.CorrectionParams(inputs, input1_index, input2_index, corrections, channel_maps)

Bases: object

Data needed to compute corrections in calc_correction_per_corrprod().

Once constructed, the data in this class must not be modified, as it will be baked into dask graphs.

Parameters:
  • inputs (list of str) – Names of inputs, in the same order as the input axis of products
  • input2_index (input1_index,) – Indices into inputs of first and second items of correlation product
  • corrections (dict) – A dictionary (indexed by cal product name) of lists (indexed by input) of sequences (indexed by dump) of numpy arrays, with corrections to apply.
  • channel_maps (dict) – A dictionary (indexed by cal product name) of functions (signature g = channel_map(g, channels)) that map the frequency axis of the cal product g onto the frequency axis of the visibility data, where the vis frequency axis will be indexed by the slice channels.
katdal.applycal.calc_correction_per_corrprod(dump, channels, params)

Gain correction per channel per correlation product for a given dump.

This calculates an array of complex gain correction terms of shape (n_chans, n_corrprods) that can be directly applied to visibility data. This incorporates all requested calibration products at the specified dump and channels.

Parameters:
  • dump (int) – Dump index (applicable to full data set, i.e. absolute)
  • channels (slice) – Channel indices (applicable to full data set, i.e. absolute)
  • params (CorrectionParams) – Corrections per input, together with correlation product indices
Returns:

gains – Gain corrections per channel per correlation product

Return type:

array of complex64, shape (n_chans, n_corrprods)

Raises:

KeyError – If input and/or cal product has no associated correction

katdal.applycal.calc_correction(chunks, cache, corrprods, cal_products, data_freqs, all_cal_freqs, skip_missing_products=False)

Create a dask array containing applycal corrections.

Parameters:
  • chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see dask.array.core.normalize_chunks()).
  • cache (SensorCache object) – Sensor cache, used to look up individual correction sensors
  • corrprods (sequence of (string, string)) – Selected correlation products as pairs of correlator input labels
  • cal_products (sequence of string) – Calibration products that will contribute to corrections (e.g. [“l1.G”])
  • data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
  • all_cal_freqs (dict) – Dictionary mapping cal stream name (e.g. “l1”) to array of associated frequencies
  • skip_missing_products (bool) – If True, skip products with missing sensors instead of raising KeyError
Returns:

  • final_cal_products (list of string) – List of calibration products in the order that they will be applied (potentially a subset of cal_products if skipping missing products)
  • corrections (dask.array.Array object, or None) – Dask array that produces corrections for entire vis array, or None if no calibration products were found (either cal_products is empty or all products had some missing sensors and skip_missing_products is True)

Raises:

KeyError – If a correction sensor for a given input and cal product is not found (and skip_missing_products is False)

katdal.applycal.apply_vis_correction

Clean up and apply correction to visibility data in data.

katdal.applycal.apply_weights_correction

Clean up and apply correction to weight data in data.

katdal.applycal.apply_flags_correction

Set POSTPROC flag wherever correction is invalid.

katdal.averager module

katdal.averager.average_visibilities(vis, weight, flag, timestamps, channel_freqs, timeav=10, chanav=8, flagav=False)

Average visibilities, flags and weights.

Visibilities are weight-averaged using the weights in the weight array with flagged data set to weight zero. The averaged weights are the sum of the input weights for each average block. An average flag is retained if all of the data in an averaging block is flagged (the averaged visibility in this case is the unweighted average of the input visibilities). In cases where the averaging size in channel or time does not evenly divide the size of the input data, the remaining channels or timestamps at the end of the array after averaging are discarded. Channels are averaged first and the timestamps are second. An array of timestamps and frequencies corresponding to each channel is also directly averaged and returned.

Parameters:
  • vis (array(numtimestamps,numchannels,numbaselines) of complex64.) – The input visibilities to be averaged.
  • weight (array(numtimestamps,numchannels,numbaselines) of float32.) – The input weights (used for weighted averaging).
  • flag (array(numtimestamps,numchannels,numbaselines) of boolean.) – Input flags (flagged data have weight zero before averaging).
  • timestamps (array(numtimestamps) of int.) – The timestamps (in mjd seconds) corresponding to the input data.
  • channel_freqs (array(numchannels) of int.) – The frequencies (in Hz) corresponding to the input channels.
  • timeav (int.) – The desired averaging size in timestamps.
  • chanav (int.) – The desired averaging size in channels.
  • flagav (bool) – Flagged averaged data in when there is a single flag in the bin if true. Only flag averaged data when all data in the bin is flagged if false.
Returns:

  • av_vis (array(int(numtimestamps/timeav),int(numchannels/chanav)) of complex64.)
  • av_weight (array(int(numtimestamps/timeav),int(numchannels/chanav)) of float32.)
  • av_flag (array(int(numtimestamps/timeav),int(numchannels/chanav)) of boolean.)
  • av_mjd (array(int(numtimestamps/timeav)) of int.)
  • av_freq (array(int(numchannels)/chanav) of int.)

katdal.categorical module

Container for categorical (i.e. non-numerical) sensor data and related tools.

class katdal.categorical.ComparableArrayWrapper(value)

Bases: object

Wrapper that improves comparison of array objects.

This wrapper class has two main benefits:

  • It prevents sensor values that are NumPy ndarrays themselves (or array-like objects such as tuples and lists) from dissolving and losing their identity when they are assembled into an array.
  • It ensures that array-valued sensor values become properly comparable (avoiding array-valued booleans resulting from standard comparisons).

The former is needed because SensorGetter is treated as a structured array even if it contains object values. The latter is needed because the equality operator crops up in hard-to-reach places like inside list.index().

Parameters:value (object) – The sensor value to be wrapped
static unwrap(v)

Unwrap value if needed.

katdal.categorical.infer_dtype(values)

Figure out dtype of sequence of sensor values.

The common dtype is determined by explicit NumPy promotion. If the values are array-like themselves, treat them as opaque objects to simplify sensor processing. If the sequence is empty, the dtype is unknown and set to None. In addition, short-circuit to an actual dtype for objects with this attribute to simplify calling this on a mixed collection of sensor data.

Parameters:values (sequence, or object with dtype) – Sequence of sensor values (typically a list), or a sensor data object with a dtype attribute (like ndarray or SensorGetter)
Returns:dtype – Inferred dtype, or None if values is an empty sequence
Return type:numpy.dtype object or None

Notes

This is almost, but not quite, entirely like numpy.result_type(). The differences are that this accepts generic objects in the sequence, treats ndarrays as objects regardless of their underlying dtype, supports a dtype of None and short-circuits the check if the sequence itself is an object with a dtype. And this accepts the sequence as the first parameter as opposed to being unpacked across the argument list.

katdal.categorical.unique_in_order(elements, return_inverse=False)

Extract unique elements from elements while preserving original order.

Parameters:
  • elements (sequence) – Sequence of equality-comparable objects
  • return_inverse ({False, True}, optional) – If True, also return sequence of indices that can be used to reconstruct original elements sequence via [unique_elements[i] for i in inverse]
Returns:

  • unique_elements (list) – List of unique objects, in original order they were found in elements
  • inverse (array of int, optional) – If return_inverse is True, sequence of indices that can be used to reconstruct original sequence

class katdal.categorical.CategoricalData(sensor_values, events)

Bases: object

Container for categorical (i.e. non-numerical) sensor data.

This container allows simple manipulation and interpolation of a time series of non-numerical data represented as discrete events. The data is stored as a list of sensor values and two integer arrays:

  • unique_values stores one copy of each unique object in the data series
  • events stores the time indices (dumps) where each event occurs
  • indices stores indices linking each event to the unique_values list

The __getitem__ interface (i.e. data[dump]) returns the data associated with the last event before the requested dump(s), in effect doing a zeroth-order interpolation of the data at each event. Events can be added and removed and realigned, and the container can be split along the time axis, amongst other functionality.

Parameters:
  • sensor_values (sequence, length N) – Sequence of sensor values (of any type, preferably not None [see Notes])
  • events (sequence of non-negative ints, length N + 1) – Corresponding monotonic sequence of dump indices where each sensor value came into effect. The last event is one past the last dump where the final sensor value applied, and therefore equal to the total number of dumps for which sensor values were specified.
unique_values

List of unique sensor values in order they were found in sensor_values with any ComparableArrayWrapper objects unwrapped

Type:list, length M
indices

Array of indices into unique_values, one per sensor event

Type:array of int, shape (N,)
dtype

Sensor data type as NumPy dtype (found on demand from unique_values)

Type:numpy.dtype object

Notes

Any object values wrapped in a ComparableArrayWrapper will be unwrapped before adding it to unique_values. When adding, removing and comparing values to this container, any object values will be wrapped again temporarily to ensure proper comparisons.

It is discouraged to have a sensor value of None as this value is given a special meaning in methods such as CategoricalData.add() and sensor_to_categorical(). On the other hand, it is the most sensible dummy object value and any Nones entering through this initialiser will probably not cause any issues.

It is better to make unique_values a list instead of an array because an array assimilates objects such as tuples, lists and other arrays. The alternative is an array of ComparableArrayWrapper objects but these then need to be unpacked at some later stage which is also tricky.

dtype

Sensor value type.

segments()

Generator that iterates through events and returns segment and value.

Yields:
  • segment (slice object) – The slice representing range of dump indices of the current segment
  • value (object) – Sensor value associated with segment
add(event, value=None)

Add or override sensor event.

This adds a new event to the container, with a new value or a duplicate of the existing value at that dump. If the new event coincides with an existing one, it overrides the value at that dump.

Parameters:
  • event (int) – Dump of event to add or override
  • value (object, optional) – New value for event (duplicate current value at this dump by default)
remove(value)

Remove sensor value, remapping indices and merging segments in process.

If the sensor value does not exist, do nothing.

Parameters:value (object) – Sensor value to remove from container
add_unmatched(segments, match_dist=1)

Add duplicate events for segment starts that don’t match sensor events.

Given a sequence of segments, this matches each segment start to the nearest sensor event dump (within match_dist). Any unmatched segment starts are added as duplicate sensor events (or ignored if they fall outside the sensor event range).

Parameters:
  • segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
  • match_dist (int, optional) – Maximum distance in dumps that signify a match between events
align(segments)

Align sensor events with segment starts, possibly discarding events.

Given a sequence of segments, this moves each sensor event dump onto the nearest segment start. If more than one event ends up in the same segment, only keep the last event, discarding the rest.

The end result is that the sensor event dumps become a subset of the segment starts and there cannot be more sensor events than segments.

Parameters:segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
partition(segments)

Partition dataset into multiple sets along time axis.

Given a sequence of segments, split the container into a sequence of containers, one per segment. Each container contains only the events occurring within its corresponding segment, with event dumps relative to the start of the segment, and the containers share the same unique values.

Parameters:segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
Returns:split_data – Resulting multiple datasets in chronological order
Return type:sequence of CategoricalData objects
remove_repeats()

Remove repeated events of the same value.

katdal.categorical.concatenate_categorical(split_data, **kwargs)

Concatenate multiple categorical datasets into one along time axis.

Join a sequence of categorical datasets together, by forming a common set of unique values, remapping events to these and incrementing the event dumps of each dataset to start off where the previous dataset ended.

Parameters:split_data (sequence of CategoricalData objects) – Sequence of containers to concatenate
Returns:data – Concatenated dataset
Return type:CategoricalData object
katdal.categorical.sensor_to_categorical(sensor_timestamps, sensor_values, dump_midtimes, dump_period, transform=None, initial_value=None, greedy_values=None, allow_repeats=False, **kwargs)

Align categorical sensor events with dumps and clean up spurious events.

This converts timestamped sensor data into a categorical dataset by comparing the sensor timestamps to a series of dump timestamps and assigning each sensor event to the dump in which it occurred. When multiple sensor events happen in the same dump, only the last one is kept. The first dump is guaranteed to have a valid value by either using the supplied initial_value or extrapolating the first proper value back in time. The sensor data may be transformed before events that repeat values are potentially discarded. Finally, events with values marked as “greedy” take precedence over normal events when both occur within the same dump (either changing from or to the greedy value, or if the greedy value occurs completely within a dump).

XXX Future improvements include picking the event with the longest duration within a dump as opposed to the final event, and “snapping” event boundaries to dump boundaries with a given tolerance (e.g. 5-10% of dump period).

Parameters:
  • sensor_timestamps (sequence of float, length M) – Sequence of sensor timestamps (typically UTC seconds since Unix epoch)
  • sensor_values (sequence, length M) – Corresponding sequence of sensor values [potentially wrapped]
  • dump_midtimes (sequence of float, length N) – Sequence of dump midtimes (same reference as sensor timestamps)
  • dump_period (float) – Duration of each dump, in seconds
  • transform (callable or None, optional) – Transform [unwrapped] sensor values before fixing initial value, mapping dumps to events and discarding repeats
  • initial_value (object or None, optional) – Sensor value [transformed, unwrapped] to use for dump = 0 up to first proper event (force first proper event to start at dump = 0 by default)
  • greedy_values (sequence or None, optional) – List of [transformed, unwrapped] sensor values considered “greedy”
  • allow_repeats ({False, True}, optional) – If False, discard sensor events that do not change [transformed] value
Returns:

data – Constructed categorical dataset [unwraps any wrapped values]

Return type:

CategoricalData object

katdal.chunkstore module

Base class for accessing a store of chunks (i.e. N-dimensional arrays).

exception katdal.chunkstore.ChunkStoreError

Bases: Exception

“Base class for all standard ChunkStore errors.

exception katdal.chunkstore.StoreUnavailable

Bases: OSError, katdal.chunkstore.ChunkStoreError

Could not access underlying storage medium (offline, auth failed, etc).

exception katdal.chunkstore.ChunkNotFound

Bases: KeyError, katdal.chunkstore.ChunkStoreError

The store was accessible but a chunk with the given name was not found.

exception katdal.chunkstore.BadChunk

Bases: ValueError, katdal.chunkstore.ChunkStoreError

The chunk is malformed, e.g. bad dtype, actual shape differs from requested.

class katdal.chunkstore.PlaceholderChunk(shape, dtype)

Bases: object

Chunk returned to indicate missing data.

katdal.chunkstore.generate_chunks(shape, dtype, max_chunk_size, dims_to_split=None, power_of_two=False, max_dim_elements=None)

Generate dask chunk specification from ndarray parameters.

Parameters:
  • shape (sequence of int) – Array shape
  • dtype (numpy.dtype object or equivalent) – Array data type
  • max_chunk_size (float or int) – Upper limit on chunk size (if allowed by dims_to_split), in bytes
  • dims_to_split (sequence of int, optional) – Indices of dimensions that may be split into chunks (default all dims)
  • power_of_two (bool, optional) – True if chunk size should be rounded down to a power of two (the last chunk size along each dimension will potentially be smaller)
  • max_dim_elements (dict, optional) – Maximum number of elements on each dimension (each key is a dimension index). Dimensions that are not in dims_to_split are ignored.
Returns:

chunks – Dask chunk specification, indicating chunk sizes along each dimension

Return type:

tuple of tuple of int

katdal.chunkstore.npy_header_and_body(chunk)

Prepare a chunk for low-level writing.

Returns the .npy header and a view of the chunk corresponding to that header. The two should be concatenated (as buffer objects) to form a valid .npy file.

This is useful for high-performance code, as it allows a chunk to be encoded as a .npy file more efficiently than saving to a io.BytesIO.

class katdal.chunkstore.ChunkStore(error_map=None)

Bases: object

Base class for accessing a store of chunks (i.e. N-dimensional arrays).

A chunk is a simple (i.e. unit-stride) slice of an N-dimensional array known as its parent array. The array is identified by a string name, while the chunk within the array is identified by a sequence of slice objects which may be used to extract the chunk from the array. The array is a numpy.ndarray object with an associated dtype.

The basic idea is that the chunk store contains multiple arrays addressed by name. The list of available arrays and all array metadata (shape, chunks and dtype) are stored elsewhere. The metadata is used to identify chunks, while the chunk store takes care of storing and retrieving bytestrings of actual chunk data. These are packaged back into NumPy arrays for the user. Each array can only be stored once, with a unique chunking scheme (i.e. different chunking of the same data is disallowed).

The naming scheme for arrays and chunks is reasonably generic but has some restrictions:

  • Names are treated like paths with components and a standard separator

  • The chunk name is formed by appending a string of indices to the array name

  • It is discouraged to have an array name that is a prefix of another name

  • Each chunk store has its own restrictions on valid characters in names: some treat names as URLs while others treat them as filenames. A safe choice for name components should be the valid characters for S3 buckets (also including underscores for non-bucket components):

    VALID_BUCKET = re.compile(r’^[a-z0-9][a-z0-9.-]{2,62}$’)

Parameters:error_map (dict mapping Exception to Exception, optional) – Dict that maps store-specific errors to standard ChunkStore errors
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
get_chunk_or_default(array_name, slices, dtype, default_value=0)

Get chunk from the store but return default value if it is missing.

get_chunk_or_placeholder(array_name, slices, dtype)

Get chunk from the store but return a PlaceholderChunk if it is missing.

create_array(array_name)

Create a new array if it does not already exist.

Parameters:array_name (string) – Identifier of array
Raises:chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
put_chunk_noraise(array_name, slices, chunk)

Put chunk into store but return any exceptions instead of raising.

mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

NAME_SEP = '/'
NAME_INDEX_WIDTH = 5
classmethod join(*names)

Join components of chunk name with supported separator.

classmethod split(name, maxsplit=-1)

Split chunk name into components based on supported separator.

classmethod chunk_id_str(slices)

Chunk identifier in string form (e.g. ‘00012_01024_00000’).

classmethod chunk_metadata(array_name, slices, chunk=None, dtype=None)

Turn array name and chunk identifier into chunk name and shape.

Form the full chunk name from array_name and slices and extract the chunk shape from slices, validating it in the process. If chunk or dtype is given, check that chunk is commensurate with slices and that dtype contains no objects which would cause nasty segfaults.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object, optional) – Actual chunk data as ndarray (used to validate shape / dtype)
  • dtype (numpy.dtype object or equivalent, optional) – Data type of array x (used for validation only)
Returns:

  • chunk_name (string) – Full chunk name used to find chunk in underlying storage medium
  • shape (tuple of int) – Chunk shape tuple associated with slices

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk, or any dtype contains objects
get_dask_array(array_name, chunks, dtype, offset=(), index=(), errors=0)

Get dask array from the store.

Handling of missing chunks is determined by the errors argument.

Parameters:
  • array_name (string) – Identifier of array in chunk store
  • chunks (tuple of tuples of ints) – Chunk specification
  • dtype (numpy.dtype object or equivalent) – Data type of array
  • offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
  • errors (number or 'raise' or 'placeholder', optional) –

    Error handling. If ‘raise’, exceptions are passed through, causing the evaluation to fail.

    If ‘placeholder’, returns instances of PlaceholderChunk in place of missing chunks. Note that such an array cannot be used as-is, because an ndarray is expected, but it can be used as raw material for building new graphs via functions like da.map_blocks().

    If a numeric value, it is used as a default value.

Returns:

array – Dask array of given dtype

Return type:

dask.array.Array object

put_dask_array(array_name, array, offset=())

Put dask array into the store.

Parameters:
  • array_name (string) – Identifier of array in chunk store
  • array (dask.array.Array object) – Dask input array
  • offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
Returns:

success – Dask array of objects indicating success of transfer of each chunk (None indicates success, otherwise there is an exception object)

Return type:

dask.array.Array object

katdal.chunkstore_dict module

A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.

class katdal.chunkstore_dict.DictChunkStore(**kwargs)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.

This interprets all keyword arguments as NumPy arrays and stores them in an arrays dict. Each array is identified by its corresponding keyword. New arrays cannot be added via put() - they all need to be in place at store initialisation (or can be added afterwards via direct insertion into the arrays dict). The put method is only useful for in-place modification of existing arrays.

get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

Create a new array if it does not already exist.

Parameters:array_name (string) – Identifier of array
Raises:chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store

katdal.chunkstore_npy module

A store of chunks (i.e. N-dimensional arrays) based on NPY files.

class katdal.chunkstore_npy.NpyFileChunkStore(path, direct_write=False)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on NPY files.

Each chunk is stored in a separate binary file in NumPy .npy format. The filename is constructed as

“<path>/<array>/<idx>.npy”

where “<path>” is the chunk store directory specified on construction, “<array>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”).

For a description of the .npy format, see numpy.lib.format or the relevant NumPy Enhancement Proposal here.

Parameters:
  • path (string) – Top-level directory that contains NPY files of chunk store
  • direct_write (bool) – If true, use O_DIRECT when writing the file. This bypasses the OS page cache, which can be useful to avoid filling it up with files that won’t be read again.
Raises:
  • chunkstore.StoreUnavailable – If path does not exist / is not readable
  • chunkstore.StoreUnavailable – If direct_write was requested but is not available
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

See the docstring of ChunkStore.create_array().

put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

katdal.chunkstore_s3 module

A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.

exception katdal.chunkstore_s3.S3ObjectNotFound

Bases: katdal.chunkstore.ChunkNotFound

An object / bucket was not found in S3 object store.

exception katdal.chunkstore_s3.S3ServerGlitch

Bases: katdal.chunkstore.ChunkNotFound

S3 chunk store responded with an HTTP error deemed to be temporary.

katdal.chunkstore_s3.read_array(fp)

Read a numpy array in npy format from a file descriptor.

This is the same concept as numpy.lib.format.read_array(), but optimised for the case of reading from http.client.HTTPResponse. Using the numpy function reads pieces out then copies them into the array, while this implementation uses readinto. Raise TruncatedRead if the response runs out of data before the array is complete.

It does not allow pickled dtypes.

exception katdal.chunkstore_s3.AuthorisationFailed

Bases: katdal.chunkstore.StoreUnavailable

Authorisation failed, e.g. due to invalid, malformed or expired token.

exception katdal.chunkstore_s3.InvalidToken(token, message)

Bases: katdal.chunkstore_s3.AuthorisationFailed

Invalid JSON Web Token (JWT).

katdal.chunkstore_s3.decode_jwt(token)

Decode JSON Web Token (JWT) string and extract claims.

The MeerKAT archive uses JWT bearer tokens for authorisation. Each token is a JSON Web Signature (JWS) string with a payload of claims. This function extracts the claims as a dict, while also doing basic checks on the token (mostly to catch copy-n-paste errors). The signature is decoded but not validated, since that would require the server secrets.

Parameters:token (str) – JWS Compact Serialization as an ASCII string (native string, not bytes)
Returns:claims – The JWT Claims Set as a dict of key-value pairs
Return type:dict
Raises:InvalidToken – If the token is malformed or truncated, or has expired
class katdal.chunkstore_s3.S3ChunkStore(url, timeout=(30, 300), retries=2, token=None, credentials=None, public_read=False, expiry_days=0, **kwargs)

Bases: katdal.chunkstore.ChunkStore

A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.

This object encapsulates the S3 client / session and its underlying connection pool, which allows subsequent get and put calls to share the connections.

The full identifier of each chunk (the “chunk name”) is given by

“<bucket>/<path>/<idx>”

where “<bucket>” refers to the relevant S3 bucket, “<bucket>/<path>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”). The corresponding S3 key string of a chunk is “<path>/<idx>.npy” which reflects the fact that the chunk is stored as a string representation of an NPY file (complete with header).

Parameters:
  • url (str) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’. It can be specified as either bytes or unicode, and is converted to the native string type with UTF-8. The URL may also contain a path if this store is relative to an existing bucket, in which case the chunk name is a relative path (useful for unit tests).
  • timeout (float or None or tuple of 2 floats or None's, optional) – Connect / read timeout, in seconds, either a single value for both or custom values as (connect, read) tuple. None means “wait forever”…
  • retries (int or tuple of 2 ints or urllib3.util.retry.Retry, optional) – Number of connect / read retries, either a single value for both or custom values as (connect, read) tuple, or a Retry object for full customisation (including status retries).
  • token (str, optional) – Bearer token to authenticate
  • credentials (tuple of str, optional) – AWS access key and secret key to authenticate
  • public_read (bool, optional) – If set to true, new buckets will be created with a policy that allows everyone (including unauthenticated users) to read the data.
  • expiry_days (int, optional) – If set to a value greater than 0 will set a future expiry time in days for any new buckets created.
  • kwargs (dict) – Extra keyword arguments (unused)
Raises:

chunkstore.StoreUnavailable – If S3 server interaction failed (it’s down, no authentication, etc)

request(method, url, process=<function S3ChunkStore.<lambda>>, chunk_name='', ignored_errors=(), timeout=(), retries=None, **kwargs)

Send HTTP request to S3 server, process response and retry if needed.

This retries temporary HTTP errors, including reset connections while processing a successful response.

Parameters:
  • url (method,) – The standard required parameters of requests.Session.request()
  • process (function, signature result = process(response), optional) – Function that will process response (just return response by default)
  • chunk_name (str, optional) – Name of chunk, used for error reporting only
  • ignored_errors (collection of int, optional) – HTTP status codes that are treated like 200 OK, not raising an error
  • timeout (float or None or tuple of 2 floats or None's, optional) – Override timeout for this request (use the store timeout by default)
  • retries (int or tuple of 2 ints or urllib3.util.retry.Retry, optional) – Override retries for this request (use the store retries by default)
  • kwargs (optional) – These are passed on to requests.Session.request()
Returns:

result – The output of the process function applied to a successful response

Return type:

object

Raises:
  • AuthorisationFailed – If the request is not authorised by appropriate token or credentials
  • S3ObjectNotFound – If S3 object request fails because it does not exist
  • S3ServerGlitch – If S3 object request fails because server is temporarily overloaded
  • StoreUnavailable – If a general HTTP error occurred that is not ignored
get_chunk(array_name, slices, dtype)

Get chunk from the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • dtype (numpy.dtype object or equivalent) – Data type of array x
Returns:

chunk – Chunk as ndarray with dtype dtype and shape dictated by slices

Return type:

numpy.ndarray object

Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to slices
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If requested chunk was not found in store
create_array(array_name)

See the docstring of ChunkStore.create_array().

put_chunk(array_name, slices, chunk)

Put chunk into the store.

Parameters:
  • array_name (string) – Identifier of parent array x of chunk
  • slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
  • chunk (numpy.ndarray object) – Chunk as ndarray with shape commensurate with slices
Raises:
  • TypeError – If slices is not a sequence of slice(start, stop, 1) objects
  • chunkstore.BadChunk – If the shape implied by slices does not match that of chunk
  • chunkstore.StoreUnavailable – If interaction with chunk store failed (offline, bad auth, bad config)
  • chunkstore.ChunkNotFound – If array_name is incompatible with store
mark_complete(array_name)

Write a special object to indicate that array_name is finished.

This operation is idempotent.

The array_name need not correspond to any array written with put_chunk(). This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.

It is not necessary to call create_array() first; the implementation will do so if appropriate.

The presence of this marker can be checked with is_complete().

is_complete(array_name)

Check whether mark_complete() has been called for this array.

katdal.concatdata module

Class for concatenating visibility data sets.

exception katdal.concatdata.ConcatenationError

Bases: Exception

Sequence of objects could not be concatenated due to incompatibility.

class katdal.concatdata.ConcatenatedLazyIndexer(indexers, transforms=None)

Bases: katdal.lazy_indexer.LazyIndexer

Two-stage deferred indexer that concatenates multiple indexers.

This indexer concatenates a sequence of indexers along the first (i.e. time) axis. The index specification is broken down into chunks along this axis, sent to the applicable underlying indexers and the returned data are concatenated again before returning it.

Parameters:
  • indexers (sequence of LazyIndexer objects and/or arrays) – Sequence of indexers or raw arrays to be concatenated
  • transforms (list of LazyTransform objects or None, optional) – Extra chain of transforms to be applied to data after final indexing
name

Name of first non-empty indexer (or empty string otherwise)

Type:string
Raises:InvalidTransform – If transform chain does not obey restrictions on changing the data shape
katdal.concatdata.common_dtype(sensor_data_sequence)

The dtype suitable to store all sensor data values in the given sequence.

This extracts the dtypes of a sequence of sensor data objects and finds the minimal dtype to which all of them may be safely cast using NumPy type promotion rules (which will typically be the dtype of a concatenation of the values).

Parameters:sensor_data_sequence (sequence of extracted sensor data objects) – These objects may include numpy.ndarray and CategoricalData
Returns:dtype – The promoted dtype of the sequence, or None if sensor_data_sequence is empty
Return type:numpy.dtype object
class katdal.concatdata.ConcatenatedSensorGetter(data)

Bases: katdal.sensordata.SensorGetter

The concatenation of multiple raw (uncached) sensor data sets.

This is a convenient container for returning raw (uncached) sensor data sets from a ConcatenatedSensorCache object. It only accesses the underlying data sets when explicitly asked to via the get() interface, but provides quick access to metadata such as sensor name.

Parameters:data (sequence of SensorGetter) – Uncached sensor data
get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.concatdata.ConcatenatedSensorCache(caches, keep=None)

Bases: katdal.sensordata.SensorCache

Sensor cache that is a concatenation of multiple underlying caches.

This concatenates a sequence of sensor caches along the time axis and makes them appear like a single sensor cache. The combined cache contains a superset of all actual and virtual sensors found in the underlying caches and replaces any missing sensor data with dummy values.

Parameters:
  • caches (sequence of SensorCache objects) – Sequence of underlying caches to be concatenated
  • keep (sequence of bool, optional) – Default (global) time selection specification as boolean mask that will be applied to sensor data (this can be disabled on data retrieval)
get(name, select=False, extract=True, **kwargs)

Sensor values interpolated to correlator data timestamps.

Retrieve raw (uncached) or cached sensor data from each underlying cache and concatenate the results along the time axis.

Parameters:
  • name (string) – Sensor name
  • select ({False, True}, optional) – True if preset time selection will be applied to returned data
  • extract ({True, False}, optional) – True if sensor data should be extracted from store and cached
  • kwargs (dict, optional) – Additional parameters are passed to underlying sensor caches
Returns:

data – If extraction is disabled, this will be a SensorGetter object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as the timestamps attribute) for numerical data, and a CategoricalData object for categorical data.

Return type:

array or CategoricalData or SensorGetter object

Raises:

KeyError – If sensor name was not found in cache and did not match virtual template

class katdal.concatdata.ConcatenatedDataSet(datasets)

Bases: katdal.dataset.DataSet

Class that concatenates existing visibility data sets.

This provides a single DataSet interface to a list of concatenated data sets. Where possible, identical targets, subarrays, spectral windows and observation sensors are merged. For more information on attributes, see the DataSet docstring.

Parameters:datasets (sequence of DataSet objects) – List of existing data sets
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.dataset module

Base class for accessing a visibility data set.

exception katdal.dataset.WrongVersion

Bases: Exception

Trying to access data using accessor class with the wrong version.

exception katdal.dataset.BrokenFile

Bases: Exception

Data set could not be loaded because file is inconsistent or misses critical bits.

class katdal.dataset.Subarray(ants, corr_products)

Bases: object

Subarray specification.

A subarray is determined by the specific correlation products produced by the correlator and the antenna objects associated with the inputs found in the correlation products.

Parameters:
  • ants (sequence of katpoint.Antenna objects) – List of antenna objects, culled to contain only antennas found in corr_products
  • corr_products (sequence of (string, string) pairs, length B) – Correlation products as pairs of input labels, e.g. (‘ant1h’, ‘ant2v’), exposed as an array of strings with shape (B, 2)
inputs

List of correlator input labels found in corr_products, e.g. ‘ant1h’

Type:list of strings
katdal.dataset.parse_url_or_path(url_or_path)

Parse URL into components, converting path to absolute file URL.

Parameters:url_or_path (string) – URL, or filesystem path if there is no scheme
Returns:url_parts – Components of the parsed URL (‘file’ scheme will have an absolute path)
Return type:urllib.parse.ParseResult
class katdal.dataset.DataSet(name, ref_ant='', time_offset=0.0, url='')

Bases: object

Base class for accessing a visibility data set.

This provides a simple interface to a generic file (or files) containing visibility data (both single-dish and interferometer data supported). The data are not loaded into memory on opening the file, but are accessible via properties after typically selecting a subset of the data. This allows the reading of huge files.

Parameters:
  • name (string) – Name / identifier of data set
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use by script)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • url (string, optional) – Location of data set (either local filename or full URL accepted)
version

Format version string

Type:string
observer

Name of person that recorded the data set

Type:string
description

Short description of the purpose of the data set

Type:string
experiment_id

Experiment ID, a unique string used to link the data files of an experiment together with blog entries, etc.

Type:string
obs_params

Observation parameters, typically set in observation script

Type:dict mapping string to string or list of strings
obs_script_log

Observation script output log (useful for debugging)

Type:list of strings
subarrays

List of all subarrays in data set

Type:list of SubArray objects
subarray

Index of currently selected subarray

Type:int
ants

List of selected antennas

Type:list of katpoint.Antenna objects
inputs

List of selected correlator input labels (‘ant1h’)

Type:array of strings
corr_products

Array of selected correlation products as pairs of input labels (e.g. [(‘ant1h’, ‘ant1h’), (‘ant1h’, ‘ant2h’)])

Type:array of strings, shape (B, 2)
receivers

Identifier of the active receiver on each antenna

Type:dict mapping string to string or list of strings
spectral_windows

List of all spectral windows in data set

Type:list of SpectralWindow objects
spw

Index of currently selected spectral window

Type:int
channel_width

Channel bandwidth of selected spectral window, in Hz

Type:float
freqs / channel_freqs

Centre frequency of each selected channel, in Hz

Type:array of float, shape (F,)
channels

Original channel indices of selected channels

Type:array of int, shape (F,)
dump_period

Dump period, in seconds

Type:float
sensor

Sensor cache

Type:SensorCache object
catalogue

Catalogue of all targets / sources / fields in data set

Type:katpoint.Catalogue object
start_time

Timestamp of start of first sample in file, in UT seconds since Unix epoch

Type:katpoint.Timestamp object
end_time

Timestamp of end of last sample in file, in UT seconds since Unix epoch

Type:katpoint.Timestamp object
dumps

Original dump indices of selected dumps

Type:array of int, shape (T,)
scan_indices

List of currently selected scans as indices

Type:list of int
compscan_indices

List of currently selected compound scans as indices

Type:list of int
target_indices

List of currently selected targets as indices into catalogue

Type:list of int
target_projection

Type of spherical projection for target coordinates

Type:{‘ARC’, ‘SIN’, ‘TAN’, ‘STG’, ‘CAR’}, optional
target_coordsys

Spherical pointing coordinate system for target coordinates

Type:{‘azel’, ‘radec’}, optional
shape

Shape of selected visibility data array, as (T, F, B)

Type:tuple of 3 ints
size

Size of selected visibility data array, in bytes

Type:int
applycal_products

List of calibration products that will be applied to data

Type:list of string
select(**kwargs)

Select subset of data, based on time / frequency / corrprod filters.

This applies a set of selection criteria to the data set, which updates the data set properties and attributes to match the selection. In other words, the timestamps() and vis() methods will return the selected subset of the data, while attributes such as ants, channel_freqs and shape are updated. The sensor cache will also return the selected subset of sensor data via the __getitem__ interface. This function returns nothing, but modifies the existing data set in-place.

The selection criteria are divided into groups, based on whether they affect the time, frequency or correlation product dimension:

* Time: `dumps`, `timerange`, `scans`, `compscans`, `targets`
* Frequency: `channels`, `freqrange`
* Correlation product: `corrprods`, `ants`, `inputs`, `pol`

The subarray and spw criteria are special, as they affect multiple dimensions (time + correlation product and time + frequency, respectively), are always active and are forced to be a single index.

If there are multiple criteria on the same dimension within a select() call, they are ANDed together, while multiple items within the same criterion (e.g. targets=[‘Hyd A’, ‘Vir A’]) are ORed together. When a second select() call is done, all new selections replace previous selections on the same dimension, while existing selections on other dimensions are preserved. The reset parameter finetunes this behaviour.

If select() is called without any parameters the selection is reset to the original data set.

In addition, the weights and flags criteria are lists of names that select which weights and flags to include in the corresponding data set property.

Parameters:
  • strict ({True, False}, optional) – True if select() raises TypeError if it encounters an unknown kwarg
  • dumps (int or slice or sequence of ints or sequence of bools, optional) – Select dumps by index, slice or boolean mask of length T (keep dumps where mask is True)
  • timerange (sequence of 2 katpoint.Timestamp objects) – or equivalent, optional Select range of times between given start and end times
  • scans (int or string or sequence, optional) – Select scans by index or state (or negate state by prepending ‘~’)
  • compscans (int or string or sequence, optional) – Select compscans by index or label (or negate label by prepending ‘~’)
  • targets (int or string or katpoint.Target object or sequence,) – optional Select targets by index or name or description or object
  • spw (int, optional) – Select spectral window by index (only one may be active)
  • channels (int or slice or sequence of ints or sequence of bools, optional) – Select frequency channels by index, slice or boolean mask of length F (keep channels where mask is True)
  • freqrange (sequence of 2 floats, optional) – Select range of frequencies between start and end frequencies, in Hz
  • subarray (int, optional) – Select subarray by index (only one may be active)
  • corrprods (int or slice or sequence of ints or sequence of bools or) – sequence of string pairs or {‘auto’, ‘cross’}, optional Select correlation products by index, slice or boolean mask of length B (keep products where mask is True). Alternatively, select by value via a sequence of string pairs, or select all autocorrelations via ‘auto’ or all cross-correlations via ‘cross’.
  • ants (string or katpoint.Antenna object or sequence, optional) – Select antennas by name or object. If all antennas specified are prefaced by a ~ this is treated as a deselection and these antennas are excluded.
  • inputs (string or sequence of strings, optional) – Select inputs by label
  • pol (string or sequence of strings) – {‘H’, ‘V’, ‘HH’, ‘VV’, ‘HV’, ‘VH’}, optional Select polarisation terms
  • weights ('all' or string or sequence of strings, optional) – List of names of weights to be multiplied together, as a sequence or string of comma-separated names (combine all weights by default)
  • flags ('all' or string or sequence of strings, optional) – List of names of flags that will be OR’ed together, as a sequence or string of comma-separated names (use all flags by default). An empty string or sequence discards all flags.
  • reset ({'auto', '', 'T', 'F', 'B', 'TF', 'TB', 'FB', 'TFB'}, optional) – Remove existing selections on specified dimensions before applying the new selections. The default ‘auto’ option clears those dimensions that will be modified by the new selections and leaves the selections on unaffected dimensions intact except if select is called without any parameters, in which case all selections are cleared. By setting reset to ‘’, new selections apply on top of existing selections.
Raises:
  • TypeError – If a keyword argument is unknown and strict is enabled
  • IndexError – If spw or subarray is out of range
scans()

Generator that iterates through scans in data set.

This iterates through the currently selected list of scans, returning the scan index, scan state and associated target object. In addition, after each iteration the data set will reflect the scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current scan. The scan selection applies on top of any existing selection.

Yields:
  • scan (int) – Scan index
  • state (string) – Scan state
  • target (katpoint.Target object) – Target associated with scan
compscans()

Generator that iterates through compound scans in data set.

This iterates through the currently selected list of compound scans, returning the compound scan index, label and the first associated target object. In addition, after each iteration the data set will reflect the compound scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current compound scan. The compound scan selection applies on top of any existing selection.

Yields:
  • compscan (int) – Compound scan index
  • label (string) – Compound scan label
  • target (katpoint.Target object) – First target associated with compound scan
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and corrprod.

The visibility data are returned as an array of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

flags

Visibility flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products().

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

mjd

Visibility timestamps in Modified Julian Days (MJD).

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

lst

Local sidereal time at the reference antenna in hours.

The sidereal times are returned in an array of float, shape (T,).

az

Azimuth angle of each dish in degrees.

The azimuth angles are returned in an array of float, shape (T, A).

el

Elevation angle of each dish in degrees.

The elevation angles are returned in an array of float, shape (T, A).

ra

Right ascension of the actual pointing of each dish in J2000 degrees.

The right ascensions are returned in an array of float, shape (T, A).

dec

Declination of the actual pointing of each dish in J2000 degrees.

The declinations are returned in an array of float, shape (T, A).

parangle

Parallactic angle of the actual pointing of each dish in degrees.

The parallactic angle is the position angle of the observer’s vertical on the sky, measured from north toward east. This is the angle between the great-circle arc connecting the celestial North pole to the dish pointing direction, and the great-circle arc connecting the zenith above the antenna to the pointing direction, or the angle between the hour circle and vertical circle through the pointing direction. It is returned as an array of float, shape (T, A).

target_x

Target x coordinate of each dish in degrees.

The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the target_projection and target_coordsys attributes, respectively. The target x coordinates are returned as an array of float, shape (T, A).

target_y

Target y coordinate of each dish in degrees.

The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the target_projection and target_coordsys attributes, respectively. The target y coordinates are returned as an array of float, shape (T, A).

u

U coordinate for each correlation product in metres.

This calculates the u coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(u_1 - u_2\) for baseline (ant1, ant2).

v

V coordinate for each correlation product in metres.

This calculates the v coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(v_1 - v_2\) for baseline (ant1, ant2).

w

W coordinate for each correlation product in metres.

This calculates the w coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B).The sign convention is \(w_1 - w_2\) for baseline (ant1, ant2).

katdal.datasources module

Various sources of correlator data and metadata.

exception katdal.datasources.DataSourceNotFound

Bases: Exception

File associated with DataSource not found or server not responding.

class katdal.datasources.AttrsSensors(attrs, sensors)

Bases: object

Metadata in the form of attributes and sensors.

Parameters:
  • attrs (mapping from string to object) – Metadata attributes
  • sensors (mapping from string to SensorGetter objects) – Metadata sensor cache mapping sensor names to raw sensor data
class katdal.datasources.DataSource(metadata, timestamps, data=None)

Bases: object

A generic data source presenting both correlator data and metadata.

Parameters:
  • metadata (AttrsSensors object) – Metadata attributes and sensors
  • timestamps (array-like of float, length T) – Timestamps at centroids of visibilities in UTC seconds since Unix epoch
  • data (VisFlagsWeights object, optional) – Correlator data (visibilities, flags and weights)
katdal.datasources.view_capture_stream(telstate, capture_block_id, stream_name)

Create telstate view based on given capture block ID and stream name.

It constructs a view on telstate with at least the prefixes

  • <capture_block_id>_<stream_name>
  • <capture_block_id>
  • <stream_name>

Additionally if there is a <stream_name>_inherit key, that stream is added too (recursively).

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Original telescope state
  • capture_block_id (string) – Capture block ID
  • stream_name (string) – Stream name
Returns:

telstate – Telstate with a view that incorporates capture block, stream and combo

Return type:

TelescopeState object

katdal.datasources.view_l0_capture_stream(telstate, capture_block_id=None, stream_name=None, **kwargs)

Create telstate view based on auto-determined capture block ID and stream name.

This figures out the appropriate capture block ID and L0 stream name from a capture-stream specific telstate, or uses the provided ones. It then calls view_capture_capture() to generate a view.

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Original telescope state
  • capture_block_id (string, optional) – Specify capture block ID explicitly (detected otherwise)
  • stream_name (string, optional) – Specify L0 stream name explicitly (detected otherwise)
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns:

  • telstate (TelstateToStr object) – Telstate with a view that incorporates capture block, stream and combo
  • capture_block_id (string) – Actual capture block ID used
  • stream_name (string) – Actual L0 stream name used

Raises:

ValueError – If no capture block or L0 stream could be detected (with no override)

katdal.datasources.infer_chunk_store(url_parts, telstate, npy_store_path=None, s3_endpoint_url=None, array='correlator_data', **kwargs)

Construct chunk store automatically from dataset URL and telstate.

Parameters:
  • url_parts (urlparse.ParseResult object) – Parsed dataset URL
  • telstate (TelstateToStr object) – Telescope state
  • npy_store_path (string, optional) – Top-level directory of NpyFileChunkStore (overrides the default)
  • s3_endpoint_url (string, optional) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’ (overrides default)
  • array (string, optional) – Array within the bucket from which to determine the prefix
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns:

store – Chunk store for visibility data

Return type:

katdal.ChunkStore object

Raises:
class katdal.datasources.TelstateDataSource(telstate, capture_block_id, stream_name, chunk_store=None, timestamps=None, url='', upgrade_flags=True, van_vleck='off', preselect=None, **kwargs)

Bases: katdal.datasources.DataSource

A data source based on katsdptelstate.TelescopeState.

It is assumed that the provided telstate already has the appropriate views to find observation, stream and chunk store information. It typically needs the following prefixes:

  • <capture block ID>_<L0 stream>
  • <capture block ID>
  • <L0 stream>
Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Telescope state with appropriate views
  • capture_block_id (string) – Capture block ID
  • stream_name (string) – Name of the L0 stream
  • chunk_store (katdal.ChunkStore object, optional) – Chunk store for visibility data (the default is no data - metadata only)
  • timestamps (array of float, optional) – Visibility timestamps, overriding (or fixing) the ones found in telstate
  • url (string, optional) – Location of the telstate source
  • upgrade_flags (bool, optional) – Look for associated flag streams and use them if True (default)
  • van_vleck ({'off', 'autocorr'}, optional) – Type of Van Vleck (quantisation) correction to perform
  • preselect (dict, optional) –

    Subset of data to select. The keys in the dictionary correspond to the keyword arguments of DataSet.select(), but with restrictions:

    • Only channels and dumps can be specified.
    • The values must be slices with unit step.
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Raises:
  • KeyError – If telstate lacks critical keys
  • IndexError – If preselect does not meet the criteria above.
classmethod from_url(url, chunk_store='auto', **kwargs)

Construct TelstateDataSource from URL or RDB filename.

The following URL styles are supported:

Parameters:
  • url (string) – URL or RDB filename serving as entry point to data set
  • chunk_store (katdal.ChunkStore object, optional) – Chunk store for visibility data (obtained automatically by default, or set to None for metadata-only data set)
  • kwargs (dict, optional) – Extra keyword arguments passed to init, telstate view, chunk store init
katdal.datasources.open_data_source(url, **kwargs)

Construct the data source described by the given URL.

katdal.flags module

Definitions of flag bits

katdal.h5datav1 module

Data accessor class for HDF5 files produced by Fringe Finder correlator.

class katdal.h5datav1.H5DataV1(filename, ref_ant='', time_offset=0.0, mode='r', **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 1 file produced by Fringe Finder correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.h5datav2 module

Data accessor class for HDF5 files produced by KAT-7 correlator.

katdal.h5datav2.get_single_value(group, name)

Return single value from attribute or dataset with given name in group.

If name is an attribute of the HDF5 group group, it is returned, otherwise it is interpreted as an HDF5 dataset of group and the last value of name is returned. This is meant to retrieve static configuration values that potentially get set more than once during capture initialisation, but then does not change during actual capturing.

Parameters:
  • group (h5py.Group object) – HDF5 group to query
  • name (string) – Name of HDF5 attribute or dataset to query
Returns:

value – Attribute or last value of dataset

Return type:

object

katdal.h5datav2.dummy_dataset(name, shape, dtype, value)

Dummy HDF5 dataset containing a single value.

This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.

Parameters:
  • name (string) – Name of dataset
  • shape (sequence of int) – Shape of dataset
  • dtype (numpy.dtype object or equivalent) – Type of data stored in dataset
  • value (object) – All elements in the dataset will equal this value
Returns:

dataset – Dummy HDF5 dataset

Return type:

h5py.Dataset object

class katdal.h5datav2.H5DataV2(filename, ref_ant='', time_offset=0.0, mode='r', quicklook=False, keepdims=False, **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 2 file produced by KAT-7 correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • quicklook ({False, True}) – True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
  • keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.h5datav3 module

Data accessor class for HDF5 files produced by RTS correlator.

katdal.h5datav3.dummy_dataset(name, shape, dtype, value)

Dummy HDF5 dataset containing a single value.

This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.

Parameters:
  • name (string) – Name of dataset
  • shape (sequence of int) – Shape of dataset
  • dtype (numpy.dtype object or equivalent) – Type of data stored in dataset
  • value (object) – All elements in the dataset will equal this value
Returns:

dataset – Dummy HDF5 dataset

Return type:

h5py.Dataset object

class katdal.h5datav3.H5DataV3(filename, ref_ant='', time_offset=0.0, mode='r', time_scale=None, time_origin=None, rotate_bls=False, centre_freq=None, band=None, keepdims=False, **kwargs)

Bases: katdal.dataset.DataSet

Load HDF5 format version 3 file produced by RTS correlator.

For more information on attributes, see the DataSet docstring.

Parameters:
  • filename (string) – Name of HDF5 file
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
  • time_scale (float or None, optional) – Resynthesise timestamps using this scale factor
  • time_origin (float or None, optional) – Resynthesise timestamps using this sync time / epoch
  • rotate_bls ({False, True}, optional) – Rotate baseline label list to work around early RTS correlator bug
  • centre_freq (float or None, optional) – Override centre frequency if provided, in Hz
  • band (string or None, optional) – Override receiver band if provided (e.g. ‘l’) - used to find ND models
  • keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
file

Underlying HDF5 file, exposed via h5py interface

Type:h5py.File object
stream_name

Name of L0 data stream, for finding corresponding telescope state keys

Type:string

Notes

The timestamps can be resynchronised from the original sample counter values by specifying time_scale and/or time_origin. The basic formula is given by:

timestamp = sample_counter / time_scale + time_origin
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

katdal.lazy_indexer module

Two-stage deferred indexer for objects with expensive __getitem__ calls.

katdal.lazy_indexer.dask_getitem(x, indices)

Index a dask array, with N-D fancy index support and better performance.

This is a drop-in replacement for x[indices] that goes one further by implementing “N-D fancy indexing” which is still unsupported in dask. If indices contains multiple fancy indices, perform outer (oindex) indexing. This behaviour deviates from NumPy, which performs the more general (but also more obtuse) vectorized (vindex) indexing in this case. See NumPy NEP 21, dask #433 and h5py #652 for more details.

In addition, this optimises performance by culling unnecessary nodes from the dask graph after indexing, which makes it cheaper to compute if only a small piece of the graph is needed, and by collapsing fancy indices in indices to slices where possible (which also implies oindex semantics).

exception katdal.lazy_indexer.InvalidTransform

Bases: Exception

Transform changes data shape in unallowed way.

class katdal.lazy_indexer.LazyTransform(name=None, transform=<function LazyTransform.<lambda>>, new_shape=<function LazyTransform.<lambda>>, dtype=None)

Bases: object

Transformation to be applied by LazyIndexer after final indexing.

A LazyIndexer potentially applies a chain of transforms to the data after the final second-stage indexing is done. These transforms are restricted in their capabilities to simplify the indexing process. Specifically, when it comes to the data shape, transforms may only:

- add dimensions at the end of the data shape, or
- drop dimensions at the end of the data shape.

The preserved dimensions are not allowed to change their shape or interpretation so that the second-stage indexing matches the first-stage indexing on these dimensions. The data type (aka dtype) is allowed to change.

Parameters:
  • name (string or None, optional) – Name of transform
  • transform (function, signature data = f(data, keep), optional) – Transform to apply to data (keep is user-specified second-stage index)
  • new_shape (function, signature new_shape = f(old_shape), optional) – Function that predicts data array shape tuple after first-stage indexing and transformation, given its original shape tuple as input. Restrictions apply as described above.
  • dtype (numpy.dtype object or equivalent or None, optional) – Type of output array after transformation (None if same as input array)
class katdal.lazy_indexer.LazyIndexer(dataset, keep=slice(None, None, None), transforms=None)

Bases: object

Two-stage deferred indexer for objects with expensive __getitem__ calls.

This class was originally designed to extend and speed up the indexing functionality of HDF5 datasets as found in h5py, but works on any equivalent object (defined as any object with shape, dtype and __getitem__ members) where a call to __getitem__ may be very expensive. The following discussion focuses on the HDF5 use case as the main example.

Direct extraction of a subset of an HDF5 dataset via the __getitem__ interface (i.e. dataset[index]) has a few issues:

  1. Data access can be very slow (or impossible) if a very large dataset is fully loaded into memory and then indexed again at a later stage
  2. Advanced indexing (via boolean masks or sequences of integer indices) is only supported on a single dimension in the current version of h5py (2.0)
  3. Even though advanced indexing has limited support, simple indexing (via single integer indices or slices) is frequently much faster.

This class wraps an h5py.Dataset or equivalent object and exposes a new __getitem__ interface on it. It efficiently composes two stages of indexing: a first stage specified at object instantiation time and a second stage that applies on top of the first stage when __getitem__ is called on this object. The data are only loaded after the combined index is determined, addressing issue 1.

Furthermore, advanced indexing is allowed on any dimension by decomposing the selection as a series of slice selections covering contiguous segments of the dimension to alleviate issue 2. Finally, this also allows faster data retrieval by extracting a large slice from the HDF5 dataset and then performing advanced indexing on the resulting numpy.ndarray object instead, in response to issue 3.

The keep parameter of the __init__() and __getitem__() methods accepts a generic index or slice specification, i.e. anything that would be accepted by the __getitem__() method of a numpy.ndarray of the same shape as the dataset. This could be a single integer index, a sequence of integer indices, a slice object (representing the colon operator commonly used with __getitem__, e.g. representing x[1:10:2] as x[slice(1,10,2)]), a sequence of booleans as a mask, or a tuple containing any number of these (typically one index item per dataset dimension). Any missing dimensions will be fully selected, and any extra dimensions will be ignored.

Parameters:
  • dataset (h5py.Dataset object or equivalent) – Underlying dataset or array object on which lazy indexing will be done. This can be any object with shape, dtype and __getitem__ members.
  • keep (NumPy index expression, optional) – First-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
  • transforms (list of LazyTransform objects or None, optional) – Chain of transforms to be applied to data after final indexing. The chain as a whole may only add or drop dimensions at the end of data shape without changing the preserved dimensions.
name

Name of HDF5 dataset (or empty string for unnamed ndarrays, etc.)

Type:string
Raises:InvalidTransform – If transform chain does not obey restrictions on changing the data shape
shape

].shape``.

Type:Shape of data array after first-stage indexing and transformation, i.e. ``self[
dtype

].dtype``.

Type:Type of data array after transformation, i.e. ``self[
class katdal.lazy_indexer.DaskLazyIndexer(dataset, keep=(), transforms=())

Bases: object

Turn a dask Array into a LazyIndexer by computing it upon indexing.

The LazyIndexer wraps an underlying dataset in the form of a dask Array. Upon first use, it applies a stored first-stage selection (keep) to the array, followed by a series of transforms. All of these actions are lazy and only update the dask graph of the dataset. Since these updates are computed only on first use, there is minimal cost in constructing an instance and immediately throwing it away again.

Second-stage selection occurs via a __getitem__() call on this object, which also triggers dask computation to return the final numpy.ndarray output. Both selection steps follow outer indexing (“oindex”) semantics, by indexing each dimension / axis separately.

DaskLazyIndexers can also index other DaskLazyIndexers, which allows them to share first-stage selections and/or transforms, and to construct nested or hierarchical indexers.

Parameters:
  • dataset (dask.Array or DaskLazyIndexer) – The full dataset, from which a subset is chosen by keep
  • keep (NumPy index expression, optional) – Index expression describing first-stage selection (e.g. as applied by katdal.DataSet.select()), with oindex semantics
  • transforms (sequence of function, signature array = f(array), optional) – Transformations that are applied after indexing by keep but before indexing on this object. Each transformation is a callable that takes a dask array and returns another dask array.
name

The name of the (full) underlying dataset, useful for reporting

Type:str
dataset

The dask array that is accessed by indexing (after applying keep and transforms). It can be used directly to perform dask computations.

Type:dask.Array
transforms

Transformations that are applied after first-stage indexing.

dataset

Array after first-stage indexing and transformation.

classmethod get(arrays, keep, out=None)

Extract several arrays from the underlying dataset.

This is a variant of __getitem__() that pulls from several arrays jointly. This can be significantly more efficient if intermediate dask nodes can be shared.

Parameters:
  • arrays (list of DaskLazyIndexer) – Arrays to index
  • keep (NumPy index expression) – Second-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
  • out (list of np.ndarray) – If specified, output arrays in which to store results. It must be the same length as arrays and each array must have the appropriate shape and dtype.
Returns:

out – Extracted output array (computed from the final dask version)

Return type:

sequence of numpy.ndarray

shape

Shape of array after first-stage indexing and transformation.

dtype

Data type of array after first-stage indexing and transformation.

katdal.ms_async module

katdal.ms_extra module

katdal.sensordata module

Container that stores cached (interpolated) and uncached (raw) sensor data.

class katdal.sensordata.SensorData(name, timestamp, value, status=None)

Bases: object

Raw (uninterpolated) sensor values.

This is a simple struct that holds timestamps, values, and optionally status.

Parameters:
  • name (string) – Sensor name
  • timestamp (np.ndarray) – Array of timestamps
  • value (np.ndarray) – Array of values (wrapped in ComparableArrayWrapper if necessary)
  • status (np.ndarray, optional) – Array of sensor statuses
class katdal.sensordata.SensorGetter(name)

Bases: object

Raw (uninterpolated) sensor data placeholder.

This is an abstract lazy interface that provides a SensorData object on request but does not store values itself. Subclasses must implement get() to retrieve values from underlying storage. They should not cache the results.

Where possible, object-valued sensors (including sensors with ndarrays as values) will have values wrapped by ComparableArrayWrapper.

Parameters:name (string) – Sensor name
get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.sensordata.SimpleSensorGetter(name, timestamp, value, status=None)

Bases: katdal.sensordata.SensorGetter

Raw sensor data held in memory.

This is a simple wrapper for SensorData that implements the SensorGetter interface.

get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
class katdal.sensordata.RecordSensorGetter(data, name=None)

Bases: katdal.sensordata.SensorGetter

Raw (uninterpolated) sensor data in record array form.

This is a wrapper for uninterpolated sensor data which resembles a record array with fields ‘timestamp’, ‘value’ and optionally ‘status’. This is also the typical format of HDF5 datasets used to store sensor data.

Technically, the data is interpreted as a NumPy “structured” array, which is a simpler version of a recarray that only provides item-style access to fields and not attribute-style access.

Object-valued sensors are not treated specially in this class, as it is assumed that any wrapping already occurred in the construction of the recarray-like data input and will be reflected in its dtype. The original HDF5 sensor datasets also did not contain any objects as they only support standard KATCP types, so there was no need for wrapping there.

Parameters:
  • data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as an h5py.Dataset)
  • name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
get()

Extract timestamp, value and status of each sensor data point.

Values are passed through to_str().

katdal.sensordata.to_str(value)

Convert string-likes to the native string type.

Bytes are decoded to str, with surrogateencoding error handler.

Tuples, lists, dicts and numpy arrays are processed recursively, with the exception that numpy structured types with string or object fields won’t be handled.

katdal.sensordata.telstate_decode(raw, no_decode=())

Load a katsdptelstate-encoded value that might be wrapped in np.void or np.ndarray.

The np.void/np.ndarray wrapping is needed to pass variable-length binary strings through h5py.

If the value is a string and is in no_decode, it is returned verbatim. This is for backwards compatibility with older files that didn’t use any encoding at all.

The return value is also passed through to_str().

class katdal.sensordata.H5TelstateSensorGetter(data, name=None)

Bases: katdal.sensordata.RecordSensorGetter

Raw (uninterpolated) sensor data in HDF5 TelescopeState recarray form.

This wraps the telstate sensors stored in recent HDF5 files. It differs in two ways from the normal HDF5 sensors: no ‘status’ field and values encoded by katsdptelstate.

TODO: This is a temporary fix to get at missing sensors in telstate and should be replaced by a proper wrapping of any telstate object.

Object-valued sensors (including sensors with ndarrays as values) will have its values wrapped by ComparableArrayWrapper.

Parameters:
  • data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as an h5py.Dataset)
  • name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
get()

Extract timestamp and value of each sensor data point.

class katdal.sensordata.TelstateToStr(telstate)

Bases: object

Wrap an existing telescope state and pass return values through to_str()

wrapped
view(name, add_separator=True, exclusive=False)
root()
get_message(channel=None)
get(key, default=None, return_encoded=False)
get_range(key, st=None, et=None, include_previous=None, include_end=False, return_encoded=False)
get_indexed(key, sub_key, default=None, return_encoded=False)
class katdal.sensordata.TelstateSensorGetter(telstate, name)

Bases: katdal.sensordata.SensorGetter

Raw (uninterpolated) sensor data stored in original TelescopeState.

This wraps sensor data stored in a TelescopeState object. The data is only read out on item access.

Object-valued sensors (including sensors with ndarrays as values) will have their values wrapped by ComparableArrayWrapper.

Parameters:
  • telstate (katsdptelstate.TelescopeState object) – Telescope state object
  • name (string) – Sensor name, also used as telstate key
Raises:

KeyError – If sensor name is not found in telstate or it is an attribute instead

get()

Retrieve the values from underlying storage.

Returns:values – Underlying data
Return type:SensorData
katdal.sensordata.get_sensor_from_katstore(store, name, start_time, end_time)

Get raw sensor data from katstore (CAM’s central sensor database).

Parameters:
  • store (string) – Hostname / endpoint of katstore webserver speaking katstore64 API
  • name (string) – Sensor name (the normalised / escaped version with underscores)
  • end_time (start_time,) – Time range for sensor records as UTC seconds since Unix epoch
Returns:

data – Retrieved sensor data with ‘timestamp’, ‘value’ and ‘status’ fields

Return type:

RecordSensorGetter object

Raises:
  • ConnectionError – If this cannot connect to the katstore server
  • RuntimeError – If connection succeeded but interaction with katstore64 API failed
  • KeyError – If the sensor was not found in the store or it has no data in time range
katdal.sensordata.dummy_sensor_getter(name, value=None, dtype=<class 'numpy.float64'>, timestamp=0.0)

Create a SensorGetter object with a single default value based on type.

This creates a dummy SimpleSensorGetter object based on a default value or a type, for use when no sensor data are available, but filler data is required (e.g. when concatenating sensors from different datasets and one dataset lacks the sensor). The dummy dataset contains a single data point with the filler value and a configurable timestamp (defaulting to way back). If the filler value is an object it will be wrapped in a ComparableArrayWrapper to match the behaviour of other SensorGetter objects.

Parameters:
  • name (string) – Sensor name
  • value (object, optional) – Filler value (default is None, meaning dtype will be used instead)
  • dtype (numpy.dtype object or equivalent, optional) – Desired sensor data type, used if no explicit value is given
  • timestamp (float, optional) – Time when dummy value occurred (default is way back)
Returns:

data – Dummy sensor data object with ‘timestamp’ and ‘value’ fields

Return type:

SimpleSensorGetter object, shape (1,)

katdal.sensordata.remove_duplicates_and_invalid_values(sensor)

Remove duplicate timestamps and invalid values from sensor data.

This sorts the ‘timestamp’ field of the sensor record array and removes any duplicate values, updating the corresponding ‘value’ and ‘status’ fields as well. If more than one timestamp has the same value, the value and status of the last of these timestamps are selected. If the values differ for the same timestamp, a warning is logged (and the last one is still picked).

In addition, if there is a ‘status’ field, get rid of data with a status other than ‘nominal’, ‘warn’ or ‘error’, which indicates that the sensor could not be read and the corresponding value will therefore be invalid. Afterwards, remove the ‘status’ field from the data as this is the only place it plays a role.

Parameters:sensor (SensorData object, length N) – Raw sensor dataset.
Returns:clean_sensor – Sensor data with duplicate timestamps and invalid values removed (M <= N), and only ‘timestamp’ and ‘value’ attributes left.
Return type:SensorData object, length M
class katdal.sensordata.SensorCache(cache, timestamps, dump_period, keep=slice(None, None, None), props=None, virtual={}, aliases={}, store=None)

Bases: collections.abc.MutableMapping

Container for sensor data providing name lookup, interpolation and caching.

Sensor data is defined as a one-dimensional time series of values. The values may be numerical or non-numerical (categorical), and the timestamps are monotonically increasing but not necessarily regularly spaced.

A sensor cache stores sensor data with dictionary-like lookup based on the sensor name. Since the extraction of sensor data from e.g. HDF5 files may be costly, the data is first represented in uncached (raw) form as SensorGetter objects, which typically wrap the underlying sensor HDF5 datasets. After extraction, the sensor data are stored either as a NumPy array (for numerical data) or as a CategoricalData object (for non-numerical data).

The sensor cache stores a timestamp array (or indexer) onto which the sensor data will be interpolated, together with a boolean selection mask that selects a subset of the interpolated data as the final output. Interpolation is linear for numerical data and zeroth-order for non-numerical data. Both extraction and selection may be enabled or disabled through the appropriate use of the two main interfaces that retrieve sensor data:

  • The __getitem__ interface (i.e. cache[sensor]) presents a simple high-level interface to the end user that always extracts the sensor data and selects the requested subset from it. In addition, the return type is always a NumPy array.
  • The get() interface (i.e. cache.get(sensor)) is an advanced interface for library builders that provides full control of the extraction process via sensor properties. It does not apply selection by default, as this is more convenient for library routines.

In addition, the sensor cache may contain virtual sensors which calculate their values based on the values of other sensors. They are identified by pattern templates that potentially match multiple sensor names.

Parameters:
  • cache (mapping from string to SensorGetter objects) – Initial sensor cache mapping sensor names to raw (uncached) sensor data
  • timestamps (array of float) – Correlator data timestamps onto which sensor values will be interpolated, as UTC seconds since Unix epoch
  • dump_period (float) – Dump period, in seconds
  • keep (int or slice or sequence of int or sequence of bool, optional) – Default time selection specification that will be applied to sensor data (this can be disabled on data retrieval)
  • props (dict, optional) – Default properties that govern how sensor data are interpreted and interpolated (this can be overridden on data retrieval). Can use * as a wildcard anywhere in the key.
  • virtual (dict mapping string to function, optional) – Virtual sensors, specified as a pattern matching the virtual sensor name and a corresponding function that will create the sensor (together with any associated virtual sensors)
  • aliases (dict mapping string to string, optional) – Alternate names for sensors, as a dictionary mapping each alias to the original sensor name suffix. This will create additional sensors with the aliased names and the data of the original sensors.
  • store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
add_aliases(alias, original)

Add alternate names / aliases for sensors.

Search for sensors with names ending in the original suffix and form a corresponding alternate name by replacing original with alias. The new aliased sensors will re-use the data of the original sensors.

Parameters:
  • alias (string) – The new sensor name suffix that replaces original
  • original (string) – Sensors with names that end in this will get aliases
get(name, select=False, extract=True, **kwargs)

Sensor values interpolated to correlator data timestamps.

Time selection is disabled by default, as this is a more advanced data extraction method typically called by library routines that want to operate on the full array of sensor values. For additional allowed parameters when extracting categorical data, see the docstring for sensor_to_categorical().

Parameters:
  • name (string) – Sensor name
  • select ({False, True}, optional) – True if preset time selection will be applied to interpolated data
  • extract ({True, False}, optional) – True if sensor data should be extracted, interpolated and cached
  • categorical ({None, True, False}, optional) – Interpret sensor data as categorical or numerical (by default, data of type float is numerical and of any other type is categorical)
  • kwargs (dict, optional) – Additional parameters are passed to sensor_to_categorical()
Returns:

data – If extraction is disabled, this will be a SensorGetter object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as the timestamps attribute) for numerical data, and a CategoricalData object for categorical data.

Return type:

array or CategoricalData or SensorGetter object

Raises:
  • ValueError – If select=True and extract=False, as select requires interpolation
  • KeyError – If sensor name was not found in cache and did not match virtual template
get_with_fallback(sensor_type, names)

Sensor values interpolated to correlator data timestamps.

Get data for a type of sensor that may have one of several names. Try each name in turn until something works, or crash sensibly.

Parameters:
  • sensor_type (string) – Name of sensor class / type, used for informational purposes only
  • names (sequence of strings) – Sensor names to try until one of them provides data
Returns:

sensor_data – Interpolated sensor data as 1-D array, one value per selected timestamp

Return type:

array

Raises:

KeyError – If none of the sensor names were found in the cache

katdal.spectral_window module

class katdal.spectral_window.SpectralWindow(centre_freq, channel_width, num_chans, product=None, sideband=-1, band='L', bandwidth=None)

Bases: object

Spectral window specification.

A spectral window is determined by the number of frequency channels produced by the correlator and their corresponding centre frequencies, as well as the channel width. The channels are assumed to be regularly spaced and to be the result of either lower-sideband downconversion (channel frequencies decreasing with channel index) or upper-sideband downconversion (frequencies increasing with index). For further information the receiver band and correlator product names are also available.

Warning

Instances should be treated as immutable. Changing the attributes will lead to inconsistencies between them.

Parameters:
  • centre_freq (float) – Centre frequency of spectral window, in Hz
  • channel_width (float) – Bandwidth of each frequency channel, in Hz
  • num_chans (int) – Number of frequency channels
  • product (string, optional) – Name of data product / correlator mode
  • sideband ({-1, +1}, optional) – Type of downconversion (-1 => lower sideband, +1 => upper sideband)
  • band ({'L', 'UHF', 'S', 'X', 'Ku'}, optional) – Name of receiver / band
  • bandwidth (float, optional) – The bandwidth of the whole spectral window, in Hz. If specified, channel_width is ignored and computed from the bandwidth. If not specified, bandwidth is computed from the channel width. Specifying this is a good idea if the channel width cannot be exactly represented in floating point.
channel_freqs

Centre frequency of each frequency channel (assuming LSB mixing), in Hz

Type:array of float, shape (F,)
channel_freqs
subrange(first, last)

Get a new SpectralWindow representing a subset of the channels.

The returned SpectralWindow covers the same frequencies as channels [first, last) of the original.

Raises:IndexError – If [first, last) is not a (non-empty) subinterval of the channels
rechannelise(num_chans)

Get a new SpectralWindow with a different number of channels.

The returned SpectralWindow covers the same frequencies as the original, but dividing the bandwidth into a different number of channels.

katdal.visdatav4 module

Data accessor class for data and metadata from various sources in v4 format.

class katdal.visdatav4.VisibilityDataV4(source, ref_ant='', time_offset=0.0, applycal='', gaincal_flux={}, sensor_store=None, preselect=None, **kwargs)

Bases: katdal.dataset.DataSet

Access format version 4 visibility data and metadata.

For more information on attributes, see the DataSet docstring.

Parameters:
  • source (DataSource object) – Correlator data (visibilities, flags and weights) and metadata
  • ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans, to determine the targets and as antenna for the data set catalogue (no relation to the calibration reference antenna…). The default is to use the observation activity sensor for scan partitioning, the CBF target and the array reference position as catalogue antenna.
  • time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
  • applycal (string or sequence of strings, optional) – List of names of calibration products to apply to vis/weights/flags, as a sequence or string of comma-separated names. An empty string or sequence means no calibration will be applied (the default for now), while the keyword ‘all’ means all available products will be applied. NB In future the default will probably change to ‘all’. NB This is still very much an experimental feature…
  • gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux produced by cal pipeline (if available). A value of None disables flux calibration.
  • sensor_store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
  • preselect (dict, optional) – Subset of the data to select. See TelstateDataSource for details. This selection is permanent, and further selections made by DataSet.select() are relative to this subset.
  • kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
timestamps

Visibility timestamps in UTC seconds since Unix epoch.

The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.

vis

Complex visibility data as a function of time, frequency and baseline.

The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.

weights

Visibility weights as a function of time, frequency and baseline.

The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

flags

Flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

raw_flags

Raw flags as a function of time, frequency and baseline.

The flags data are returned as an array indexer of uint8, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

excision

Excision as a function of time, frequency and baseline.

The fraction of each visibility that has been excised in the SDP ingest pipeline is returned as an array indexer of bool, shape (T, F, B) with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of timestamps(), the number of frequency channels F matches the length of freqs() and the number of correlation products B matches the length of corr_products(). To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.

temperature

Air temperature in degrees Celsius.

pressure

Barometric pressure in millibars.

humidity

Relative humidity as a percentage.

wind_speed

Wind speed in metres per second.

wind_direction

Wind direction as an azimuth angle in degrees.

Module contents

Data access library for data sets in the MeerKAT Visibility Format (MVF).

katdal.open(filename, ref_ant='', time_offset=0.0, **kwargs)

Open data file(s) with loader of the appropriate version.

Parameters:
  • filename (string or sequence of strings) – Data file name or list of file names
  • ref_ant (string, optional) – Name of reference antenna (default is first antenna in use)
  • time_offset (float, optional) – Offset to add to all timestamps, in seconds
  • kwargs (dict, optional) –

    Extra keyword arguments are passed on to underlying accessor class:

    mode (string, optional)
    [H5DataV*] File opening mode (e.g. ‘r+’ to open file in write mode)
    quicklook (bool)
    [H5DataV2] True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders

    See the documentation of VisibilityDataV4 for the keywords it accepts.

Returns:

data – Object providing DataSet interface to file(s)

Return type:

DataSet object

katdal.get_ants(filename)

Quick look function to get the list of antennas in a data file.

Parameters:filename (string) – Data file name
Returns:antennas
Return type:list of katpoint.Antenna objects
katdal.get_targets(filename)

Quick look function to get the list of targets in a data file.

Parameters:filename (string) – Data file name
Returns:targets – All targets in file
Return type:katpoint.Catalogue object