katdal package¶
Submodules¶
katdal.applycal module¶
Utilities for applying calibration solutions to visibilities and weights.
-
katdal.applycal.
complex_interp
(x, xi, yi, left=None, right=None)¶ Piecewise linear interpolation of magnitude and phase of complex values.
Given discrete data points (xi, yi), this returns a 1-D piecewise linear interpolation y evaluated at the x coordinates, similar to numpy.interp(x, xi, yi). While
numpy.interp()
interpolates the real and imaginary parts of yi separately, this function interpolates magnitude and (unwrapped) phase separately instead. This is useful when the phase of yi changes more rapidly than its magnitude, as in electronic gains.Parameters: - x (1-D sequence of float, length M) – The x-coordinates at which to evaluate the interpolated values
- xi (1-D sequence of float, length N) – The x-coordinates of the data points, must be sorted in ascending order
- yi (1-D sequence of complex, length N) – The y-coordinates of the data points, same length as xi
- left (complex, optional) – Value to return for x < xi[0], default is yi[0]
- right (complex, optional) – Value to return for x > xi[-1], default is yi[-1]
Returns: y – The evaluated y-coordinates, same length as x and same dtype as yi
Return type: array of complex, length M
-
katdal.applycal.
get_cal_product
(cache, cal_stream, product_type)¶ Extract calibration solution from cache as a sensor.
Parameters: - cache (
SensorCache
object) – Sensor cache serving cal product sensors - cal_stream (string) – Name of calibration stream (e.g. “l1”)
- product_type (string) – Calibration product type (e.g. “G”)
- cache (
-
katdal.applycal.
calc_delay_correction
(sensor, index, data_freqs)¶ Calculate correction sensor from delay calibration solution sensor.
Given the delay calibration solution sensor, this extracts the delay time series of the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).
Invalid delays (NaNs) are replaced by zeros, since bandpass calibration still has a shot at fixing any residual delay.
-
katdal.applycal.
calc_bandpass_correction
(sensor, index, data_freqs, cal_freqs)¶ Calculate correction sensor from bandpass calibration solution sensor.
Given the bandpass calibration solution sensor, this extracts the time series of bandpasses (channelised by cal_freqs) for the input specified by index (in the form (pol, ant)) and builds a categorical sensor for the corresponding complex correction terms (channelised by data_freqs).
Invalid solutions (NaNs) are replaced by linear interpolations over frequency (separately for magnitude and phase), as long as some channels have valid solutions.
-
katdal.applycal.
calc_gain_correction
(sensor, index, targets=None)¶ Calculate correction sensor from gain calibration solution sensor.
Given the gain calibration solution sensor, this extracts the time series of gains for the input specified by index (in the form (pol, ant)) and interpolates them over time to get the corresponding complex correction terms. The optional targets parameter is a
CategoricalData
i.e. a sensor indicating the target associated with each dump. The targets can be actualkatpoint.Target
objects or indices, as long as they uniquely identify the target. If provided, interpolate solutions derived from one target only at dumps associated with that target, which is what you want for self-calibration solutions (but not for standard calibration based on gain calibrator sources).Invalid solutions (NaNs) are replaced by linear interpolations over time (separately for magnitude and phase), as long as some dumps have valid solutions on the appropriate target.
-
katdal.applycal.
calibrate_flux
(sensor, targets, gaincal_flux)¶ Apply flux scale to calibrator gains (aka flux calibration).
Given the gain calibration solution sensor, this identifies the target associated with each set of solutions by looking up the gain events in the targets sensor, and then scales the gains by the inverse square root of the relevant flux if a valid match is found in the gaincal_flux dict. This is equivalent to the final step of the AIPS GETJY and CASA fluxscale tasks.
-
katdal.applycal.
add_applycal_sensors
(cache, attrs, data_freqs, cal_stream, cal_substreams=None, gaincal_flux={})¶ Register virtual sensors for one calibration stream.
This operates on a single calibration stream called cal_stream (possibly an alias), which derives from one or more underlying cal streams listed in cal_substreams and has stream attributes in attrs.
The first set of virtual sensors maps all cal products into a unified namespace (template ‘Calibration/Products/cal_stream/{product_type}’). Map receptor inputs to the relevant indices in each calibration product based on the ants and pols found in attrs. Then register a virtual sensor per product type and per input in the SensorCache cache, with template ‘Calibration/Corrections/cal_stream/{product_type}/{inp}’. The virtual sensor function picks the appropriate correction calculator based on the cal product type, which also uses auxiliary info like the channel frequencies, data_freqs.
Parameters: - cache (
SensorCache
object) – Sensor cache serving cal product sensors and receiving correction sensors - attrs (dict-like) – Calibration stream attributes (e.g. a “cal” telstate view)
- data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
- cal_stream (string) – Name of (possibly virtual) calibration stream (e.g. “l1”)
- cal_substreams (sequence of string, optional) – Names of actual underlying calibration streams (e.g. [“cal”]), defaults to [cal_stream] itself
- gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux stored in attrs (if available). A value of None disables flux calibration.
Returns: cal_freqs – Centre frequency of each frequency channel of calibration stream, in Hz (or None if no sensors were registered)
Return type: 1D array of float, or None
- cache (
-
class
katdal.applycal.
CorrectionParams
(inputs, input1_index, input2_index, corrections, channel_maps)¶ Bases:
object
Data needed to compute corrections in
calc_correction_per_corrprod()
.Once constructed, the data in this class must not be modified, as it will be baked into dask graphs.
Parameters: - inputs (list of str) – Names of inputs, in the same order as the input axis of products
- input2_index (input1_index,) – Indices into inputs of first and second items of correlation product
- corrections (dict) – A dictionary (indexed by cal product name) of lists (indexed by input) of sequences (indexed by dump) of numpy arrays, with corrections to apply.
- channel_maps (dict) – A dictionary (indexed by cal product name) of functions (signature g = channel_map(g, channels)) that map the frequency axis of the cal product g onto the frequency axis of the visibility data, where the vis frequency axis will be indexed by the slice channels.
-
katdal.applycal.
calc_correction_per_corrprod
(dump, channels, params)¶ Gain correction per channel per correlation product for a given dump.
This calculates an array of complex gain correction terms of shape (n_chans, n_corrprods) that can be directly applied to visibility data. This incorporates all requested calibration products at the specified dump and channels.
Parameters: - dump (int) – Dump index (applicable to full data set, i.e. absolute)
- channels (slice) – Channel indices (applicable to full data set, i.e. absolute)
- params (
CorrectionParams
) – Corrections per input, together with correlation product indices
Returns: gains – Gain corrections per channel per correlation product
Return type: array of complex64, shape (n_chans, n_corrprods)
Raises: KeyError
– If input and/or cal product has no associated correction
-
katdal.applycal.
calc_correction
(chunks, cache, corrprods, cal_products, data_freqs, all_cal_freqs, skip_missing_products=False)¶ Create a dask array containing applycal corrections.
Parameters: - chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see
dask.array.core.normalize_chunks()
). - cache (
SensorCache
object) – Sensor cache, used to look up individual correction sensors - corrprods (sequence of (string, string)) – Selected correlation products as pairs of correlator input labels
- cal_products (sequence of string) – Calibration products that will contribute to corrections (e.g. [“l1.G”])
- data_freqs (array of float, shape (F,)) – Centre frequency of each frequency channel of visibilities, in Hz
- all_cal_freqs (dict) – Dictionary mapping cal stream name (e.g. “l1”) to array of associated frequencies
- skip_missing_products (bool) – If True, skip products with missing sensors instead of raising KeyError
Returns: - final_cal_products (list of string) – List of calibration products in the order that they will be applied (potentially a subset of cal_products if skipping missing products)
- corrections (
dask.array.Array
object, or None) – Dask array that produces corrections for entire vis array, or None if no calibration products were found (either cal_products is empty or all products had some missing sensors and skip_missing_products is True)
Raises: KeyError
– If a correction sensor for a given input and cal product is not found (and skip_missing_products is False)- chunks (tuple of tuple of int) – Chunking scheme of the resulting array, in normalized form (see
-
katdal.applycal.
apply_vis_correction
¶ Clean up and apply correction to visibility data in data.
-
katdal.applycal.
apply_weights_correction
¶ Clean up and apply correction to weight data in data.
-
katdal.applycal.
apply_flags_correction
¶ Set POSTPROC flag wherever correction is invalid.
katdal.averager module¶
-
katdal.averager.
average_visibilities
(vis, weight, flag, timestamps, channel_freqs, timeav=10, chanav=8, flagav=False)¶ Average visibilities, flags and weights.
Visibilities are weight-averaged using the weights in the weight array with flagged data set to weight zero. The averaged weights are the sum of the input weights for each average block. An average flag is retained if all of the data in an averaging block is flagged (the averaged visibility in this case is the unweighted average of the input visibilities). In cases where the averaging size in channel or time does not evenly divide the size of the input data, the remaining channels or timestamps at the end of the array after averaging are discarded. Channels are averaged first and the timestamps are second. An array of timestamps and frequencies corresponding to each channel is also directly averaged and returned.
Parameters: - vis (array(numtimestamps,numchannels,numbaselines) of complex64.) – The input visibilities to be averaged.
- weight (array(numtimestamps,numchannels,numbaselines) of float32.) – The input weights (used for weighted averaging).
- flag (array(numtimestamps,numchannels,numbaselines) of boolean.) – Input flags (flagged data have weight zero before averaging).
- timestamps (array(numtimestamps) of int.) – The timestamps (in mjd seconds) corresponding to the input data.
- channel_freqs (array(numchannels) of int.) – The frequencies (in Hz) corresponding to the input channels.
- timeav (int.) – The desired averaging size in timestamps.
- chanav (int.) – The desired averaging size in channels.
- flagav (bool) – Flagged averaged data in when there is a single flag in the bin if true. Only flag averaged data when all data in the bin is flagged if false.
Returns: - av_vis (array(int(numtimestamps/timeav),int(numchannels/chanav)) of complex64.)
- av_weight (array(int(numtimestamps/timeav),int(numchannels/chanav)) of float32.)
- av_flag (array(int(numtimestamps/timeav),int(numchannels/chanav)) of boolean.)
- av_mjd (array(int(numtimestamps/timeav)) of int.)
- av_freq (array(int(numchannels)/chanav) of int.)
katdal.categorical module¶
Container for categorical (i.e. non-numerical) sensor data and related tools.
-
class
katdal.categorical.
ComparableArrayWrapper
(value)¶ Bases:
object
Wrapper that improves comparison of array objects.
This wrapper class has two main benefits:
- It prevents sensor values that are NumPy ndarrays themselves (or array-like objects such as tuples and lists) from dissolving and losing their identity when they are assembled into an array.
- It ensures that array-valued sensor values become properly comparable (avoiding array-valued booleans resulting from standard comparisons).
The former is needed because
SensorGetter
is treated as a structured array even if it contains object values. The latter is needed because the equality operator crops up in hard-to-reach places like inside list.index().Parameters: value (object) – The sensor value to be wrapped -
static
unwrap
(v)¶ Unwrap value if needed.
-
katdal.categorical.
infer_dtype
(values)¶ Figure out dtype of sequence of sensor values.
The common dtype is determined by explicit NumPy promotion. If the values are array-like themselves, treat them as opaque objects to simplify sensor processing. If the sequence is empty, the dtype is unknown and set to None. In addition, short-circuit to an actual dtype for objects with this attribute to simplify calling this on a mixed collection of sensor data.
Parameters: values (sequence, or object with dtype) – Sequence of sensor values (typically a list), or a sensor data object with a dtype attribute (like ndarray or SensorGetter
)Returns: dtype – Inferred dtype, or None if values is an empty sequence Return type: numpy.dtype
object or NoneNotes
This is almost, but not quite, entirely like
numpy.result_type()
. The differences are that this accepts generic objects in the sequence, treats ndarrays as objects regardless of their underlying dtype, supports a dtype of None and short-circuits the check if the sequence itself is an object with a dtype. And this accepts the sequence as the first parameter as opposed to being unpacked across the argument list.
-
katdal.categorical.
unique_in_order
(elements, return_inverse=False)¶ Extract unique elements from elements while preserving original order.
Parameters: - elements (sequence) – Sequence of equality-comparable objects
- return_inverse ({False, True}, optional) – If True, also return sequence of indices that can be used to reconstruct original elements sequence via [unique_elements[i] for i in inverse]
Returns: - unique_elements (list) – List of unique objects, in original order they were found in elements
- inverse (array of int, optional) – If return_inverse is True, sequence of indices that can be used to reconstruct original sequence
-
class
katdal.categorical.
CategoricalData
(sensor_values, events)¶ Bases:
object
Container for categorical (i.e. non-numerical) sensor data.
This container allows simple manipulation and interpolation of a time series of non-numerical data represented as discrete events. The data is stored as a list of sensor values and two integer arrays:
- unique_values stores one copy of each unique object in the data series
- events stores the time indices (dumps) where each event occurs
- indices stores indices linking each event to the unique_values list
The __getitem__ interface (i.e. data[dump]) returns the data associated with the last event before the requested dump(s), in effect doing a zeroth-order interpolation of the data at each event. Events can be added and removed and realigned, and the container can be split along the time axis, amongst other functionality.
Parameters: - sensor_values (sequence, length N) – Sequence of sensor values (of any type, preferably not None [see Notes])
- events (sequence of non-negative ints, length N + 1) – Corresponding monotonic sequence of dump indices where each sensor value came into effect. The last event is one past the last dump where the final sensor value applied, and therefore equal to the total number of dumps for which sensor values were specified.
-
unique_values
¶ List of unique sensor values in order they were found in sensor_values with any
ComparableArrayWrapper
objects unwrappedType: list, length M
-
indices
¶ Array of indices into unique_values, one per sensor event
Type: array of int, shape (N,)
-
dtype
¶ Sensor data type as NumPy dtype (found on demand from unique_values)
Type: numpy.dtype
object
Notes
Any object values wrapped in a
ComparableArrayWrapper
will be unwrapped before adding it to unique_values. When adding, removing and comparing values to this container, any object values will be wrapped again temporarily to ensure proper comparisons.It is discouraged to have a sensor value of None as this value is given a special meaning in methods such as
CategoricalData.add()
andsensor_to_categorical()
. On the other hand, it is the most sensible dummy object value and any Nones entering through this initialiser will probably not cause any issues.It is better to make unique_values a list instead of an array because an array assimilates objects such as tuples, lists and other arrays. The alternative is an array of
ComparableArrayWrapper
objects but these then need to be unpacked at some later stage which is also tricky.-
dtype
Sensor value type.
-
segments
()¶ Generator that iterates through events and returns segment and value.
Yields: - segment (slice object) – The slice representing range of dump indices of the current segment
- value (object) – Sensor value associated with segment
-
add
(event, value=None)¶ Add or override sensor event.
This adds a new event to the container, with a new value or a duplicate of the existing value at that dump. If the new event coincides with an existing one, it overrides the value at that dump.
Parameters: - event (int) – Dump of event to add or override
- value (object, optional) – New value for event (duplicate current value at this dump by default)
-
remove
(value)¶ Remove sensor value, remapping indices and merging segments in process.
If the sensor value does not exist, do nothing.
Parameters: value (object) – Sensor value to remove from container
-
add_unmatched
(segments, match_dist=1)¶ Add duplicate events for segment starts that don’t match sensor events.
Given a sequence of segments, this matches each segment start to the nearest sensor event dump (within match_dist). Any unmatched segment starts are added as duplicate sensor events (or ignored if they fall outside the sensor event range).
Parameters: - segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
- match_dist (int, optional) – Maximum distance in dumps that signify a match between events
-
align
(segments)¶ Align sensor events with segment starts, possibly discarding events.
Given a sequence of segments, this moves each sensor event dump onto the nearest segment start. If more than one event ends up in the same segment, only keep the last event, discarding the rest.
The end result is that the sensor event dumps become a subset of the segment starts and there cannot be more sensor events than segments.
Parameters: segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment
-
partition
(segments)¶ Partition dataset into multiple sets along time axis.
Given a sequence of segments, split the container into a sequence of containers, one per segment. Each container contains only the events occurring within its corresponding segment, with event dumps relative to the start of the segment, and the containers share the same unique values.
Parameters: segments (sequence of int) – Monotonically increasing sequence of segment starts, including an extra element at the end that is one past the end of the last segment Returns: split_data – Resulting multiple datasets in chronological order Return type: sequence of CategoricalData
objects
-
remove_repeats
()¶ Remove repeated events of the same value.
-
katdal.categorical.
concatenate_categorical
(split_data, **kwargs)¶ Concatenate multiple categorical datasets into one along time axis.
Join a sequence of categorical datasets together, by forming a common set of unique values, remapping events to these and incrementing the event dumps of each dataset to start off where the previous dataset ended.
Parameters: split_data (sequence of CategoricalData
objects) – Sequence of containers to concatenateReturns: data – Concatenated dataset Return type: CategoricalData
object
-
katdal.categorical.
sensor_to_categorical
(sensor_timestamps, sensor_values, dump_midtimes, dump_period, transform=None, initial_value=None, greedy_values=None, allow_repeats=False, **kwargs)¶ Align categorical sensor events with dumps and clean up spurious events.
This converts timestamped sensor data into a categorical dataset by comparing the sensor timestamps to a series of dump timestamps and assigning each sensor event to the dump in which it occurred. When multiple sensor events happen in the same dump, only the last one is kept. The first dump is guaranteed to have a valid value by either using the supplied initial_value or extrapolating the first proper value back in time. The sensor data may be transformed before events that repeat values are potentially discarded. Finally, events with values marked as “greedy” take precedence over normal events when both occur within the same dump (either changing from or to the greedy value, or if the greedy value occurs completely within a dump).
XXX Future improvements include picking the event with the longest duration within a dump as opposed to the final event, and “snapping” event boundaries to dump boundaries with a given tolerance (e.g. 5-10% of dump period).
Parameters: - sensor_timestamps (sequence of float, length M) – Sequence of sensor timestamps (typically UTC seconds since Unix epoch)
- sensor_values (sequence, length M) – Corresponding sequence of sensor values [potentially wrapped]
- dump_midtimes (sequence of float, length N) – Sequence of dump midtimes (same reference as sensor timestamps)
- dump_period (float) – Duration of each dump, in seconds
- transform (callable or None, optional) – Transform [unwrapped] sensor values before fixing initial value, mapping dumps to events and discarding repeats
- initial_value (object or None, optional) – Sensor value [transformed, unwrapped] to use for dump = 0 up to first proper event (force first proper event to start at dump = 0 by default)
- greedy_values (sequence or None, optional) – List of [transformed, unwrapped] sensor values considered “greedy”
- allow_repeats ({False, True}, optional) – If False, discard sensor events that do not change [transformed] value
Returns: data – Constructed categorical dataset [unwraps any wrapped values]
Return type: CategoricalData
object
katdal.chunkstore module¶
Base class for accessing a store of chunks (i.e. N-dimensional arrays).
-
exception
katdal.chunkstore.
ChunkStoreError
¶ Bases:
Exception
“Base class for all standard ChunkStore errors.
Bases:
OSError
,katdal.chunkstore.ChunkStoreError
Could not access underlying storage medium (offline, auth failed, etc).
-
exception
katdal.chunkstore.
ChunkNotFound
¶ Bases:
KeyError
,katdal.chunkstore.ChunkStoreError
The store was accessible but a chunk with the given name was not found.
-
exception
katdal.chunkstore.
BadChunk
¶ Bases:
ValueError
,katdal.chunkstore.ChunkStoreError
The chunk is malformed, e.g. bad dtype, actual shape differs from requested.
-
class
katdal.chunkstore.
PlaceholderChunk
(shape, dtype)¶ Bases:
object
Chunk returned to indicate missing data.
-
katdal.chunkstore.
generate_chunks
(shape, dtype, max_chunk_size, dims_to_split=None, power_of_two=False, max_dim_elements=None)¶ Generate dask chunk specification from ndarray parameters.
Parameters: - shape (sequence of int) – Array shape
- dtype (
numpy.dtype
object or equivalent) – Array data type - max_chunk_size (float or int) – Upper limit on chunk size (if allowed by dims_to_split), in bytes
- dims_to_split (sequence of int, optional) – Indices of dimensions that may be split into chunks (default all dims)
- power_of_two (bool, optional) – True if chunk size should be rounded down to a power of two (the last chunk size along each dimension will potentially be smaller)
- max_dim_elements (dict, optional) – Maximum number of elements on each dimension (each key is a dimension index). Dimensions that are not in dims_to_split are ignored.
Returns: chunks – Dask chunk specification, indicating chunk sizes along each dimension
Return type: tuple of tuple of int
-
katdal.chunkstore.
npy_header_and_body
(chunk)¶ Prepare a chunk for low-level writing.
Returns the .npy header and a view of the chunk corresponding to that header. The two should be concatenated (as buffer objects) to form a valid .npy file.
This is useful for high-performance code, as it allows a chunk to be encoded as a .npy file more efficiently than saving to a
io.BytesIO
.
-
class
katdal.chunkstore.
ChunkStore
(error_map=None)¶ Bases:
object
Base class for accessing a store of chunks (i.e. N-dimensional arrays).
A chunk is a simple (i.e. unit-stride) slice of an N-dimensional array known as its parent array. The array is identified by a string name, while the chunk within the array is identified by a sequence of slice objects which may be used to extract the chunk from the array. The array is a
numpy.ndarray
object with an associated dtype.The basic idea is that the chunk store contains multiple arrays addressed by name. The list of available arrays and all array metadata (shape, chunks and dtype) are stored elsewhere. The metadata is used to identify chunks, while the chunk store takes care of storing and retrieving bytestrings of actual chunk data. These are packaged back into NumPy arrays for the user. Each array can only be stored once, with a unique chunking scheme (i.e. different chunking of the same data is disallowed).
The naming scheme for arrays and chunks is reasonably generic but has some restrictions:
Names are treated like paths with components and a standard separator
The chunk name is formed by appending a string of indices to the array name
It is discouraged to have an array name that is a prefix of another name
Each chunk store has its own restrictions on valid characters in names: some treat names as URLs while others treat them as filenames. A safe choice for name components should be the valid characters for S3 buckets (also including underscores for non-bucket components):
VALID_BUCKET = re.compile(r’^[a-z0-9][a-z0-9.-]{2,62}$’)
Parameters: error_map (dict mapping Exception
toException
, optional) – Dict that maps store-specific errors to standard ChunkStore errors-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
get_chunk_or_default
(array_name, slices, dtype, default_value=0)¶ Get chunk from the store but return default value if it is missing.
-
get_chunk_or_placeholder
(array_name, slices, dtype)¶ Get chunk from the store but return a
PlaceholderChunk
if it is missing.
-
create_array
(array_name)¶ Create a new array if it does not already exist.
Parameters: array_name (string) – Identifier of array Raises: chunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
put_chunk_noraise
(array_name, slices, chunk)¶ Put chunk into store but return any exceptions instead of raising.
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
-
NAME_SEP
= '/'¶
-
NAME_INDEX_WIDTH
= 5¶
-
classmethod
join
(*names)¶ Join components of chunk name with supported separator.
-
classmethod
split
(name, maxsplit=-1)¶ Split chunk name into components based on supported separator.
-
classmethod
chunk_id_str
(slices)¶ Chunk identifier in string form (e.g. ‘00012_01024_00000’).
-
classmethod
chunk_metadata
(array_name, slices, chunk=None, dtype=None)¶ Turn array name and chunk identifier into chunk name and shape.
Form the full chunk name from array_name and slices and extract the chunk shape from slices, validating it in the process. If chunk or dtype is given, check that chunk is commensurate with slices and that dtype contains no objects which would cause nasty segfaults.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object, optional) – Actual chunk data as ndarray (used to validate shape / dtype) - dtype (
numpy.dtype
object or equivalent, optional) – Data type of array x (used for validation only)
Returns: - chunk_name (string) – Full chunk name used to find chunk in underlying storage medium
- shape (tuple of int) – Chunk shape tuple associated with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunk, or any dtype contains objects
-
get_dask_array
(array_name, chunks, dtype, offset=(), index=(), errors=0)¶ Get dask array from the store.
Handling of missing chunks is determined by the errors argument.
Parameters: - array_name (string) – Identifier of array in chunk store
- chunks (tuple of tuples of ints) – Chunk specification
- dtype (
numpy.dtype
object or equivalent) – Data type of array - offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
- errors (number or 'raise' or 'placeholder', optional) –
Error handling. If ‘raise’, exceptions are passed through, causing the evaluation to fail.
If ‘placeholder’, returns instances of
PlaceholderChunk
in place of missing chunks. Note that such an array cannot be used as-is, because an ndarray is expected, but it can be used as raw material for building new graphs via functions likeda.map_blocks()
.If a numeric value, it is used as a default value.
Returns: array – Dask array of given dtype
Return type: dask.array.Array
object
-
put_dask_array
(array_name, array, offset=())¶ Put dask array into the store.
Parameters: - array_name (string) – Identifier of array in chunk store
- array (
dask.array.Array
object) – Dask input array - offset (tuple of int, optional) – Offset to add to each dimension when addressing chunks in store
Returns: success – Dask array of objects indicating success of transfer of each chunk (None indicates success, otherwise there is an exception object)
Return type: dask.array.Array
object
katdal.chunkstore_dict module¶
A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.
-
class
katdal.chunkstore_dict.
DictChunkStore
(**kwargs)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on a dict of arrays.
This interprets all keyword arguments as NumPy arrays and stores them in an arrays dict. Each array is identified by its corresponding keyword. New arrays cannot be added via
put()
- they all need to be in place at store initialisation (or can be added afterwards via direct insertion into the arrays dict). The put method is only useful for in-place modification of existing arrays.-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ Create a new array if it does not already exist.
Parameters: array_name (string) – Identifier of array Raises: chunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
katdal.chunkstore_npy module¶
A store of chunks (i.e. N-dimensional arrays) based on NPY files.
-
class
katdal.chunkstore_npy.
NpyFileChunkStore
(path, direct_write=False)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on NPY files.
Each chunk is stored in a separate binary file in NumPy
.npy
format. The filename is constructed as“<path>/<array>/<idx>.npy”where “<path>” is the chunk store directory specified on construction, “<array>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”).
For a description of the
.npy
format, seenumpy.lib.format
or the relevant NumPy Enhancement Proposal here.Parameters: - path (string) – Top-level directory that contains NPY files of chunk store
- direct_write (bool) – If true, use
O_DIRECT
when writing the file. This bypasses the OS page cache, which can be useful to avoid filling it up with files that won’t be read again.
Raises: chunkstore.StoreUnavailable
– If path does not exist / is not readablechunkstore.StoreUnavailable
– If direct_write was requested but is not available
-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ See the docstring of
ChunkStore.create_array()
.
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
katdal.chunkstore_s3 module¶
A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.
-
exception
katdal.chunkstore_s3.
S3ObjectNotFound
¶ Bases:
katdal.chunkstore.ChunkNotFound
An object / bucket was not found in S3 object store.
-
exception
katdal.chunkstore_s3.
S3ServerGlitch
¶ Bases:
katdal.chunkstore.ChunkNotFound
S3 chunk store responded with an HTTP error deemed to be temporary.
-
katdal.chunkstore_s3.
read_array
(fp)¶ Read a numpy array in npy format from a file descriptor.
This is the same concept as
numpy.lib.format.read_array()
, but optimised for the case of reading fromhttp.client.HTTPResponse
. Using the numpy function reads pieces out then copies them into the array, while this implementation uses readinto. RaiseTruncatedRead
if the response runs out of data before the array is complete.It does not allow pickled dtypes.
-
exception
katdal.chunkstore_s3.
AuthorisationFailed
¶ Bases:
katdal.chunkstore.StoreUnavailable
Authorisation failed, e.g. due to invalid, malformed or expired token.
-
exception
katdal.chunkstore_s3.
InvalidToken
(token, message)¶ Bases:
katdal.chunkstore_s3.AuthorisationFailed
Invalid JSON Web Token (JWT).
-
katdal.chunkstore_s3.
decode_jwt
(token)¶ Decode JSON Web Token (JWT) string and extract claims.
The MeerKAT archive uses JWT bearer tokens for authorisation. Each token is a JSON Web Signature (JWS) string with a payload of claims. This function extracts the claims as a dict, while also doing basic checks on the token (mostly to catch copy-n-paste errors). The signature is decoded but not validated, since that would require the server secrets.
Parameters: token (str) – JWS Compact Serialization as an ASCII string (native string, not bytes) Returns: claims – The JWT Claims Set as a dict of key-value pairs Return type: dict Raises: InvalidToken
– If the token is malformed or truncated, or has expired
-
class
katdal.chunkstore_s3.
S3ChunkStore
(url, timeout=(30, 300), retries=2, token=None, credentials=None, public_read=False, expiry_days=0, **kwargs)¶ Bases:
katdal.chunkstore.ChunkStore
A store of chunks (i.e. N-dimensional arrays) based on the Amazon S3 API.
This object encapsulates the S3 client / session and its underlying connection pool, which allows subsequent get and put calls to share the connections.
The full identifier of each chunk (the “chunk name”) is given by
“<bucket>/<path>/<idx>”where “<bucket>” refers to the relevant S3 bucket, “<bucket>/<path>” is the name of the parent array of the chunk and “<idx>” is the index string of each chunk (e.g. “00001_00512”). The corresponding S3 key string of a chunk is “<path>/<idx>.npy” which reflects the fact that the chunk is stored as a string representation of an NPY file (complete with header).
Parameters: - url (str) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’. It can be specified as either bytes or unicode, and is converted to the native string type with UTF-8. The URL may also contain a path if this store is relative to an existing bucket, in which case the chunk name is a relative path (useful for unit tests).
- timeout (float or None or tuple of 2 floats or None's, optional) – Connect / read timeout, in seconds, either a single value for both or custom values as (connect, read) tuple. None means “wait forever”…
- retries (int or tuple of 2 ints or
urllib3.util.retry.Retry
, optional) – Number of connect / read retries, either a single value for both or custom values as (connect, read) tuple, or a Retry object for full customisation (including status retries). - token (str, optional) – Bearer token to authenticate
- credentials (tuple of str, optional) – AWS access key and secret key to authenticate
- public_read (bool, optional) – If set to true, new buckets will be created with a policy that allows everyone (including unauthenticated users) to read the data.
- expiry_days (int, optional) – If set to a value greater than 0 will set a future expiry time in days for any new buckets created.
- kwargs (dict) – Extra keyword arguments (unused)
Raises: chunkstore.StoreUnavailable
– If S3 server interaction failed (it’s down, no authentication, etc)-
request
(method, url, process=<function S3ChunkStore.<lambda>>, chunk_name='', ignored_errors=(), timeout=(), retries=None, **kwargs)¶ Send HTTP request to S3 server, process response and retry if needed.
This retries temporary HTTP errors, including reset connections while processing a successful response.
Parameters: - url (method,) – The standard required parameters of
requests.Session.request()
- process (function, signature
result = process(response)
, optional) – Function that will process response (just return response by default) - chunk_name (str, optional) – Name of chunk, used for error reporting only
- ignored_errors (collection of int, optional) – HTTP status codes that are treated like 200 OK, not raising an error
- timeout (float or None or tuple of 2 floats or None's, optional) – Override timeout for this request (use the store timeout by default)
- retries (int or tuple of 2 ints or
urllib3.util.retry.Retry
, optional) – Override retries for this request (use the store retries by default) - kwargs (optional) – These are passed on to
requests.Session.request()
Returns: result – The output of the process function applied to a successful response
Return type: object
Raises: AuthorisationFailed
– If the request is not authorised by appropriate token or credentialsS3ObjectNotFound
– If S3 object request fails because it does not existS3ServerGlitch
– If S3 object request fails because server is temporarily overloadedStoreUnavailable
– If a general HTTP error occurred that is not ignored
- url (method,) – The standard required parameters of
-
get_chunk
(array_name, slices, dtype)¶ Get chunk from the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- dtype (
numpy.dtype
object or equivalent) – Data type of array x
Returns: chunk – Chunk as ndarray with dtype dtype and shape dictated by slices
Return type: numpy.ndarray
objectRaises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If requested dtype does not match underlying parent array dtype or stored buffer has wrong size / shape compared to sliceschunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If requested chunk was not found in store
-
create_array
(array_name)¶ See the docstring of
ChunkStore.create_array()
.
-
put_chunk
(array_name, slices, chunk)¶ Put chunk into the store.
Parameters: - array_name (string) – Identifier of parent array x of chunk
- slices (sequence of unit-stride slice objects) – Identifier of individual chunk, to be extracted as x[slices]
- chunk (
numpy.ndarray
object) – Chunk as ndarray with shape commensurate with slices
Raises: TypeError
– If slices is not a sequence of slice(start, stop, 1) objectschunkstore.BadChunk
– If the shape implied by slices does not match that of chunkchunkstore.StoreUnavailable
– If interaction with chunk store failed (offline, bad auth, bad config)chunkstore.ChunkNotFound
– If array_name is incompatible with store
-
mark_complete
(array_name)¶ Write a special object to indicate that array_name is finished.
This operation is idempotent.
The array_name need not correspond to any array written with
put_chunk()
. This has no effect on katdal, but a producer can call this method to provide a hint to a consumer that no further data will be coming for this array. When arrays are arranged in a hierarchy, a producer and consumer may agree to write a single completion marker at a higher level of the hierarchy rather than one per actual array.It is not necessary to call
create_array()
first; the implementation will do so if appropriate.The presence of this marker can be checked with
is_complete()
.
-
is_complete
(array_name)¶ Check whether
mark_complete()
has been called for this array.
katdal.concatdata module¶
Class for concatenating visibility data sets.
-
exception
katdal.concatdata.
ConcatenationError
¶ Bases:
Exception
Sequence of objects could not be concatenated due to incompatibility.
-
class
katdal.concatdata.
ConcatenatedLazyIndexer
(indexers, transforms=None)¶ Bases:
katdal.lazy_indexer.LazyIndexer
Two-stage deferred indexer that concatenates multiple indexers.
This indexer concatenates a sequence of indexers along the first (i.e. time) axis. The index specification is broken down into chunks along this axis, sent to the applicable underlying indexers and the returned data are concatenated again before returning it.
Parameters: - indexers (sequence of
LazyIndexer
objects and/or arrays) – Sequence of indexers or raw arrays to be concatenated - transforms (list of
LazyTransform
objects or None, optional) – Extra chain of transforms to be applied to data after final indexing
-
name
¶ Name of first non-empty indexer (or empty string otherwise)
Type: string
Raises: InvalidTransform
– If transform chain does not obey restrictions on changing the data shape- indexers (sequence of
-
katdal.concatdata.
common_dtype
(sensor_data_sequence)¶ The dtype suitable to store all sensor data values in the given sequence.
This extracts the dtypes of a sequence of sensor data objects and finds the minimal dtype to which all of them may be safely cast using NumPy type promotion rules (which will typically be the dtype of a concatenation of the values).
Parameters: sensor_data_sequence (sequence of extracted sensor data objects) – These objects may include numpy.ndarray
andCategoricalData
Returns: dtype – The promoted dtype of the sequence, or None if sensor_data_sequence is empty Return type: numpy.dtype
object
-
class
katdal.concatdata.
ConcatenatedSensorGetter
(data)¶ Bases:
katdal.sensordata.SensorGetter
The concatenation of multiple raw (uncached) sensor data sets.
This is a convenient container for returning raw (uncached) sensor data sets from a
ConcatenatedSensorCache
object. It only accesses the underlying data sets when explicitly asked to via theget()
interface, but provides quick access to metadata such as sensor name.Parameters: data (sequence of SensorGetter
) – Uncached sensor data-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.concatdata.
ConcatenatedSensorCache
(caches, keep=None)¶ Bases:
katdal.sensordata.SensorCache
Sensor cache that is a concatenation of multiple underlying caches.
This concatenates a sequence of sensor caches along the time axis and makes them appear like a single sensor cache. The combined cache contains a superset of all actual and virtual sensors found in the underlying caches and replaces any missing sensor data with dummy values.
Parameters: - caches (sequence of
SensorCache
objects) – Sequence of underlying caches to be concatenated - keep (sequence of bool, optional) – Default (global) time selection specification as boolean mask that will be applied to sensor data (this can be disabled on data retrieval)
-
get
(name, select=False, extract=True, **kwargs)¶ Sensor values interpolated to correlator data timestamps.
Retrieve raw (uncached) or cached sensor data from each underlying cache and concatenate the results along the time axis.
Parameters: - name (string) – Sensor name
- select ({False, True}, optional) – True if preset time selection will be applied to returned data
- extract ({True, False}, optional) – True if sensor data should be extracted from store and cached
- kwargs (dict, optional) – Additional parameters are passed to underlying sensor caches
Returns: data – If extraction is disabled, this will be a
SensorGetter
object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as thetimestamps
attribute) for numerical data, and aCategoricalData
object for categorical data.Return type: array or
CategoricalData
orSensorGetter
objectRaises: KeyError
– If sensor name was not found in cache and did not match virtual template
- caches (sequence of
-
class
katdal.concatdata.
ConcatenatedDataSet
(datasets)¶ Bases:
katdal.dataset.DataSet
Class that concatenates existing visibility data sets.
This provides a single DataSet interface to a list of concatenated data sets. Where possible, identical targets, subarrays, spectral windows and observation sensors are merged. For more information on attributes, see the
DataSet
docstring.Parameters: datasets (sequence of DataSet
objects) – List of existing data sets-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of selection on it.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
-
katdal.dataset module¶
Base class for accessing a visibility data set.
-
exception
katdal.dataset.
WrongVersion
¶ Bases:
Exception
Trying to access data using accessor class with the wrong version.
-
exception
katdal.dataset.
BrokenFile
¶ Bases:
Exception
Data set could not be loaded because file is inconsistent or misses critical bits.
-
class
katdal.dataset.
Subarray
(ants, corr_products)¶ Bases:
object
Subarray specification.
A subarray is determined by the specific correlation products produced by the correlator and the antenna objects associated with the inputs found in the correlation products.
Parameters: - ants (sequence of
katpoint.Antenna
objects) – List of antenna objects, culled to contain only antennas found in corr_products - corr_products (sequence of (string, string) pairs, length B) – Correlation products as pairs of input labels, e.g. (‘ant1h’, ‘ant2v’), exposed as an array of strings with shape (B, 2)
-
inputs
¶ List of correlator input labels found in corr_products, e.g. ‘ant1h’
Type: list of strings
- ants (sequence of
-
katdal.dataset.
parse_url_or_path
(url_or_path)¶ Parse URL into components, converting path to absolute file URL.
Parameters: url_or_path (string) – URL, or filesystem path if there is no scheme Returns: url_parts – Components of the parsed URL (‘file’ scheme will have an absolute path) Return type: urllib.parse.ParseResult
-
class
katdal.dataset.
DataSet
(name, ref_ant='', time_offset=0.0, url='')¶ Bases:
object
Base class for accessing a visibility data set.
This provides a simple interface to a generic file (or files) containing visibility data (both single-dish and interferometer data supported). The data are not loaded into memory on opening the file, but are accessible via properties after typically selecting a subset of the data. This allows the reading of huge files.
Parameters: - name (string) – Name / identifier of data set
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use by script)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- url (string, optional) – Location of data set (either local filename or full URL accepted)
-
version
¶ Format version string
Type: string
-
observer
¶ Name of person that recorded the data set
Type: string
-
description
¶ Short description of the purpose of the data set
Type: string
-
experiment_id
¶ Experiment ID, a unique string used to link the data files of an experiment together with blog entries, etc.
Type: string
-
obs_params
¶ Observation parameters, typically set in observation script
Type: dict mapping string to string or list of strings
-
obs_script_log
¶ Observation script output log (useful for debugging)
Type: list of strings
-
subarrays
¶ List of all subarrays in data set
Type: list of SubArray
objects
-
subarray
¶ Index of currently selected subarray
Type: int
-
ants
¶ List of selected antennas
Type: list of katpoint.Antenna
objects
-
inputs
¶ List of selected correlator input labels (‘ant1h’)
Type: array of strings
-
corr_products
¶ Array of selected correlation products as pairs of input labels (e.g. [(‘ant1h’, ‘ant1h’), (‘ant1h’, ‘ant2h’)])
Type: array of strings, shape (B, 2)
-
receivers
¶ Identifier of the active receiver on each antenna
Type: dict mapping string to string or list of strings
-
spectral_windows
¶ List of all spectral windows in data set
Type: list of SpectralWindow
objects
-
spw
¶ Index of currently selected spectral window
Type: int
-
channel_width
¶ Channel bandwidth of selected spectral window, in Hz
Type: float
-
freqs / channel_freqs
Centre frequency of each selected channel, in Hz
Type: array of float, shape (F,)
-
channels
¶ Original channel indices of selected channels
Type: array of int, shape (F,)
-
dump_period
¶ Dump period, in seconds
Type: float
-
sensor
¶ Sensor cache
Type: SensorCache
object
-
catalogue
¶ Catalogue of all targets / sources / fields in data set
Type: katpoint.Catalogue
object
-
start_time
¶ Timestamp of start of first sample in file, in UT seconds since Unix epoch
Type: katpoint.Timestamp
object
-
end_time
¶ Timestamp of end of last sample in file, in UT seconds since Unix epoch
Type: katpoint.Timestamp
object
-
dumps
¶ Original dump indices of selected dumps
Type: array of int, shape (T,)
-
scan_indices
¶ List of currently selected scans as indices
Type: list of int
-
compscan_indices
¶ List of currently selected compound scans as indices
Type: list of int
-
target_indices
¶ List of currently selected targets as indices into catalogue
Type: list of int
-
target_projection
¶ Type of spherical projection for target coordinates
Type: {‘ARC’, ‘SIN’, ‘TAN’, ‘STG’, ‘CAR’}, optional
-
target_coordsys
¶ Spherical pointing coordinate system for target coordinates
Type: {‘azel’, ‘radec’}, optional
-
shape
¶ Shape of selected visibility data array, as (T, F, B)
Type: tuple of 3 ints
-
size
¶ Size of selected visibility data array, in bytes
Type: int
-
applycal_products
¶ List of calibration products that will be applied to data
Type: list of string
-
select
(**kwargs)¶ Select subset of data, based on time / frequency / corrprod filters.
This applies a set of selection criteria to the data set, which updates the data set properties and attributes to match the selection. In other words, the
timestamps()
andvis()
methods will return the selected subset of the data, while attributes such asants
,channel_freqs
andshape
are updated. The sensor cache will also return the selected subset of sensor data via the __getitem__ interface. This function returns nothing, but modifies the existing data set in-place.The selection criteria are divided into groups, based on whether they affect the time, frequency or correlation product dimension:
* Time: `dumps`, `timerange`, `scans`, `compscans`, `targets` * Frequency: `channels`, `freqrange` * Correlation product: `corrprods`, `ants`, `inputs`, `pol`
The subarray and spw criteria are special, as they affect multiple dimensions (time + correlation product and time + frequency, respectively), are always active and are forced to be a single index.
If there are multiple criteria on the same dimension within a select() call, they are ANDed together, while multiple items within the same criterion (e.g. targets=[‘Hyd A’, ‘Vir A’]) are ORed together. When a second select() call is done, all new selections replace previous selections on the same dimension, while existing selections on other dimensions are preserved. The reset parameter finetunes this behaviour.
If
select()
is called without any parameters the selection is reset to the original data set.In addition, the weights and flags criteria are lists of names that select which weights and flags to include in the corresponding data set property.
Parameters: - strict ({True, False}, optional) – True if select() raises TypeError if it encounters an unknown kwarg
- dumps (int or slice or sequence of ints or sequence of bools, optional) – Select dumps by index, slice or boolean mask of length T (keep dumps where mask is True)
- timerange (sequence of 2
katpoint.Timestamp
objects) – or equivalent, optional Select range of times between given start and end times - scans (int or string or sequence, optional) – Select scans by index or state (or negate state by prepending ‘~’)
- compscans (int or string or sequence, optional) – Select compscans by index or label (or negate label by prepending ‘~’)
- targets (int or string or
katpoint.Target
object or sequence,) – optional Select targets by index or name or description or object - spw (int, optional) – Select spectral window by index (only one may be active)
- channels (int or slice or sequence of ints or sequence of bools, optional) – Select frequency channels by index, slice or boolean mask of length F (keep channels where mask is True)
- freqrange (sequence of 2 floats, optional) – Select range of frequencies between start and end frequencies, in Hz
- subarray (int, optional) – Select subarray by index (only one may be active)
- corrprods (int or slice or sequence of ints or sequence of bools or) – sequence of string pairs or {‘auto’, ‘cross’}, optional Select correlation products by index, slice or boolean mask of length B (keep products where mask is True). Alternatively, select by value via a sequence of string pairs, or select all autocorrelations via ‘auto’ or all cross-correlations via ‘cross’.
- ants (string or
katpoint.Antenna
object or sequence, optional) – Select antennas by name or object. If all antennas specified are prefaced by a ~ this is treated as a deselection and these antennas are excluded. - inputs (string or sequence of strings, optional) – Select inputs by label
- pol (string or sequence of strings) – {‘H’, ‘V’, ‘HH’, ‘VV’, ‘HV’, ‘VH’}, optional Select polarisation terms
- weights ('all' or string or sequence of strings, optional) – List of names of weights to be multiplied together, as a sequence or string of comma-separated names (combine all weights by default)
- flags ('all' or string or sequence of strings, optional) – List of names of flags that will be OR’ed together, as a sequence or string of comma-separated names (use all flags by default). An empty string or sequence discards all flags.
- reset ({'auto', '', 'T', 'F', 'B', 'TF', 'TB', 'FB', 'TFB'}, optional) – Remove existing selections on specified dimensions before applying the new selections. The default ‘auto’ option clears those dimensions that will be modified by the new selections and leaves the selections on unaffected dimensions intact except if select is called without any parameters, in which case all selections are cleared. By setting reset to ‘’, new selections apply on top of existing selections.
Raises: TypeError
– If a keyword argument is unknown and strict is enabledIndexError
– If spw or subarray is out of range
-
scans
()¶ Generator that iterates through scans in data set.
This iterates through the currently selected list of scans, returning the scan index, scan state and associated target object. In addition, after each iteration the data set will reflect the scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current scan. The scan selection applies on top of any existing selection.
Yields: - scan (int) – Scan index
- state (string) – Scan state
- target (
katpoint.Target
object) – Target associated with scan
-
compscans
()¶ Generator that iterates through compound scans in data set.
This iterates through the currently selected list of compound scans, returning the compound scan index, label and the first associated target object. In addition, after each iteration the data set will reflect the compound scan selection, i.e. the timestamps, visibilities, sensor values, etc. will be those of the current compound scan. The compound scan selection applies on top of any existing selection.
Yields: - compscan (int) – Compound scan index
- label (string) – Compound scan label
- target (
katpoint.Target
object) – First target associated with compound scan
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and corrprod.
The visibility data are returned as an array of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.
-
flags
¶ Visibility flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
-
mjd
¶ Visibility timestamps in Modified Julian Days (MJD).
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
lst
¶ Local sidereal time at the reference antenna in hours.
The sidereal times are returned in an array of float, shape (T,).
-
az
¶ Azimuth angle of each dish in degrees.
The azimuth angles are returned in an array of float, shape (T, A).
-
el
¶ Elevation angle of each dish in degrees.
The elevation angles are returned in an array of float, shape (T, A).
-
ra
¶ Right ascension of the actual pointing of each dish in J2000 degrees.
The right ascensions are returned in an array of float, shape (T, A).
-
dec
¶ Declination of the actual pointing of each dish in J2000 degrees.
The declinations are returned in an array of float, shape (T, A).
-
parangle
¶ Parallactic angle of the actual pointing of each dish in degrees.
The parallactic angle is the position angle of the observer’s vertical on the sky, measured from north toward east. This is the angle between the great-circle arc connecting the celestial North pole to the dish pointing direction, and the great-circle arc connecting the zenith above the antenna to the pointing direction, or the angle between the hour circle and vertical circle through the pointing direction. It is returned as an array of float, shape (T, A).
-
target_x
¶ Target x coordinate of each dish in degrees.
The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the
target_projection
andtarget_coordsys
attributes, respectively. The target x coordinates are returned as an array of float, shape (T, A).
-
target_y
¶ Target y coordinate of each dish in degrees.
The target coordinates are projections of the spherical coordinates of the dish pointing direction to a plane with the target position at the origin. The type of projection (e.g. ARC, SIN, etc.) and spherical pointing coordinate system (e.g. azel or radec) can be set via the
target_projection
andtarget_coordsys
attributes, respectively. The target y coordinates are returned as an array of float, shape (T, A).
-
u
¶ U coordinate for each correlation product in metres.
This calculates the u coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(u_1 - u_2\) for baseline (ant1, ant2).
-
v
¶ V coordinate for each correlation product in metres.
This calculates the v coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B). The sign convention is \(v_1 - v_2\) for baseline (ant1, ant2).
-
w
¶ W coordinate for each correlation product in metres.
This calculates the w coordinate of the baseline vector of each correlation product as a function of time while tracking the target. It is returned as an array of float, shape (T, B).The sign convention is \(w_1 - w_2\) for baseline (ant1, ant2).
katdal.datasources module¶
Various sources of correlator data and metadata.
-
exception
katdal.datasources.
DataSourceNotFound
¶ Bases:
Exception
File associated with DataSource not found or server not responding.
-
class
katdal.datasources.
AttrsSensors
(attrs, sensors)¶ Bases:
object
Metadata in the form of attributes and sensors.
Parameters: - attrs (mapping from string to object) – Metadata attributes
- sensors (mapping from string to
SensorGetter
objects) – Metadata sensor cache mapping sensor names to raw sensor data
-
class
katdal.datasources.
DataSource
(metadata, timestamps, data=None)¶ Bases:
object
A generic data source presenting both correlator data and metadata.
Parameters: - metadata (
AttrsSensors
object) – Metadata attributes and sensors - timestamps (array-like of float, length T) – Timestamps at centroids of visibilities in UTC seconds since Unix epoch
- data (
VisFlagsWeights
object, optional) – Correlator data (visibilities, flags and weights)
- metadata (
-
katdal.datasources.
view_capture_stream
(telstate, capture_block_id, stream_name)¶ Create telstate view based on given capture block ID and stream name.
It constructs a view on telstate with at least the prefixes
- <capture_block_id>_<stream_name>
- <capture_block_id>
- <stream_name>
Additionally if there is a <stream_name>_inherit key, that stream is added too (recursively).
Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Original telescope state - capture_block_id (string) – Capture block ID
- stream_name (string) – Stream name
Returns: telstate – Telstate with a view that incorporates capture block, stream and combo
Return type: TelescopeState
object
-
katdal.datasources.
view_l0_capture_stream
(telstate, capture_block_id=None, stream_name=None, **kwargs)¶ Create telstate view based on auto-determined capture block ID and stream name.
This figures out the appropriate capture block ID and L0 stream name from a capture-stream specific telstate, or uses the provided ones. It then calls
view_capture_capture()
to generate a view.Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Original telescope state - capture_block_id (string, optional) – Specify capture block ID explicitly (detected otherwise)
- stream_name (string, optional) – Specify L0 stream name explicitly (detected otherwise)
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns: - telstate (
TelstateToStr
object) – Telstate with a view that incorporates capture block, stream and combo - capture_block_id (string) – Actual capture block ID used
- stream_name (string) – Actual L0 stream name used
Raises: ValueError
– If no capture block or L0 stream could be detected (with no override)- telstate (
-
katdal.datasources.
infer_chunk_store
(url_parts, telstate, npy_store_path=None, s3_endpoint_url=None, array='correlator_data', **kwargs)¶ Construct chunk store automatically from dataset URL and telstate.
Parameters: - url_parts (
urlparse.ParseResult
object) – Parsed dataset URL - telstate (
TelstateToStr
object) – Telescope state - npy_store_path (string, optional) – Top-level directory of NpyFileChunkStore (overrides the default)
- s3_endpoint_url (string, optional) – Endpoint of S3 service, e.g. ‘http://127.0.0.1:9000’ (overrides default)
- array (string, optional) – Array within the bucket from which to determine the prefix
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Returns: store – Chunk store for visibility data
Return type: katdal.ChunkStore
objectRaises: KeyError
– If telstate lacks critical keyskatdal.chunkstore.StoreUnavailable
– If the chunk store could not be constructed
- url_parts (
-
class
katdal.datasources.
TelstateDataSource
(telstate, capture_block_id, stream_name, chunk_store=None, timestamps=None, url='', upgrade_flags=True, van_vleck='off', preselect=None, **kwargs)¶ Bases:
katdal.datasources.DataSource
A data source based on
katsdptelstate.TelescopeState
.It is assumed that the provided telstate already has the appropriate views to find observation, stream and chunk store information. It typically needs the following prefixes:
- <capture block ID>_<L0 stream>
- <capture block ID>
- <L0 stream>
Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Telescope state with appropriate views - capture_block_id (string) – Capture block ID
- stream_name (string) – Name of the L0 stream
- chunk_store (
katdal.ChunkStore
object, optional) – Chunk store for visibility data (the default is no data - metadata only) - timestamps (array of float, optional) – Visibility timestamps, overriding (or fixing) the ones found in telstate
- url (string, optional) – Location of the telstate source
- upgrade_flags (bool, optional) – Look for associated flag streams and use them if True (default)
- van_vleck ({'off', 'autocorr'}, optional) – Type of Van Vleck (quantisation) correction to perform
- preselect (dict, optional) –
Subset of data to select. The keys in the dictionary correspond to the keyword arguments of
DataSet.select()
, but with restrictions:- Only
channels
anddumps
can be specified. - The values must be slices with unit step.
- Only
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other methods and ignored
Raises: KeyError
– If telstate lacks critical keysIndexError
– If preselect does not meet the criteria above.
-
classmethod
from_url
(url, chunk_store='auto', **kwargs)¶ Construct TelstateDataSource from URL or RDB filename.
The following URL styles are supported:
- Local RDB filename (no scheme): ‘1556574656/1556574656_sdp_l0.rdb’
- Archive: ‘https://archive/1556574656/1556574656_sdp_l0.rdb?token=<>’
- Redis server: ‘redis://cal5.sdp.mkat.karoo.kat.ac.za:31852’
Parameters: - url (string) – URL or RDB filename serving as entry point to data set
- chunk_store (
katdal.ChunkStore
object, optional) – Chunk store for visibility data (obtained automatically by default, or set to None for metadata-only data set) - kwargs (dict, optional) – Extra keyword arguments passed to init, telstate view, chunk store init
-
katdal.datasources.
open_data_source
(url, **kwargs)¶ Construct the data source described by the given URL.
katdal.flags module¶
Definitions of flag bits
katdal.h5datav1 module¶
Data accessor class for HDF5 files produced by Fringe Finder correlator.
-
class
katdal.h5datav1.
H5DataV1
(filename, ref_ant='', time_offset=0.0, mode='r', **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 1 file produced by Fringe Finder correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.h5datav2 module¶
Data accessor class for HDF5 files produced by KAT-7 correlator.
-
katdal.h5datav2.
get_single_value
(group, name)¶ Return single value from attribute or dataset with given name in group.
If name is an attribute of the HDF5 group group, it is returned, otherwise it is interpreted as an HDF5 dataset of group and the last value of name is returned. This is meant to retrieve static configuration values that potentially get set more than once during capture initialisation, but then does not change during actual capturing.
Parameters: - group (
h5py.Group
object) – HDF5 group to query - name (string) – Name of HDF5 attribute or dataset to query
Returns: value – Attribute or last value of dataset
Return type: object
- group (
-
katdal.h5datav2.
dummy_dataset
(name, shape, dtype, value)¶ Dummy HDF5 dataset containing a single value.
This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.
Parameters: - name (string) – Name of dataset
- shape (sequence of int) – Shape of dataset
- dtype (
numpy.dtype
object or equivalent) – Type of data stored in dataset - value (object) – All elements in the dataset will equal this value
Returns: dataset – Dummy HDF5 dataset
Return type: h5py.Dataset
object
-
class
katdal.h5datav2.
H5DataV2
(filename, ref_ant='', time_offset=0.0, mode='r', quicklook=False, keepdims=False, **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 2 file produced by KAT-7 correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- quicklook ({False, True}) – True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
- keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array indexer of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.h5datav3 module¶
Data accessor class for HDF5 files produced by RTS correlator.
-
katdal.h5datav3.
dummy_dataset
(name, shape, dtype, value)¶ Dummy HDF5 dataset containing a single value.
This creates a dummy HDF5 dataset in memory containing a single value. It can have virtually unlimited size as the dataset is highly compressed.
Parameters: - name (string) – Name of dataset
- shape (sequence of int) – Shape of dataset
- dtype (
numpy.dtype
object or equivalent) – Type of data stored in dataset - value (object) – All elements in the dataset will equal this value
Returns: dataset – Dummy HDF5 dataset
Return type: h5py.Dataset
object
-
class
katdal.h5datav3.
H5DataV3
(filename, ref_ant='', time_offset=0.0, mode='r', time_scale=None, time_origin=None, rotate_bls=False, centre_freq=None, band=None, keepdims=False, **kwargs)¶ Bases:
katdal.dataset.DataSet
Load HDF5 format version 3 file produced by RTS correlator.
For more information on attributes, see the
DataSet
docstring.Parameters: - filename (string) – Name of HDF5 file
- ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- mode (string, optional) – HDF5 file opening mode (e.g. ‘r+’ to open file in write mode)
- time_scale (float or None, optional) – Resynthesise timestamps using this scale factor
- time_origin (float or None, optional) – Resynthesise timestamps using this sync time / epoch
- rotate_bls ({False, True}, optional) – Rotate baseline label list to work around early RTS correlator bug
- centre_freq (float or None, optional) – Override centre frequency if provided, in Hz
- band (string or None, optional) – Override receiver band if provided (e.g. ‘l’) - used to find ND models
- keepdims ({False, True}, optional) – Force vis / weights / flags to be 3-dimensional, regardless of selection
- kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
file
¶ Underlying HDF5 file, exposed via
h5py
interfaceType: h5py.File
object
-
stream_name
¶ Name of L0 data stream, for finding corresponding telescope state keys
Type: string
Notes
The timestamps can be resynchronised from the original sample counter values by specifying time_scale and/or time_origin. The basic formula is given by:
timestamp = sample_counter / time_scale + time_origin
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
katdal.lazy_indexer module¶
Two-stage deferred indexer for objects with expensive __getitem__ calls.
-
katdal.lazy_indexer.
dask_getitem
(x, indices)¶ Index a dask array, with N-D fancy index support and better performance.
This is a drop-in replacement for
x[indices]
that goes one further by implementing “N-D fancy indexing” which is still unsupported in dask. If indices contains multiple fancy indices, perform outer (oindex) indexing. This behaviour deviates from NumPy, which performs the more general (but also more obtuse) vectorized (vindex) indexing in this case. See NumPy NEP 21, dask #433 and h5py #652 for more details.In addition, this optimises performance by culling unnecessary nodes from the dask graph after indexing, which makes it cheaper to compute if only a small piece of the graph is needed, and by collapsing fancy indices in indices to slices where possible (which also implies oindex semantics).
-
exception
katdal.lazy_indexer.
InvalidTransform
¶ Bases:
Exception
Transform changes data shape in unallowed way.
-
class
katdal.lazy_indexer.
LazyTransform
(name=None, transform=<function LazyTransform.<lambda>>, new_shape=<function LazyTransform.<lambda>>, dtype=None)¶ Bases:
object
Transformation to be applied by LazyIndexer after final indexing.
A
LazyIndexer
potentially applies a chain of transforms to the data after the final second-stage indexing is done. These transforms are restricted in their capabilities to simplify the indexing process. Specifically, when it comes to the data shape, transforms may only:- add dimensions at the end of the data shape, or - drop dimensions at the end of the data shape.
The preserved dimensions are not allowed to change their shape or interpretation so that the second-stage indexing matches the first-stage indexing on these dimensions. The data type (aka dtype) is allowed to change.
Parameters: - name (string or None, optional) – Name of transform
- transform (function, signature
data = f(data, keep)
, optional) – Transform to apply to data (keep is user-specified second-stage index) - new_shape (function, signature
new_shape = f(old_shape)
, optional) – Function that predicts data array shape tuple after first-stage indexing and transformation, given its original shape tuple as input. Restrictions apply as described above. - dtype (
numpy.dtype
object or equivalent or None, optional) – Type of output array after transformation (None if same as input array)
-
class
katdal.lazy_indexer.
LazyIndexer
(dataset, keep=slice(None, None, None), transforms=None)¶ Bases:
object
Two-stage deferred indexer for objects with expensive __getitem__ calls.
This class was originally designed to extend and speed up the indexing functionality of HDF5 datasets as found in
h5py
, but works on any equivalent object (defined as any object with shape, dtype and __getitem__ members) where a call to __getitem__ may be very expensive. The following discussion focuses on the HDF5 use case as the main example.Direct extraction of a subset of an HDF5 dataset via the __getitem__ interface (i.e. dataset[index]) has a few issues:
- Data access can be very slow (or impossible) if a very large dataset is fully loaded into memory and then indexed again at a later stage
- Advanced indexing (via boolean masks or sequences of integer indices) is only supported on a single dimension in the current version of h5py (2.0)
- Even though advanced indexing has limited support, simple indexing (via single integer indices or slices) is frequently much faster.
This class wraps an
h5py.Dataset
or equivalent object and exposes a new __getitem__ interface on it. It efficiently composes two stages of indexing: a first stage specified at object instantiation time and a second stage that applies on top of the first stage when __getitem__ is called on this object. The data are only loaded after the combined index is determined, addressing issue 1.Furthermore, advanced indexing is allowed on any dimension by decomposing the selection as a series of slice selections covering contiguous segments of the dimension to alleviate issue 2. Finally, this also allows faster data retrieval by extracting a large slice from the HDF5 dataset and then performing advanced indexing on the resulting
numpy.ndarray
object instead, in response to issue 3.The keep parameter of the
__init__()
and__getitem__()
methods accepts a generic index or slice specification, i.e. anything that would be accepted by the__getitem__()
method of anumpy.ndarray
of the same shape as the dataset. This could be a single integer index, a sequence of integer indices, a slice object (representing the colon operator commonly used with __getitem__, e.g. representing x[1:10:2] as x[slice(1,10,2)]), a sequence of booleans as a mask, or a tuple containing any number of these (typically one index item per dataset dimension). Any missing dimensions will be fully selected, and any extra dimensions will be ignored.Parameters: - dataset (
h5py.Dataset
object or equivalent) – Underlying dataset or array object on which lazy indexing will be done. This can be any object with shape, dtype and __getitem__ members. - keep (NumPy index expression, optional) – First-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
- transforms (list of
LazyTransform
objects or None, optional) – Chain of transforms to be applied to data after final indexing. The chain as a whole may only add or drop dimensions at the end of data shape without changing the preserved dimensions.
-
name
¶ Name of HDF5 dataset (or empty string for unnamed ndarrays, etc.)
Type: string
Raises: InvalidTransform
– If transform chain does not obey restrictions on changing the data shape
-
class
katdal.lazy_indexer.
DaskLazyIndexer
(dataset, keep=(), transforms=())¶ Bases:
object
Turn a dask Array into a LazyIndexer by computing it upon indexing.
The LazyIndexer wraps an underlying dataset in the form of a dask Array. Upon first use, it applies a stored first-stage selection (keep) to the array, followed by a series of transforms. All of these actions are lazy and only update the dask graph of the dataset. Since these updates are computed only on first use, there is minimal cost in constructing an instance and immediately throwing it away again.
Second-stage selection occurs via a
__getitem__()
call on this object, which also triggers dask computation to return the finalnumpy.ndarray
output. Both selection steps follow outer indexing (“oindex”) semantics, by indexing each dimension / axis separately.DaskLazyIndexers can also index other DaskLazyIndexers, which allows them to share first-stage selections and/or transforms, and to construct nested or hierarchical indexers.
Parameters: - dataset (
dask.Array
orDaskLazyIndexer
) – The full dataset, from which a subset is chosen by keep - keep (NumPy index expression, optional) – Index expression describing first-stage selection (e.g. as applied by
katdal.DataSet.select()
), with oindex semantics - transforms (sequence of function, signature
array = f(array)
, optional) – Transformations that are applied after indexing by keep but before indexing on this object. Each transformation is a callable that takes a dask array and returns another dask array.
-
name
¶ The name of the (full) underlying dataset, useful for reporting
Type: str
-
dataset
¶ The dask array that is accessed by indexing (after applying keep and transforms). It can be used directly to perform dask computations.
Type: dask.Array
-
transforms
¶ Transformations that are applied after first-stage indexing.
-
dataset
Array after first-stage indexing and transformation.
-
classmethod
get
(arrays, keep, out=None)¶ Extract several arrays from the underlying dataset.
This is a variant of
__getitem__()
that pulls from several arrays jointly. This can be significantly more efficient if intermediate dask nodes can be shared.Parameters: - arrays (list of
DaskLazyIndexer
) – Arrays to index - keep (NumPy index expression) – Second-stage index as a valid index or slice specification (supports arbitrary slicing or advanced indexing on any dimension)
- out (list of
np.ndarray
) – If specified, output arrays in which to store results. It must be the same length as arrays and each array must have the appropriate shape and dtype.
Returns: out – Extracted output array (computed from the final dask version)
Return type: sequence of
numpy.ndarray
- arrays (list of
-
shape
¶ Shape of array after first-stage indexing and transformation.
-
dtype
¶ Data type of array after first-stage indexing and transformation.
- dataset (
katdal.ms_async module¶
katdal.ms_extra module¶
katdal.sensordata module¶
Container that stores cached (interpolated) and uncached (raw) sensor data.
-
class
katdal.sensordata.
SensorData
(name, timestamp, value, status=None)¶ Bases:
object
Raw (uninterpolated) sensor values.
This is a simple struct that holds timestamps, values, and optionally status.
Parameters: - name (string) – Sensor name
- timestamp (np.ndarray) – Array of timestamps
- value (np.ndarray) – Array of values (wrapped in
ComparableArrayWrapper
if necessary) - status (np.ndarray, optional) – Array of sensor statuses
-
class
katdal.sensordata.
SensorGetter
(name)¶ Bases:
object
Raw (uninterpolated) sensor data placeholder.
This is an abstract lazy interface that provides a
SensorData
object on request but does not store values itself. Subclasses must implementget()
to retrieve values from underlying storage. They should not cache the results.Where possible, object-valued sensors (including sensors with ndarrays as values) will have values wrapped by
ComparableArrayWrapper
.Parameters: name (string) – Sensor name -
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.sensordata.
SimpleSensorGetter
(name, timestamp, value, status=None)¶ Bases:
katdal.sensordata.SensorGetter
Raw sensor data held in memory.
This is a simple wrapper for
SensorData
that implements theSensorGetter
interface.-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
-
-
class
katdal.sensordata.
RecordSensorGetter
(data, name=None)¶ Bases:
katdal.sensordata.SensorGetter
Raw (uninterpolated) sensor data in record array form.
This is a wrapper for uninterpolated sensor data which resembles a record array with fields ‘timestamp’, ‘value’ and optionally ‘status’. This is also the typical format of HDF5 datasets used to store sensor data.
Technically, the data is interpreted as a NumPy “structured” array, which is a simpler version of a recarray that only provides item-style access to fields and not attribute-style access.
Object-valued sensors are not treated specially in this class, as it is assumed that any wrapping already occurred in the construction of the recarray-like data input and will be reflected in its dtype. The original HDF5 sensor datasets also did not contain any objects as they only support standard KATCP types, so there was no need for wrapping there.
Parameters: - data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as
an
h5py.Dataset
) - name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
- data (recarray-like, with fields 'timestamp', 'value' and optionally 'status') – Uninterpolated sensor data as structured array or equivalent (such as
an
-
katdal.sensordata.
to_str
(value)¶ Convert string-likes to the native string type.
Bytes are decoded to str, with surrogateencoding error handler.
Tuples, lists, dicts and numpy arrays are processed recursively, with the exception that numpy structured types with string or object fields won’t be handled.
-
katdal.sensordata.
telstate_decode
(raw, no_decode=())¶ Load a katsdptelstate-encoded value that might be wrapped in np.void or np.ndarray.
The np.void/np.ndarray wrapping is needed to pass variable-length binary strings through h5py.
If the value is a string and is in no_decode, it is returned verbatim. This is for backwards compatibility with older files that didn’t use any encoding at all.
The return value is also passed through
to_str()
.
-
class
katdal.sensordata.
H5TelstateSensorGetter
(data, name=None)¶ Bases:
katdal.sensordata.RecordSensorGetter
Raw (uninterpolated) sensor data in HDF5 TelescopeState recarray form.
This wraps the telstate sensors stored in recent HDF5 files. It differs in two ways from the normal HDF5 sensors: no ‘status’ field and values encoded by katsdptelstate.
TODO: This is a temporary fix to get at missing sensors in telstate and should be replaced by a proper wrapping of any telstate object.
Object-valued sensors (including sensors with ndarrays as values) will have its values wrapped by
ComparableArrayWrapper
.Parameters: - data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as
an
h5py.Dataset
) - name (string or None, optional) – Sensor name (assumed to be data.name by default, if it exists)
-
get
()¶ Extract timestamp and value of each sensor data point.
- data (recarray-like, with fields ('timestamp', 'value')) – Uninterpolated sensor data as structured array or equivalent (such as
an
-
class
katdal.sensordata.
TelstateToStr
(telstate)¶ Bases:
object
Wrap an existing telescope state and pass return values through
to_str()
-
wrapped
¶
-
view
(name, add_separator=True, exclusive=False)¶
-
root
()¶
-
get_message
(channel=None)¶
-
get
(key, default=None, return_encoded=False)¶
-
get_range
(key, st=None, et=None, include_previous=None, include_end=False, return_encoded=False)¶
-
get_indexed
(key, sub_key, default=None, return_encoded=False)¶
-
-
class
katdal.sensordata.
TelstateSensorGetter
(telstate, name)¶ Bases:
katdal.sensordata.SensorGetter
Raw (uninterpolated) sensor data stored in original TelescopeState.
This wraps sensor data stored in a TelescopeState object. The data is only read out on item access.
Object-valued sensors (including sensors with ndarrays as values) will have their values wrapped by
ComparableArrayWrapper
.Parameters: - telstate (
katsdptelstate.TelescopeState
object) – Telescope state object - name (string) – Sensor name, also used as telstate key
Raises: KeyError
– If sensor name is not found in telstate or it is an attribute instead-
get
()¶ Retrieve the values from underlying storage.
Returns: values – Underlying data Return type: SensorData
- telstate (
-
katdal.sensordata.
get_sensor_from_katstore
(store, name, start_time, end_time)¶ Get raw sensor data from katstore (CAM’s central sensor database).
Parameters: - store (string) – Hostname / endpoint of katstore webserver speaking katstore64 API
- name (string) – Sensor name (the normalised / escaped version with underscores)
- end_time (start_time,) – Time range for sensor records as UTC seconds since Unix epoch
Returns: data – Retrieved sensor data with ‘timestamp’, ‘value’ and ‘status’ fields
Return type: RecordSensorGetter
objectRaises: ConnectionError
– If this cannot connect to the katstore serverRuntimeError
– If connection succeeded but interaction with katstore64 API failedKeyError
– If the sensor was not found in the store or it has no data in time range
-
katdal.sensordata.
dummy_sensor_getter
(name, value=None, dtype=<class 'numpy.float64'>, timestamp=0.0)¶ Create a SensorGetter object with a single default value based on type.
This creates a dummy
SimpleSensorGetter
object based on a default value or a type, for use when no sensor data are available, but filler data is required (e.g. when concatenating sensors from different datasets and one dataset lacks the sensor). The dummy dataset contains a single data point with the filler value and a configurable timestamp (defaulting to way back). If the filler value is an object it will be wrapped in aComparableArrayWrapper
to match the behaviour of otherSensorGetter
objects.Parameters: - name (string) – Sensor name
- value (object, optional) – Filler value (default is None, meaning dtype will be used instead)
- dtype (
numpy.dtype
object or equivalent, optional) – Desired sensor data type, used if no explicit value is given - timestamp (float, optional) – Time when dummy value occurred (default is way back)
Returns: data – Dummy sensor data object with ‘timestamp’ and ‘value’ fields
Return type: SimpleSensorGetter
object, shape (1,)
-
katdal.sensordata.
remove_duplicates_and_invalid_values
(sensor)¶ Remove duplicate timestamps and invalid values from sensor data.
This sorts the ‘timestamp’ field of the sensor record array and removes any duplicate values, updating the corresponding ‘value’ and ‘status’ fields as well. If more than one timestamp has the same value, the value and status of the last of these timestamps are selected. If the values differ for the same timestamp, a warning is logged (and the last one is still picked).
In addition, if there is a ‘status’ field, get rid of data with a status other than ‘nominal’, ‘warn’ or ‘error’, which indicates that the sensor could not be read and the corresponding value will therefore be invalid. Afterwards, remove the ‘status’ field from the data as this is the only place it plays a role.
Parameters: sensor ( SensorData
object, length N) – Raw sensor dataset.Returns: clean_sensor – Sensor data with duplicate timestamps and invalid values removed (M <= N), and only ‘timestamp’ and ‘value’ attributes left. Return type: SensorData
object, length M
-
class
katdal.sensordata.
SensorCache
(cache, timestamps, dump_period, keep=slice(None, None, None), props=None, virtual={}, aliases={}, store=None)¶ Bases:
collections.abc.MutableMapping
Container for sensor data providing name lookup, interpolation and caching.
Sensor data is defined as a one-dimensional time series of values. The values may be numerical or non-numerical (categorical), and the timestamps are monotonically increasing but not necessarily regularly spaced.
A sensor cache stores sensor data with dictionary-like lookup based on the sensor name. Since the extraction of sensor data from e.g. HDF5 files may be costly, the data is first represented in uncached (raw) form as
SensorGetter
objects, which typically wrap the underlying sensor HDF5 datasets. After extraction, the sensor data are stored either as a NumPy array (for numerical data) or as aCategoricalData
object (for non-numerical data).The sensor cache stores a timestamp array (or indexer) onto which the sensor data will be interpolated, together with a boolean selection mask that selects a subset of the interpolated data as the final output. Interpolation is linear for numerical data and zeroth-order for non-numerical data. Both extraction and selection may be enabled or disabled through the appropriate use of the two main interfaces that retrieve sensor data:
- The __getitem__ interface (i.e. cache[sensor]) presents a simple high-level interface to the end user that always extracts the sensor data and selects the requested subset from it. In addition, the return type is always a NumPy array.
- The get() interface (i.e. cache.get(sensor)) is an advanced interface for library builders that provides full control of the extraction process via sensor properties. It does not apply selection by default, as this is more convenient for library routines.
In addition, the sensor cache may contain virtual sensors which calculate their values based on the values of other sensors. They are identified by pattern templates that potentially match multiple sensor names.
Parameters: - cache (mapping from string to
SensorGetter
objects) – Initial sensor cache mapping sensor names to raw (uncached) sensor data - timestamps (array of float) – Correlator data timestamps onto which sensor values will be interpolated, as UTC seconds since Unix epoch
- dump_period (float) – Dump period, in seconds
- keep (int or slice or sequence of int or sequence of bool, optional) – Default time selection specification that will be applied to sensor data (this can be disabled on data retrieval)
- props (dict, optional) – Default properties that govern how sensor data are interpreted and
interpolated (this can be overridden on data retrieval). Can use
*
as a wildcard anywhere in the key. - virtual (dict mapping string to function, optional) – Virtual sensors, specified as a pattern matching the virtual sensor name and a corresponding function that will create the sensor (together with any associated virtual sensors)
- aliases (dict mapping string to string, optional) – Alternate names for sensors, as a dictionary mapping each alias to the original sensor name suffix. This will create additional sensors with the aliased names and the data of the original sensors.
- store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
-
add_aliases
(alias, original)¶ Add alternate names / aliases for sensors.
Search for sensors with names ending in the original suffix and form a corresponding alternate name by replacing original with alias. The new aliased sensors will re-use the data of the original sensors.
Parameters: - alias (string) – The new sensor name suffix that replaces original
- original (string) – Sensors with names that end in this will get aliases
-
get
(name, select=False, extract=True, **kwargs)¶ Sensor values interpolated to correlator data timestamps.
Time selection is disabled by default, as this is a more advanced data extraction method typically called by library routines that want to operate on the full array of sensor values. For additional allowed parameters when extracting categorical data, see the docstring for
sensor_to_categorical()
.Parameters: - name (string) – Sensor name
- select ({False, True}, optional) – True if preset time selection will be applied to interpolated data
- extract ({True, False}, optional) – True if sensor data should be extracted, interpolated and cached
- categorical ({None, True, False}, optional) – Interpret sensor data as categorical or numerical (by default, data of type float is numerical and of any other type is categorical)
- kwargs (dict, optional) – Additional parameters are passed to
sensor_to_categorical()
Returns: data – If extraction is disabled, this will be a
SensorGetter
object for uncached sensors. If selection is enabled, this will be a 1-D array of values, one per selected timestamp. If selection is disabled, this will be a 1-D array of values (of the same length as thetimestamps
attribute) for numerical data, and aCategoricalData
object for categorical data.Return type: array or
CategoricalData
orSensorGetter
objectRaises: ValueError
– If select=True and extract=False, as select requires interpolationKeyError
– If sensor name was not found in cache and did not match virtual template
-
get_with_fallback
(sensor_type, names)¶ Sensor values interpolated to correlator data timestamps.
Get data for a type of sensor that may have one of several names. Try each name in turn until something works, or crash sensibly.
Parameters: - sensor_type (string) – Name of sensor class / type, used for informational purposes only
- names (sequence of strings) – Sensor names to try until one of them provides data
Returns: sensor_data – Interpolated sensor data as 1-D array, one value per selected timestamp
Return type: array
Raises: KeyError
– If none of the sensor names were found in the cache
katdal.spectral_window module¶
-
class
katdal.spectral_window.
SpectralWindow
(centre_freq, channel_width, num_chans, product=None, sideband=-1, band='L', bandwidth=None)¶ Bases:
object
Spectral window specification.
A spectral window is determined by the number of frequency channels produced by the correlator and their corresponding centre frequencies, as well as the channel width. The channels are assumed to be regularly spaced and to be the result of either lower-sideband downconversion (channel frequencies decreasing with channel index) or upper-sideband downconversion (frequencies increasing with index). For further information the receiver band and correlator product names are also available.
Warning
Instances should be treated as immutable. Changing the attributes will lead to inconsistencies between them.
Parameters: - centre_freq (float) – Centre frequency of spectral window, in Hz
- channel_width (float) – Bandwidth of each frequency channel, in Hz
- num_chans (int) – Number of frequency channels
- product (string, optional) – Name of data product / correlator mode
- sideband ({-1, +1}, optional) – Type of downconversion (-1 => lower sideband, +1 => upper sideband)
- band ({'L', 'UHF', 'S', 'X', 'Ku'}, optional) – Name of receiver / band
- bandwidth (float, optional) – The bandwidth of the whole spectral window, in Hz. If specified, channel_width is ignored and computed from the bandwidth. If not specified, bandwidth is computed from the channel width. Specifying this is a good idea if the channel width cannot be exactly represented in floating point.
-
channel_freqs
¶ Centre frequency of each frequency channel (assuming LSB mixing), in Hz
Type: array of float, shape (F,)
-
channel_freqs
-
subrange
(first, last)¶ Get a new
SpectralWindow
representing a subset of the channels.The returned
SpectralWindow
covers the same frequencies as channels [first, last) of the original.Raises: IndexError
– If [first, last) is not a (non-empty) subinterval of the channels
-
rechannelise
(num_chans)¶ Get a new
SpectralWindow
with a different number of channels.The returned
SpectralWindow
covers the same frequencies as the original, but dividing the bandwidth into a different number of channels.
katdal.visdatav4 module¶
Data accessor class for data and metadata from various sources in v4 format.
-
class
katdal.visdatav4.
VisibilityDataV4
(source, ref_ant='', time_offset=0.0, applycal='', gaincal_flux={}, sensor_store=None, preselect=None, **kwargs)¶ Bases:
katdal.dataset.DataSet
Access format version 4 visibility data and metadata.
For more information on attributes, see the
DataSet
docstring.Parameters: - source (
DataSource
object) – Correlator data (visibilities, flags and weights) and metadata - ref_ant (string, optional) – Name of reference antenna, used to partition data set into scans, to determine the targets and as antenna for the data set catalogue (no relation to the calibration reference antenna…). The default is to use the observation activity sensor for scan partitioning, the CBF target and the array reference position as catalogue antenna.
- time_offset (float, optional) – Offset to add to all correlator timestamps, in seconds
- applycal (string or sequence of strings, optional) – List of names of calibration products to apply to vis/weights/flags, as a sequence or string of comma-separated names. An empty string or sequence means no calibration will be applied (the default for now), while the keyword ‘all’ means all available products will be applied. NB In future the default will probably change to ‘all’. NB This is still very much an experimental feature…
- gaincal_flux (dict mapping string to float, optional) – Flux density (in Jy) per gaincal target name, used to flux calibrate the “G” product, overriding the measured flux produced by cal pipeline (if available). A value of None disables flux calibration.
- sensor_store (string, optional) – Hostname / endpoint of katstore webserver to access additional sensors
- preselect (dict, optional) – Subset of the data to select. See
TelstateDataSource
for details. This selection is permanent, and further selections made byDataSet.select()
are relative to this subset. - kwargs (dict, optional) – Extra keyword arguments, typically meant for other formats and ignored
-
timestamps
¶ Visibility timestamps in UTC seconds since Unix epoch.
The timestamps are returned as an array of float64, shape (T,), with one timestamp per integration aligned with the integration midpoint.
-
vis
¶ Complex visibility data as a function of time, frequency and baseline.
The visibility data are returned as an array indexer of complex64, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The returned array always has all three dimensions, even for scalar (single) values. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.The sign convention of the imaginary part is consistent with an electric field of \(e^{i(\omega t - jz)}\) i.e. phase that increases with time.
-
weights
¶ Visibility weights as a function of time, frequency and baseline.
The weights data are returned as an array indexer of float32, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
flags
¶ Flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of bool, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
raw_flags
¶ Raw flags as a function of time, frequency and baseline.
The flags data are returned as an array indexer of uint8, shape (T, F, B), with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
excision
¶ Excision as a function of time, frequency and baseline.
The fraction of each visibility that has been excised in the SDP ingest pipeline is returned as an array indexer of bool, shape (T, F, B) with time along the first dimension, frequency along the second dimension and correlation product (“baseline”) index along the third dimension. The number of integrations T matches the length of
timestamps()
, the number of frequency channels F matches the length offreqs()
and the number of correlation products B matches the length ofcorr_products()
. To get the data array itself from the indexer x, do x[:] or perform any other form of indexing on it. Only then will data be loaded into memory.
-
temperature
¶ Air temperature in degrees Celsius.
-
pressure
¶ Barometric pressure in millibars.
-
humidity
¶ Relative humidity as a percentage.
-
wind_speed
¶ Wind speed in metres per second.
-
wind_direction
¶ Wind direction as an azimuth angle in degrees.
- source (
Module contents¶
Data access library for data sets in the MeerKAT Visibility Format (MVF).
-
katdal.
open
(filename, ref_ant='', time_offset=0.0, **kwargs)¶ Open data file(s) with loader of the appropriate version.
Parameters: - filename (string or sequence of strings) – Data file name or list of file names
- ref_ant (string, optional) – Name of reference antenna (default is first antenna in use)
- time_offset (float, optional) – Offset to add to all timestamps, in seconds
- kwargs (dict, optional) –
Extra keyword arguments are passed on to underlying accessor class:
- mode (string, optional)
- [H5DataV*] File opening mode (e.g. ‘r+’ to open file in write mode)
- quicklook (bool)
- [H5DataV2] True if synthesised timestamps should be used to partition data set even if real timestamps are irregular, thereby avoiding the slow loading of real timestamps at the cost of slightly inaccurate label borders
See the documentation of
VisibilityDataV4
for the keywords it accepts.
Returns: data – Object providing
DataSet
interface to file(s)Return type: DataSet
object
-
katdal.
get_ants
(filename)¶ Quick look function to get the list of antennas in a data file.
Parameters: filename (string) – Data file name Returns: antennas Return type: list of katpoint.Antenna
objects
-
katdal.
get_targets
(filename)¶ Quick look function to get the list of targets in a data file.
Parameters: filename (string) – Data file name Returns: targets – All targets in file Return type: katpoint.Catalogue
object