Observations¶
The Observation
class, is the container for the data acquired by the
telescope during a scanning period (and the relevant information about it).
Serial applications¶
In a serial code Observation
is equivalent to an empty class in which you
can put anything:
import litebird_sim as lbs
import numpy as np
obs = lbs.Observation(
detectors=2,
start_time_global=0.0,
sampling_rate_hz=5.0,
n_samples_global=5,
)
obs.my_new_attr = 'value'
setattr(obs, 'another_new_attr', 'another value')
Across the framework, the coherence in the names and content of the
attributes is guaranteed by convention (no check is done by the
Observation
class). At the moment, the most common field
that you can find in the class are the following, assuming that it is
meant to collect \(N\) samples for \(n_d\) detectors:
Observation.tod
is initialized when you callSimulation.create_observations()
. It’s a 2D array of shape \((n_d, N)\). This means thatObservation.tod[0]
is the time stream of the first detector, andobs.tod[:, 0]
are the first time samples of all the detectors.You can create other TOD-like arrays through the parameter
tods
; it accepts a list ofTodDescription
objects that specify the name of the field used to store the 2D array, a textual description, and the value fordtype
. (By default, thetod
field uses 32-bit floating-point numbers.) Here is an example:sim.create_observations( detectors=[det1, det2, det3], tods=[ lbs.TodDescription( name="tod", description="TOD", dtype=np.float64, ), lbs.TodDescription( name="noise", description="1/f+white noise", dtype=np.float32 ), ], ) for cur_obs in sim.observations: print("Shape of 'tod': ", cur_obs.tod.shape) print("Shape of 'noise': ", cur_obs.noise.shape)
Observation.pointings
is initialized when you callSimulation.compute_pointings()
. It is a 3-rank tensor of shape \((n_d, N, 2)\), where the last rank collects the two pointing angles colatitude (θ) and longitude (φ), expressed in radians.Observation.pointing_coords
is initialized together withObservation.pointings
; it is a value of typeCoordinateSystem
, and it identifies the coordinate system used to express the pointing angles.Observation.psi
is a \((n_d, N)\) matrix containing the polarization angles (in radians), expressed with respect to the celestial North (ψ). It’s initialized together withObservation.pointings
.Observation.local_flags
is a \((n_d, N)\) matrix containing flags for the \(n_d\) detectors. These flags are typically associated to peculiarities in the single detectors, like saturations or mis-calibrations.Observation.global_flags
is a vector of \(N\) elements containing flags that must be associated with all the detectors in the observation.
Keep in mind that the general rule is that detector-specific
attributes are collected in arrays. Thus, obs.calibration_factors
should be a 1-D array of \(n_d\) elements (one per each detector).
With this memory layout, typical operations look like this:
# Collect detector properties in arrays
obs.calibration_factors = np.array([1.1, 1.2])
obs.wn_levels = np.array([2.1, 2.2])
# Apply to each detector its own calibration factor
obs.tod *= obs.calibration_factors[:, None]
# Add white noise at a different level for each detector
obs.tod += (np.random.normal(size=obs.tod.shape)
* obs.wn_levels[:, None])
Parallel applications¶
The only work that the Observation
class actually does is handling
parallelism. obs.tod
can be distributed over a
n_blocks_det
by n_blocks_time
grid of MPI ranks. The blocks can be
changed at run-time.
The coherence between the serial and parallel operations is achieved by
distributing also the arrays of detector properties:
each rank keeps in memory only an interval of calibration_factors
or
wn_levels
, the same detector interval of tod
.
The main advantage is that the example operations in the Serial section are achieved with the same lines of code. The price to pay is that you have to set detector properties with special methods.
import litebird_sim as lbs
from mpi4py import MPI
comm = MPI.COMM_WORLD
obs = lbs.Observation(
detectors=2,
start_time_global=0.0,
sampling_rate_hz=5.0,
n_samples_global=5,
n_blocks_det=2, # Split the detector axis in 2
comm=comm # across the processes of this communicator
)
# Add detector properties either with a global array that contains
# all the detectors (which will be scattered across the processor grid)
obs.setattr_det_global('calibration_factors', np.array([1.1, 1.2]))
# Or with the local array, if somehow you have it already
if comm.rank == 0:
wn_level_local = np.array([2.1])
elif comm.rank == 1:
wn_level_local = np.array([2.2])
else:
wn_level_local = np.array([])
obs.setattr_det('wn_levels', wn_level_local)
# Operate on the local portion of the data just like in serial code
# Apply to each detector its own calibration factor
obs.tod *= obs.calibration_factors[:, None]
# Add white noise at a different level for each detector
obs.tod += (np.random.normal(size=obs.tod.shape)
* obs.wn_levels[:, None])
# Change the data distribution
obs.set_blocks(n_blocks_det=1, n_blocks_time=1)
# Now the rank 0 has exactly the data of the serial obs object
For clarity, here is a visualization of how data (a detector attribute and the TOD) gets distributed.
When n_blocks_det != 1
, keep in mind that obs.tod[0]
or
obs.wn_levels[0]
are quantities of the first local detector, not global.
This should not be a problem as the only thing that matters is that the two
quantities refer to the same detector. If you need the global detector index,
you can get it with obs.det_idx[0]
, which is created
at construction time.
To get a better understanding of how observations are being used in a
MPI simulation, use the method Simulation.describe_mpi_distribution()
.
This method must be called after the observations have been allocated using
Simulation.create_observations()
; it will return an instance of the
class MpiDistributionDescr
, which can be inspected to determine
which detectors and time spans are covered by each observation in all the
MPI processes that are being used. For more information, refer to the Section
Simulations.
Other notable functionalities¶
The starting time can be represented either as floating-point values (appropriate in 99% of the cases) or MJD; in the latter case, it is handled through the AstroPy library.
Instead of adding detector attributes after construction, you can pass a list of dictionaries (one entry for each detectors). One attribute is created for every key.
import litebird_sim as lbs
from astropy.time import Time
# Second case: use MJD to track the time
obs_mjd = lbs.Observation(
detectors=[{"name": "A"}, {"name": "B"}]
start_time_global=Time("2020-02-20", format="iso"),
sampling_rate_hz=5.0,
nsamples_global=5,
)
obs.name == np.array(["A", "B"]) # True
Reading/writing observations to disk¶
The framework implements a couple of functions to write/read
Observation
objects to disk, using the HDF5 file format. By
default, each observation is saved in a separate HDF5 file; the
following information are saved and restored:
Whether times are tracked as floating-point numbers or proper AstroPy dates;
The TOD matrix (in
.tod
);Any pointing information stored in
.pointings
(the matrix containing the colatitude and longitude of the direction of the main beam axis),.psi
(the polarization angle), and.pixidx
(the index of the pixel in the Healpix pixelization scheme).Global and local flags saved in
.global_flags
and.local_flags
(see below).
The function used to save observations is Simulation.write_observations()
,
which acts on a Simulation
object; if you prefer to
operate without a Simulation
object, you can call
write_list_of_observations()
.
To read observations, you can use Simulation.read_observations()
and
read_list_of_observations()
.
Flags¶
The LiteBIRD Simulation Framework permits to associate flags with the scientific samples in a TOD. These flags are usually unsigned integer numbers that are treated like bitmasks that signal peculiarities in the data. They are especially useful when dealing with data that have been acquired by some real instrument to signal if the hardware was malfunctioning or if some disturbance was recorded by the detectors. Other possible applications are possible:
Marking samples that should be left out of map-making (e.g., because a planet or some other transient source was being observed by the detector).
Flagging potentially interesting observations that should be used in data analysis (e.g., observations of the Crab nebula that are considered good enough for polarization angle calibration).
Similarly to other frameworks like TOAST, the LiteBIRD Simulation
Framework lets to store both «global» and «local» flags associated
with the scientific samples in TODs. Global flags are associated with
all the samples in an Observation, and they must be a vector of M
elements, where M
is the number of samples in the TOD. Local
samples are stored in a matrix of shape N × M
, where N
is the
number of detectors in the observation.
The framework encourages to store these flags in the fields
local_flags
and global_flags
: in this way, they can be saved
correctly in HDF5 files by functions like write_observations()
.
API reference¶
- class litebird_sim.observations.Observation(detectors: int | List[dict], n_samples_global: int, start_time_global: float | Time, sampling_rate_hz: float, allocate_tod=True, tods=None, n_blocks_det=1, n_blocks_time=1, comm=None, root=0)¶
Bases:
object
An observation made by one or multiple detectors over some time window
After construction at least the following attributes are available
start_time()
n_samples()
tod()
2D array (n_detectors by n_samples) stacking the times streams of the detectors.
- A note for MPI-parallel application: unless specified, all the
variables are local. Should you need the global counterparts, 1) think twice, 2) append _global to the attribute name, like in the following:
start_time_global()
end_time_global()
n_detectors_global()
~ n_detectors * n_blocks_detn_samples_global()
~ n_samples * n_blocks_time
Following the same philosophy,
get_times()
returns the time stamps of the local time interval- Parameters:
detectors (int/list of dict) – Either the number of detectors or a list of dictionaries with one entry for each detector. The keys of the dictionaries will become attributes of the observation. If a detector is missing a key it will be set to
nan
. If an MPI communicator is passed tocomm
,detectors
is relevant only for theroot
processn_samples_global (int) – The number of samples in this observation.
start_time_global – Start time of the observation. It can either be a astropy.time.Time type or a floating-point number. In the latter case, it must be expressed in seconds.
sampling_rate_hz (float) – The sampling frequency, in Hertz.
dtype_tod (dtype) – Data type of the TOD array. Use it to balance numerical precision and memory consumption.
n_blocks_det (int) – divide the detector axis of the tod (and all the arrays of detector attributes) in n_blocks_det blocks
n_blocks_time (int) – divide the time axis of the tod in n_blocks_time blocks
comm – either None (do not use MPI) or a MPI communicator object, like mpi4py.MPI.COMM_WORLD. Its size is required to be at least n_blocks_det times n_blocks_time
root (int) – rank of the process receiving the detector list, if
detectors
is a list of dictionaries, otherwise it is ignored.
- get_delta_time() float | TimeDelta ¶
Return the time interval between two consecutive samples in this observation
Depending whether the field
start_time
of theObservation
object is afloat
or aastropy.time.Time
object, the return value is either afloat
(in seconds) or an instance ofastropy.time.TimeDelta
. See alsoget_time_span()
.
- get_det2ecl_quaternions(spin2ecliptic_quats: ~litebird_sim.scanning.Spin2EclipticQuaternions, detector_quats, bore2spin_quat, quaternion_buffer=None, dtype=<class 'numpy.float64'>)¶
Deprecated: see scanning.get_det2ecl_quaternions
- get_ecl2det_quaternions(spin2ecliptic_quats: ~litebird_sim.scanning.Spin2EclipticQuaternions, detector_quats, bore2spin_quat, quaternion_buffer=None, dtype=<class 'numpy.float64'>)¶
Deprecated: see scanning.get_ecl2det_quaternions
- get_quaternion_buffer_shape(num_of_detectors=None)¶
Deprecated: see scanning.get_quaternion_buffer_shape
- get_time_span() float | TimeDelta ¶
Return the temporal length of the current observation
This method can either return a
float
(in seconds) or aastropy.time.TimeDelta
object, depending whether the fieldstart_time
of theObservation
object is afloat
or aastropy.time.Time
instance. See alsoget_delta_time()
.
- get_times(normalize=False, astropy_times=False)¶
Return a vector containing the time of each sample in the observation
The measure unit of the result depends on the value of astropy_times: if it’s true, times are returned as astropy.time.Time objects, which can be converted to several units (MJD, seconds, etc.); if astropy_times is false (the default), times are expressed in seconds. In the latter case, you should interpret these times as sidereal.
If normalize=True, then the first time is zero. Setting this flag requires that astropy_times=False.
This can be a costly operation; you should cache this result if you plan to use it in your code, instead of calling this method over and over again.
Note for MPI-parallel codes: the times returned are only those of the local portion of the data. This means that
the size of the returned array is
n_samples
, smaller thann_samples_global
whenever there is more than one time-blockself.tod * self.get_times()
is a meaningless but always allowed operation
- property n_blocks_det¶
- property n_blocks_time¶
- property n_detectors¶
- property n_detectors_global¶
Total number of detectors in the observation
If you need the number of detectors in the local TOD block
self.tod
, use eithern_detectors
orself.tod.shape[0]
.
- property n_samples_global¶
Samples in the whole observation
If you need the time-lenght of the local TOD block
self.tod
, use eithern_samples
orself.tod.shape[1]
.
- property sampling_rate_hz¶
- set_n_blocks(n_blocks_det=1, n_blocks_time=1)¶
Change the number of blocks
- Parameters:
n_blocks_det (int) – new number of blocks in the detector direction
n_blocks_time (int) – new number of blocks in the time direction
- setattr_det(name, info)¶
Add a piece of information about the detectors
Store
info
as the attributename
of the observation. The difference with respect toself.name = info
, relevant only in MPI programs, areinfo
is assumed to contain a number of elements equal toself.tod.shape[0]
and to have the same order (info[i]
is a property ofself.tod[i]
)When changing
n_blocks_det
,set_n_blocks()
is aware ofname
and will redistributeinfo
in such a way thatself.name[i]
is a property ofself.tod[i]
in the new block distribution
- Parameters:
name (str) – Name of the detector information
info (array) – Information to be stored in the attribute
name
. The array must contain one entry for each local detector.
- setattr_det_global(name, info, root=0)¶
Add a piece of information on the detectors
Variant of
setattr_det()
to be used when the information comes from a single MPI rank (root
). In particular,In the
root
process,info
is required to haven_detectors_global
elements (notself.tod.shape[1]
). For other processes info is irrelevantinfo
is scattered from theroot
rank to the relevant processes inself.comm
so thatself.name
will haveself.tod.shape[0]
elements on all the processes andself.name[i]
is a property ofself.tod[i]
When changing
n_blocks_det
,set_n_blocks()
is aware ofname
and will redistributeinfo
in such a way thatself.name[i]
is a property ofself.tod[i]
in the new block distribution
- Parameters:
name (str) – Name of the information
info (array) – Array containing
n_detectors_global
entries. Relevant only for thrroot
processroot (int) – Rank of the root process
- class litebird_sim.observations.TodDescription(name: str, dtype: Any, description: str)¶
Bases:
object
A brief description of a TOD held in a
Observation
objectThis field is used to pass information about one TOD in a
Observation
object. It is mainly used by the methodSimulation.create_observation()
to figure out how much memory allocate and how to organize it.The class contains three fields:
name (a
str
): the name of the field to be created within eachObservation
object.dtype (the NumPy type to use, e.g.,
numpy.float32
)description (a
str
): human-readable description
- description: str¶
- dtype: Any¶
- name: str¶
- litebird_sim.io.read_list_of_observations(file_name_list: ~typing.List[str | ~pathlib.Path], tod_dtype=<class 'numpy.float32'>, pointings_dtype=<class 'numpy.float32'>, limit_mpi_rank: bool = True, tod_fields: ~typing.List[str | ~litebird_sim.observations.TodDescription] = ['tod']) List[Observation] ¶
Read a list of HDF5 files containing TODs and return a list of observations
The function reads all the HDF5 files listed in file_name_list (either a list of strings or
pathlib.Path
objects) and assigns them to the various MPI processes that are currently running, provided that limit_mpi_rank isTrue
; otherwise, all the files are read by the current process and returned in a list. By default, only thetod
field is loaded; if the HDF5 file contains multiple TODs, you must load each of them.When using MPI, the observations are distributed among the MPI processes using the same layout that was used to save them; this means that you are forced to use the same number of processes you used when saving the files. This number is saved in the attribute
mpi_size
in each of the HDF5 files.If the HDF5 file contains more than one TOD, e.g., foregrounds, dipole, noise…, you can specify which datasets to load with
tod_fields
(a list of strings orTodDescription
objects), which defaults to["tod"]
. Each dataset will be initialized as a member field of theObservation
class, likeObservation.tod
.
- litebird_sim.io.read_observations(sim, path: str | Path = None, subdir_name: None | str = 'tod', *args, **kwargs)¶
Deprecated since version 0.11: Use Simulation.read_observations
- litebird_sim.io.read_one_observation(path: str | ~pathlib.Path, limit_mpi_rank=True, tod_dtype=<class 'numpy.float32'>, pointings_dtype=<class 'numpy.float32'>, read_pointings_if_present=True, read_pixidx_if_present=True, read_psi_if_present=True, read_global_flags_if_present=True, read_local_flags_if_present=True, tod_fields: ~typing.List[str] = ['tod']) Observation | None ¶
Read one
Observation
object from a HDF5 file.This is a low-level function that is wrapped by
read_list_of_observations()
andread_observations()
. It returns aObservation
object filled with the data read from the HDF5 file pointed by path.If limit_mpi_rank is
True
(the default), the function makes sure that the rank of the MPI process reading this file is the same as the rank of the process that originally wrote it.The flags tod_dtype and pointings_dtype permit to override the data type of TOD samples and pointing angles used in the HDF5 file.
The parameters read_pointings_if_present, read_pixidx_if_present, read_psi_if_present, read_global_flags_if_present, and read_local_flags_if_present permit to avoid loading some parts of the HDF5 if they are not needed.
The function returns a
Observation
, orNothing
if the HDF5 file was ill-formed.
- litebird_sim.io.write_list_of_observations(obs: ~litebird_sim.observations.Observation | ~typing.List[~litebird_sim.observations.Observation], path: str | ~pathlib.Path, tod_dtype=<class 'numpy.float32'>, pointings_dtype=<class 'numpy.float32'>, file_name_mask: str = 'litebird_tod{global_index:04d}.h5', custom_placeholders: ~typing.List[~typing.Dict[str, ~typing.Any]] | None = None, start_index: int = 0, collective_mpi_call: bool = True, tod_fields: ~typing.List[str | ~litebird_sim.observations.TodDescription] = [], gzip_compression: bool = False) List[Path] ¶
Save a list of observations in a set of HDF5 files
This function takes one or more observations and saves the TODs in several HDF5 (each observation leads to one file), using tod_dtype and pointings_dtype as the default datatypes for the samples and the pointing angles. The function returns a list of the file written (
pathlib.Path
objects).The name of the HDF5 files is built using the variable file_name_mask, which can contain placeholders in the form
{name}
, wherename
can be one of the following keys:mpi_rank
: the rank number of the MPI process that owns this observation (starting from zero)mpi_size
: the number of running MPI processesnum_of_obs
: the number of observationsglobal_index
: an unique increasing index identifying this observation among all the observations used by MPI processes (see below)local_index
: the number of the current observation within the current MPI process (see below)
You can provide other placeholders through custom_placeholders, which must be a list of dictionaries. The number of elements in the list must be the same as the number of observations, and each dictionary will be used to determine the placeholders for the file name related to the corresponding observation. Here is an example:
custom_dicts = [ { "myvalue": "A" }, { "myvalue": "B" }, ] write_list_of_observations( obs=[obs1, obs2], # Write two observations path=".", file_name_mask="tod_{myvalue}.h5", custom_placeholders=custom_dicts, ) # The two observations will be saved in # - tod_A.h5 # - tod_B.h5
If the parameter collective_mpi_call is
True
and MPI is enabled (seelitebird_sim.MPI_ENABLED
), the code assumes that this function was called at the same time by all the MPI processes that are currently running. This is the most typical case, i.e., you have several MPI processes and want that each of them save their observations in HDF5 files. Passcollective_mpi_call=False
only if you are calling this function on some of the MPI processes. Here is an example:if lbs.MPI_COMM_WORLD.rank == 0: # Do this only for the first MPI process write_list_of_observations( ..., collective_mpi_call=False, )
In the example,
collective_mpi_call=False
signals that not every MPI process is writing their observations to disk.The
local_index
andglobal_index
placeholders used in the template file name start from zero, but this can be changed using the parameter start_index.If observations contain more than one timeline in separate fields (e.g., foregrounds, dipole, noise…), you can specify the names of the fields using the parameter
tod_fields
(list of strings orTodDescription
objects), which by default will only save Observation.tod.To save disk space, you can choose to apply GZip compression to the data frames in each HDF5 file (the file will still be a valid .h5 file) or to save quaternions instead of full pointings.
- litebird_sim.io.write_observations(sim, subdir_name: None | str = 'tod', include_in_report: bool = True, *args, **kwargs) List[Path] ¶
Deprecated since version 0.11: Use Simulation.write_observations
- litebird_sim.io.write_one_observation(output_file: File, obs: Observation, tod_dtype, pointings_dtype, global_index: int, local_index: int, tod_fields: List[str | TodDescription] | None = None, gzip_compression: bool = False)¶
Write one
Observation
object in a HDF5 file.This is a low-level function that stores a TOD in a HDF5 file. You should usually use more high-level functions that are able to write several observations at once, like
write_list_of_observations()
andwrite_observations()
.The output file is specified by output_file and should be opened for writing; the observation to be written is passed through the obs parameter. The data type to use for writing TODs and pointings is specified in the tod_dtype and pointings_dtype (it can either be a NumPy type like
numpy.float64
or a string, passNone
to use the same type as the one used in the observation). The global_index and local_index parameters are two integers that are used by high-level functions likewrite_observations()
to understand how to read several HDF5 files at once; if you do not need them, you can pass 0 to both.