Random numbers#

The LiteBIRD Simulation Framework can be used to generate random numbers, used for example for producing noise timelines. In order to do so, a seed and a Hierarchy of Random Number Generators (RNGs) are used.

Seed#

The random_seed is used to control the behavior of the RNG. The seed can be None or an integer number. If you are not interested in the reproducibility of your results, you can set random_seed to None. However, this is not recommended, as if you run many times a function or method that uses an RNG, e.g., for producing noise timelines, then the outputs will be different, and you will not be able to re-obtain these results again. If, instead, you are interested in the reproducibility of your results, you can set random_seed to an integer number. With this choice, if you run a function or method multiple times using an RNG, then the outputs will be the same, and you will obtain these results again by re-running your script, as the random numbers produced by the generator will be the same.

How should you set the random_seed? This parameter must be passed to the constructor of a Simulation class. If you create a Simulation instance without passing the seed, the constructor will raise an error, and your script will fail. We implemented this feature to avoid automatic settings of the seed and unclear behavior of the generation of random numbers. If you run a parallel script using MPI, you do not have to worry about setting a different seed for different MPI ranks. The random number generator ensures that the random numbers generated by the ranks will produce orthogonal sequences as the first layer of the RNG hierarchy below the random_seed. Furthermore, a second layer of per-detector RNGs is created, ensuring that each detector of each MPI task is independent. This structure is stored in a RNGHierarchy instance, which handles this under the hood.

We want this behavior as we do not want repetitions of, e.g., the same noise TOD if we split their computation on different MPI ranks. For example, in this way, if you split the TOD matrix of an Observation class by the time, you will not encounter the same noise after the samples generated by a certain rank; if you split the TOD matrix of an Observation class by the detectors, each one will have a different noise timestream.

Regarding the reproducibility of the results in a parallel code, there is an important thing to bear in mind. If you set the seed to an integer number but run your script with a different number of MPI ranks, the outputs will be different! Think about a noise time stream of 4 samples. If you use 2 MPI ranks, then the first two samples will be generated by one RNG (rank 0), while the last two samples will be generated by a different RNG (rank 1). If you then run the same script with the same seed but with 4 MPI ranks, each of the samples will be generated by a different RNG, and only the first sample will be the same for the two runs, as it is always the first one generated by rank 0’s RNG.

Despite this, the closest thing you can do towards complete reproducibility among different runs is the possibility to save to disk the instance of RNGHierarchy. This can be done at any moment calling RNGHierarchy.save(), which saves a frozen image of the state of the random generators, that can be reloaded through RNGHierarchy.load().

The setting of the random_seed is as simple as this:

sim = lbs.Simulation(
    base_path="/storage/output",
    start_time=astropy.time.Time("2020-02-01T10:30:00"),
    duration_s=3600.0,
    name="My noise simulation",
    description="I want to generate some noise and be able to reproduce my results",
    random_seed=12345, # Here the seed for the random number generator is set
)

The Simulation constructor runs the Simulation.init_rng_hierarchy() method, which builds the MPI-layer of the RNG hierarchy and seeds it with random_seed. When observations are created with Simulation.create_observations(), the detector-layer of the RNG hierarchy is initialized with Simulation.init_detectors_random(). You can then access these RNGs by calling sim.dets_random, which is a list of RNGs. There is also the possibility to change the random seed after creating a Simulation instance. You can achieve this by calling Simulation.init_random(), which will regenerate the full hierarchy:

sim = lbs.Simulation(random_seed=12345)
[...]
sim.init_random(random_seed=42, num_detectors=3)

Random number generators#

The random number generator used by the RNGHierarchy is PCG64. After creating this RNG hierarchy from the Simulation constructor, for example, you can access the detector-level geenrators via sim.dets_random:

sim = lbs.Simulation(random_seed=12345)
[...]
sim.add_noise(noise_type='white', dets_random=sim.dets_random)

You can also use your own RNGs with the functions and methods of litebird_sim:

sim = lbs.Simulation(random_seed=12345)
[...]
my_rngs = ... # New list of RNG definition (length must match number of detectors)
sim.add_noise(noise_type='white', dets_random=my_rngs)

You should just make sure that your custom RNG implements the normal method, so it can be used for white noise generation.

Details on the RNG hierarchy#

As mentioned above, the RNG hierarchy system in litebird_sim provides a reproducible and flexible structure for managing random number generators across multiple levels of a simulation. This is especially useful in MPI-parallel contexts where independent and deterministic streams of randomness are required at different granularity levels—such as per-task or per-detector.

The key classes and methods provided by this module are:

  1. RNGHierarchy: A container managing a nested RNG tree from a single base seed.

  2. RNGHierarchy.build_mpi_layer(): Method to build the first level of the RNG hierarchy for MPI tasks.

  3. RNGHierarchy.build_detector_layer(): Method to build the second level of the RNG hierarchy for detectors.

  4. RNGHierarchy.get_generator(): Method to get a speficic RNG generator from the hierarchy.

  5. RNGHierarchy.get_detector_level_generators_on_rank(): Method to extract the full list of generators for a specific MPI rank.

  6. RNGHierarchy.regenerate_or_check_detector_generators(): Method to check validity of a set of detector-level generators and eventually regenerate them from a new base seed.

The following example demonstrates how you can create a layered RNG hierarchy:

import litebird_sim as lbs

base_seed = 12345
num_mpi_tasks = 2
num_detectors = 3

# Create a hierarchy of RNGs with two levels: task and detector
rng_hierarchy = lbs.RNGHierarchy(base_seed)
rng_hierarchy.build_mpi_layer(num_mpi_tasks)
rng_hierarchy.build_detector_layer(num_detectors)

# or directly with
rng = lbs.RNGHierarchy(base_seed, num_mpi_tasks, num_detectors)

Once the RNGHierarchy is instanciated and RNGs are created, it is sufficient to use RNGHierarchy.get_detector_level_generators_on_rank() to obtain all the generators of a specific MPI rank. If needed, RNGHierarchy.get_generator() allows you to retrieve a specific generator from the hierarchy.

rank = 0
generators_of_first_rank = rng_hierarchy.get_detector_level_generators_on_rank(rank)

indeces = (1, 2)
generetor_second_rank_third_det = rng_hierarchy.get_generator(indeces)

Furthermore, RNGHierarchy.add_extra_layer() allows you to define a new level of the hierarchy from the current leaf layer.

Finally, the RNGHierarchy.regenerate_or_check_detector_generators() allows you to quickly regenerate a set of new detector-level generators, or to check the validity of the existing one. This is useful if you passes a random seed specific for each systematics. This needs a list of Observation to work.

Saving and loading#

RNG hierarchies can be serialized and deserialized via:

This is useful for ensuring reproducibility even across different simulation runs.

rng_hierarchy.save("my_rng_state.pkl")

restored_rng_hierarchy = lbs.RNGHierarchy.load("my_rng_state.pkl")

assert rng == restored_rng

API reference#

class litebird_sim.seeding.RNGHierarchy(base_seed: int, comm_size: int | None = None, num_detectors_per_rank: int | None = None)#

Bases: object

SAVE_FORMAT_VERSION = 1#
add_extra_layer(num_children: int, layer_name: str | None = None)#

Recursively add an additional layer to all leaves of the hierarchy.

This method adds a new generation of children nodes to all existing detector-level nodes. The added layer can be optionally named (e.g., “subdet”) or will be auto-labeled using the depth.

Parameters:
  • num_children (int) – Number of child nodes to spawn under each leaf node.

  • layer_name (str, optional) – Prefix used to label the new layer’s children, by default None.

Notes

The new layer is recursively inserted below the current deepest level (detectors by default).

build_detector_layer(num_detectors_per_rank: int)#

Build (or rebuild) the RNG hierarchy down to the detector layer beneath each MPI rank.

Each MPI rank node spawns a fixed number of detector-level seed sequences and generators. These are added under the “children” dictionary of each rank node.

Parameters:

num_detectors_per_rank (int) – Number of detectors (i.e., child nodes) to create under each MPI rank.

build_hierarchy(comm_size: int, detectors_per_rank: int)#

Convenience function to construct a two-level hierarchy with ranks and detectors: root → MPI rank → detectors.

Equivalent to calling build_mpi_layer followed by build_detector_layer.

Parameters:
  • comm_size (int) – Number of MPI ranks.

  • detectors_per_rank (int) – Number of detectors per MPI rank.

build_mpi_layer(comm_size: int)#

Construct (or reconstruct) the MPI rank layer of the RNG hierarchy.

This method spawns a seed and corresponding RNG generator for each MPI rank, starting from the root seed sequence. Each rank node includes an empty ‘children’ field for downstream levels.

Parameters:

comm_size (int) – The number of MPI ranks to include in the hierarchy.

get_detector_level_generators_on_rank(rank: int | str) list[Generator]#

Retrieve the list of RNG generators for all detectors under a given MPI rank.

Wrapper around the get_detector_level_generators_from_hierarchy utility.

get_generator(*indices: int | str) Generator#

Retrieve a generator from the hierarchy using a sequence of indices.

Wrapper around the get_generator_from_hierarchy utility function.

classmethod load(path: str) RNGHierarchy#

Load a saved RNGHierarchy instance from file.

This method loads a previously saved RNGHierarchy object and validates its format version.

Parameters:

path (str) – Path to the saved hierarchy pickle file.

Returns:

The loaded hierarchy instance.

Return type:

RNGHierarchy

Raises:

ValueError – If the saved format version is incompatible with the current implementation.

save(path: str)#

Serialize and save the RNGHierarchy to a file.

This method uses pickle to serialize the current instance and save it to a binary file. Metadata, including the format version and timestamp, are preserved.

Parameters:

path (str) – Path to the file where the hierarchy should be saved.

litebird_sim.seeding.get_derived_random_generators(source_sequence: SeedSequence | list[SeedSequence], num_to_spawn: int) tuple[list[SeedSequence], list[Generator]]#

Generate multiple derived SeedSequence and corresponding RNGs from one or more sources.

This utility function spawns num_to_spawn child SeedSequence objects from each source sequence and uses them to create numpy.random.Generator instances using the PCG64 bit generator.

Parameters:
  • source_sequence (SeedSequence | list[SeedSequence]) – The source seed sequence(s) from which to derive new sequences.

  • num_to_spawn (int) – Number of new sequences to spawn from each source sequence.

Returns:

Returns a tuple where the first element is the list of newly spawned SeedSequence objects, and the second is the corresponding list of RNG generators.

Return type:

tuple[list[SeedSequence], list[Generator]]

Raises:

ValueError – Raised if any of the provided source sequences are not valid SeedSequence instances.

litebird_sim.seeding.get_detector_level_generators_from_hierarchy(hierarchy: dict, rank: int | str) list[Generator]#

Retrieve the list of detector-level RNGs under a given MPI rank node.

Parameters:
  • hierarchy (dict) – The top-level RNG hierarchy dictionary.

  • rank (int or str) – Rank identifier (e.g., 0 or “rank0”).

Returns:

A list of RNG generators corresponding to the detectors for that rank.

Return type:

list[numpy.random.Generator]

Raises:

KeyError – If the specified rank is not found or has no detector children.

litebird_sim.seeding.get_generator_from_hierarchy(hierarchy: dict, *indices: int | str) Generator#

Retrieve a specific RNG generator from a nested hierarchy using a path of indices.

The indices are interpreted as hierarchical labels (e.g., “rank0”, “det3”). If integers are passed, they are automatically converted to appropriate labels (e.g., (0, 1) → (“rank0”, “det1”)).

Parameters:
  • hierarchy (dict) – A nested dictionary representing the RNG hierarchy.

  • indices (sequence of str or int) – Path to follow in the hierarchy, e.g., (0, 2) or (“rank0”, “det2”).

Returns:

Generator located at the given path in the hierarchy.

Return type:

numpy.random.Generator

Raises:

KeyError – If any index along the path does not exist, or if no generator is found at the final node.

litebird_sim.seeding.regenerate_or_check_detector_generators(observations: list[Observation], user_seed: int | None = None, dets_random: list[Generator] | None = None) list[Generator]#

Check or regenerate detector-level RNGs for a given set of observations.

This function ensures that a list of numpy.random.Generator objects is provided for all detectors in the observation. If a user_seed is provided, a new RNGHierarchy is built and the corresponding generators are extracted for the current MPI rank.

Parameters:
  • observations (list[Observation]) – A list of observations, assumed to have consistent number of detectors and communicators.

  • user_seed (int, optional) – Optional base seed used to regenerate the RNG hierarchy.

  • dets_random (list[Generator], optional) – List of pre-constructed RNGs to be validated.

Returns:

A list of RNG generators, one for each detector.

Return type:

list[Generator]

Raises:
  • ValueError – If neither user_seed nor dets_random is provided.

  • AssertionError – If the number of generators does not match the number of detectors.