tools.speedutils#
This module provides utility functions for efficient DataFrame concatenation and array pair matching.
- skysurvey.tools.speedutils.isin_pair_elements(elements, test_elements)[source]#
Test whether each pair of integers in elements is present in test_elements.
- Parameters:
elements (array_like) – Array of integer pairs to test.
test_element (array_like) – Array of integer pairs defining the reference set.
- Returns:
isin – Boolean array. True if the corresponding pair in elements is present in test_elements, False otherwise.
- Return type:
ndarray, bool
- skysurvey.tools.speedutils.chunk_dfs(dfs, chunk_size)[source]#
Split an iterable of DataFrames into successive chunks.
- Parameters:
dfs (iterable of pandas.DataFrame) – Iterable yielding DataFrames to be grouped into chunks.
chunk_size (int) – Number of DataFrames per chunk.
- Yields:
chunk (list of pandas.DataFrame) – List of DataFrames in the current chunk.
size (int) – Number of DataFrames in the chunk (may be smaller than chunk_size for the last chunk).
- skysurvey.tools.speedutils.concat_chunk(dfs, **kwargs)[source]#
Concatenate a chunk of DataFrames using pandas.concat.
- Parameters:
dfs (iterable of pandas.DataFrame) – DataFrames to concatenate.
**kwargs – Additional keyword arguments passed to pandas.concat.
- Returns:
Concatenated DataFrame.
- Return type:
pandas.DataFrame
- skysurvey.tools.speedutils.eff_concat(dfs, chunk_size, keys=None, **kwargs)[source]#
Efficiently concatenate a large number of DataFrames by chunking.
- Parameters:
dfs (iterable of pandas.DataFrame) – DataFrames to concatenate.
chunk_size (int) – Number of DataFrames per chunk.
keys (sequence, optional) – Keys to use for indexing, passed to pandas.concat. When chunking, the corresponding slice of keys is passed to each chunk. Default is None.
**kwargs – Additional keyword arguments passed to pandas.concat.
- Returns:
Concatenated DataFrame.
- Return type:
pandas.DataFrame