tools.speedutils#

This module provides utility functions for efficient DataFrame concatenation and array pair matching.

skysurvey.tools.speedutils.isin_pair_elements(elements, test_elements)[source]#

Test whether each pair of integers in elements is present in test_elements.

Parameters:
  • elements (array_like) – Array of integer pairs to test.

  • test_element (array_like) – Array of integer pairs defining the reference set.

Returns:

isin – Boolean array. True if the corresponding pair in elements is present in test_elements, False otherwise.

Return type:

ndarray, bool

skysurvey.tools.speedutils.chunk_dfs(dfs, chunk_size)[source]#

Split an iterable of DataFrames into successive chunks.

Parameters:
  • dfs (iterable of pandas.DataFrame) – Iterable yielding DataFrames to be grouped into chunks.

  • chunk_size (int) – Number of DataFrames per chunk.

Yields:
  • chunk (list of pandas.DataFrame) – List of DataFrames in the current chunk.

  • size (int) – Number of DataFrames in the chunk (may be smaller than chunk_size for the last chunk).

skysurvey.tools.speedutils.concat_chunk(dfs, **kwargs)[source]#

Concatenate a chunk of DataFrames using pandas.concat.

Parameters:
  • dfs (iterable of pandas.DataFrame) – DataFrames to concatenate.

  • **kwargs – Additional keyword arguments passed to pandas.concat.

Returns:

Concatenated DataFrame.

Return type:

pandas.DataFrame

skysurvey.tools.speedutils.eff_concat(dfs, chunk_size, keys=None, **kwargs)[source]#

Efficiently concatenate a large number of DataFrames by chunking.

Parameters:
  • dfs (iterable of pandas.DataFrame) – DataFrames to concatenate.

  • chunk_size (int) – Number of DataFrames per chunk.

  • keys (sequence, optional) – Keys to use for indexing, passed to pandas.concat. When chunking, the corresponding slice of keys is passed to each chunk. Default is None.

  • **kwargs – Additional keyword arguments passed to pandas.concat.

Returns:

Concatenated DataFrame.

Return type:

pandas.DataFrame