repic.utils.build_subsets

Creates cross-validation subsets for iterative ensemble particle picking

Attributes

name

module name (used by argparse subparser)

rng

NumPy random generator (set to zero for reproducibility)

parser

argparse parse_args() object

Functions

add_arguments(parser)

Adds argparse command line arguments for build_subsets.py

calc_subsets(n[, s])

Calculates subsets of examples (micrographs) for desired sampling percentages (1, 25, 50, and 100%)

create_symlinks(args, files, label)

Creates symlinks for cross-validation files

plot_defocus(data, low, med, out_file)

Creates Matplotlib line plot of CTFFIND4 defocus values

sample_from_bin(bins, i)

Samples example from a random defocus bin (low, medium, and high) if the bin has items else randomly choose another bin to sample from

main(args)

Builds training, validation, and testing subsets (cross-validation files) for machine learning algorithm training

Module Contents

repic.utils.build_subsets.name = 'build_subsets'

module name (used by argparse subparser)

Type:

str

repic.utils.build_subsets.rng

NumPy random generator (set to zero for reproducibility)

repic.utils.build_subsets.add_arguments(parser)

Adds argparse command line arguments for build_subsets.py

Parameters:

parser (object) – argparse parse_args() object

Returns:

None

repic.utils.build_subsets.calc_subsets(n, s=3)

Calculates subsets of examples (micrographs) for desired sampling percentages (1, 25, 50, and 100%)

Parameters:
  • n (int) – total number of examples to sample from

  • s (int) – number of examples to sample each iteration (s = 3 represents the low, medium, and high defocus bins)

Returns:

Python dictionary containing the number of examples (values) per subset (key)

Return type:

dict

Creates symlinks for cross-validation files

Parameters:
  • args (obj) – argparse command line argument object

  • files (list) – list of micrograph filenames to be symlinled

  • label (str) – name for created subdirectory that will contain linked files

Returns:

None

repic.utils.build_subsets.plot_defocus(data, low, med, out_file)

Creates Matplotlib line plot of CTFFIND4 defocus values

Parameters:
  • data (list) – list of paired micrograph filenames and CTFFIND4 defocus values

  • low (float) – low defocus bin upper threshold

  • med (float) – medium defocus bin upper threshold

  • outfile (str) – filepath of the produced line plot

Returns:

None

repic.utils.build_subsets.sample_from_bin(bins, i)

Samples example from a random defocus bin (low, medium, and high) if the bin has items else randomly choose another bin to sample from

Parameters:
  • bins (list) – list of defocus bins

  • i (int) – index of defocus bin to sample from

Returns:

filename (str) and CTFFIND4 defocus value (float) of sampled example

Return type:

tuple

repic.utils.build_subsets.main(args)

Builds training, validation, and testing subsets (cross-validation files) for machine learning algorithm training

Parameters:

args (obj) – argparse command line argument object

repic.utils.build_subsets.parser

argparse parse_args() object

Type:

obj