repic.utils.coord_converter

Converts particle bounding box coordinates between different formats (STAR, BOX, dat, coord, etc.)

Attributes

Box

named tuple for script output in BOX format

STAR_COL_X

STAR file column name for x-coordinate of particle bounding box

STAR_COL_Y

STAR file column name for y-coordinate of particle bounding box

STAR_COL_C

STAR file column name for figure of merit

STAR_COL_N

STAR file column name for micrograph name

DF_COL_NAMES

default column names for Pandas data frame

STAR_HEADER_MAP

dictionary of header mappings for STAR file

BOX_HEADER_MAP

dictionary of header mappings for BOX file

CBOX_HEADER_MAP

dictionary of header mappings for cBOX file

TSV_HEADER_MAP

dictionary of header mappings for TSV file

CS_HEADER_MAP

dictionary of header mappings for CryoSparc file

AUTO

flag for automatically processing columns of certain file types

parser

argparse parse_args() object

Functions

_log(msg[, lvl, quiet])

Formats and prints message to console with one of the following logging levels:

_is_int(val)

Checks if Python object can be converted to int datatype

_row_is_all_nonnumeric(x)

Checks if all elements of Pandas dataframe row are numeric values

_has_numbers(s)

Checks if string s contains an integer

_make_parent_dir(path_str)

Creates parent directory if it does not exist

_path_occupied(path_str)

Checks if file path is a file

cs_to_df(path)

Converts particle bounding box coordinate file (in CryoSparc format) into a Pandas dataframe with the correct column headers

star_to_df(path)

Converts particle bounding box coordinate file (in STAR file format) into a Pandas dataframe with the correct column headers

tsv_to_df(path[, header_mode])

Converts particle bounding box coordinate file (in TSV-like file format) into a Pandas dataframe, skipping any non-numeric header rows

df_to_star(df, out_path[, force])

Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage in STAR file format

df_to_tsv(df, col_order, out_path[, include_header, force])

Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage, optionally writing out [x, y, w, h, conf] labels as a header

process_conversion(paths, in_fmt, out_fmt[, boxsize, ...])

Converts between different particle bounding box formats

Module Contents

repic.utils.coord_converter.Box

named tuple for script output in BOX format

repic.utils.coord_converter.STAR_COL_X = '_rlnCoordinateX'

STAR file column name for x-coordinate of particle bounding box

repic.utils.coord_converter.STAR_COL_Y = '_rlnCoordinateY'

STAR file column name for y-coordinate of particle bounding box

repic.utils.coord_converter.STAR_COL_C = '_rlnAutopickFigureOfMerit'

STAR file column name for figure of merit

repic.utils.coord_converter.STAR_COL_N = '_rlnMicrographName'

STAR file column name for micrograph name

repic.utils.coord_converter.DF_COL_NAMES = ['x', 'y', 'w', 'h', 'conf', 'name']

default column names for Pandas data frame

repic.utils.coord_converter.STAR_HEADER_MAP

dictionary of header mappings for STAR file

repic.utils.coord_converter.BOX_HEADER_MAP

dictionary of header mappings for BOX file

repic.utils.coord_converter.CBOX_HEADER_MAP

dictionary of header mappings for cBOX file

repic.utils.coord_converter.TSV_HEADER_MAP

dictionary of header mappings for TSV file

repic.utils.coord_converter.CS_HEADER_MAP

dictionary of header mappings for CryoSparc file

repic.utils.coord_converter.AUTO = 'auto'

flag for automatically processing columns of certain file types

repic.utils.coord_converter._log(msg, lvl=0, quiet=False)

Formats and prints message to console with one of the following logging levels: 0: info (print and continue execution; ignore if quiet=True), 1: warning (print and continue execution), 2: error (print and exit with code 1)

Parameters:

msg (str) – message text

Keyword Arguments:
  • lvl (int, default=0) – logging level

  • quiet (bool, default=False) – suppress log printing

Returns:

None

repic.utils.coord_converter._is_int(val)

Checks if Python object can be converted to int datatype

Parameters:

val (obj) – Python object

Returns:

True if string can be converted (False otherwise)

Return type:

bool

repic.utils.coord_converter._row_is_all_nonnumeric(x)

Checks if all elements of Pandas dataframe row are numeric values

Parameters:

x (obj) – Pandas dataframe row object

Returns:

True if all elements are numeric values (False otherwise)

Return type:

bool

repic.utils.coord_converter._has_numbers(s)

Checks if string s contains an integer

Parameters:

s (str) – string

Returns:

True if string contains integer (False otherwise)

Return type:

bool

repic.utils.coord_converter._make_parent_dir(path_str)

Creates parent directory if it does not exist

Parameters:

path_str (str) – filepath to subdirectory

Returns:

None

repic.utils.coord_converter._path_occupied(path_str)

Checks if file path is a file

Parameters:

path_str (str) – filepath

Returns:

True if filepath is a file (False otherwise)

Return type:

bool

repic.utils.coord_converter.cs_to_df(path)

Converts particle bounding box coordinate file (in CryoSparc format) into a Pandas dataframe with the correct column headers

Parameters:

path (str) – filepath to particle bounding box file

Returns:

Pandas dataframe of particle bounding box coordinates

Return type:

obj

repic.utils.coord_converter.star_to_df(path)

Converts particle bounding box coordinate file (in STAR file format) into a Pandas dataframe with the correct column headers

Parameters:

path (str) – filepath to particle bounding box file

Returns:

Pandas dataframe of particle bounding box coordinates

Return type:

obj

repic.utils.coord_converter.tsv_to_df(path, header_mode=None)

Converts particle bounding box coordinate file (in TSV-like file format) into a Pandas dataframe, skipping any non-numeric header rows

Parameters:

path (str) – filepath to particle bounding box file

Keyword Arguments:

header_mode (int, str, or None) – One of None, “infer” or an int (row index). If None, any non-numeric rows at the top of the file are skipped and column names are not set. Otherwise, manual column skipping is not performed, and header_mode is passed directly to the header argument of pandas.read_csv

Returns:

Pandas dataframe of particle bounding box coordinates

Return type:

obj

repic.utils.coord_converter.df_to_star(df, out_path, force=False)

Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage in STAR file format

Parameters:
  • df (object) – Pandas dataframe of particle bounding box coordinates

  • out_path (str) – filepath to output file

Keyword Arguments:

force (bool, default=False) – overwrite output file if it exists

Returns:

None

repic.utils.coord_converter.df_to_tsv(df, col_order, out_path, include_header=False, force=False)

Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage, optionally writing out [x, y, w, h, conf] labels as a header

Parameters:
  • df (object) – Pandas dataframe of particle bounding box coordinates

  • col_order (list) – list of ordered file columns

  • out_path (str) – filepath to output file

Keyword Arguments:
  • include_header (bool, default=False) – include header in output file

  • force (bool, default=False) – overwrite output file if it exists

Returns:

None

repic.utils.coord_converter.process_conversion(paths, in_fmt, out_fmt, boxsize=None, out_dir=None, in_cols=('auto', 'auto', 'auto', 'auto', 'auto', 'auto'), out_col_order=('x', 'y', 'w', 'h', 'conf', 'name'), suffix='', include_header=False, single_out=False, multi_out=False, round_to=None, norm_conf=None, require_conf=None, force=False, quiet=False)

Converts between different particle bounding box formats

Parameters:
  • paths (list) – filepaths of particle bounding box coordinate files

  • in_fmt (str) – input file format

  • out_fmt (str) – output file format

Keyword Arguments:
  • boxsize (int or None) – particle bounding box height/width

  • out_dir (str or None) – filepath to output file

  • in_cols (tuple) – tuple of column determination (default=auto)

  • out_col_order (tuple) – output column order

  • suffix (str) – additional suffix for output files (default=’’)

  • include_header (bool, default=False) – include header in output file

  • single_out (bool, default=False) – output particle bounding box coordinates in a single file

  • multi_out (bool, default=False) – output particle bounding box coordinates in multiple files (one per micrograph)

  • round_to (int or None) – round coordinates to the specified number of decimal places

  • norm_conf (list or None) – list of min and max confidence values to be used to normalize observed confidence scores

  • require_conf (float or None) – model confidence score to assign to particle bounding boxes without a score

  • force (bool, default=False) – overwrite output file if it exists

  • quiet (bool, default=False) – suppress log printing

Returns:

None

repic.utils.coord_converter.parser

argparse parse_args() object

Type:

obj