repic.utils.coord_converter
Converts particle bounding box coordinates between different formats (STAR, BOX, dat, coord, etc.)
Attributes
STAR file column name for x-coordinate of particle bounding box |
|
STAR file column name for y-coordinate of particle bounding box |
|
STAR file column name for figure of merit |
|
STAR file column name for micrograph name |
|
default column names for Pandas data frame |
|
dictionary of header mappings for STAR file |
|
dictionary of header mappings for BOX file |
|
dictionary of header mappings for cBOX file |
|
dictionary of header mappings for TSV file |
|
dictionary of header mappings for CryoSparc file |
|
flag for automatically processing columns of certain file types |
|
argparse parse_args() object |
Classes
Functions
|
Formats and prints message to console with one of the following logging levels: |
|
Checks if Python object can be converted to int datatype |
Checks if all elements of Pandas dataframe row are numeric values |
|
|
Checks if string s contains an integer |
|
Creates parent directory if it does not exist |
|
Checks if file path is a file |
|
Converts particle bounding box coordinate file (in CryoSparc format) into a Pandas dataframe with the correct column headers |
|
Converts particle bounding box coordinate file (in STAR file format) into a Pandas dataframe with the correct column headers |
|
Converts particle bounding box coordinate file (in TSV-like file format) into a Pandas dataframe, skipping any non-numeric header rows |
|
Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage in STAR file format |
|
Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage, optionally writing out [x, y, w, h, conf] labels as a header |
|
Converts between different particle bounding box formats |
Module Contents
- repic.utils.coord_converter.STAR_COL_X = '_rlnCoordinateX'
STAR file column name for x-coordinate of particle bounding box
- repic.utils.coord_converter.STAR_COL_Y = '_rlnCoordinateY'
STAR file column name for y-coordinate of particle bounding box
- repic.utils.coord_converter.STAR_COL_C = '_rlnAutopickFigureOfMerit'
STAR file column name for figure of merit
- repic.utils.coord_converter.STAR_COL_N = '_rlnMicrographName'
STAR file column name for micrograph name
- repic.utils.coord_converter.DF_COL_NAMES = ['x', 'y', 'w', 'h', 'conf', 'name']
default column names for Pandas data frame
- repic.utils.coord_converter.STAR_HEADER_MAP
dictionary of header mappings for STAR file
- repic.utils.coord_converter.BOX_HEADER_MAP
dictionary of header mappings for BOX file
- repic.utils.coord_converter.CBOX_HEADER_MAP
dictionary of header mappings for cBOX file
- repic.utils.coord_converter.TSV_HEADER_MAP
dictionary of header mappings for TSV file
- repic.utils.coord_converter.CS_HEADER_MAP
dictionary of header mappings for CryoSparc file
- repic.utils.coord_converter.AUTO = 'auto'
flag for automatically processing columns of certain file types
- repic.utils.coord_converter._log(msg, lvl=0, quiet=False)
Formats and prints message to console with one of the following logging levels: 0: info (print and continue execution; ignore if quiet=True), 1: warning (print and continue execution), 2: error (print and exit with code 1)
- Parameters:
msg (str) – message text
- Keyword Arguments:
lvl (int, default=0) – logging level
quiet (bool, default=False) – suppress log printing
- Returns:
None
- repic.utils.coord_converter._is_int(val)
Checks if Python object can be converted to int datatype
- Parameters:
val (obj) – Python object
- Returns:
True if string can be converted (False otherwise)
- Return type:
bool
- repic.utils.coord_converter._row_is_all_nonnumeric(x)
Checks if all elements of Pandas dataframe row are numeric values
- Parameters:
x (obj) – Pandas dataframe row object
- Returns:
True if all elements are numeric values (False otherwise)
- Return type:
bool
- repic.utils.coord_converter._has_numbers(s)
Checks if string s contains an integer
- Parameters:
s (str) – string
- Returns:
True if string contains integer (False otherwise)
- Return type:
bool
- repic.utils.coord_converter._make_parent_dir(path_str)
Creates parent directory if it does not exist
- Parameters:
path_str (str) – filepath to subdirectory
- Returns:
None
- repic.utils.coord_converter._path_occupied(path_str)
Checks if file path is a file
- Parameters:
path_str (str) – filepath
- Returns:
True if filepath is a file (False otherwise)
- Return type:
bool
- repic.utils.coord_converter.cs_to_df(path)
Converts particle bounding box coordinate file (in CryoSparc format) into a Pandas dataframe with the correct column headers
- Parameters:
path (str) – filepath to particle bounding box file
- Returns:
Pandas dataframe of particle bounding box coordinates
- Return type:
obj
- repic.utils.coord_converter.star_to_df(path)
Converts particle bounding box coordinate file (in STAR file format) into a Pandas dataframe with the correct column headers
- Parameters:
path (str) – filepath to particle bounding box file
- Returns:
Pandas dataframe of particle bounding box coordinates
- Return type:
obj
- repic.utils.coord_converter.tsv_to_df(path, header_mode=None)
Converts particle bounding box coordinate file (in TSV-like file format) into a Pandas dataframe, skipping any non-numeric header rows
- Parameters:
path (str) – filepath to particle bounding box file
- Keyword Arguments:
header_mode (int, str, or None) – One of None, “infer” or an int (row index). If None, any non-numeric rows at the top of the file are skipped and column names are not set. Otherwise, manual column skipping is not performed, and header_mode is passed directly to the header argument of pandas.read_csv
- Returns:
Pandas dataframe of particle bounding box coordinates
- Return type:
obj
- repic.utils.coord_converter.df_to_star(df, out_path, force=False)
Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage in STAR file format
- Parameters:
df (object) – Pandas dataframe of particle bounding box coordinates
out_path (str) – filepath to output file
- Keyword Arguments:
force (bool, default=False) – overwrite output file if it exists
- Returns:
None
- repic.utils.coord_converter.df_to_tsv(df, col_order, out_path, include_header=False, force=False)
Writes Panda dataframe of particle bounding box coordinates (generated from one of the *_to_df methods) to storage, optionally writing out [x, y, w, h, conf] labels as a header
- Parameters:
df (object) – Pandas dataframe of particle bounding box coordinates
col_order (list) – list of ordered file columns
out_path (str) – filepath to output file
- Keyword Arguments:
include_header (bool, default=False) – include header in output file
force (bool, default=False) – overwrite output file if it exists
- Returns:
None
- repic.utils.coord_converter.process_conversion(paths, in_fmt, out_fmt, boxsize=None, out_dir=None, in_cols=('auto', 'auto', 'auto', 'auto', 'auto', 'auto'), out_col_order=('x', 'y', 'w', 'h', 'conf', 'name'), suffix='', include_header=False, single_out=False, multi_out=False, round_to=None, norm_conf=None, require_conf=None, force=False, quiet=False)
Converts between different particle bounding box formats
- Parameters:
paths (list) – filepaths of particle bounding box coordinate files
in_fmt (str) – input file format
out_fmt (str) – output file format
- Keyword Arguments:
boxsize (int or None) – particle bounding box height/width
out_dir (str or None) – filepath to output file
in_cols (tuple) – tuple of column determination (default=auto)
out_col_order (tuple) – output column order
suffix (str) – additional suffix for output files (default=’’)
include_header (bool, default=False) – include header in output file
single_out (bool, default=False) – output particle bounding box coordinates in a single file
multi_out (bool, default=False) – output particle bounding box coordinates in multiple files (one per micrograph)
round_to (int or None) – round coordinates to the specified number of decimal places
norm_conf (list or None) – list of min and max confidence values to be used to normalize observed confidence scores
require_conf (float or None) – model confidence score to assign to particle bounding boxes without a score
force (bool, default=False) – overwrite output file if it exists
quiet (bool, default=False) – suppress log printing
- Returns:
None
- repic.utils.coord_converter.parser
argparse parse_args() object
- Type:
obj