WorksheetWrapper

(Implemented at xlsx_copycull.xlsx_copycull.WorksheetWrapper but automatically imported as a top-level class, xlsx_copycull.WorksheetWrapper.)

Note

A WorksheetWrapper can be created directly, but most likely you will want to create one via the stage_ws() method on a WorkbookWrapper object.

class xlsx_copycull.WorksheetWrapper(wb_wrapper: WorkbookWrapper, ws_name: str, header_row: int = 1, protected_rows: Optional[set] = None, first_modifiable_row: int = -1)

A wrapper class for openpyxl worksheets, with added methods for culling rows (based on whether cell values match a specified condition) and adding formulas.

__init__(wb_wrapper: WorkbookWrapper, ws_name: str, header_row: int = 1, protected_rows: Optional[set] = None, first_modifiable_row: int = -1)
Parameters
  • wb_wrapper – The parent WorkbookWrapper object.

  • ws_name – The name of this worksheet.

  • header_row – The row containing headers (an int, indexed to 1)

  • protected_rows

    (Optional) A list-like object containing the rows that should never be modified or deleted. Rows before first_modifiable_row and the header row will be automatically added.

    Note

    .protected_rows may change behind the scenes if rows are

    deleted by .cull(). If rows are inserted or deleted outside the functionality of this module, .protected_rows may get corrupted.

  • first_modifiable_row – (Optional) The first row that may be modified or deleted (an int, indexed to 1). If not set, will default to the first row after the header_row.

add_formulas(formulas, rows=None, protected_rows=None, number_formats: Optional[dict] = None) dict

Add formulas to the working spreadsheet in .ws.

Parameters

formulas – A dict keyed by column name (e.g. 'E') whose values are a function that generates the formula, based on the row number.

For example, to generate these formulas…:

# Column R --> =F5*AB5/$S$1  (for an example row 5)
# Column S --> =AB5*AC5  (for an example row 5)
# ...we would pass this dict:

formulas = {
    "R": lambda row_num: "=F{0}*AB{0}/$S$1".format(row_num),
    "S": lambda row_num: "=AB{0}*AC{0}".format(row_num)
}
Parameters
  • rows – The rows where formulas should be added. If rows is specified here, it will IGNORE protected_rows (potentially adding formulas to all rows in rows, even if they are also in protected_rows). However, if rows is NOT specified, it will insert a formula into every row EXCEPT protected_rows.

  • protected_rows – (Optional) A list-like object containing the rows that should never be deleted. If not specified here, will pull from what is set in .protected_rows. NOTE: If .cull() was called before this method, then the protected_rows may have changed since this object was initialized. See comments under .cull() for a more complete discussion.

  • number_formats

    (Optional) A dict, keyed by column letter, whose values are the 'number_format' to apply to any cells in that column to which we’re adding a formula. Reference openpyxl documentation for possible values and built-in options for number_format.

    from openpyxl.styles.numbers import BUILTIN_FORMATS
    
    # ...
    
    number_formats = {
        "R": "General",
        "S": BUILTIN_FORMATS[2]  # number format of '0.00'
    }
    

Returns

A dict, keyed by Column letter, whose values are a list of the cell names that were modified (e.g., {'A': ['A2', 'A3']}).

cull(select_conditions: dict, bool_oper='AND', protected_rows=None)

Cull the spreadsheet, based on the select_conditions. If more than one select condition is used (i.e. more than one key in select_conditions), specify whether to apply 'AND', 'OR', or 'XOR' boolean logic to the resulting sets by passing one of those as bool_oper (defaults to 'AND').

Note

protected_rows is a list (or set) of integers, being the row numbers for those rows that should NEVER be deleted (indexed to 1). If rows above those numbers get deleted, the resulting indexes in protected_rows would not be accurate, so this method adjusts the protected rows to their new position. The resulting row numbers are stored to attribute .last_protected_rows. IF AND ONLY IF protected_rows is NOT passed as an arg here, it will be pulled from .protected_rows, and the resulting protected_rows will ALSO be stored to .protected_rows (in addition to .last_protected_rows).

(The reason for this design choice was that rows may need to be protected for one operation, but not for another – but we want to be able to track any changes to them. This way, they can be accessed in .protected_rows and/or .last_protected_rows after calling .cull() but before calling the next method, which might change them.)

Parameters
  • select_conditions – A dict of column_header-to- select_condition pairs, to determine which rows should be deleted. Specifically, keyed by the header of the column to check under, and whose value is a function to be applied to the value of the cell under that column, which returns a bool. If the function returns False (or a False-like value) when applied to the cell’s value, that row will be marked for deletion.

  • bool_oper – When using more than one select conditions (i.e. more than one key in the dict), use this to determine whether to apply OR, AND, or XOR to the resulting rows to be selected. Pass one of the following: 'AND', 'OR', 'XOR'. (Defaults to 'AND'.)

  • protected_rows – (Optional) A list-like object containing the rows that should never be deleted. If not specified here, will pull from what is set in .protected_rows. (See comments above regarding .protected_rows and .last_protected_rows.)

Returns

None

find_match_col(header_row, match_col_name)

Find the match column number, based on its header name.

Note

Will return the first match, so avoid duplicate header names.

mandate_loaded()

Raise an error if the .wb is not currently loaded.

modifiable_rows(protected_rows=None) list

Get a list of row numbers that currently exist in the spreadsheet (indexed to 1), and which are NOT in protected_rows.

Parameters

protected_rows – (Optional) A collection of row numbers (indexed to 1) that should never be deleted. If not specified here, will pull from what is set in .protected_rows.

rename_ws(new_name) None

Rename this worksheet.

Parameters

new_name – The new name for this sheet.

Returns

None