Preprocessor

Implements modular components for dataset preprocessing: a data-trimmer, a standardizer, a feature selector and a sliding window data generator.

Contents

License

The MIT License (MIT)

Copyright (c) 2020 Harvey Bastidas

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contributors

Changelog

Version 0.1

  • Feature A added
  • FIX: nasty bug #1729 fixed
  • add your changes here!

preprocessor

preprocessor package

Subpackages
preprocessor.data_trimmer package
Submodules
preprocessor.data_trimmer.data_trimmer module

This File contains the DataTrimmer class. To run this script uncomment or add the following lines in the [options.entry_points] section in setup.cfg:

console_scripts =
data-trimmer = data_trimmer.__main__:main

Then run python setup.py install which will install the command data-trimmer inside your current environment.

class preprocessor.data_trimmer.data_trimmer.DataTrimmer(conf)[source]

Bases: preprocessor.preprocessor.Preprocessor

The Data Trimmer preprocessor class

core()[source]
Core preprocessor task after starting the instance with the main method.
Decide from the arguments, what trimming method to call.

Args: args (obj): command line parameters as objects

load_from_config()[source]
parse_args(args)[source]

Parse command line parameters

Parameters:args ([str]) – command line parameters as list of strings
Returns:command line parameters namespace
Return type:argparse.Namespace
store()[source]

Save preprocessed data and the configuration of the preprocessor.

trim_auto()[source]

Trims all the constant columns and trims all rows with consecutive zeroes from start and end of the input dataset

Returns: rows_t, cols_t (int,int): number of rows and columns trimmed

trim_columns()[source]

Trims all the constant columns from the input dataset

Returns:number of rows and columns trimmed
Return type:rows_t, cols_t (int,int)
trim_fixed_rows(from_start, from_end)[source]

Trims a configurable number of rows from the start or end of the input dataset

Parameters:
  • from_start (int) – number of rows to remove from start (ignored if auto_trim)
  • from_end (int) – number of rows to remove from end (ignored if auto_trim)
Returns:

number of rows and columns trimmed

Return type:

rows_t, cols_t (int,int)

preprocessor.data_trimmer.data_trimmer.run(args)[source]

Entry point for console_scripts

Module contents
preprocessor.feature_selector package
Submodules
preprocessor.feature_selector.feature_selector module

This File contains the FeatureSelector class. To run this script uncomment or add the following lines in the [options.entry_points] section in setup.cfg:

console_scripts =
feature_selector = feature_selector.__main__:main

Then run python setup.py install which will install the command feature_selector inside your current environment.

class preprocessor.feature_selector.feature_selector.FeatureSelector(conf)[source]

Bases: preprocessor.preprocessor.Preprocessor

The FeatureSelector preprocessor class

core()[source]
Core preprocessor task after starting the instance with the main method.
Decide from the arguments, what method to call.

Args: args (obj): command line parameters as objects

feature_selection()[source]

Process the dataset.

load_from_config()[source]

Process the dataset from a config file.

parse_args(args)[source]

Parse command line parameters

Parameters:args ([str]) – command line parameters as list of strings
Returns:command line parameters namespace
Return type:argparse.Namespace
store()[source]

Save preprocessed data and the configuration of the preprocessor.

preprocessor.feature_selector.feature_selector.run(args)[source]

Entry point for console_scripts

preprocessor.feature_selector.feature_selector.score_func_classification(X, Y)[source]

Used to score the features for feature selection, for regression. To be used in the fFeatureSeclector.feature_selection() method.

preprocessor.feature_selector.feature_selector.score_func_regression(X, Y)[source]

Used to score the features for feature selection, for regression. To be used in the fFeatureSeclector.feature_selection() method.

Module contents
preprocessor.sliding_window package
Submodules
preprocessor.sliding_window.sliding_window module

This File contains the SlidingWindow class. To run this script uncomment or add the following lines in the [options.entry_points] section in setup.cfg:

console_scripts =
sliding_window = sliding_window.__main__:main

Then run python setup.py install which will install the command sliding_window inside your current environment.

class preprocessor.sliding_window.sliding_window.SlidingWindow(conf)[source]

Bases: preprocessor.preprocessor.Preprocessor

The SlidingWindow preprocessor class

core()[source]
Core preprocessor task after starting the instance with the main method.
Decide from the arguments, what method to call.

Args: args (obj): command line parameters as objects

parse_args(args)[source]

Parse command line parameters additional to the preprocessor class ones

Parameters:args ([str]) – command line parameters as list of strings
Returns:command line parameters namespace
Return type:argparse.Namespace
sl_window()[source]

Perform sliding window technique on the input the dataset.

store()[source]

Save preprocessed data and the configuration of the preprocessor.

preprocessor.sliding_window.sliding_window.run(args)[source]

Entry point for console_scripts

Module contents
preprocessor.standardizer package
Submodules
preprocessor.standardizer.standardizer module

This File contains the Standardizer class. To run this script uncomment or add the following lines in the [options.entry_points] section in setup.cfg:

console_scripts =
standardizer = standardizer.__main__:main

Then run python setup.py install which will install the command standardizer inside your current environment.

class preprocessor.standardizer.standardizer.Standardizer(conf)[source]

Bases: preprocessor.preprocessor.Preprocessor

The Standardizer preprocessor class

core()[source]
Core preprocessor task after starting the instance with the main method.
Decide from the arguments, what method to call.

Args: args (obj): command line parameters as objects

load_from_config()[source]

Standardize the dataset from a config file.

parse_args(args)[source]

Parse command line parameters

Parameters:args ([str]) – command line parameters as list of strings
Returns:command line parameters namespace
Return type:argparse.Namespace
standardize()[source]

Standardize the dataset.

store()[source]

Save preprocessed data and the configuration of the preprocessor.

preprocessor.standardizer.standardizer.run(args)[source]

Entry point for console_scripts

Module contents
Submodules
preprocessor.conftest module
preprocessor.preprocessor module

This File contains the Preprocessor class, it is the base class for DataTrimmer, FeatureSelector, Standardizer and SlidingWindow classes.

class preprocessor.preprocessor.Preprocessor(conf)[source]

Bases: preprocessor.preprocessor_base.PreprocessorBase

Base class for DataTrimmer, FeatureSelector, Standardizer, SlidingWindow.

assign_arguments(pargs)[source]
core()[source]

Core preprocessor task after starting the instance with the main method. To be overriden by child classes depending on their preprocessor task.

main(args)[source]
Starts an instance. Main entry point allowing external calls.
Starts logging, parse command line arguments and start core.

Args: args ([str]): command line parameter list

parse_args(args)[source]

Parse command line parameters, to be overriden by child classes depending on their command line parameters if they are console scripts.

Args: args ([str]): command line parameters as list of strings

Returns: argparse.Namespace: command line parameters namespace

parse_cmd(parser)[source]
store()[source]

Save preprocessed data and the configuration of the preprocessor.

preprocessor.preprocessor_base module

This File contains the Preprocessor class, it is the base class for DataTrimmer, FeatureSelector, Standardizer, SlidingWindow.

class preprocessor.preprocessor_base.PreprocessorBase(conf)[source]

Bases: object

Base class for Preprocessor.

input_file = None

Path of the input dataset

load_ds()[source]

Save preprocessed data and the configuration of the preprocessor.

output_file = None

Path of the output dataset

setup_logging(loglevel)[source]

Setup basic logging.

Args: loglevel (int): minimum loglevel for emitting messages

Module contents

Indices and tables