Preprocessing Module (API Reference)¶
This is main class that is used for whole processing logic for sklearn.
@author: Guangqiang.lu
- class automl.preprocessing.FeatureSelect(simple_select=True, tree_select=False)¶
Bases:
automl.preprocessing.Process- fit(data, y=None)¶
Also support with algorithm based feature selection :param data: data to process :param label: label data if need with algorithm trained based :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- class automl.preprocessing.Imputation(use_al_to_im=False, threshold=0.5)¶
Bases:
automl.preprocessing.Process- fit(data, y=None)¶
Imputation logic happens here. I have to add a logic to process with different type of columns, like for numeric data could use distance based, for categorical, we could use most frequent items.
- Parameters
data – contain missing field data
use_al_to_im – Whether or not to use algorithm to impute data
- Returns
fitted estimator
- fit_transform(data, y=None)¶
This have to overwrite parent func, as we need different processing logic :param data: :return:
- static get_col_data_type(data)¶
To get each column data type with some missing values :param data: :return:
- class automl.preprocessing.MinMax¶
Bases:
automl.preprocessing.Process- fit(x, y=None)¶
For whole processing logic should provide with label, so that even we don’t use it, we could just follow sklearn logic. :param data: :param label: :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- class automl.preprocessing.Normalize¶
Bases:
automl.preprocessing.Process- fit(x, y=None)¶
For whole processing logic should provide with label, so that even we don’t use it, we could just follow sklearn logic. :param data: :param label: :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- class automl.preprocessing.OnehotEncoding(keep_origin_feature=False, except_feature_indexes=None, except_feature_names_list=None, drop_ratio=0.2, max_categoric_number=30)¶
Bases:
automl.preprocessing.ProcessIn case there will be numpy array or pandas DataFrame data type, so that we could try to use both of them types. :param keep_origin_feature:
when to transform, to keep original feature or not.
- Parameters
except_feature_indexes – array column indexes
except_feature_names_list – some features doesn’t need to convert even they could.
drop_ratio – if there are too many categorical features, so we should make a threshould that if there are categorical feature over drop_ratio, then we should just drop this feature.
max_categoric_number – in case there are too many categorical values, even with the drop_ratio, we still get too many features, this is not we want.
- fit(x, y=None)¶
To fit the onehot model. :param x: data should be DataFrame only! As if we have array that contains string,
then we couldn’t get the type of each column.
- Returns
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- transform(x, y=None)¶
Should try to keep the data type with same type like array or pandas, so even if we fit with pandas, but we still could use array data type to do transform. Originally I want to keep the data type also into dataframe if provided dataframe, but that’s useless, so there just to make it into a array. :param data: data could be DataFrame or array type could be fine. :return: array type
- class automl.preprocessing.PrincipalComponentAnalysis(n_components=None, selection_ratio=0.9, cols_keep_ratio=0.8)¶
Bases:
automl.preprocessing.ProcessPCA for data decomposition with PCA :param n_components: how many new components to keep :param selection_ratio: how much information to keep to get fewer columns
- fit(x, y=None)¶
For whole processing logic should provide with label, so that even we don’t use it, we could just follow sklearn logic. :param data: :param label: :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- transform(data, y=None)¶
Here I want to do feature decomposition based on pca score to reduce to less feature
- Parameters
data –
- Returns
- class automl.preprocessing.Process¶
Bases:
sklearn.base.TransformerMixinThis is whole class that is used for pre-processing logic, just give a direction. Here for init function that we want to get just class name for later process.
- fit(x, y=None)¶
For whole processing logic should provide with label, so that even we don’t use it, we could just follow sklearn logic. :param data: :param label: :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return:
- class automl.preprocessing.Standard¶
Bases:
automl.preprocessing.Process- fit(x, y=None)¶
For whole processing logic should provide with label, so that even we don’t use it, we could just follow sklearn logic. :param data: :param label: :return:
- fit_transform(data, y=None)¶
Make parent logic just like sklearn. :param data: :return: