When your goal is to get the best possible results from a predictive model, you need to get the most from what you have. This includes getting the best results from the algorithms you are using. It also involves getting the most out of the data for your algorithms to work with.
How do you get the most out of your data for predictive modeling? The features in your data will directly influence the predictive models you use and the results you can achieve.
Feature engineering is manually designing what the input x’s should be — Tomasz Malisiewicz, answer to “What is feature engineering?”
Feature engineering is the art of creating predictor variables. Your model will never be any good if your features, your predictors aren’t any good. Feature Engineering is important because better features result in simpler models that are faster to run, easier to understand and easier to maintain and gives good result. The big idea is how can we take the voluminous raw data and shape it into a reasonable set of variables in an efficient, effective and predictive way.
It’s a process in its own right, you brainstorm features, you decide what features to create, you create the features, you study the impact of the features and the model goodness then you iterate on the features if useful and then you go back to creating more features or brainstorming more features until you feel like you’re done.
The traditional idea of “Transforming Data” from a raw state to a state suitable for modeling is where feature engineering fits in. Transform data and feature engineering may in fact be synonyms. It is an iterative process that interplays with data selection and model evaluation till we get improved model.
Some observations are far too voluminous in their raw state to be modeled by predictive modeling algorithms directly. Feature extraction is a process of automatically reducing the dimensionality of these types of observations into a much smaller set that can be modelled. If our data is in tabular format then the columns or attribute could represent a feature. To be more specific a feature is an attribute that is useful or meaningful to our problem.
Your features come from somewhere, you can take a standard set of variables or pre-existing variables and there’s no question that’s faster than engineering things but thinking about your variables is likely to lead the better models.
Features could be any summary of the data such that it is capturing the relevant information. It involved a lot of thinking about the data and its structure. Deep understanding and knowledge of the domain leads to a better job at feature extraction.
Exploratory analysis of the data, plotting and tabulating data and understanding the patterns and variation could be the starting point.
We can have a large number of features in our data set. However not all of them would be useful or equally important in the context if our problem. Usefulness of features can be estimated by allocating scores and can then be ranked by their scores. High rankng features can be included in the training dataset, while the remaining can be ignored. A feature may be important if it is highly correlated with the dependent variable (the thing being predicted).
Not all features are created equal. Those attributes that are irrelevant to the problem need to be removed. There will be some features that will be more important than others to the model accuracy. There will also be features that will be redundant in the context of other features.
Feature selection addresses these problems by automatically selecting a subset that are most useful to the problem. Feature selection algorithms may use a scoring method to rank and choose features, such as correlation or other feature importance methods. More advanced methods may search subsets of features by trial and error, creating and evaluating models automatically in pursuit of the objectively most predictive sub-group of features. Stepwise regression is an example of an algorithm that automatically performs feature selection as part of the model construction process.
Feature importance and selection can inform you about the objective utility of features, but those features have to come from somewhere. You need to manually create them. This requires spending a lot of time with actual sample data (not aggregates) and thinking about the underlying form of the problem, structures in the data and how best to expose them to predictive modeling algorithms.
With tabular data, it often means a mixture of aggregating or combining features to create new features, and decomposing or splitting features to create new features. With textual data, it often means devising document or context specific indicators relevant to the problem. With image data, it can often mean enormous amounts of time prescribing automatic filters to pick out relevant structures.
This is the part of feature engineering that is often talked the most about as an artform, the part that is attributed the importance and signalled as the differentiator in competitive machine learning.
Does feature engineering over-fit?. If you iterate on your features a lot and you’re using cross-validated goodness the whole time, then the cross-validated goodness will not be the true test of your model. The true test of a model is whether it works on entirely unseen data