Managing feature groups
Create a feature group to preserve a set of columns of a data asset along with associated metadata for use with Machine Learning models.
Requirements and restrictions
You can view a feature group for assets under the following circumstances.
- Required service
-
Watson Studio (for projects)
- Required permissions
-
To view this page, you can have any role in a project.
-
To edit or update information on this page, you must have the Editor or Admin role in the project.
- Workspaces
-
You can view the asset feature group in these workspaces:
- Projects
- Types of assets
-
These types of assets can have a feature group:
- Tabular: CSV, TSV, Parquet, xls, xslx, avro, text, json files
- Connected data types that are structured and supported in Watson Studio.
- Data size
-
No limit
Feature groups
Create a feature group to preserve a set of columns of a particular data asset along with the metadata used for Machine Learning. For example, if you have a set of features for a credit approval model, you can preserve the features used to train the model, as well as some metadata, including which column is used as the prediction target, and which columns are used for bias detection. Feature groups make it simple to preserve the metadata for the features used to train a machine learning model so other data scientists can use the same features. You can see the feature group tab when you preview a particular asset.
- Creating a feature group
- Editing a feature group
- Removing features or a feature group
- Using the Python API for feature groups
Creating a feature group in a project
Before you begin
If you create a profile for the data asset before creating a feature group you can select profile metadata to add values to the feature.
Create a feature group
You can select particular columns of data assets to form a feature group.
-
In the project Assets tab, click the name of the relevant asset to open the preview and select the Feature group tab. Here you can create a feature group or view and edit an existing one. An asset can have only one feature group. Click New feature group.
-
Select the columns that you want to be used in the feature group. Select the Name checkbox to include all the columns as features.
Editing a feature group
When you have selected the columns of the data asset to be used in the feature group, you can then view each feature and edit it to specify the role it will have in Machine Learning models.
-
Click a feature name and click Edit this feature. A window opens displaying the following tabs:
-
Details - provide the following information about the feature.
Select a Role to be assigned to the feature:
Input
: the feature can be used as input for training a Machine Learning model.Target
: the feature to be used as the prediction target when the data is used to train a Machine Learning model.Identifier
: the primary key, such as customer ID, used to identify the input data.
Enter a Description, Recipe (any method or formula used to create values for the feature) and any Tags.
-
Value descriptions
Value descriptions allow you to clarify the meaning of specific values. For example, consider a column "credit evaluation" with the values -1, 0 and 1. You can use value descriptions to provide meaning for these values. For example, -1 might mean "evaluation rejected". You can enter descriptions for particular values. For numerical values, you can also specify a range. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets, and click Add. For example, to describe all age values between 18 and 24 as "millenials", enter [18,24] as the value and millenials as the description. If you have a profile defined, the profile values are displayed in the value descriptions list. From here you can select one value or multiple values.
-
Fairness information
You can define
Monitor
orReference
groups of values for monitoring bias. The values that are more at risk of biased outcomes can be placed in the Monitor group. These values are then compared to values in the Reference group. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets. For example, to monitor all age values between 18 and 35, enter [18,35]. Then select Monitor or Reference and click Add. You can also specify Favorable outcomes. See Fairness in AutoAI experiments for more information about fairness.
-
-
When you have edited the feature, click Save. You can now see your changes in the Feature Details window. Close this window to return to the feature group.
Removing features from a group
To remove a feature from a group:
-
Preview the asset in the project and select the Feature group tab.
-
In the Features table that is displayed, select the feature (or features) that you want to remove.
-
In the toolbar that appears, select Remove from group.
The feature, or feature group if you selected all the features, is removed.
Using the Python API to create and use feature groups
You can also use the assetframe-lib Python library in notebooks to create and edit feature groups. This library also allows you use feature metadata like fairness information when creating machine learning models.
- Creating and using feature store data. See the FeatureGroup-Project sample. Select the relevant version and
Projects
subfolder.
Parent topic: Getting and preparing data in a project