Feature columns

The most important part of easyrec is feature columns, which basically determines whether you can train models within your customized dataset.

In easyrec, the type of feature columns can be concluded into 2 groups, i.e., by data type or by usage.

Data type-aware feature columns

One hot

One hot feature columns indicate a list of categorical feature columns, and a data sample can belong to one and only one of the categories.

"""
Example Args in Functions:
    one_hot_feature_columns: List[CategoricalColumn] encodes one hot feature fields, such as sex_id.
"""
import tensorflow as tf

categorical_column_with_identity = tf.feature_column.categorical_column_with_identity
categorical_column_with_vocabulary_list = tf.feature_column.categorical_column_with_vocabulary_list
one_hot_feature_columns = [
    categorical_column_with_identity(key='user_id', num_buckets=df['user_id'].max() + 1, default_value=0),
    categorical_column_with_vocabulary_list(
        key='sex_id', vocabulary_list=set(df['sex_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='age_id', vocabulary_list=set(df['age_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='occupation_id', vocabulary_list=set(df['occupation_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='zip_code_id', vocabulary_list=set(df['zip_code_id'].values), num_oov_buckets=1),
    categorical_column_with_identity(key='item_id', num_buckets=df['item_id'].max() + 1, default_value=0),
]

Multi hot

Multi hot feature columns indicate a list of categorical feature columns, and a data sample can belong to one or more than one of the categories.

"""
Example Args in Function:
    multi_hot_feature_columns: List[CategoricalColumn] encodes multi hot feature fields, such as
        historical_item_ids.
"""
import tensorflow as tf

categorical_column_with_vocabulary_list = tf.feature_column.categorical_column_with_vocabulary_list
multi_hot_feature_columns = [
    categorical_column_with_vocabulary_list(
        key='genre_ids', vocabulary_list=get_vocabulary_list_from_ragged_list_series(item_df['genre_ids']),
        num_oov_buckets=1
    )
]

Dense

Dense feature columns indicate a list of numerical feature columns.

"""
Example Args in Function:
    dense_feature_columns: List[NumericalColumn] encodes numerical feature fields, such as age.
"""
import tensorflow as tf

dense_feature_columns = [
    tf.feature_column.numeric_column(key='age')
]

Usage-aware feature columns

These feature columns indicate a list of feature columns that can be directly feed into model.

"""
Example Args:
    user_feature_columns: List[FeatureColumn] to directly feed into tf.keras.layers.DenseFeatures, which
        basically contains user feature fields.
    item_feature_columns: List[FeatureColumn] to directly feed into tf.keras.layers.DenseFeatures, which
        basically contains item feature fields.
    feature columns: List[FeatureColumn] to directly feed into tf.keras.layers.DenseFeatures, which basically
        contains all feature fields.
"""
import tensorflow as tf

categorical_column_with_identity = tf.feature_column.categorical_column_with_identity
categorical_column_with_vocabulary_list = tf.feature_column.categorical_column_with_vocabulary_list
indicator_column = tf.feature_column.indicator_column
user_feature_columns = [
    categorical_column_with_identity(key='user_id', num_buckets=df['user_id'].max() + 1, default_value=0),
    categorical_column_with_vocabulary_list(
        key='sex_id', vocabulary_list=set(df['sex_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='age_id', vocabulary_list=set(df['age_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='occupation_id', vocabulary_list=set(df['occupation_id'].values), num_oov_buckets=1),
    categorical_column_with_vocabulary_list(
        key='zip_code_id', vocabulary_list=set(df['zip_code_id'].values), num_oov_buckets=1),
]