TabText Processing Module APIs
- class smjsindustry.build_tabText(tabular_df: DataFrame, tabular_key: str, tabular_date_column: str, text_df: DataFrame, text_key: str, text_date_column: str, how: str = 'inner', freq: str = 'Q')
Bases:
Builds a TabText dataframe by joining the columns in the tabular and text dataframes.
It joins a tabular dataframe and a text dataframe to create a TabText dataframe. Each row of the two dataframes must be uniquely defined by a composite key consisting of a key and a date column. After the date columns are normalized according to the given frequency, the two dataframes can be merged using the key column and the normalized date column.
- Parameters
tabular_df (pandas.DataFrame) – The tabular dataframe to be joined, requiring a date column.
tabular_key (str) – The tabular dataframe’s key column to be joined on.
tabular_date_column (str) – The tabular dataframe’s date column to be joined on, in a format of
"yyyy-mm-dd"
,"yyyy-mm"
, or"yyyy"
.text_df (pandas.DataFrame) – The text dataframe to be joined, requiring a date column.
text_key (str) – The text dataframe’s key column to be joined on.
text_date_column (str) – The text dataframe’s date column to be joined on, in a format of
"yyyy-mm-dd"
,"yyyy-mm"
, or"yyyy"
.how (str) – The type of join to be performed; possible values:
{'left', 'right', 'outer', 'inner'}
(default:'inner'
).freq (str) – Specify how the date field should be joined, by year, quarter, month, week or day. Possible values:
{'Y', 'Q', 'M', 'W', 'D'}
(default:'Q'
).
- Returns
The joined dataframe object.
- Return type
pandas.DataFrame