TabText Processing Module APIs

class smjsindustry.build_tabText(tabular_df: DataFrame, tabular_key: str, tabular_date_column: str, text_df: DataFrame, text_key: str, text_date_column: str, how: str = 'inner', freq: str = 'Q')

Bases:

Builds a TabText dataframe by joining the columns in the tabular and text dataframes.

It joins a tabular dataframe and a text dataframe to create a TabText dataframe. Each row of the two dataframes must be uniquely defined by a composite key consisting of a key and a date column. After the date columns are normalized according to the given frequency, the two dataframes can be merged using the key column and the normalized date column.

Parameters
  • tabular_df (pandas.DataFrame) – The tabular dataframe to be joined, requiring a date column.

  • tabular_key (str) – The tabular dataframe’s key column to be joined on.

  • tabular_date_column (str) – The tabular dataframe’s date column to be joined on, in a format of "yyyy-mm-dd", "yyyy-mm", or "yyyy".

  • text_df (pandas.DataFrame) – The text dataframe to be joined, requiring a date column.

  • text_key (str) – The text dataframe’s key column to be joined on.

  • text_date_column (str) – The text dataframe’s date column to be joined on, in a format of "yyyy-mm-dd", "yyyy-mm", or "yyyy".

  • how (str) – The type of join to be performed; possible values: {'left', 'right', 'outer', 'inner'} (default: 'inner').

  • freq (str) – Specify how the date field should be joined, by year, quarter, month, week or day. Possible values: {'Y', 'Q', 'M', 'W', 'D'} (default: 'Q').

Returns

The joined dataframe object.

Return type

pandas.DataFrame