Financial DataLoader and Parser Module APIs
- class smjsindustry.finance.DataLoader(role: str, instance_count: int, instance_type: str, volume_size_in_gb: int = 30, volume_kms_key: Optional[str] = None, output_kms_key: Optional[str] = None, max_runtime_in_seconds: Optional[int] = None, sagemaker_session: Optional[Session] = None, tags: Optional[List[Dict[str, str]]] = None, network_config: Optional[NetworkConfig] = None)
Bases:
FinanceProcessorInitializes a DataLoader instance to load a dataset.
For the general processing job configuration parameters of this class, see the parameters in the
FinanceProcessorclass.The following
loadclass method withEDGARDataSetConfigdownloads SEC XML filings from the SEC EDGAR database and parses the downloaded XML filings to plain text files.- load(dataset_config: EDGARDataSetConfig, s3_output_path: str, output_file_name: str, wait: bool = True, logs: bool = True)
Runs a processing job to load dataset from SEC EDGAR database.
- Parameters
dataset_config (
EDGARDataSetConfig) – The config for the DataLoader.s3_output_path (str) – An S3 prefix in the format of
's3://<output bucket name>/output/path'.output_file_name (str) – The output file name. The full path is
's3://<output bucket name>/output/path/output_file_name'.wait (bool) – Whether the call should wait until the job completes (default:
True).logs (bool) – Whether to show the logs produced by the job (default:
True).
- Raises
ValueError – if
logsis True butwaitis False.
- class smjsindustry.finance.EDGARDataSetConfig(tickers_or_ciks: Optional[List[str]] = None, form_types: Optional[List[str]] = None, filing_date_start: Optional[str] = None, filing_date_end: Optional[str] = None, email_as_user_agent: Optional[str] = None)
Bases:
FinanceProcessorConfigConfig class for loading SEC filings from SEC EDGAR.
It specifies the details of SEC filings required by the DataLoader.
- Parameters
tickers_or_ciks (List[str]) – A list of stock tickers or CIKs. For example,
['amzn']form_types (List[str]) – A list of SEC form types. The supported form types are
10-K,10-Q,8-K,497,497K,S-3ASR,N-1A,485BXT,485BPOS,485APOS,S-3,S-3/A,DEF 14A,SC 13D, andSC 13D/A. For example,['10-K'].filing_date_start (str) – The starting filing date in the format of
'YYYY-MM-DD'. For example,'2021-01-01'.filing_date_end (str) – The ending filing date in the format of
'YYYY-MM-DD'. For example,'2021-12-31'.email_as_user_agent (str) – The user email used as a
user_agentfor SEC EDGAR HTTP requests. For example,"gecko_demo_user@amazon.com".
- get_config()
Returns config to be passed to a SageMaker JumpStart Industry DataLoader instance.
- property tickers_or_ciks
Gets the string of the tickers_or_ciks parameter.
- property form_types
Gets the string of the
form_typesparameter.
- property filing_date_start
Gets the string of the
filing_date_startparameter.
- property filing_date_end
Gets the string of the
filing_date_endparameter.
- property email_as_user_agent
Gets the string of the
email_as_user_agentparameter.
- class smjsindustry.finance.SECXMLFilingParser(role: str, instance_count: int, instance_type: str, volume_size_in_gb: int = 30, volume_kms_key: Optional[str] = None, output_kms_key: Optional[str] = None, max_runtime_in_seconds: Optional[int] = None, sagemaker_session: Optional[Session] = None, tags: Optional[List[Dict[str, str]]] = None, network_config: Optional[NetworkConfig] = None)
Bases:
FinanceProcessorInitializes a SECXMLFilingParser instance that parses SEC XML filings.
For the general processing job configuration parameters of this class, see the parameters in the
FinanceProcessorclass.The following
parseclass method parses user-downloaded SEC XML filings to plain text files.- parse(input_data_path: str, s3_output_path: str, wait: bool = True, logs: bool = True)
Runs a processing job to parse SEC XML filings.
- Parameters
input_data_path (str) – The input file path pointing to directory containing the SEC XML filings to be parsed. It can be a local folder or an S3 path.
s3_output_path (str) – An S3 prefix in the format of
's3://<output bucket name>/output/path'.wait (bool) – Whether the call should wait until the job completes (default:
True).logs (bool) – Whether to show the logs produced by the job (default:
True).
- Raises
ValueError – if
logsis True butwaitis False.