Financial DataLoader and Parser Module APIs
- class smjsindustry.finance.DataLoader(role: str, instance_count: int, instance_type: str, volume_size_in_gb: int = 30, volume_kms_key: Optional[str] = None, output_kms_key: Optional[str] = None, max_runtime_in_seconds: Optional[int] = None, sagemaker_session: Optional[Session] = None, tags: Optional[List[Dict[str, str]]] = None, network_config: Optional[NetworkConfig] = None)
Bases:
FinanceProcessor
Initializes a DataLoader instance to load a dataset.
For the general processing job configuration parameters of this class, see the parameters in the
FinanceProcessor
class.The following
load
class method withEDGARDataSetConfig
downloads SEC XML filings from the SEC EDGAR database and parses the downloaded XML filings to plain text files.- load(dataset_config: EDGARDataSetConfig, s3_output_path: str, output_file_name: str, wait: bool = True, logs: bool = True)
Runs a processing job to load dataset from SEC EDGAR database.
- Parameters
dataset_config (
EDGARDataSetConfig
) – The config for the DataLoader.s3_output_path (str) – An S3 prefix in the format of
's3://<output bucket name>/output/path'
.output_file_name (str) – The output file name. The full path is
's3://<output bucket name>/output/path/output_file_name'
.wait (bool) – Whether the call should wait until the job completes (default:
True
).logs (bool) – Whether to show the logs produced by the job (default:
True
).
- Raises
ValueError – if
logs
is True butwait
is False.
- class smjsindustry.finance.EDGARDataSetConfig(tickers_or_ciks: Optional[List[str]] = None, form_types: Optional[List[str]] = None, filing_date_start: Optional[str] = None, filing_date_end: Optional[str] = None, email_as_user_agent: Optional[str] = None)
Bases:
FinanceProcessorConfig
Config class for loading SEC filings from SEC EDGAR.
It specifies the details of SEC filings required by the DataLoader.
- Parameters
tickers_or_ciks (List[str]) – A list of stock tickers or CIKs. For example,
['amzn']
form_types (List[str]) – A list of SEC form types. The supported form types are
10-K
,10-Q
,8-K
,497
,497K
,S-3ASR
,N-1A
,485BXT
,485BPOS
,485APOS
,S-3
,S-3/A
,DEF 14A
,SC 13D
, andSC 13D/A
. For example,['10-K']
.filing_date_start (str) – The starting filing date in the format of
'YYYY-MM-DD'
. For example,'2021-01-01'
.filing_date_end (str) – The ending filing date in the format of
'YYYY-MM-DD'
. For example,'2021-12-31'
.email_as_user_agent (str) – The user email used as a
user_agent
for SEC EDGAR HTTP requests. For example,"gecko_demo_user@amazon.com"
.
- get_config()
Returns config to be passed to a SageMaker JumpStart Industry DataLoader instance.
- property tickers_or_ciks
Gets the string of the tickers_or_ciks parameter.
- property form_types
Gets the string of the
form_types
parameter.
- property filing_date_start
Gets the string of the
filing_date_start
parameter.
- property filing_date_end
Gets the string of the
filing_date_end
parameter.
- property email_as_user_agent
Gets the string of the
email_as_user_agent
parameter.
- class smjsindustry.finance.SECXMLFilingParser(role: str, instance_count: int, instance_type: str, volume_size_in_gb: int = 30, volume_kms_key: Optional[str] = None, output_kms_key: Optional[str] = None, max_runtime_in_seconds: Optional[int] = None, sagemaker_session: Optional[Session] = None, tags: Optional[List[Dict[str, str]]] = None, network_config: Optional[NetworkConfig] = None)
Bases:
FinanceProcessor
Initializes a SECXMLFilingParser instance that parses SEC XML filings.
For the general processing job configuration parameters of this class, see the parameters in the
FinanceProcessor
class.The following
parse
class method parses user-downloaded SEC XML filings to plain text files.- parse(input_data_path: str, s3_output_path: str, wait: bool = True, logs: bool = True)
Runs a processing job to parse SEC XML filings.
- Parameters
input_data_path (str) – The input file path pointing to directory containing the SEC XML filings to be parsed. It can be a local folder or an S3 path.
s3_output_path (str) – An S3 prefix in the format of
's3://<output bucket name>/output/path'
.wait (bool) – Whether the call should wait until the job completes (default:
True
).logs (bool) – Whether to show the logs produced by the job (default:
True
).
- Raises
ValueError – if
logs
is True butwait
is False.