Finance
This tutorial section shows previews of SageMaker JumpStart example notebooks that demonstrate how to use the SageMaker JumpStart Industry Python SDK, how to run processing jobs for loading finance documents, parsing texts, computing scores based on NLP score types and corresponding word lists, and creating a multimodal (TabText) dataset. Using the processed and enhanced multimodal dataset, you’ll learn how to fine-tune pretrained BERT language models and deploy them to make predictions.
Note
The SageMaker JumpStart Industry example notebooks are hosted and runnable only through SageMaker Studio. Log in to the SageMaker console, and launch SageMaker Studio. For instructions on how to access the notebooks, see SageMaker JumpStart and SageMaker JumpStart Industry in the Amazon SageMaker Developer Guide.
Important
The example notebooks are for demonstrative purposes only. The notebooks are not financial advice and should not be relied on as financial or investment advice.
- Simple Construction of a Multimodal Dataset from SEC Filings and NLP Scores
- Machine Learning on a TabText Dataframe
- An Example Based on the Paycheck Protection Program
- Objective
- SageMaker Studio Kernel Setup
- Load Data, SDK, and Dependencies
- Step 1: Read in the Tickers
- Step 2: Read in the SEC Forms Filed by These Companies
- Step 3: Collect Stock Prices and Convert to Returns
- Step 4: Merge Text and Tabular Datasets
- Step 5: Machine Learning Analysis
- Step 6: Machine Learning on the TabText Dataframe
- Clean Up
- Further Supports
- Reference
- Licence
- Classify SEC 10K/Q Filings to Industry Codes Based on the MDNA Text Column
- Introduction
- SageMaker Studio Kernel Setup
- Load Data, SDK, and Dependencies
- Step 1: Prepare a Dataset
- Step 2: Add NLP scores to the MD&A Text Features
- Step 3: Train the AutoGluon Model for Classification on the TabText Data Consists of the MD&A Texts, Industry Codes, and the NLP scores
- Summary
- Clean Up
- Further Supports
- Reference
- Licence
- Dashboarding SEC Text for Financial NLP
- Financial NLP
- SageMaker Studio Kernel Setup
- Load SDK and Helper Scripts
- Load the functions for extracting the “Item” sections from the forms
- Download the filings you wish to work with
- Copy the file into Studio from the s3 bucket
- Create the dataframe for the extracted item sections from the 10-K filings
- Similarly, we can create the dataframe for the extracted item sections from the 10-Q filings
- Create the dataframe for the extracted item sections from the 8-K filings
- Summary table of section counts
- NLP scoring of the 10-K forms for specific sections
- Stock Screener based on NLP scores
- Add a column with summaries of the text being scored
- Create an interactive dashboard
- Visualizing the text through the NLP scores
- Further support
- References
- Licences