Welcome to the SageMaker JumpStart Industry Documentation!
The SageMaker JumpStart Industry Python SDK is a client library of Amazon SageMaker JumpStart. The library provides tools for feature engineering, training, and deploying industry-focused machine learning models on SageMaker JumpStart. With this industry-focused SDK, you can curate text datasets, and train and deploy language models.
This is the documentation for the sagemaker-jumpstart-industry-pack library.
Installing the SageMaker JumpStart Industry Python SDK
The SageMaker JumpStart Industry Python SDK is released to PyPI and can be installed with pip as follows:
pip install smjsindustry
You can also install from source by cloning this repository and running a pip install command in the root directory of the repository:
git clone https://github.com/aws/sagemaker-jumpstart-industry-python-sdk.git cd sagemaker-jumpstart-industry-python-sdk pip install .
Supported Operating Systems
The SageMaker JumpStart Industry Python SDK supports Unix/Linux and Mac.
Supported Python Versions
The SageMaker JumpStart Industry Python SDK is tested on:
The SageMaker JumpStart Industry Python SDK runs on Amazon SageMaker. As a managed service, Amazon SageMaker performs operations on your behalf on the AWS hardware that is managed by Amazon SageMaker. Amazon SageMaker can perform only operations that the user permits. You can read more about which permissions are necessary in the Amazon SageMaker Documentation.
The SageMaker JumpStart Industry Python SDK should not require any additional permissions aside from what is required for using SageMaker.
However, if you are using an IAM role with a path in it, you should grant permission for
The SageMaker JumpStart Industry Python SDK is licensed under the Apache 2.0 License. It is copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at Apache License.
The SageMaker JumpStart Industry solutions, notebooks, demos, and examples are for demonstrative purposes only. It is not financial advice and should not be relied on as financial or investment advice.
The SageMaker JumpStart Industry solutions, notebooks, demos, and examples use data obtained from the SEC EDGAR database. You are responsible for complying with EDGAR’s access terms and conditions located in the Accessing EDGAR Data page.
The SageMaker JumpStart Industry SDK has unit tests and integration tests.
You can install the libraries needed to run the tests by running
pip install --upgrade .[test] or, for Zsh users:
pip install --upgrade .[test]
We use tox to run Unit tests. Tox is an automated test tool that helps you run unit tests easily on multiple Python versions, and also checks the code sytle meets our standards. We run tox with all of our supported Python versions(Python 3.6, Python 3.7, Python 3.8). In order to run unit tests with the same configuration as we do, you need to have interpreters for those Python versions installed.
To run the unit tests with tox, run:
To run the integration tests, you need to first prepare an AWS account with certain configurations:
AWS account credentials are available in the environment for the boto3 client to use.
The AWS account has an IAM role named
SageMakerRole. It should have the AmazonSageMakerFullAccess policy attached as well as a policy with the necessary permissions to use Elastic Inference.
We recommend selectively running just those integration tests you would like to run. You can filter by individual test function names with:
tox -- -k 'test_function_i_care_about'
You can also run all of the integration tests by running the following command, which runs them in sequence, which may take a while:
tox -- tests/integ
Building Sphinx Docs Locally
Install the dev version of the library:
pip install -e .\[all\]
Install Sphinx and the dependencies listed in
pip install sphinx pip install -r sagemaker-jumpstart-industry-python-sdk/docs/requirements.txt
cd into the
sagemaker-jumpstart-industry-python-sdk/docs directory and run:
make html && open build/html/index.html
- Financial DataLoader and Parser Module APIs
- Text Summarizer Module APIs
- NLP Scorer Module APIs
- TabText Processing Module APIs
- Utils Module APIs