Files
strategy-lab/to_explore/pyquantnews/112_LlamaIndexFinancialStatement.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

14 KiB

No description has been provided for this image
In [1]:
from llama_index.llms.openai import OpenAI
In [2]:
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
)
In [3]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from dotenv import load_dotenv
In [4]:
load_dotenv()
Out[4]:
True

Configure the language model and load the document

First, we configure the language model with specific parameters and load the document.

In [6]:
llm = OpenAI(temperature=0, model_name="gpt-4o")
In [7]:
doc = SimpleDirectoryReader(input_files=["nvda.pdf"]).load_data()
print(f"Loaded NVDA 10-K with {len(doc)} pages")
Loaded NVDA 10-K with 80 pages

We set the language model to use the GPT-4 model with a temperature of 0 for deterministic responses. The model is configured to use an unlimited number of tokens. We then load the NVDA 10-K document from a PDF file and print the number of pages loaded.

Create an index to enable querying of the document

Next, we create an index from the loaded document to facilitate efficient querying.

In [8]:
index = VectorStoreIndex.from_documents(doc)
In [9]:
engine = index.as_query_engine(similarity_top_k=3)

We create a VectorStoreIndex from the loaded document, which enables us to perform similarity searches. We then set up a query engine with a similarity search parameter to return the top 3 most relevant results for each query.

Query specific financial information from the document

Now, we can use the query engine to extract specific financial information from the document.

In [10]:
response = await engine.aquery("What is the revenue of NVDIA in the last period reported? Answer in millions with page reference. Include the period.")
print(response)
The revenue of NVIDIA in the last period reported was $30,040 million for the three months ended July 28, 2024, as shown on page 3 of the document.
In [11]:
response = await engine.aquery("What is the beginning and end date of NVIDA's fiscal period?")
print(response)
NVIDIA's fiscal period begins on the last Sunday in January.

We use the query engine to asynchronously ask questions about NVIDIA's financial report. The first query asks for the revenue in the last reported period, including the page reference. The second query asks for the beginning and end dates of NVIDIA's fiscal period. The responses are printed to the console.

Set up a tool for sub-question querying

We will now set up a tool to handle more complex queries by breaking them down into sub-questions.

In [12]:
query_engine_tool = [
    QueryEngineTool(
        query_engine=engine,
        metadata=ToolMetadata(name='nvda_10k', description='Provides information about NVDA financials for year 2024')
    )
]
In [13]:
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tool)

We create a list of QueryEngineTool objects with metadata describing the tool's function. We then initialize a SubQuestionQueryEngine with the list of tools. This engine can break down complex queries into smaller, more manageable sub-questions.

Perform complex queries on customer segments and risks

Finally, we perform more complex queries on the document to extract detailed information about customer segments and business risks.

In [14]:
response = await s_engine.aquery("Compare and contrast the customer segments and geographies that grew the fastest")
print(response)
Generated 2 sub questions.
[nvda_10k] Q: What are the customer segments that grew the fastest in terms of revenue in 2024?
[nvda_10k] Q: Which geographies showed the highest growth in revenue for NVDA in 2024?
[nvda_10k] A: Networking revenue grew the fastest in terms of revenue in 2024.
[nvda_10k] A: Data Center revenue showed the highest growth in revenue for NVDA in 2024.
Networking revenue grew the fastest in terms of revenue in 2024, while Data Center revenue showed the highest growth in revenue for NVDA in the same year.
In [15]:
response = await s_engine.aquery("What risks to NVDIA's business are highlighted in the document?")
print(response)
Generated 1 sub questions.
[nvda_10k] Q: What risks are highlighted in NVDA's 10-K document for the year 2024?
[nvda_10k] A: Long manufacturing lead times, uncertain supply and component availability, failure to estimate customer demand accurately, mismatches between supply and demand, product shortages, excess inventory, and the impact of changes in product development cycles, competing technologies, business and economic conditions, government lockdowns, technology advancements, and other factors on revenue and supply levels.
The risks highlighted in the document for NVIDIA's business include long manufacturing lead times, uncertain supply and component availability, failure to estimate customer demand accurately, mismatches between supply and demand, product shortages, excess inventory, and the impact of changes in product development cycles, competing technologies, business and economic conditions, government lockdowns, technology advancements, and other factors on revenue and supply levels.
In [16]:
response = await s_engine.aquery("How does NVDIA see the risks highlighted in the document impacting financial performance?")
print(response)
Generated 2 sub questions.
[nvda_10k] Q: What are the key risks highlighted in the NVDA 10K document?
[nvda_10k] Q: How does NVDA plan to mitigate the risks mentioned in the 10K document?
[nvda_10k] A: The key risks highlighted in the NVDA 10K document include potential challenges related to manufacturing lead times, uncertain supply and component availability, inaccurate estimation of customer demand leading to mismatches between supply and demand, product shortages or excess inventory, inability to secure sufficient commitments for capacity, impeded ability to sell products if necessary components are unavailable, extended lead times on orders, increased product costs due to securing future supply, and the impact of various factors on underestimating or overestimating customer demand.
[nvda_10k] A: NVDA plans to mitigate the risks mentioned in the 10K document by increasing supply and capacity purchases with existing and new suppliers to support their demand projections. Additionally, they aim to accurately estimate customer demand to avoid mismatches between supply and demand, which have previously harmed their financial results. They may enter into long-term supply agreements and capacity commitments to address their business needs and secure sufficient commitments for capacity. Furthermore, they acknowledge the potential impact of factors such as changes in product development cycles, competitor actions, economic conditions, and technology advancements on their revenue and strive to manage these uncertainties effectively.
NVDIA sees the risks highlighted in the document impacting financial performance through potential challenges related to manufacturing lead times, uncertain supply and component availability, inaccurate estimation of customer demand leading to mismatches between supply and demand, product shortages or excess inventory, inability to secure sufficient commitments for capacity, impeded ability to sell products if necessary components are unavailable, extended lead times on orders, increased product costs due to securing future supply, and the impact of various factors on underestimating or overestimating customer demand. These risks could lead to financial implications such as decreased revenue, increased costs, reduced profitability, and potential negative effects on overall financial results.

We use the sub-question query engine to ask complex questions about NVIDIA's customer segments and geographies and the business risks highlighted in the document. The engine breaks these questions into smaller sub-questions, processes them, and compiles the responses. Each response is then printed to the console.

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.

In [ ]: