Compare commits

...

4 Commits

Author SHA1 Message Date
Jerry Liu 70a1ff765b cr 2025-02-07 12:28:22 -08:00
Jerry Liu 43602c88ba cr 2025-02-06 21:16:29 -08:00
Jerry Liu 1222bc107f cr 2025-02-06 21:14:38 -08:00
Jerry Liu cd071a5659 cr 2025-02-06 21:12:31 -08:00
2 changed files with 633 additions and 0 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB

@@ -0,0 +1,633 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "97c79c38-38a3-40f3-ba2e-250649347d63",
"metadata": {},
"source": [
"# Multimodal Parsing with Gemini 2.0 Flash\n",
"\n",
"<a href=\"https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/multimodal/gemini2_flash.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Gemini 2.0 Flash.\n",
"\n",
"LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.\n"
]
},
{
"cell_type": "markdown",
"id": "15e60ecf-519c-41fc-911b-765adaf8bad4",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Download the data - we'll use a technical datasheet for a programmable logic device (Xilinx's XC9500 In-System Programmable CPLD)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91a9e532-1454-40e0-bbf0-fd442c350121",
"metadata": {},
"outputs": [],
"source": [
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0d9fb0aa-74cd-476f-8161-efd9e04248bf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2025-02-06 20:24:19-- https://media.digikey.com/pdf/Data%20Sheets/AMD/XC9500_CPLD_Family.pdf\n",
"Resolving media.digikey.com (media.digikey.com)... 23.37.18.160\n",
"Connecting to media.digikey.com (media.digikey.com)|23.37.18.160|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 201899 (197K) [application/pdf]\n",
"Saving to: data/XC9500_CPLD_Family.pdf\n",
"\n",
"data/XC9500_CPLD_Fa 100%[===================>] 197.17K --.-KB/s in 0.03s \n",
"\n",
"2025-02-06 20:24:19 (7.67 MB/s) - data/XC9500_CPLD_Family.pdf saved [201899/201899]\n",
"\n"
]
}
],
"source": [
"!wget \"https://media.digikey.com/pdf/Data%20Sheets/AMD/XC9500_CPLD_Family.pdf\" -O data/XC9500_CPLD_Family.pdf"
]
},
{
"cell_type": "markdown",
"id": "4e29a9d7-5bd9-4fb8-8ec1-4c128a748662",
"metadata": {},
"source": [
"## Initialize LlamaParse\n",
"\n",
"Initialize LlamaParse in multimodal mode, and specify the vendor as `gemini-2.0-flash-001`.\n",
"\n",
"**NOTE**: Current pricing is 2 credits for a 1 page ($0.006 USD / page). This includes core model, infra, and algorithm costs to fully process the page. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc921729-3446-42ca-8e1b-a6fd26195ed9",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.schema import TextNode\n",
"from typing import List\n",
"import json\n",
"\n",
"\n",
"def get_text_nodes(json_list: List[dict]):\n",
" text_nodes = []\n",
" for idx, page in enumerate(json_list):\n",
" text_node = TextNode(text=page[\"md\"], metadata={\"page\": page[\"page\"]})\n",
" text_nodes.append(text_node)\n",
" return text_nodes\n",
"\n",
"\n",
"def save_jsonl(data_list, filename):\n",
" \"\"\"Save a list of dictionaries as JSON Lines.\"\"\"\n",
" with open(filename, \"w\") as file:\n",
" for item in data_list:\n",
" json.dump(item, file)\n",
" file.write(\"\\n\")\n",
"\n",
"\n",
"def load_jsonl(filename):\n",
" \"\"\"Load a list of dictionaries from JSON Lines.\"\"\"\n",
" data_list = []\n",
" with open(filename, \"r\") as file:\n",
" for line in file:\n",
" data_list.append(json.loads(line))\n",
" return data_list"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2e9d9cf-8189-4fcb-b34f-cde6cc0b59c8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Started parsing the file under job_id 51538aa0-13e6-4429-a458-a492ba7eec04\n"
]
}
],
"source": [
"from llama_parse import LlamaParse\n",
"\n",
"parsing_instruction = \"\"\"\n",
"You are given a technical datasheet of an electronic component.\n",
"For any graphs, try to create a 2D table of relevant values, along with a description of the graph.\n",
"For any schematic diagrams, MAKE SURE to describe a list of all components and their connections to each other.\n",
"Make sure that you always parse out the text with the correct reading order.\n",
"\"\"\"\n",
"\n",
"parser = LlamaParse(\n",
" result_type=\"markdown\",\n",
" use_vendor_multimodal_model=True,\n",
" vendor_multimodal_model_name=\"gemini-2.0-flash-001\",\n",
" invalidate_cache=True,\n",
" parsing_instruction=parsing_instruction,\n",
")\n",
"json_objs = parser.get_json_result(\"./data/XC9500_CPLD_Family.pdf\")\n",
"json_list = json_objs[0][\"pages\"]\n",
"docs = get_text_nodes(json_list)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96a81df0-1026-4e30-a930-f677dc31e344",
"metadata": {},
"outputs": [],
"source": [
"# Optional: Save\n",
"save_jsonl([d.dict() for d in docs], \"docs_gemini_2.0_flash.jsonl\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee2e6920-8893-4b39-ae12-94d13c651406",
"metadata": {},
"outputs": [],
"source": [
"# Optional: Load\n",
"from llama_index.core import Document\n",
"\n",
"docs_dicts = load_jsonl(\"docs_gemini_2.0_flash.jsonl\")\n",
"docs = [Document.parse_obj(d) for d in docs_dicts]"
]
},
{
"cell_type": "markdown",
"id": "4f3c51b0-7878-48d7-9bc3-02b516500128",
"metadata": {},
"source": [
"### Setup GPT-4o baseline\n",
"\n",
"For comparison, we will also parse the document using GPT-4o ($0.03 per page)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fc3f258-50ae-4988-b904-c105463a498f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Started parsing the file under job_id 23c6627c-2e3d-46c9-88a0-7945d7e65d96\n"
]
}
],
"source": [
"from llama_parse import LlamaParse\n",
"\n",
"parser_gpt4o = LlamaParse(\n",
" result_type=\"markdown\",\n",
" use_vendor_multimodal_model=True,\n",
" vendor_multimodal_model=\"openai-gpt4o\",\n",
" invalidate_cache=True,\n",
" parsing_instruction=parsing_instruction,\n",
")\n",
"json_objs_gpt4o = parser_gpt4o.get_json_result(\"./data/XC9500_CPLD_Family.pdf\")\n",
"json_list_gpt4o = json_objs_gpt4o[0][\"pages\"]\n",
"docs_gpt4o = get_text_nodes(json_list_gpt4o)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a47f04e-12e1-4c80-a71d-ef7721f96401",
"metadata": {},
"outputs": [],
"source": [
"# Optional: Save\n",
"save_jsonl([d.dict() for d in docs_gpt4o], \"docs_gpt4o.jsonl\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c38b5ca3-fa87-434b-b477-bf6a4962eb3d",
"metadata": {},
"outputs": [],
"source": [
"# Optional: Load\n",
"from llama_index.core import Document\n",
"\n",
"docs_gpt4o_dicts = load_jsonl(\"docs_gpt4o.jsonl\")\n",
"docs_gpt4o = [Document.parse_obj(d) for d in docs_gpt4o_dicts]"
]
},
{
"cell_type": "markdown",
"id": "44c20f7a-2901-4dd0-b635-a4b33c5664c1",
"metadata": {},
"source": [
"## View Results\n",
"\n",
"Let's visualize the results between GPT-4o and Gemini Flash 2.0 along with the original document page."
]
},
{
"cell_type": "markdown",
"id": "bf314141-9f6d-4453-beb9-0106cdf196bf",
"metadata": {},
"source": [
"Check out an example page 2 below."
]
},
{
"cell_type": "markdown",
"id": "c70d420d-1778-4b0d-81e2-db09276e90cf",
"metadata": {},
"source": [
"![xc9500_img](XC9500_CPLD_Family_p3.png)"
]
},
{
"cell_type": "markdown",
"id": "0950ecad-248c-4c3c-98b9-ab1a9dabd5b4",
"metadata": {},
"source": [
"We see that the parsed text is fairly similar between Gemini 2.0 Flash and GPT-4o. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "778698aa-da7e-4081-b3b5-0372f228536f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page: 3\n",
"\n",
"The image shows the architecture of the XC9500 In-System Programmable CPLD Family, which is marked as obsolete. Here's a breakdown of the components and their connections:\n",
"\n",
"### Components and Connections:\n",
"\n",
"1. **JTAG Port:**\n",
" - Connects to the JTAG Controller.\n",
"\n",
"2. **JTAG Controller:**\n",
" - Interfaces with the In-System Programming Controller.\n",
" - Connects to the I/O Blocks.\n",
"\n",
"3. **In-System Programming Controller:**\n",
" - Interfaces with the JTAG Controller and the Fast CONNECT Switch Matrix.\n",
"\n",
"4. **I/O Blocks:**\n",
" - Multiple I/O lines connect to the Fast CONNECT Switch Matrix.\n",
" - Includes special I/O lines for GCK, GSR, and GTS.\n",
"\n",
"5. **Fast CONNECT Switch Matrix:**\n",
" - Connects to the I/O Blocks and Function Blocks.\n",
" - Provides 36 inputs and 18 outputs to each Function Block.\n",
"\n",
"6. **Function Blocks (FB):**\n",
" - Each block contains 18 macrocells.\n",
" - Outputs from the Function Blocks drive the I/O Blocks directly.\n",
" - Multiple Function Blocks (1 to N) are shown, each with 18 macrocells.\n",
"\n",
"### Function Block Details:\n",
"\n",
"- Each Function Block consists of 18 independent macrocells.\n",
"- Capable of implementing combinatorial or registered functions.\n",
"- Receives global clock, output enable, and set/reset signals.\n",
"- Generates 18 outputs for the Fast CONNECT switch matrix.\n",
"- Logic is implemented using a sum-of-products representation.\n",
"- 36 inputs provide 72 true and complement signals to form 90 product terms.\n",
"- Product terms can be allocated to each macrocell by the product term allocator.\n",
"- Supports local feedback paths for fast counters and state machines.\n",
"\n",
"This architecture is designed for flexibility in implementing complex logic functions within a programmable logic device.\n"
]
}
],
"source": [
"# using Gemini 2.0 Flash\n",
"print(docs[2].get_content(metadata_mode=\"all\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1511a30f-3efc-4142-9668-7dc056a24d0c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page: 3\n",
"\n",
"The diagram illustrates the architecture of the XC9500 In-System Programmable CPLD Family. Here's a breakdown of the components and their connections:\n",
"\n",
"1. **JTAG Port**: \n",
" - Connects to the JTAG Controller.\n",
"\n",
"2. **JTAG Controller**: \n",
" - Interfaces with the In-System Programming Controller.\n",
"\n",
"3. **In-System Programming Controller**: \n",
" - Manages programming of the device.\n",
"\n",
"4. **I/O Blocks**: \n",
" - Connect to external I/O pins.\n",
" - Interface with the Fast CONNECT Switch Matrix.\n",
"\n",
"5. **Fast CONNECT Switch Matrix**: \n",
" - Connects I/O Blocks to Function Blocks.\n",
" - Provides 36 inputs and 18 outputs to each Function Block.\n",
"\n",
"6. **Function Blocks (FB)**: \n",
" - Each block contains 18 macrocells.\n",
" - Capable of implementing combinatorial or registered functions.\n",
" - Receives global clock, output enable, and set/reset signals.\n",
" - Outputs drive the Fast CONNECT Switch Matrix.\n",
" - Supports local feedback paths for fast counters and state machines.\n",
"\n",
"7. **I/O/GCK, I/O/GSR, I/O/GTS**: \n",
" - Special I/O pins for global clock, set/reset, and output enable signals.\n",
"\n",
"The architecture is designed for flexibility and high-speed operation, with each Function Block capable of handling complex logic functions.\n"
]
}
],
"source": [
"# using GPT-4o\n",
"print(docs_gpt4o[2].get_content(metadata_mode=\"all\"))"
]
},
{
"cell_type": "markdown",
"id": "705f7729-fa0f-4ca0-8562-c42afeaa8532",
"metadata": {},
"source": [
"## Setup RAG Pipeline\n",
"\n",
"Let's setup a RAG pipeline over this data.\n",
"\n",
"(we also use gpt4o-mini for the actual text synthesis step)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a53ee5d-cc63-421b-8896-588c83edfcf0",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import Settings\n",
"from llama_index.llms.openai import OpenAI\n",
"from llama_index.embeddings.openai import OpenAIEmbedding\n",
"\n",
"Settings.llm = OpenAI(model=\"o3-mini\")\n",
"Settings.embed_model = OpenAIEmbedding(model=\"text-embedding-3-large\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60972d7a-7948-4ad7-89df-57004acee917",
"metadata": {},
"outputs": [],
"source": [
"# from llama_index.core import SummaryIndex\n",
"from llama_index.core import VectorStoreIndex\n",
"from llama_index.llms.openai import OpenAI\n",
"\n",
"index = VectorStoreIndex(docs)\n",
"query_engine = index.as_query_engine(similarity_top_k=5)\n",
"\n",
"index_gpt4o = VectorStoreIndex(docs_gpt4o)\n",
"query_engine_gpt4o = index_gpt4o.as_query_engine(similarity_top_k=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7df7bcb-1df4-4a01-88fc-2d596b1cc74d",
"metadata": {},
"outputs": [],
"source": [
"query = \"Give me the full output slew-Rate curve for (a) Rising and (b) Falling Outputs\"\n",
"\n",
"response = query_engine.query(query)\n",
"response_gpt4o = query_engine_gpt4o.query(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7070a31-3bb8-4134-8338-20bc2fd6f3d6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The full output slew-rate curve for (a) Rising and (b) Falling Outputs is represented in a graph where the output voltage starts at 1.5V and reaches the desired output level over a time period defined as T<sub>SLEW</sub>. The curve illustrates the gradual increase in voltage for rising outputs and the gradual decrease for falling outputs, effectively showing how the output edge rates can be controlled to reduce system noise.\n"
]
}
],
"source": [
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bee8167-f021-4c87-8d28-9f40a4f7b69d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# XC9500 In-System Programmable CPLD Family\n",
"\n",
"Each output has independent slew rate control. Output edge rates may be slowed down to reduce system noise (with an additional time delay of T<sub>SLEW</sub>) through programming. See Figure 11.\n",
"\n",
"Each IOB provides user programmable ground pin capability. This allows device I/O pins to be configured as additional ground pins. By tying strategically located programmable ground pins to the external ground connection, system noise generated from large numbers of simultaneous switching outputs may be reduced.\n",
"\n",
"A control pull-up resistor (typically 10K ohms) is attached to each device I/O pin to prevent them from floating when the device is not in normal user operation. This resistor is active during device programming mode and system power-up. It is also activated for an erased device. The resistor is deactivated during normal operation.\n",
"\n",
"The output driver is capable of supplying 24 mA output drive. All output drivers in the device may be configured for either 5V TTL levels or 3.3V levels by connecting the device output voltage supply (V<sub>CCIO</sub>) to a 5V or 3.3V voltage supply. Figure 12 shows how the XC9500 device can be used in 5V only and mixed 3.3V/5V systems.\n",
"\n",
"## Pin-Locking Capability\n",
"\n",
"The capability to lock the user defined pin assignments during design changes depends on the ability of the architecture to adapt to unexpected changes. The XC9500 devices have architectural features that enhance the ability to accept design changes while maintaining the same pinout.\n",
"\n",
"The XC9500 architecture provides maximum routing within the Fast CONNECT switch matrix, and incorporates a flexible Function Block that allows block-wide allocation of available product terms. This provides a high level of confidence of maintaining both input and output pin assignments for unexpected design changes.\n",
"\n",
"For extensive design changes requiring higher logic capacity than is available in the initially chosen device, the new design may be able to fit into a larger pin-compatible device using the same pin assignments. The same board may be used with a higher density device without the expense of board rework.\n",
"\n",
"!Output slew-Rate for (a) Rising and (b) Falling Outputs\n",
"\n",
"**Figure 11:** Output slew-Rate for (a) Rising and (b) Falling Outputs\n",
"\n",
"| Output Voltage | Time |\n",
"|----------------|------|\n",
"| 1.5V | 0 |\n",
"| T<sub>SLEW</sub> | |\n",
"\n",
"**Figure 12:** XC9500 Devices in (a) 5V Systems and (b) Mixed 5V/3.3V Systems\n",
"\n",
"| 5V CMOS or 5V TTL | 3.3V |\n",
"|-------------------|------|\n",
"| 5V | 0V |\n",
"| 3.6V | 0V |\n",
"| 3.3V | 0V |\n",
"\n",
"- **(a) 5V System:**\n",
" - V<sub>CCINT</sub> V<sub>CCIO</sub>\n",
" - XC9500 CPLD\n",
" - IN OUT\n",
" - GND\n",
"\n",
"- **(b) Mixed 5V/3.3V System:**\n",
" - V<sub>CCINT</sub> V<sub>CCIO</sub>\n",
" - XC9500 CPLD\n",
" - IN OUT\n",
" - GND\n",
"\n",
"www.xilinx.com\n",
"\n",
"DS063 (v6.0) May 17, 2013 \n",
"Product Specification\n"
]
}
],
"source": [
"print(response.source_nodes[0].get_content())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f9fef7f-510b-46a5-8716-f5616f542035",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The output slew-rate curve for (a) Rising and (b) Falling Outputs is represented in a timing diagram where the output voltage transitions from a low state to a high state and vice versa. \n",
"\n",
"For the rising output, the curve starts at 1.5V and transitions to the desired output voltage level over a time period defined as T<sub>SLEW</sub>. \n",
"\n",
"For the falling output, the curve similarly begins at the high output voltage and decreases to a low state, also taking the time defined as T<sub>SLEW</sub> to complete the transition.\n",
"\n",
"The specific values and graphical representation would typically be illustrated in a figure, but the key takeaway is that the output slew rate can be controlled to manage system noise by programming the desired T<sub>SLEW</sub> time.\n"
]
}
],
"source": [
"print(response_gpt4o)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d40f9dd4-2dd4-4fa5-b636-1f901dc1601b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# XC9500 In-System Programmable CPLD Family\n",
"\n",
"Each output has independent slew rate control. Output edge rates may be slowed down to reduce system noise (with an additional time delay of T<sub>SLEW</sub>) through programming. See Figure 11.\n",
"\n",
"Each IOB provides user programmable ground pin capability. This allows device I/O pins to be configured as additional ground pins. By tying strategically located programmable ground pins to the external ground connection, system noise generated from large numbers of simultaneous switching outputs may be reduced.\n",
"\n",
"A control pull-up resistor (typically 10K ohms) is attached to each device I/O pin to prevent them from floating when the device is not in normal user operation. This resistor is active during device programming mode and system power-up. It is also activated for an erased device. The resistor is deactivated during normal operation.\n",
"\n",
"The output driver is capable of supplying 24 mA output drive. All output drivers in the device may be configured for either 5V TTL levels or 3.3V levels by connecting the device output voltage supply (V<sub>CCIO</sub>) to a 5V or 3.3V voltage supply. Figure 12 shows how the XC9500 device can be used in 5V only and mixed 3.3V/5V systems.\n",
"\n",
"## Pin-Locking Capability\n",
"\n",
"The capability to lock the user defined pin assignments during design changes depends on the ability of the architecture to adapt to unexpected changes. The XC9500 devices have architectural features that enhance the ability to accept design changes while maintaining the same pinout.\n",
"\n",
"The XC9500 architecture provides maximum routing within the Fast CONNECT switch matrix, and incorporates a flexible Function Block that allows block-wide allocation of available product terms. This provides a high level of confidence of maintaining both input and output pin assignments for unexpected design changes.\n",
"\n",
"For extensive design changes requiring higher logic capacity than is available in the initially chosen device, the new design may be able to fit into a larger pin-compatible device using the same pin assignments. The same board may be used with a higher density device without the expense of board rework.\n",
"\n",
"!Output slew-Rate for (a) Rising and (b) Falling Outputs\n",
"\n",
"**Figure 11:** Output slew-Rate for (a) Rising and (b) Falling Outputs\n",
"\n",
"| Output Voltage | Time |\n",
"|----------------|------|\n",
"| 1.5V | 0 |\n",
"| T<sub>SLEW</sub> | |\n",
"\n",
"**Figure 12:** XC9500 Devices in (a) 5V Systems and (b) Mixed 5V/3.3V Systems\n",
"\n",
"| 5V CMOS or 5V TTL | 3.3V |\n",
"|-------------------|------|\n",
"| 5V | 0V |\n",
"| 3.6V | 0V |\n",
"| 3.3V | 0V |\n",
"\n",
"- **XC9500 CPLD** \n",
" - **IN** \n",
" - **OUT** \n",
" - **GND** \n",
"\n",
"www.xilinx.com \n",
"DS063 (v6.0) May 17, 2013 \n",
"Product Specification\n"
]
}
],
"source": [
"print(response_gpt4o.source_nodes[0].get_content())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama_parse",
"language": "python",
"name": "llama_parse"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}