Update README

2026-07-01 19:55:06 -04:00 · 2025-06-24 18:29:49 -07:00
parent 4b6b892cb6
commit 34fdad43e4
5 changed files with 70 additions and 4 deletions
@@ -1,6 +1,6 @@
-# Multi-Modal Research Agent
+# Multi-Modal Researcher

-This project is a simple research and podcast generation system that uses the unique capabilities of Google's Gemini 2.5 model family. It combines three useful features of the Gemini 2.5 model family. You can pass a research topic and, optionally, a YouTube video URL. The system will then perform research on the topic using search, analyze the video, combine the insights, and generate a report with citations as well as a short podcast on the topic for you. It takes advantage of a few of Gemini's native capabilities:
+This project is a simple research and podcast generation workflow that uses LangGraph with the unique capabilities of Google's Gemini 2.5 model family. It combines three useful features of the Gemini 2.5 model family. You can pass a research topic and, optionally, a YouTube video URL. The system will then perform research on the topic using search, analyze the video, combine the insights, and generate a report with citations as well as a short podcast on the topic for you. It takes advantage of a few of Gemini's native capabilities:

 - 🎥 [Video understanding and native YouTube tool](https://developers.googleblog.com/en/gemini-2-5-video-understanding/): Integrated processing of YouTube videos
 - 🔍 [Google search tool](https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/): Native Google Search tool integration with real-time web results
@@ -0,0 +1,67 @@
+# Research Report: Give me an overview of the idea that LLMs are like a new kind of operating system.
+
+## Executive Summary
+
+The idea of Large Language Models (LLMs) evolving into a new kind of operating system (OS) represents a significant paradigm shift in human-computer interaction, moving beyond traditional software models. Both sources converge on the core analogy of the LLM itself acting as the central processing unit or “kernel,” with its context window serving as the system’s “RAM” or working memory, managing immediate data and reasoning. This conceptualization, dubbed **LLM OS** or **AIOS**, and famously framed by Andrej Karpathy as **“Software 3.0,”** posits that natural language prompts become the primary user interface and even the “programming language,” fundamentally changing how users interact with and develop digital applications.
+
+This new “OS” orchestrates a diverse ecosystem of tools and resources, much like a traditional OS manages peripheral devices and libraries. LLMs can integrate with **Software 1.0** tools (like Python interpreters or browsers), access external data storage (analogous to file systems via databases or embeddings), and even interact with other LLMs, creating complex chains of thought. This centralized control and orchestration capability allows for the automation of complex workflows and the development of **Agent Applications** (AAPs) that can manage various AI agents. The vision extends to multimodal interactions, where voice and vision further enhance the intuitive, context-aware exchanges, making technology significantly more accessible to a broader audience.
+
+Despite its transformative potential, this emerging paradigm faces considerable challenges. High computational power and cost, along with security vulnerabilities, are significant hurdles. User adaptation and trust, particularly for high-stakes decisions, will require time and robust reliability. Karpathy highlights parallels to early computing, where LLM compute is currently expensive and centralized like mainframe time-sharing, with the “graphical user interface” for general LLM interaction yet to be invented. Nevertheless, the opportunities are vast, including the democratization of software development through natural language, the creation of “partial autonomy apps” with user-controlled AI intervention, and a new “human-AI workflow” focused on AI generation and human verification. This shift also necessitates **“building for agents”** by making digital content and tools directly understandable and actionable by LLMs, leading to novel programming approaches like **“vibe coding”** where the underlying code becomes increasingly abstracted.
+
+## Video Source
+
+- **URL:** https://youtu.be/LCEmiRjPEtQ?si=raeMN2Roy5pESNG2
+
+## Additional Sources
+
+1. **arxiv.org**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF9zzJ0-qtRmncHMcxWzG5IymgPbWr-4_NNpXZGCt08Tk9ut2-_mA9yRsfpLVWZ8q-QAI7fyTxnpqQNd6Ml3ozPXiHXVtuSCxVRJbz-5buv2NOLGu9cXbZiRx9Br4wG  
+
+2. **pillar.security**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHJJNAJs4Qt8wjMSM9yVTp72rKDYdsU-0vkB1se56suHdoz-EP114vy9TG6C5C1-0JIRWR227AFl2DMrlYwoKNocwTGztHFjxQi3E9pVInETgmrjuhq3NLKHXKFV8z3Payl10Q9H5C0IxFkzSZav87He-dIYo1p9-qiEyzfmvGP37bO92qqO6zRrD1Ej3yq00cCVee2sEsYNZz6  
+
+3. **datacamp.com**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEjpMvAtwQkFp83-26hz9fwCUUVkDX-BXNNu_VeaJs2lSk50vhkbIBs5D2SpLeZobrWrCJXTkeyHPrUQWahtYmy8xpRWwOKis80-yxeTNXRew6wOp6R4sVJeyEZ8XVNpA==  
+
+4. **terrencekim.net**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFPiLjKPFQJjdUXPBzc_5XH8S3qWBIZZeS_f1RKFoZSK6MDF7rGAeZvsyXN9rGGihgLUUNCudsEXYcxsrcUpCsmxDtgXmdormxWyjvYqopGg6eFyRQ9VvVZNlSPaTZeug3TUNyKRUgHTgZdrobvZTL3L0owUk1ffzDkxmU=\
+
+5. **arxiv.org**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGOQ1EGpQ5ZBn7KXQHipPcdvk0QNRiNW89Czmk4rVCAU12BU3dzKnxxPzRnw_kUwQVRmP3zH4QXKw_wsk0PsCwAK-RKlUImp4tnjb_zwSJoEWMGC4w841PFV4BW  
+
+6. **dqindia.com**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGxtJ5Xmn-R02MVCkcXK9FSRlCeUEMVi48vMy1C-zuRgw1kw7j_THxfmh8uoY_f3lSC7dUFkobb0GuWnMY3FQfmt8-b_s8KbbMQDp8RJJlEHMj0F_qa0cefRp3xBGljxQTTkIVo4CNG9a2EE5Yrzbj7TNHbHMy-jVYGEiVzSYERvTJeztVbZe8mEgxFge6FgiTHXpzcsEOz095mHpjPS0IME85QNMPQvpKUnNKnYg==  
+
+7. **a21.ai**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFOXsWR2jHumJjdkznmlD3vbDuYtvBaOIM9XbbfPl734VsbBs6683hyLO5CZJnRXi2KjVBDC43LxekhlizFtq1DIkhihw_hVxAuZBy_Uo2sDsg=\
+
+8. **fluid.ai**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEltcIwB616fEW_PA0Lt3jOhJ59KFIdbQxootuCjyOetdgup71DlDBfTEOkqryhdK2xN2xiJHEdGVJL7SojEpMMnsXSMzP7sYGU7Sm55tnPzDNaEavUxE4vNwZM6FoWNA_ndQ8OnWxoBJMe8a_TVWYh1o46jlTuN8dD5e3AG9dnJ6ZKLChI4YR0pRhMhIy2  
+
+9. **deeplearning.ai**  
+   https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQE8j6m4IGrPGXWEf5s4dPmDY1qadIJ_4jR15JFTzJOzOCn4Ep3TU1Sof-kAUBOj84AG8OydnI0tPHHsCS3ON2k35_FOhQEjSZy_bKVie_VHXHTLA3Fc-vRZ2M9FqpuQnbK4PzE3bW0vuvDMQMQMphp0OmWz8LVmY_FmqtFZssipS5YF8K_ksPqZP7EGVUO-ZQQeuY6AaFIXCIVaxB1Ud71yOA==  
+
+10. **community.aws**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFuWwhDDxNz0XB1t3saEW2s8KMeThz2_DlX_Vjjt0SvlYuKx5rq-KCiUlcFgS2r2uUO2RRcov0rWdH4DiPW-2UYa734xxQfX_427xh3pJhSzYeE0QCk4-YQDKFSt5TcccIyuParO7Rk2RVtp_Rz0NPIgkATA7i-3YycdO8Y8iWI57HYXEfmGrmKLD2xAWT_M1nd445p0UIjs4KCd3YCEpTl-JSo5R7PtgUvUg==  
+
+11. **github.io**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFlO8Ci6R3-sjgFc_w1mjEjjE1fKPPORefeS_K9B_gmEOvtKDwX1_w5EfOoXKUUpEeqe42Rk6IHI0dKLCbkVKFx0ZKE-RtknPy1rkg9QLVVnq2Di5ZSP41f4zmEwgaIBnPmvhwf3_bbjbm4VSF-AuoMRZw=\
+
+12. **researchgate.net**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGyoGHZGvdDHiJ0z_1NdM95RHiYphJxjvZ0aV5T1XOjKtMLAQ617v8lrr02UigpovvV0-FwTJK7hTifVfkTgSg1v_PpK0Wk86eUzr7tY6FN9WSAXMzyjve7gvRKNtqXLa3vrB1usjAf84efwm7l-m7LLJM4o1MjBmojw-sRphbvU-jjTgDeinKu6iUHmUkTLHOhRPc6StLhIXIX7gSqhL4JhCKPQ2s1BmcpWsyaFbYqIhkcbpBhoHiduaMaQdt5M_9vMee5nMcYAg==  
+
+13. **truefoundry.com**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGP8Ze07lU-fQT_DulsnB0Nx_1k-C07XRByzqH4pokcRug0t_Er2_90dNZVMvB0jk-SFZatgI0sffMggqvjlwA-Jvvf5qeGhL3zvfQVEVuJil0xuY1ERM5rBeJ40X4DSgBfWWtrut5MdONEb3ncflExevqFJH4tZJwJcT3Y9tKhSJT5cLB4uxR30fQ9PxUHh9O35uQDCiTu7I3rUuwcdcrWCsMC2Pnz3p4X  
+
+14. **nutanix.com**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGfiJfyIsRy_ChW0QEEWrG5k9iar-q1kehevhhKqJcG4CbGiJEMOvL2jCuJQh4n79-W-cNKo8FcvEoTru2Q4W8e5dydSSyPkDShGveOgFddSjMmjL_IVbcVOvLR6Mt74qv1plHroRobCAEcQ4KqSKQyhOvkA2tjjgSGmkjAzMxcJQxxZAcj_Vgtgxlk3tFiFcmDO-BrXw8DfQ==  
+
+15. **teneo.ai**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQE7HOToF58gvklolE7jE2N4wGJfpoiDe9c2PBFBFSFDL91xh_vxRAmuWXcRsnwqAUm0nEwYf7SYbIveEjqmJt49_KUpbfWKDW2ZP7tGWsrHlM5O-qVmAsXVQegjR3ZiUm4H2ABrCHNWWHpUxQZldKgEo_yyOla7dXME9Bc=\
+
+16. **huggingface.co**  
+    https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF9ZLTgVilgzWDz2NZ4d4F1jts0g8q7rz7KLvYHlxfCjMq8d6KpH-WWNnhAmrMExKxNnFcZqG6Y0tPboFBgC0XFf9T8OIG18bqCSsKUqGO4-Z0Vk3ITjv8dGiq3wz9OVGHFIpx_9Py_ZFgJ5MfnYq5VGBM==
+
+---
+
+*Report generated using multi-modal AI research combining web search and video analysis*  
@@ -1,5 +1,5 @@
 [project]
-name = "agent"
+name = "mutli-modal-researcher"
 version = "0.0.1"
 description = "Multi-modal researcher with Gemini"
 authors = [
@@ -1,7 +1,6 @@
 from typing_extensions import TypedDict
 from typing import Optional

-
 class ResearchStateInput(TypedDict):
    """State for the research and podcast generation workflow"""
    # Input fields