Tavily – Introduction to Agentic search tool

An agentic search tool is like the eyes and ears of the Agentic AI. This helps in obtaining real-time data from the web when the LLMs lack the necessary information. LLMs, in most cases, have a knowledge cut off date and lack access to real time information. Agentic searching tools help solve the gap here. When the AI agent determines that it lacks the necessary information, it can request tools like Tavily to search and extract real-time information.

How is agentic search different from regular search?
The traditional search engine result pages (SERP) returns a long list of titles, images, and URLs which are designed for humans to navigate through. However, the AI agents need a structured set of data (the actual content itself) for the LLMs to process and provide the final input. This is time efficient because the LLMs are not doing the extra work of parsing the content from the SERP. Rather with agentic search tools, the LLMs get the direct content from the search tool that can be easily processed but still holding the source of truth to the content that was shared with the LLMs. This ensures trust in the LLM.

How is Tavily search different from SERP?
The challenges with the traditional SERP are –
1) Paid results – This sometimes adds no context to the relevant search. This is because most search engines are designed for commerce and not for informational relevance.
2) Complex search results that take extra efforts for the LLMs to process and filter the unstructured information, which is inefficient, error-prone, and adds latency to the query.

What is the role, and where does the agentic search fit in the RAG pipeline of an AI agent?
We now know that Agentic search fills in the gap when the LLM determines it doesn’t have relevant information. Below are the high-level steps involved in the entire process.
1. Intent recognition – For example, the user submits the following query – ‘What are the latest developments in AI and what are the top 5 AI companies?’.
2. Tool decision – LLM, in this case, recognizes 2 intents – a) It needs the latest information, and b) It has to search the web.
3. Agentic search – This is where Tavily comes into play. LLM generates the structured query for Tavily.
Ex –

{
"tool_name": "tavily_search",
"arguments": {
"query": "latest developments in AI and top companies",
"max_results": 5,
"search_depth": "advanced"
}
}

4. Final response generation by LLM – Tavily search API returns the relevant and highly precise data snippets for the LLM to generate the final answer for the query.

Why is Tavily easy to use?
1. Simple API –
Traditional search for an AI agent is a multi-stage workflow that looks like the one below –
Traditional Search ⟹ SERP API -> Crawl URL -> Scrape HTML -> Clean Content -> LLM Input
Tavily does all the heavy lifting and offers a straightforward API call like the below –
Agentic Search ⟹ Tavily API Call -> LLM-Ready, Clean Context
2. Out of the box integration with Agent frameworks like Langchain, LlamaIndex. And as per the Tavily documentation, REST APIs and simple Python and JavaScript SDKs also make the usage simple and less time consuming.

What is the advantage of using a agentic search tool like Tavily?
There are 2 major advantages –
1. Maximize the token efficiency and content cleaning – Every LLM has an upper limit on the information it can process at a time (context window). With the SERP, we get lot of unwanted data as content that some times exceeds the context window. Tavily API provides a clean content by already filtering unwanted HTML, CSS, and many other forms of irrelevant data. The clean content is the actual input needed by the LLM.
2. Fine grained search parameters – Provides options to fine tune the search with parameters for time range filtering, including/excluding a domain, including/excluding the images, and so on.

What APIs are available from Tavily?
There are 4 major APIs available from Tavily –
1) Search API – Returns the results in a clean format with a snippet and answer (generated by LLM) for the search query. The API provides more control on the search by setting boundaries on the search query with various request parameters. The search query is any data that we have to look at the internet to get the results for.

Request parameters –

ParameterTypeNotesStatus
Authorizationstring (Header)Bearer token containing the API key.MANDATORY
querystring (Body)The search query text.MANDATORY
auto_parametersbooleanDefault: false. Enables automatic parameter configuration.OPTIONAL
topicenum<string>Default: general. Options: general, news, finance.OPTIONAL
search_depthenum<string>Default: basic. Options: basic, advanced.OPTIONAL
chunks_per_sourceintegerDefault: 3. Max content chunks per source (only with advanced search depth).OPTIONAL
max_resultsintegerDefault: 5. Max number of search results (Range 0-20).OPTIONAL
time_rangeenum<string>Filter results by publish date (e.g., day, week, month, year).OPTIONAL
daysintegerDefault: 7. Number of days back to include (only if topic is news).OPTIONAL
start_datestringYYYY-MM-DD format. Returns results after this date.OPTIONAL
end_datestringYYYY-MM-DD format. Returns results before this date.OPTIONAL
include_answerboolean/enum<string>Default: false. Includes LLM-generated answer. Options: true, basic, advanced.OPTIONAL
include_raw_contentboolean/enum<string>Default: false. Includes cleaned HTML content. Options: true, markdown, text.OPTIONAL
include_imagesbooleanDefault: false. Performs an image search.OPTIONAL
include_image_descriptionsbooleanDefault: false. Adds descriptions to images (requires include_images: true).OPTIONAL
include_faviconbooleanDefault: false. Includes the favicon URL for each result.OPTIONAL
include_domainsstring[]List of domains to specifically include.OPTIONAL
exclude_domainsstring[]List of domains to specifically exclude.OPTIONAL
countryenum<string>Boosts results from a specific country (only if topic is general).OPTIONAL

Response parameters –

ParameterTypeDescriptionStatus
querystringThe search query that was executed.MANDATORY
response_timenumberTime taken to complete the request (in seconds).MANDATORY
resultsobject[]A list of sorted search results.MANDATORY
    titlestringThe title of the search result.MANDATORY
    urlstringThe URL of the search result.MANDATORY
    contentstringA short description (snippet) of the search result.MANDATORY
    scorenumberThe relevance score.MANDATORY
    raw_contentstringIncluded only if include_raw_content was requested.OPTIONAL (Conditional)
    faviconstringIncluded only if include_favicon was requested.OPTIONAL (Conditional)
imagesobject[]List of query-related images (can be an empty array []).MANDATORY
    urlstringThe URL of the image.MANDATORY
    descriptionstringIncluded only if include_image_descriptions was requested.OPTIONAL (Conditional)
answerstringAn LLM-generated answer. Included only if include_answer was requested.MANDATORY (Conditional)
auto_parametersobjectA dictionary of parameters automatically selected by Tavily. Included only if auto\_parameters: true was sent in the request.OPTIONAL (Conditional)
request_idstringA unique identifier for the request.OPTIONAL

2) Extract API – Returns the raw content of the entire web page for the provided URL. Multiple URL’s can be passed in the request, and the API returns the raw content of each web page in a clean format.

Request parameters –

ParameterTypeNotesStatus
Authorizationstring (Header)Bearer token containing the API key.MANDATORY
urlsstring or string[] (Body)The URL(s) to extract content from.MANDATORY
include_imagesbooleanDefault: false. Includes a list of image URLs extracted from the page.OPTIONAL
include_faviconbooleanDefault: false. Includes the favicon URL for each result.OPTIONAL
extract_depthenum<string>Default: basic. Options: basic, advanced.OPTIONAL
formatenum<string>Default: markdown. Format for the extracted content. Options: markdown, text.OPTIONAL
timeoutnumberMax time in seconds to wait for extraction (Range 1.0 to 60.0).OPTIONAL

Response parameters –

ParameterTypeDescriptionStatus
resultsobject[]A list of successfully extracted content from the provided URLs.MANDATORY
    urlstringThe URL the content was extracted from.MANDATORY
    raw_contentstringThe full, cleaned content extracted from the page.MANDATORY
    faviconstringThe favicon URL for the result.OPTIONAL (Conditional)
    imagesstring[]A list of image URLs. Included only if include_images: true was requested.OPTIONAL (Conditional)
failed_resultsobject[]A list of URLs that could not be processed.MANDATORY
    urlstringThe URL that failed.MANDATORY
    errorstringAn error message explaining the failure.MANDATORY
response_timenumberTime taken to complete the request (in seconds).MANDATORY
request_idstringA unique identifier for the request.OPTIONAL

3) Crawl API – Based on the provided instruction, the API crawls through the specified domain and fetches the raw content on the successfully crawled URLs.

Request parameters –

ParameterTypeNotesStatus
Authorizationstring (Header)Bearer token containing the API key.MANDATORY
urlstring (Body)The root URL where the crawl begins.MANDATORY
instructionsstringNatural language instructions for the crawler’s goal (increases cost if specified).OPTIONAL
max_depthintegerDefault: 1. Maximum number of link “hops” from the base URL (Range ≥1).OPTIONAL
max_breadthintegerDefault: 20. Maximum number of links to follow per page (Range ≥1).OPTIONAL
limitintegerDefault: 50. Total maximum number of links the crawler will process (Range ≥1).OPTIONAL
select_pathsstring[]Regex patterns to include only URLs matching specific path patterns.OPTIONAL
select_domainsstring[]Regex patterns to limit crawling to specific domains or subdomains.OPTIONAL
exclude_pathsstring[]Regex patterns to exclude URLs matching specific path patterns.OPTIONAL
exclude_domainsstring[]Regex patterns to exclude specific domains or subdomains.OPTIONAL
allow_externalbooleanDefault: true. Whether to include external domain links in the final results.OPTIONAL
include_imagesbooleanDefault: false. Whether to include images in the crawl results.OPTIONAL
extract_depthenum<string>Default: basic. Extraction depth. Options: basic, advanced.OPTIONAL
formatenum<string>Default: markdown. Format for the extracted content. Options: markdown, text.OPTIONAL
include_faviconbooleanDefault: false. Whether to include the favicon URL for each result.OPTIONAL

Response parameters –

ParameterTypeDescriptionStatus
base_urlstringThe root URL that was used for the crawl.MANDATORY
resultsobject[]A list of extracted content from the successfully crawled URLs.MANDATORY
    urlstringThe URL of the crawled page.MANDATORY
    raw_contentstringThe full, cleaned content extracted from the page.MANDATORY
    faviconstringThe favicon URL for the result.OPTIONAL (Conditional)
    imagesstring[]A list of image URLs. Included only if include_images: true was requested.OPTIONAL (Conditional)
failed_resultsobject[]A list of URLs that could not be processed.MANDATORY
    urlstringThe URL that failed.MANDATORY
    errorstringAn error message explaining the failure.MANDATORY
response_timenumberTime taken to complete the request (in seconds).MANDATORY
request_idstringA unique identifier for the request.OPTIONAL

4) Map API – Constructs the list of all the URLs based on the instructions and other parameters that are set with the request. This helps in building a comprehensive site maps.

Request parameters –

ParameterTypeNotesStatus
Authorizationstring (Header)Bearer token containing the API key.MANDATORY
urlstring (Body)The root URL where the mapping begins.MANDATORY
instructionsstringNatural language instructions for the crawler’s goal (increases cost if specified).OPTIONAL
max_depthintegerDefault: 1. Maximum number of link “hops” from the base URL (Range ≥1).OPTIONAL
max_breadthintegerDefault: 20. Maximum number of links to follow per page (Range ≥1).OPTIONAL
limitintegerDefault: 50. Total maximum number of links the crawler will process before stopping (Range ≥1).OPTIONAL
select_pathsstring[]Regex patterns to include only URLs matching specific path patterns.OPTIONAL
select_domainsstring[]Regex patterns to limit mapping to specific domains or subdomains.OPTIONAL
exclude_pathsstring[]Regex patterns to exclude URLs matching specific path patterns.OPTIONAL
exclude_domainsstring[]Regex patterns to exclude specific domains or subdomains.OPTIONAL
allow_externalbooleanDefault: true. Whether to include external domain links in the final results list.OPTIONAL

Response parameters

ParameterTypeDescriptionStatus
base_urlstringThe base URL that was used for the mapping.MANDATORY
resultsstring[]A list of all URLs discovered during the website mapping.MANDATORY
response_timenumberTime taken to complete the request (in seconds).MANDATORY
request_idstringA unique identifier for the request.OPTIONAL

From the enterprise perspective, today enterprises have plenty of support articles in the form of knowledge base, FAQ, People also ask, etc. All these mostly live on a domain that the enterprises manage and control. These documents can be funneled via Tavily API (Search, Extract, and Crawl) to the agent for the final responses. This provides a quicker AI adoption at an enterprise with lower risk of providing wrong information or hallucination.

Utilizing Tavily for the enterprise AI applications to fetch real time information

Sharing a sample project that was developed using lovable.dev to build a search engine like Google using Tavily search API to demonstrate how handy and quick is to use the Tavily API to get the needed search result in a very clean format – https://github.com/shankar2686/safebrowse.

Share your experience utilizing Search APIs for your AI agents.
Happy learning!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.