Sitemap URL Extractor
I process XML sitemaps, extracting URLs into tables.
"The Sitemap URL Extractor is a lightweight XML processor that effortlessly extracts URLs from given sitemaps. No need for external knowledge sources or tools. Simplify URL extraction with ease!"
How to
Files (0)
Comments (0)
Learn how to use Sitemap URL Extractor effectively! Here are a few example prompts, tips, and the documentation of available commands.
GPT Documentation
Introduction
This is a user guide providing an overview of how to interact with a custom GPT. Each of the sections outlined below addresses different aspects of the GPT, from understanding its capabilities and limitations, to getting started with the tool and using its various features effectively.
Overview of Features and Commands
The GPT provides a set of features and commands designed to help users navigate and work with sitemaps. The table below outlines the available commands, descriptions, and additional details for each.
Command | Description |
---|---|
extract_urls | Extracts URLs from the input sitemap, returning a list of all URLs present. |
extract_urls_with_depth | Extracts URLs from the input sitemap, using a specified maximum depth to avoid repeating URLs. |
extract_urls_with_date | Extracts URLs from the input sitemap, only returning those dated within a specified time range. |
extract_urls_with_status | Extracts URLs from the input sitemap, only returning those with a specified status (e.g., indexed, updated, submitted). |
search_urls | Searches through the extracted list of URLs, returning all URLs that match a specified search term. |
search_urls_in_domain | Searches through the extracted list of URLs, only returning those that match a specified domain name. |
Comprehending the GPT
The GPT uses a combination of XML and Python libraries to process sitemaps and extract URLs. It can handle a variety of sitemaps, including those that have modified or older versions. It does not have access to any external knowledge sources, thus it relies solely on the information contained within the sitemap.
Getting Started with the GPT
To begin using the GPT, you will first need to install the required Python libraries, such as xml.etree.ElementTree
and lxml
. You can do this by running the following command in your terminal or command prompt:
pip install xml.etree.ElementTree lxml
Once your system is configured with the necessary libraries, you can import the GPT module and use its commands to extract and search through URLs within a sitemap.
Example Prompts
-
extract_urls
: Extract all URLs from the current sitemap. -
extract_urls_with_date
: Extract all URLs from the current sitemap that were last modified within the last month. -
search_urls_in_domain
: Search for any URLs within the current sitemap that end in.edu
.