Project Overview
My project allows you to scrape Home Depot's website for product information and interact with a chatbot via Slack to answer questions about those products. The main files are scraping.py
and server.py
. The requirements.txt
file lists necessary packages and ProductList.csv
contains a list of nearly 700 products that have been scraped already. That list makes for a sufficient demonstration of how the chatbot works.
Running the bot
NOTE: Please contact me if you would like to test out the bot. I would be happy to add you to the Slack workspace and give you access to the proper files. You may also peruse my presentation above for more details and an example of it in action.
Files
scraping.py
The Home Depot sitemap is constructed in multiple layers. There is a parent link of all products and two subcategories. In other words, this is how you get to a product page:
PIP XML link (above) —> XML link —> XML link —> Product page
In the file, I iterate through each product page to get the product information, if available. For simplicity, only the most important deatails are scraped, including the product's name, price, basic information, rating, and number of reviews. The information is formatted and saved into ProductList.csv
server.py
This is where the magic happens! Using Langchain, the CSV data is loaded into the chatbot based on OpenAI's GPT 3.5-turbo model. The file also includes necessary functions to integrate with Slack, enabling localhost port 3000 to accept and reply to Slack requests, generating relevant output based on the data and the user's question. The bot remembers previous messages sent and received in the same session.
Future Work and Improvements
Web Scraping
- Scrape more comprehensive product data, and speed up the scraping if possible.
-
Scan product pages to enumerate all possible XPaths or
class
names of features like price, reviews, etc. Some products have the details in different locations, and I may have missed some.
Server and Security
- Create a permanent server to run the bot instead of a temporary one, used for testing.
- If used externally, put all sensitive data and keys into a
.env
file.
Slack Bot
- Optimize and tune the LLM for best results (e.g., adjust temperature), and test with different base models from OpenAI and others.
- Optimize the way the product data is processed into the model by adjusting the size of the text splitting, etc.
- Add additional features for the bot in the workspace, such as enabling it to work in DMs, threads, etc.
- Store product information in a database, and maintain a history of chat messages that persists longer than the temporary server's session. Currently, the data and memory are transient.
References
I used the Langchain and Slack documentation for this project, in addition to adapting limited portions of code from Stack Overflow, Github, and freeCodeCamp.