Creating a support chat bot for my business

Question

I am trying to create a kind of support bot to answer questions from my clients about specific technical details about WordPress plugins that I sell.

The goal is that the /completions API would be fed a prompt, which could be something general like a CSS styling change, which the davinci engine knows without any specific data about my business, but the customer might ask something specific for which I have about a data set of 3000 questions and answers (prompts/completions? input/output?) on which the bot can feed on exactly like this awesome example here.

I am a web developer, and I don't have experience with AI. I am just scratching the surface, trying to put this bot together with learning concepts like machine learning, training data, validation sets, plotting, and neural networks. So bear with me, because it's a lot to grasp.

So first of all, I did a lot of documenting, and getting an API key from OpenAI is certainly the first step.

Then I told ChatGPT my story and what I tried to achieve. I asked him to write in PHP, preferably, but it always ends up hallucinating, so I could not really use anything he generated without adjusting. And the further I asked him about specifics, the further he hallucinated.

So I read a lot of documentation and extrapolated with what I got from ChatGPT. I think there are 3 ways to achieve this:

A fine-tuned model;
Uploading a training set and a validation set;
Embeddings API ( which the example that I linked uses )

By understanding that most examples are in Python, I started the GPT-3 Fine Tuning: Key Concepts and Use Cases tutorial, then prepared fine-tunes.prepare DATA_UNDER_COMMENT into the json by line to be categorised in prompts and completions.

Then, I used openai api fine_tunes.create -t to create my fine-tune, and now I have my fine-tune created, and I run:

openai.Completion.create(
    model=FINE_TUNED_MODEL,
    prompt=YOUR_PROMPT)

This looked like the way to go, but even if you put a basic question that was actually in the JSONL, it's like the engine forgot to talk and outputs random characters.

So I tried another approach from the cookbook, which seems pretty great following this which seems exactly like what I want to achieve.

The GPT models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.

I tried to use the code there, but with my CSV hosted online, I got a 406 response when trying to load.

Then I stored the CSV locally, and it complained that a column ( tokens ) was not available for converting to an int( 10 ). Then, from what I could understand, I switched from using load_embeddings to compute_doc_embeddings because it says from the documentation that they already have the embedding generated for that CSV

I did that, but now it asks for a JSON instead of a CSV.

Of course, I am able to provide my data in any format, so when I tried to load my data, it said that the token limit of 8000 is exceeded for this request.

I now try to input a small JSON here, under a comment, and try to run a prompt. And, kind of amazing, after hours of work, it seems to work. I provide a question from the data, but under a different structure, and it replies to me correctly, using a different wording than the one from the JSON data.

He could not have known this from his general knowledge.

So this is what I want to achieve, but my data set is much larger.

I need help to understand if my approach is correct. And if Embedding is the way to go, how do I feed data into OpenAI and reference the embeddings set when doing API calls to completions? Ideally, I would have those embeddings stored somehow with the possibility of adding to them. Just like I have fine-tune sets or files under my API account.

kokumajutsu · Answer 1 · 2023-03-08T14:23:45.757

First and foremost, do not use GPT/OpenAI for customer-facing applications. You end up with a mess. GPT is great for creative work, but not for production. GPT is a probabilistic language model, and every answer that he gives is always different. So, you will never know what kind of response he will give. Because of this, your customers might become frustrated. A retrieval-based chatbot is more what you want.

Another thing is that you should know what word embeddings are. To train a model with word embeddings, you have to know how to train the model. Even if you have the embeddings, you don't know if they generalize well to your case. A few samples could work, as you mentioned that they did, but I'm really not sure if passing prompts is the way to go.

It can be much simpler to make your own. Also, I found out that GPT can be detrimental to one's productivity. Sometimes a thing that you could do in no time will take longer if you read the documentation. Also, you have the problem of availability: GPT shuts down for scaling, and then your customers don't have your product.

So a bit of information that you could use without the overkill:

Chatbots (or conversational agents) can be of three types:

Rule-based;
Corpus-based: these can be retrieval-based or generative-based;
Hybrid (some combination of the above)

Chatbots can also be classified into two types: conversational chatbots, such as (roughly) ChatGPT and XiaoIce, and task-based bots, such as Siri and Alexa. Given what you said, you want a task-based chatbot for Q&A. If you don't have a background in AI in general, machine learning in particular, or even worse, deep learning, I would suggest you use RASA Core and take a peak at the dialogue-state architecture in Dan Jurafsky and James H. Martin's [draft] Speech and Language Processing (3rd ed.). Or if you aren't that into this, use AIML or Chatscript.

You can start with a rule-based approach and then go for the retrieval-based approach. Generative-based is still not good for customers because the answers could be nonsensical.

ChatGPT is biased, and that's self-evident given that he suggested GPT, and you always have to take with a grain of salt what he says. To fine-tune a model, you need a corpus of thousands upon thousands (for a pre-trained BERT fine-tuned model, about 1 million words) of Q&A adjacent pairs (a question and an answer), and you have to make sure that they aren't noisy (too many words, typos, bad punctuation, ambiguity, etc.) to train that model. Not only that, but you have to annotate the adjacency pairs (see the SQuAD dataset, for instance). Also, you need a decent (and expensive) GPU. So, for your scenario, it isn't that feasible.

Always use stuff off the shelf. Take a look on GitHub for task-based, rule-based, and/or retrieval-based chatbots.

Also, if you are willing, you can use Dialogflow.

Lastly, the one thing that you really need to know is Natural Language Processing, both Natural Language Understanding (NLU) and Natural Language Generation (NLG): tokenization, lemmatization, Bag-Of-Words, Part-Of-Speech (POS) tagging, N-grams, Named Entity Recognition, etc., at least for NLU.

Creating a support chat bot for my business

1 Answers1