
Identify low-quality open-ended responses using the OpenAI API
Source:R/06_02_data_quality_lowqual_gpt.R
lowqual_gpt.RdThis function flags low-quality open-ended survey responses using the OpenAI API. Responses are evaluated for signs of gibberish, nonsense, random text, irrelevant words, or other indicators of poor data quality.
Value
A data frame with an additional variable named x_lowqual containing the classification results.
Details
Requires an OpenAI API key, which can be generated at https://platform.openai.com/, to be set in your R session using Sys.setenv(OPENAI_API_KEY="...").
The model classifies each response as low-quality (1) or valid (0).
A response is flagged as low-quality if it is:
Gibberish or random characters
Off-topic or meaningless
Contains only emojis or irrelevant text
A response is considered valid if it is interpretable, relevant, and meaningful.
The function appends a new column to the dataset with the results.