Skip to content

Data cleaning experiment with OpenAI chat completion API and the Instructor library

Notifications You must be signed in to change notification settings

FlorianLD/ai_data_cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Data cleaning experiment with OpenAI chat completion API and the Instructor library.
The objective was to see how pydantic models can be leveraged to clean data and ensure a structured output consistently.
See notebook here.

The material used is a csv file with unstructured fake customer data. The file consists of a single column where each cell refers to a unique customer.

In each cell, the following information can be found, in a random order:

  • first name
  • last name
  • current age
  • city
  • account type (premium or non-premium)

The data structure is intentionnally diverse for a more challenging test case.

Material

By using models and passing them in the response_model parameter of the function, the following output can be obtained

Final

Model used:

class Premium(Enum):
    """Defines the type of account of the customer, either premium or non-premium"""
    PREMIUM = 1
    NON_PREMIUM = 0

class Customer(BaseModel):
    """Represents a customer, including their first name, last name, age, city, and type of account"""
    first_name: str = Field(..., description="The first name of the customer")
    last_name: str = Field(..., description="The last name of the customer")
    age: int = Field(..., description="The age of the customer")
    city: str = Field(... , description="The city where the customer lives")
    premium: Premium = Field(..., description="Type of account, either premium or non-premium")

class CustomerList(BaseModel):
    """A list of customers"""
    customers: List[Customer] = Field(..., description="A list of customers")

About

Data cleaning experiment with OpenAI chat completion API and the Instructor library

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published