Assume we have a users table containing the columns full_name, last_name, and first_name. In 10% of the cases, the first_name field is NULL, but we observe that the first name is embedded within the full_name field, just without a clear delimiter (e.g., 'DUMBRAVEANURADU'). Or we can make the problem more difficult by considering compound first names, which are also not separated by spaces from the last name (e.g., 'IONESCUANDREIMIHAIL', 'POPESCOANAMARIA', 'VASILESCUIONAELENA', 'TOMAIOANDANIEL', 'MOLDOVANMARIAGABRIELA').
I would like to tell my experience of solving such a problem.
1. Export all rows where first_name is NULL into a data.csv file (PgAdmin > Save results to file).
2. I started Ollama and loaded the LLaMA 3 model locally
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it 57b53e2bc095 ollama pull llama3
3. I Created a Kotlin Project and Asked ChatGPT for Code Fragments
build.gradle.kts
dependencies {
implementation("com.squareup.okhttp3:okhttp:4.12.0")
implementation("com.fasterxml.jackson.module:jackson-module-kotlin:2.17.0")
implementation("com.fasterxml.jackson.core:jackson-databind:2.17.0")
testImplementation(kotlin("test"))
}
4. I Asked ChatGPT How to Construct the Prompt
Main.kt
import okhttp3.MediaType.Companion.toMediaType
import okhttp3.OkHttpClient
import okhttp3.Request
import okhttp3.RequestBody.Companion.toRequestBody
import com.fasterxml.jackson.module.kotlin.jacksonObjectMapper
import com.fasterxml.jackson.module.kotlin.readValue
import java.io.File
import java.io.PrintWriter
import java.util.concurrent.TimeUnit
fun main() {
val client = OkHttpClient.Builder()
.connectTimeout(10, TimeUnit.SECONDS) // Time to establish connection
.readTimeout(30, TimeUnit.SECONDS) // Time to wait for server response
.writeTimeout(15, TimeUnit.SECONDS) // Time to send request data
.build()
val mapper = jacksonObjectMapper()
val inputStream = object {}.javaClass.getResourceAsStream("/data-1746619055858.csv")
?: error("File not found in resources")
val writer = PrintWriter(File("output.csv"))
writer.println(listOf("id", "full_name", "last_name", "first_name", "sql")
.joinToString(",") { "\"$it\"" })
inputStream.bufferedReader().useLines { lines ->
lines
.drop(1) // Skip the header line
.take(100)
.forEach {
val columns = it.split(",")
.map { it.trim('"') } // Remove surrounding quotes
val id = columns[0]
val fullName = columns[1]
println("ID: $id, Name: $fullName")
val json = mapper.writeValueAsString(
mapOf(
"model" to "llama3",
// "system" to "The following string contains a full name in uppercase without clear delimiters between the last name(s) and the first name(s).\nPlease analyze the name and split it into two parts:\n\nlastName: which can include compound surnames.\n\nfirstName: whichANDREI-MIHAIL may also be a compound name.\nTry to detect where the first name begins even if it is attached directly to the last name (e.g., DUMBRAVEANURADU → DUMBRAVEANU + RADU).\nReturn the result in JSON format like:\n{ \"lastName\": \"...\", \"firstName\": \"...\" }",
"system" to """
The following string contains a full name in uppercase without clear delimiters between the last name(s) and the first name(s).
Your task is to analyze and split it into two parts:
- "lastName: which can include compound surnames"
- "firstName: which may also be a compound name
Try to detect where the first name begins even if it is attached directly to the last name (e.g., DUMBRAVEANURADU → DUMBRAVEANU + RADU).
Important: Return only a JSON object with the following format:
{ "lastName": "...", "firstName": "..." }
Do not explain or add any other text. Only return valid JSON.
Examples:
- "DUMBRAVEANURADU" should be split into "lastName": "DUMBRAVEANU", "firstName": "RADU"
- "IONESCUANDREIMIHAIL" should be split into "lastName": "IONESCU", "firstName": "ANDREI-MIHAIL"
- "POPESCOANAMARIA" should be split into "lastName": "POPESCO", "firstName": "ANA-MARIA"
""".trimIndent(),
// "prompt" to participantName,
"prompt" to "Name: $participantName\nReturn JSON only.",
"stream" to false
)
)
val request = Request.Builder()
.url("http://localhost:11434/api/generate")
.post(json.toRequestBody("application/json".toMediaType()))
.build()
client.newCall(request).execute().use { response ->
if (!response.isSuccessful) {
println("Unexpected response: $response")
} else {
val body = response.body?.string()
val parsed = mapper.readValue<Map<String, Any>>(body!!)
println(parsed["response"])
val response = mapper.readValue<Map<String, Any>>(parsed["response"].toString())
val lastName = response?.get("lastName")
val firstName = response?.get("firstName")
val sql = """
UPDATE users
SET last_name = '$lastName',
first_name = '$firstName'
WHERE id = '$id' AND full_name = '$fullName';
""".trimIndent().replace("\n", " ")
writer.println(listOf(caseNumber, participantName, response?.get("lastName"), response?.get("firstName"), sql)
.joinToString(",") { "\"$it\"" })
}
}
}
}
writer.close()
}
5. Execute and Analyze the Output File