Code generation via LLM #568

Gauntlet173 · 2023-07-18T17:08:27Z

Blawx code is currently represented as XML. That method has been deprecated by Blockly, and there is an issue for upgrading (#137).

Once that has happened, it will be possible to give something like the -0613 models of GPT4 a JSON schema, and a set of examples to work from, and ask it to generate Blockly code directly. There is good reason to believe that it will be able to generate valid code, either with a codebase-specific JSON schema, or with a generic schema and detailed information on the availalble ontology and block types.

The problem is going to be context, because 8k will not contain everything we need it to know, plus multi-shot training examples, and I'm expecting the results will be bad without the examples.

So I'm thinking the path forward is:

Resolve issue Refactor to Blockly's JSON-based serialization #137
Implement a bi-directional compression of Blawx JSON into a smaller representation, which can be expanded back using the block definitions.
Encode the smallified JSON representation as a JSON Schema for use in the OpenAI call.
Create an interface to select a set of examples that fit within the context limit.
Create an endpoint that will make the request and update the code in the relevant section.

The smallest way to test it is to do 2 and 3 first, manually generate JSON representations for some existing encodings, and do some leave-one-out testing to see if what we get back is syntactically correct and better than a blank screen. The larger project might involve breaking it into ontology and rule steps, because of the step-wise nature of the interface (you can't add a category and use it in the same code change).

An example of what the minified JSON might look like is given here for section 4 of the Rock Paper Scissors Act example:

{"blawx": [
	"fact": {
		"statements": [
			"new_attribute": {
				"category_name": "game",
				"attribute_name": "winner",
				"type": "player",
				"order": "ov",
				"prefix": "the winner of",
				"infix": "is",
				"postfix": "",
				"category_options": "sdfsdfsdf"
			},
			"new_relationship": {
				"relationship_name": "throw",
				"prefix1": "",
				"type1": "player",
				"prefix2": "threw",
				"type2": "sign",
				"prefix3": "in",
				"type3": "game",
				"type_options": "asdfasdfasdf"
			}
		]
	},
	"rule": {
		"conditions": [
			"member": {
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"category": "game"
			},
			"member": {
				"object": {
					"variable": {
						"name": "Player1"
					}
				},
				"category": "game"
			},
			"member": {
				"object": {
					"variable": {
						"name": "Player2"
					}
				},
				"category": "game"
			},
			"disequal": {
				"first": {
					"variable": {
						"name": "Player1"
					}
				},
				"second": {
					"variable": {
						"name": "Player2"
					}
				}
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player1"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "participant"
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player2"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "participant"
			},
			"relationship_value": {
				"relationship_name": "throw",
				"param1": {										"variable": {
						"name": "Player2"
					}
				},
				"param2": {
					"variable": {
						"name": "Throw2"
					}
				},
				"param3": {
					"variable": {
						"name": "Game"
					}
				}
			},
			"relationship_value": {
				"relationship_name": "throw",
				"param1": {										"variable": {
						"name": "Player1"
					}
				},
				"param2": {
					"variable": {
						"name": "Throw1"
					}
				},
				"param3": {
					"variable": {
						"name": "Game"
					}
				}
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Throw2"
					}
				},
				"object": {
					"variable": {
						"name": "Throw1"
					}
				},
				"attribute_name": "beats"
			}
		],
		"conclusion": [
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player1"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "winner"
			}
		]
	}
]}

A list is a list of blocks, which indicates an input. An object is a single block, which indicates a value. Those aren't distinguished in the Blockly representation, either. A value indicates a field. Extra state is not distinguished except that the field names don't appear in the block definition.

The only reason for doing this is that it would be roughly 1/3 the size in tokens of the Blockly JSON representation, which I'm hoping makes it feasible to do multi-shot, and increase the quality of the results while staying inside an 8k token context window.

The text was updated successfully, but these errors were encountered:

Gauntlet173 added enhancement New feature or request code editor Dealing with the code editor labels Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code generation via LLM #568

Code generation via LLM #568

Gauntlet173 commented Jul 18, 2023

Code generation via LLM #568

Code generation via LLM #568

Comments

Gauntlet173 commented Jul 18, 2023