Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generation via LLM #568

Open
Gauntlet173 opened this issue Jul 18, 2023 · 0 comments
Open

Code generation via LLM #568

Gauntlet173 opened this issue Jul 18, 2023 · 0 comments
Labels
code editor Dealing with the code editor enhancement New feature or request

Comments

@Gauntlet173
Copy link
Contributor

Blawx code is currently represented as XML. That method has been deprecated by Blockly, and there is an issue for upgrading (#137).

Once that has happened, it will be possible to give something like the -0613 models of GPT4 a JSON schema, and a set of examples to work from, and ask it to generate Blockly code directly. There is good reason to believe that it will be able to generate valid code, either with a codebase-specific JSON schema, or with a generic schema and detailed information on the availalble ontology and block types.

The problem is going to be context, because 8k will not contain everything we need it to know, plus multi-shot training examples, and I'm expecting the results will be bad without the examples.

So I'm thinking the path forward is:

  1. Resolve issue Refactor to Blockly's JSON-based serialization #137
  2. Implement a bi-directional compression of Blawx JSON into a smaller representation, which can be expanded back using the block definitions.
  3. Encode the smallified JSON representation as a JSON Schema for use in the OpenAI call.
  4. Create an interface to select a set of examples that fit within the context limit.
  5. Create an endpoint that will make the request and update the code in the relevant section.

The smallest way to test it is to do 2 and 3 first, manually generate JSON representations for some existing encodings, and do some leave-one-out testing to see if what we get back is syntactically correct and better than a blank screen. The larger project might involve breaking it into ontology and rule steps, because of the step-wise nature of the interface (you can't add a category and use it in the same code change).

An example of what the minified JSON might look like is given here for section 4 of the Rock Paper Scissors Act example:

{"blawx": [
	"fact": {
		"statements": [
			"new_attribute": {
				"category_name": "game",
				"attribute_name": "winner",
				"type": "player",
				"order": "ov",
				"prefix": "the winner of",
				"infix": "is",
				"postfix": "",
				"category_options": "sdfsdfsdf"
			},
			"new_relationship": {
				"relationship_name": "throw",
				"prefix1": "",
				"type1": "player",
				"prefix2": "threw",
				"type2": "sign",
				"prefix3": "in",
				"type3": "game",
				"type_options": "asdfasdfasdf"
			}
		]
	},
	"rule": {
		"conditions": [
			"member": {
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"category": "game"
			},
			"member": {
				"object": {
					"variable": {
						"name": "Player1"
					}
				},
				"category": "game"
			},
			"member": {
				"object": {
					"variable": {
						"name": "Player2"
					}
				},
				"category": "game"
			},
			"disequal": {
				"first": {
					"variable": {
						"name": "Player1"
					}
				},
				"second": {
					"variable": {
						"name": "Player2"
					}
				}
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player1"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "participant"
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player2"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "participant"
			},
			"relationship_value": {
				"relationship_name": "throw",
				"param1": {										"variable": {
						"name": "Player2"
					}
				},
				"param2": {
					"variable": {
						"name": "Throw2"
					}
				},
				"param3": {
					"variable": {
						"name": "Game"
					}
				}
			},
			"relationship_value": {
				"relationship_name": "throw",
				"param1": {										"variable": {
						"name": "Player1"
					}
				},
				"param2": {
					"variable": {
						"name": "Throw1"
					}
				},
				"param3": {
					"variable": {
						"name": "Game"
					}
				}
			},
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Throw2"
					}
				},
				"object": {
					"variable": {
						"name": "Throw1"
					}
				},
				"attribute_name": "beats"
			}
		],
		"conclusion": [
			"attribute_value": {
				"value": {
					"variable": {
						"name": "Player1"
					}
				},
				"object": {
					"variable": {
						"name": "Game"
					}
				},
				"attribute_name": "winner"
			}
		]
	}
]}

A list is a list of blocks, which indicates an input. An object is a single block, which indicates a value. Those aren't distinguished in the Blockly representation, either. A value indicates a field. Extra state is not distinguished except that the field names don't appear in the block definition.

The only reason for doing this is that it would be roughly 1/3 the size in tokens of the Blockly JSON representation, which I'm hoping makes it feasible to do multi-shot, and increase the quality of the results while staying inside an 8k token context window.

@Gauntlet173 Gauntlet173 added enhancement New feature or request code editor Dealing with the code editor labels Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code editor Dealing with the code editor enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant