NOTE: This is my first parser and I pushed it to Github just to showcase the project. It's not production ready in any sense.
This is a small project showcasing the parsing process of JSON syntax and how parser can turn JSON syntax into the object that Javascript can read.
Parser was build based on this document.
This parser can parse the following types:
string
number
object
array
true
( In parser this is calledbool
)false
( In parser this is calledbool
)null
Stages will be showcased based on the following JSON syntax:
{
"key": "value"
}
Tokenization is a first process executed here. Tokenization is spliting the content ( string
type ) into multiple different characters that are then identified using different identifiers. List of identifiers are:
WHITESPACE
OBJECT_START
OBJECT_END
STRING_START
STRING_CONTENT
STRING_END
COLON
NUMBER
COMMA
DOT
ARRAY_START
ARRAY_END
BOOL
UNKNOWN
NULL
These identifiers are used to identify different characters. In this case the tokenization output would be:
[
{
"start": 0,
"end": 1,
"raw": "{",
"identifier": "OBJECT_START"
},
{
"start": 1,
"end": 2,
"raw": "\r",
"identifier": "WHITESPACE",
"child": true
},
...WHITESPACE
{
"start": 7,
"end": 8,
"raw": "\"",
"identifier": "STRING_START",
"child": true
},
...STRING_CONTENT
{
"start": 10,
"end": 11,
"raw": "y",
"identifier": "STRING_CONTENT",
"child": true
},
{
"start": 11,
"end": 12,
"raw": "\"",
"identifier": "STRING_END",
"child": true
},
{
"start": 12,
"end": 13,
"raw": ":",
"identifier": "COLON",
"child": true
},
{
"start": 13,
"end": 14,
"raw": " ",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 14,
"end": 15,
"raw": "\"",
"identifier": "STRING_START",
"child": true
},
...STRING_CONTENT
{
"start": 19,
"end": 20,
"raw": "e",
"identifier": "STRING_CONTENT",
"child": true
},
{
"start": 20,
"end": 21,
"raw": "\"",
"identifier": "STRING_END",
"child": true
},
{
"start": 21,
"end": 22,
"raw": "\r",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 22,
"end": 23,
"raw": "\n",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 23,
"end": 24,
"raw": "}",
"identifier": "OBJECT_END"
}
]
Lexing is a second process and it's fed using Token[]
type generated from tokenization process. Lexing returns Entity[]
type that contains different entities. Entity in this case resembles connected tokens that produce one type. On example, if STRING_START
, STRING_CONTENT
and STRING_END
are registered as tokens, entity will generate string type.
Entities can be of type:
string
object
number
array
bool
null
whitespace
colon
comma
unknown
In this case the lexer output would be:
[
{
"type": "object",
"children": [
{
"type": "whitespace",
"value": "\r\n "
},
{
"type": "whitespace",
"value": "\n "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "string",
"value": "key"
},
{
"type": "colon",
"value": ":"
},
{
"type": "whitespace",
"value": " "
},
{
"type": "string",
"value": "value"
},
{
"type": "whitespace",
"value": "\r\n"
},
{
"type": "whitespace",
"value": "\n"
}
]
}
]
AST generation is a third process that is fed with Entity[]
type and generates ASTNode[]
type that can finally be recognized by the final process.
One AST node can be of type:
object
string
array
number
bool
(true
|false
)null
And have different optional data fields like:
children
( only for parent-based types ( in this casearray
andobject
)key
value
While type field is required and is always defined.
In this case the AST output would be:
[
{
"type": "object",
"children": [
{
"type": "string",
"value": "value",
"key": "key"
}
]
}
]
The fourth and final process is parsing the ASTNode[]
and converting node types to Javascript native types.
The output is:
{ key: 'value' }
You can now access key
field inside this object.
First install all packages using npm i
and run npm start
.
Parses JSON from a file.
Args:
- path ->
string
- Defines a path of the JSON file
Parses JSON from a string.
Args:
- content ->
string
- Defines a JSON syntax based string
To access different processes you can import Tokenizer
, Lexer
, AST
and Parser
classes. Each of them have parse
method ( except AST
which has construct
) that accept argument of previous process result.
To visualize every process better, use JSON.stringify(<object>, null, 2)
.