Skip to content

Data Preparation

Akshay Verma edited this page May 31, 2017 · 2 revisions

Preparation of the Initial Dataset

The data has been manually entered by a database team at CBGA and then integrated into the portal with visualizations. These values are calculated using multiple state budget documents. A separate concordance table has been prepared and has been integrated on the tool.

Granularity of Data

There is so much variability among states in terms of their budget documents and the contents, that for purposes of such inter-state comparison, only aggregate level indicators could be taken.

Scope of Data

As of now, expenditure side indicators for 26 states (including Delhi ) and for 12 sectors have been put up on the portal.

Future Expansion of the scope of Data

In the next phase Municipal Corporation data can be brought under the ambit of Story Generator as well. In that direction, maybe you can mention that creation of standard templates will enable such comparison across municipal corporations or further down. So, in some sense, this will be like a harmony between budget knowlegde (in creating templates) and technology (in integrating this standardised database into a visual form).

Data format of the Dataset and the conversion required

The initial dataset is prepared in CSV. While Javascript supports methods to read CSV file, the structure in which it would read would not be suitable for the App. Thus munging would be required to make it ingest-able for visualization. The dataset is munged to JSON using a Python Script. The JSON data contains an array of objects with each object representing the Sector. The sector contains indicators. The indicator contains State Data. Each State data contains Budget Attributes. These budget attributes contain the fiscal year and their corresponding values.