Developing your Data Package

How to create package properties and resources in your Data Package.

In your new project generated from the template-data-package, the first steps for creating and developing your Data Package are already set up in main.py. For more detailed instructions on using Seedcase Sprout to organise your Data Package, see the guide on Sprout’s website. You can read more about the files and folders created by main.py on the Outputs page of the design documentation.

Creating package properties

  1. Run main.py to create the scripts/package_properties.py file for the properties of your Data Package by using the recipe in the justfile:

    just build

    You can also run main.py by clicking the “Run” button in your IDE.

  2. Open scripts/package_properties.py and fill in all required fields. Also fill in any optional fields you find useful. You can always update these later. Make sure to save the file.

  3. In main.py, uncomment the lines referencing the package_properties and package_path variables.

  4. Rerun main.py to create the datapackage.json file for your Data Package.

Creating a new resource

If you already have the data

While you can create resource properties without data, it is a lot more challenging. If at all possible, only create a resource properties object when you have data to use to at least pre-fill in some of the important fields. In order to use Sprout, the data needs to already be in a tidy format. When it is, load the data as a Polars DataFrame into the raw_data variable in main.py.

  1. Uncomment lines up to and including the creation of resource properties.

  2. Fill in the resource_name argument.

  3. Rerun main.py to create the scripts/resource_properties_<name>.py file for the properties of the new resource.

  4. Open scripts/resource_properties_<name>.py and fill in all required fields. Also fill in any optional fields you find useful. You can always update these later. Make sure to save the file.

  5. In package_properties.py, import your new resource properties by uncommenting and updating it with the name of your resource. Also uncomment the resources field and update the name of the resource properties in the array to match the name of your new resource.

  6. In main.py, import your new resource properties by uncommenting it and updating it with the name of your resource.

  7. Uncomment everything else in the main.py file and rename the resource_properties variable to the name of the new resource properties you just imported.

  8. Rerun main.py. This will:

    • Update datapackage.json.
    • Create a resources/ folder containing a folder for your new resource. In here, you will find a batch/ folder with the individual data batches you’ve uploaded for this resource and a data.parquet file containing all resource data.