Batch CSV Processing
- Jacques Marais
- Charl Viljoen
- Former user (Deleted)
Description
Whereas standard CSV processing in Helium processes an entire blob, representing a CSV file, at once to produce a collection of object instances, the batch CSV processing features takes a multi-step approach:
- First the
Mez:createBatch
 built-in function is used to create a batch from a blob representing a valid CSV file. - The built-in
MezBatch
 andMezBatchItem
 objects are used to store the created batch. - The created batch can then be processed item by item using theÂ
fromCsvLine
 built-in function.
MezBatch and MezBatchItem Built-in Objects
The following two objects are used to store batches that are created by the use of the Mez:createBatch
 built-in function. These objects are built-in objects meaning they are included, by Helium, as part of every Helium app:
@NotTracked persistent object MezBatch { Â // User specified description of this batch string name; Â // String representing the headers for the CSV file associated with this batch string header; }
@NotTracked persistent object MezBatchItem { // Flag indicating whether this batch item has been processed with fromCsvLine bool processed; Â // A string representing an entire line of the CSV file associated with the batch string value; // Relationship to the built-in MezBatch object @ManyToOne MezBatch batch via batchItems; }
These objects can be queried like any other object in Helium using for example the native SQL statements provided by the DSL or using selectors.
VERY IMPORTANT
These objects are persistent.
Creating them will persist to the database.
If you expect/do regular csv uploads on your application please take note that these MezBatchÂ
and MezBatchItemÂ
records will be added to you schema.
It is advisable that you clear these tables often to ensure your schema doesn't grow needlessly large.
Creating a Batch for Processing
To create a batch for processing the Mez:createBatch
 built-in function can be used:
object FileUpload { blob data; }
MezBatch stockUpdateBatch = Mez:createBatch(fileUpload._id, fileUpload.data);
In this case fileUpload
 represents an object instance containing a blob
 attribute named data
.
The first parameter of Mez:createBatch
 represents the name of the batch. This can be any value that can be implicitly converted to a string. Note that there is no validation on uniqueness for this value meaning more than one batch can have the same name. If, however, batch creation and batch processing are done separately it makes sense that the batch should be uniquely identifiable. For this purpose we use the unique id of the fileUpload
 object instance to ensure uniqueness.Â
The second parameter represents the blob value that represents the CSV file to process. blob values can be populated using, amongst other methods, the file upload widget, or the inbound API.
The above command will create entries in the database that looks as follows:
 helium-app-1=# select * from mezbatch; _id_ | _tstamp_ | name | header --------------------------------------+----------------------------+--------------------------------------+----------------------- b6a3f674-a47d-4480-810d-ef9b06026b47 | 2018-12-06 15:03:31.313484 | 4731ab52-7b4d-424e-b9f1-072ac47b3e08 | stockName,level,price
 helium-app-1=# select * from mezbatchitem ; _id_ | _tstamp_ | processed | value | batch_fk --------------------------------------+----------------------------+-----------+-----------------------------------+-------------------------------------- 1e8a4c8b-79cc-490c-a133-122e869722db | 2018-12-06 15:03:31.313484 | f | Groundnut,440,10 | b6a3f674-a47d-4480-810d-ef9b06026b47 2e90b632-1516-4d74-a952-ece3a54afab1 | 2018-12-06 15:03:31.313484 | f | A-grade grain fertilizer,100,100 | b6a3f674-a47d-4480-810d-ef9b06026b47 362bab77-efeb-4594-ad7f-5bc9e87f0da6 | 2018-12-06 15:03:31.313484 | f | Rice,150,50 | b6a3f674-a47d-4480-810d-ef9b06026b47 36ec85de-f7fe-4086-b055-2b9762fbe5e5 | 2018-12-06 15:03:31.313484 | f | Corn,200,100 | b6a3f674-a47d-4480-810d-ef9b06026b47 5bc7aa10-5ca0-4c6e-9cb3-6d7f900e6f36 | 2018-12-06 15:03:31.313484 | f | Wheat,300,50 | b6a3f674-a47d-4480-810d-ef9b06026b47 7eea8146-d83c-405a-bfed-9907ac5a22d8 | 2018-12-06 15:03:31.313484 | f | A-grade legume fertilizer,150,120 | b6a3f674-a47d-4480-810d-ef9b06026b47 8b0a1efb-7758-4406-bfe2-f7260505a9f3 | 2018-12-06 15:03:31.313484 | f | Beans,400,60 | b6a3f674-a47d-4480-810d-ef9b06026b47 971a9efc-f891-485f-9752-a6488512f92d | 2018-12-06 15:03:31.313484 | f | Cowpea,100,50 | b6a3f674-a47d-4480-810d-ef9b06026b47 ca458b84-a63f-42e5-a662-cd3c8e40f5b5 | 2018-12-06 15:03:31.313484 | f | B-grade legume fertilizer,300,50 | b6a3f674-a47d-4480-810d-ef9b06026b47 f138856e-033b-49b1-8f24-194b6ccb5a5b | 2018-12-06 15:03:31.313484 | f | B-grade grain fertilizer,200,40 | b6a3f674-a47d-4480-810d-ef9b06026b47
Processing an Existing Batch
Once a batch has been created, it can be processed line for line using the fromCsvLine
 built-in function:
// Get the batch items related to the batch being processed MezBatchItem[] stockUpdateBatchItems = MezBatchItem:relationshipIn(batch, stockUpdateBatch); Â // Results from CSV processing will be stored in this collection StockUpdate[] stockUpdates; Â // Iterate over the batch items and process one by one using fromCsvLine foreach(MezBatchItem stockUpdateBatchItem: stockUpdateBatchItems) { StockUpdate stockUpdate = StockUpdate:fromCsvLine(stockUpdateBatch.header, stockUpdateBatchItem.value); stockUpdates.append(stockUpdate); Â // Once a batch item has been successfully processed it can be marked as such stockUpdateBatchItem.processed = true; }
To retrieve the batch items associated with the batch being processed, we make use of the relationship between the two objects.
The first parameter of fromCsvLine
 represents a string representing the header of the CSV file being processed. This can be retrieved from the MezBatch
 object instance as shown.
The second parameter of fromCsvLine
 represents a string representing the current line of the CSV file being processed. This can be retrieved from the MezBatchItem
 object instance as shown.
Exception Handling and Debugging
Seeing as each line of a CSV file is processed separately we can make use of the Helium exception handling functionality to continue processing lines of the CSV file even if one or more lines fail. Additional follow-up tasks can then be created to manually process or address lines in the CSV file that could not be parsed.
The above example with exception handling included might look something like this:
// Get the batch items related to the batch being processed MezBatchItem[] stockUpdateBatchItems = MezBatchItem:relationshipIn(batch, stockUpdateBatch); // Results from CSV processing will be stored in this collection StockUpdate[] stockUpdates; // Iterate over the batch items and process one by one using fromCsvLine foreach(MezBatchItem stockUpdateBatchItem: stockUpdateBatchItems) { try{ StockUpdate stockUpdate = StockUpdate:fromCsvLine(stockUpdateBatch.header, stockUpdateBatchItem.value); stockUpdates.append(stockUpdate); // Once a batch item has been successfully processed it can be marked as such stockUpdateBatchItem.processed = true; } catch(exception) { // Handle the exception by logging it and creating a follow up task handleException(stockUpdateBatch, stockUpdateBatchItem, exception.toString); } }
More details on how the use exception handling in the Helium DSL, please see this page.