Music Business Data Parser

The Music Business Data Parser toolset allows for quick and flexible generation of CWR and CRD parsers. It supports different target programming languages like Javascript, PHP and Java.

The music business has elaborated different kinds of data interchange formats for sharing information between participants. For example, the CISAC defines two widely used data standards: _Common Work Registration_ (CWR) is used to register a musical work with collecting societies. IN addition, the _Common Royalty Distribution_ (CRD) supports reporting of distributed royalties between societies and music publishers. For publishers, it is necessary to be able to read and write files in CWR and CRD format. Eventually, the standards are updated and, thus, the specification of the data formats change. To allow for quick adaption of new standards, we developed a set of flexible, easy-to-adapt parsers.

With the tools presented here, you can validate CISAC-based data files in the programming language of your choice. Currently, parser generation supports the following languages:

Ready-to-use Parsers

CWR Parser

A prototypical implementation of the CWR parser is available online. It expects a CWR file and displays the content of the parsed file as shown in the image below. Each of the elements in the CWR file has a tooltip stating its type.

Javascript Parser Content
Javascript Parser Content (click to enhance)

CRD Parser

A prototypical implementation of the CRD parser is available online. It expects a CRD file and displays the content of the parsed file as shown in the image below. Each of the elements in the CRD file has a tooltip stating its type.

Javascript Parser Content
Javascript Parser Content (click to enhance)

Parser Generation

The music business parsers are generated using the Grammar Parser Generator which is an extension of the waxeye open source parser generator. It allows for generating parsers in different programming languages. Each parser uses BNF-style grammar files to represent the structure of the respective data format. The following snippet shows an example for defining the transmission header of a CWR/CRD file:

TransmissionBegin   <- TransmissionHeader ((SenderType SenderIdShort) | SenderIdLong) SenderName EDIVersionNumber CreationDate CreationTime TransmissionDate ?CharacterSet LINEBREAK
TransmissionHeader  <- >HDR>
SenderIdShort       <- IPNumberShort
SenderIdLong        <- IPNumberLog
SenderName          <- {AlphaNum{45}}
CreationDate        <- Date
CreationTime        <- Time
TransmissionDate    <- Date
CharacterSet        <- {AlphaNum{15}}

IPNumberShort   <- {Number{9}}
IPNumberLong    <- {Number{11}}

As can be seen from the example, the transmission header starts with the string HDR followed by either the senderType and senderIdShort (a nine-digit IPI number) or the senderIdLong (an eleven-digit IPI number). This reflects the rules from the CISAC specification as shown in the image below.

CISAC Rules IPI

Using the grammar-based definition of CWR, it is possible to quickly adapt to changes in the specification by modifying the grammar and generating a new parser. We currently support parser generation for Typescript, Javascript, PHP and Java. Thorough grammar files for generating parsers can be found under music business grammars. The generated parsers are also available online.

Prerequisites

To run the Grammar Parser Generator, you first have to compile the waxeye parser generator or download the binary distribution. In addition, we provide an extensible docker image which with a preconfigured waxeye environment.

Typescript/Javascript Parser

Usage of the Javascript Parser (Browser)

The necessary Javascript files are available for download. The download package contains the waxeye library (residing in directory waxeye) and the generated Javascript parser (residing in the file js-parser.js). To use the parser you have to include the respective Javascript file and instantiate a new parser object, i.e.:

self.importScripts("./require.js");
const parserModule = Tarp.require({main: "./js-parser.js", sync: true});
const parser = new parserModule.Parser();
const result = parser.parse(content);

The result is either an error object or the abstract syntax tree (AST) containing the parsed content. As the parser does not return the type of the result, a check may look as follows:

if (undefined === result.type) {
    // parse error
} else {
    // parse successful
}

The parse result is a JSON structure containing information about parsed terminal symbols from the grammar together with their position in the file. For example, parsing a CWR file results in the following JSON snippet:

Javascript Parser Result
Javascript Parser Results (click to enhance)

Usage of the Javascript Parser (node.js)

The Javascript parser can be used as a node.js application, using the following code:

const file = process.argv[2];
if (file === undefined) {
    console.log("Usage: node Application.js cwrFile!")
    return -1;
}

const fs = require('fs');
if (!fs.existsSync(file)) {
    console.log("File " + file + " was not found!");
    return -1;
}

const parser = require('./parser');
const waxeye = require("./waxeye/waxeye");

const p = new parser.Parser();
const fileContent = fs.readFileSync(file).toString();
const result = p.parse(fileContent);

if (result instanceof waxeye.ParseError) {
    console.log("error parsing " + file + ": " + JSON.stringify(result));
} else {
    const path = require('path');
    fs.writeFileSync(path.basename(file) + ".json", JSON.stringify(result));
}

PHP Parser

The PHP parser package is available for download. It is tested to run with PHP > 7. The download package provides a command line application runnable via php CommandLineApplication.php inputFile. After parsing the given file, the result can be found in the file $cwrFile.json.

Java Parser

To run the Java version of the parser (available for download), run java -classpath parser-0.1.jar; de.unileipzig.urz.soclear.cwr.CommandLineApplication $inputFile. In its current version, the Java parser does not produce JSON output. However, it writes the parsing hierarchy into the file $cwrFile.result.

Downloads

Grammars

The grammars for CWR and CRD parser generation are available for ↓ download

Typescript Download

The current version is 0.2 from November, 20th 2020 and supports the following CWR/CRD versions:

Javascript Download

The current version is 0.2 from November, 20th 2020 and supports the following CWR/CRD versions:

Disclaimer

The tool is thoroughly tested. However, it is still in ALPHA state. Thus, use it at your own risk. We are not responsible for any negative outcome due to the use of this program. Feedback of any kind please send to Michael Becker.

Known Issues
  • parsing of invalid CWR files stops at the first error
  • mandatory whitespaces are displayed as commas
  • the parser does not type language codes

PHP Download

The current version is 0.2 from November, 20th 2020 and supports the following CWR/CRD versions:

Disclaimer

The tool is thoroughly tested. However, it is still in ALPHA state. Thus, use it at your own risk. We are not responsible for any negative outcome due to the use of this program. Feedback of any kind please send to Michael Becker.

Known Issues
  • parsing of invalid CWR files stops at the first error
  • the parser does not type language codes

Java Download

The current version is 0.1 from 01.06.2020 and supports CWR 2.1-r8

↓ download

Disclaimer

The tool is thoroughly tested. However, it is still in ALPHA state. Thus, use it at your own risk. We are not responsible for any negative outcome due to the use of this program. Feedback of any kind please send to Michael Becker.

Known Issues
  • parsing of invalid CWR files stops at the first error
  • mandatory whitespaces are displayed as commas
  • currently, the parser does not produce JSON output but only a parse tree
  • the parser uses a recursive implementation which might lead to memory shortages parsing large CWR files

Building from source code

The source code of the parser generator is available online. If you experience any issues during usage, feel free to add a bug report.

To build the waxeye parser generator from source code, you need to have Racket installed. After cloning the repository, run exe.bat from the build directory. This generates the file waxeye.exe and the directory lib which are required for running the parser generator. Using the grammar, you can generate the parser running waxeye -g $language $dir grammar. Language can be one of java, javascript, or php.