1

Backstory

I have an XML type document (SSML, which is used forText-To-Speech), which will be used to generate audio files when ssh transferred to a remote server. As such, I will need to include metadata for the ID3 tags that typically are used in audio files (Genre, Title, Composer, Album, etc...).

My approach thus far has been to invent a new tag:

<metadata value="genre">
Froggy
</metadata>

And then parse it using regular expressions:

/* Grab Metadata */
QTerminalTools tt;
QFile file(filePath); 

if (file.open(QIODevice::ReadOnly | QIODevice::Text)) {
    const QString metadata = file.readAll();

    QString genre(metadata);
    genre.replace(QRegularExpression("(?s)^.*"
                + QRegularExpression::escape("<metadata value=\"genre\">")
                +"\n(.*)?\n"  
                + QRegularExpression::escape("</metadata>")
                + ".*$")
                , "\\1");
    qDebug().noquote() << tt.orange("Genre: " + genre);
}

This really is a very raw approach that I have made up on the fly, so I figure that there are better practices which I am unaware of. As such:

Questions

  • Was XML designed to handle custom metadata?
  • Is there already a standard tag in XML for custom metadata (<metadata value="type">value</metadata>)?
  • Are XML parsers standardized in case I need to build my own?
  • Are there any security issues involved with creating my own tags?

Thanks.

Anon
  • 3,565
  • 3
  • 27
  • 45
  • 1
    This question is too broad to be reasonably answerable here. Can you narrow it down to something more specific, and ask your question in such a way that it doesn't rely so heavily on words like "should," "acceptable" and "better?" – Robert Harvey Mar 14 '17 at 20:55
  • @RobertHarvey Editted the questions: How is that? – Anon Mar 14 '17 at 21:21
  • @Akiva XML is a context-free language. Can a regular expression parse a context-free language? –  Mar 14 '17 at 22:20

3 Answers3

5

Besides the fact it is a very bad idea to parse XML by regular expressions, I try to answer your questions:

Was XML designed to handle custom metadata?

XML is designed to handle all kind of data or metadata, and what one calls "metadata" from one point of view can be called just "data" from another point of view. The distinction is somewhat arbitrary.

Is there already a standard tag in XML for ...

There are no "standard tags" in XML at all. XML is not a specific data format, it is more a ruleset of how to create data formats. And when you want to have a tag with a specific semantic, you will always have to define the tag as well as the semantics by yourself.

Are XML parsers standardized in case I need to build my own?

I am pretty sure you don't need to build your own parser, try to use an existing one. As a rough distinction, there are DOM based parsers (which read an XML document completely), and SAX parsers, which allow you to read XML sequentially without holding the whole document in memeory. The former ones are typically easier to use, the latter ones have a better memory footprint.

Among the available parsers, assuming most are good, are there any I should avoid?

Sorry, we do not provide any tool recommendations on this site. Better delete that part of your question, otherwise the community may close your question as being off-topic. However, since you are using Qt, I recommend to inform yourself about the Xml parsers included in Qt - Google is your friend, found this older SO question in seconds.

Are there any security issues involved with creating my own tags?

Short answer: no. How would you else use XML for creating data formats if not by creating tags?

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
2

XML is designed to handle arbitrary data. It doesn't distinguish between data and metadata, but does have facilities to allow you to mix data of multiple types defined independently of each other in a single file. To do this, it uses a notion of namespaces, that can be used to identify multiple schemas. An example might look something like this:

<?xml version="1.0" ?>
<mydoc xmlns="schema URL 1" xmlns:meta="schema 2">
    <meta:description>a demonstration</meta:description>
</mydoc>

What this file contains is data in two independently defined formats. The "xmlns" attribute provides an identifier (conventionally a URI) for the tags that don't have a namespaces prefix, while "meta:xmlns" identifies the format used for tags with a "meta:" prefix

Most XML parsers will allow you to readily identify what the identifier uri for any tag or attribute is.

You could therefore use this to keep your metadata tags separate from the tags defined in the format you're using. There are also well known formats for metadata tags too. A search for something like "XML metadata schema" should turn up descriptions of some common ones. Or you could define your own if you prefer: just make up a URI for your namespace and use it consistently and everything should work

Jules
  • 17,614
  • 2
  • 33
  • 63
0

You should not using regular expressions or otherwise try to parse the xml tags manually. Instead You should use a dedicated XML library for whatever language You are using. You can easily extract the information in the tags with the Document Object Model which is supported by most libraries: https://en.wikipedia.org/wiki/Document_Object_Model