0

Problem to solve

I have a method parse_doc that should dynamically determine a parsing function to parse document_id based on the value of the document_id string:

# list of files that may have different custom parser that depends arbitrarily on name:
documents = [
    "my_file.py",
    "my_file.sh",
    "another_file.txt",
    "https://some_domain.org/some_page",
]

class Parser(object):
    
    def parse_doc(self, document_id, *args, **kwargs):
        
        # identify the correct parser function
        parser = return_doc_parser(document_id)
        parsed_doc = parser(document_id, *args, **kwargs)
        return parsed_doc

Proposed solution

My thought was to define a type inference function, then dispatch the appropriate function, something like the following:

import requests
import parsers

# document type -> parser function handle map:
PARSER_DISPATCH = {
    "python": parsers.py_parser,
    "text": parsers.text_parser,
    "url": parsers.url_parser,
}

def return_doc_type(document_id):
    """Return custom type of document_id."""
    try:
        response = requests.get(document_id)
        if response.status_code == 200:
            url_exists = True
    except Exception:
        url_exists = False
    try:
      fp = open(document_id, "r")
      file_exists = True
    except IOError:
      file_exists = False

    if file_exists and document_id.endswith(".py"):
        doc_type = "python"
    elif file_exists and document_id.endswith(".txt"):
        doc_type = "text"
    elif url_exists:
        doc_type = "url"
    else:
        raise ValueError("no doctype identified.")
    return doc_type    

class Parser(object):
    
    def parse_doc(self, document_id, *args, **kwargs):
        
        doc_type = return_doc_type(document_id)
        parser = PARSER_DISPATCH[doc_type]
        parsed_doc = parser(document_id, *args, **kwargs)
        return parsed_doc

A few comments: such an informal type system seems a bit unusual, and the logic for type inference seems like it could quickly grow out of hand.

My question

Is this a reasonable approach? Are there other well-known methods to do value-based dispatch? I've googled high and low and found mostly type-based dispatch (multi-methods, multiple dispatch, method overloading) discussed.

anon01
  • 117
  • 4
  • 1
    Yes, this kind of logic to figure out which function to call is perfectly normal. There is no silver bullet unless your values have a particular structure, e.g. to enable dispatch tables or binary search of targets. Here we could at most quibble about the details, e.g. whether it makes sense to potentially leak all document_ids to the internet or whether we should blindly open files (security!), or what kind of error message would be most helpful for unknown types. – amon Oct 10 '20 at 21:01
  • But what you could do to make it a bit more modular is separate each check into its own function - e.g, one that only checks if it's a py document, and returns the py_parser or None, another one that checks for text document, etc. Then stick them (the functions) into a list, and wrap the list into something that takes a doc name, goes through the list invoking the functions, and returns the first non-None value. (You could modify the idea to suit your needs - e.g., you could return (docname, parserFunc) pair, or whatever). That would also remove the need the PARSER_DISPATCH map. – Filip Milovanović Oct 11 '20 at 11:22
  • why the downvoting? An explanation would be helpful – anon01 Oct 11 '20 at 15:35

0 Answers0