6

Say you have an application that saves image(.jpg, .png etc) and text(.txt, .xml) files, and the application has all of the files paths hard coded throughout code, per the example below.

Assuming the following file path structure exists:

  • jpg - \\MyFileServer\Media\jpg\
  • png - \\MyFileServer\Media\png\
  • txt - \\MyFileServer\Text\txt\
  • xml - \\MyFileServer\Text\xml\

The files in the destination file paths are referenced inside of a table, so in a VBA example the file path would be hard coded as:

Dim myFilePath as String
Dim myFileName as String
myFileName = "puppies.jpg" 'This would be the result of a query in a real scenario
myFilePath = "\\MyFileServer\Media\jpeg\" & myFileName 

Now say I have to split \\MyFileServer into \\MyMediaFileServer and \\MyTextFileServer

What would have been ideal is if I had a central location that I could have just changed a table value or variable, rather than trudging through every single function in the application to update the hard coded path. Just to change it again in the future if the situation arises again. So, my goal is to make sure my puppies image can be shown with minimal effort.

Concerning industry standard, I was wondering what is the best option, be it my two options below or another separate option.

Option 1:

Single Self Referential Table:

Defined in Mike's answer at hierarchical/tree database for directories in filesystem by a hierarchical structure like the following:

        (ROOT)
      /        \ 
    Dir2        Dir3
    /    \           \
  Dir4   Dir5        Dir6
  /          
Dir7

You simply have a table like the following:

ID  ParentID     FilePath
______________________________________________
1   NULL         \\MyFileServer 
2   1            \Images
3   2            \jpg
4   3            \10 MB Files

OR Option 2: Self Referential Table With Master Roots:

A slight deviation on option 1.

Perhaps a master file path would be clearer. Where there is a separate table.

ID    Name               FilePath
______________________________________________
1     Image Root         \\MyMediaFileServer\Images
2     Text Root          \\MyTextFileServer\Text

Then in the structure mentioned in option 1.

ID  ParentID    RootID   FilePath
______________________________________________
1   NULL        1        \jpg 
2   1           NULL     \10 MB Files

I feel that option 1 is overall more simple to change, but will become more confusing as a tree expands, and option 2 allows for an easier absolute file path change. IS Option 1 the best solution to storing file paths or is there another industry standard I am not aware of?

Elias
  • 193
  • 1
  • 6

2 Answers2

11

Both options are over-engineered, involving the database is inappropriate here, and the directory structure is too deep.

Instead, for administrator-friendly software, I see the following patterns emerging:

  1. Paths are configured in config files, not database.

    Usually, you want to be able to copy the database dump to another system. Often, the database server is on another system. You simply don't expect the database content to be entangled with file system details, the only exception being strictly relative paths. Finally, having to make configuration changes in a database is a pain.

  2. There's just one data directory to configure, if any.

    This depends on the application, of course, but you almost always want to have everything in a single directory. Then the application, not the config file, takes care of its inner structure. In your case, all I want to configure is:

    \\MyFileServer

    Another common approach is not to configure it at all, but to a well defined sub directory. This is either within the user's HOME directory or within some generic application data directory (different conventions on different operating systems, but that's another topic).

    If you want another place, you simply replace that directory with a symbolic link, such as:

    {...}\Data -> \\MyFileServer

  3. The inner structure of the data directory is managed by source.

    Good applications have internal functions or property table for all common directories, usually all combined in a single class or module.

    The base is something like getRootPath() which reads the config file and returns something like \\MyFileServer.

    Then, there functions like getMediaPath() or getTextPath() which internally just call getRootPath() (or each other) and append their relative path.

    A common variation from the theme is to let those functions take a filename (or relative file path) as argument, and to return the full path to that file. For example, getMediaPath("great.jpg") would return \\MyFileServer\Media\great.jpg.

    Ideally, these functions also create the directory if it doesn't exist yet. Another approach is to create each directory only when the first file is written to it. Either way, the point is not to expect a fully populated directory structure, which is one less thing the admin (or installer) might get wrong.

    If a future version of the application needs some more subdirectories, it usually just creates those, without bothering the administrator to creative those, or to force them to add those paths to a config file, or any another stupid hassles. (see 2: There's just one data directory)

  4. Directories structures are not deeper than necessary.

    Your directory structure is essentially split by file extention.

    • {YourRoot}\Media\jpg\great.jpg
    • {YourRoot}\Media\png\great.png
    • {YourRoot}\Text\txt\great.txt
    • {YourRoot}\Text\xml\great.xml

    Is that really necessary? The files are already uniquely determined by their extension, so why not skip those intermediate paths?

    • {YourRoot}\great.jpg
    • {YourRoot}\great.png
    • {YourRoot}\great.txt
    • {YourRoot}\great.xml

    Of course, applications have good reasons for structuring the data directory. But that's usually a separation by usage (or purpose), not merely by file extension.

  5. If you must, split uniformly.

    The only reasons for further splitting the directories is if they get too large. In that case, you still don't split by file extension, but by something that provides a uniform partition.

    Often, this is time based, e.g. daily or monthly:

    • {YourRoot}\2015-01\great.jpg
    • {YourRoot}\2015-01\stuff.png
    • ...
    • {YourRoot}\2015-02\others.jpg
    • ...

    If you have uniform file names (e.g. hashes), their prefix is often chosen as subdirectory name (and cut from the original file name), such as:

    • {YourRoot}\12\34567.jpg
    • ...
    • {YourRoot}\13\43577.png
    • ...
vog
  • 1,424
  • 2
  • 11
  • 10
  • 1
    +1 - I wrote basically the same answer but you've done it better. I can see a benefit for keeping the file category (media/document etc. so they could be housed on different servers for example) but I definitely agree that by filetype is not useful - it needs something in the taxonomy to help avoid collisions even if it's only `yyyy-mm`. – James Snell Feb 04 '15 at 09:23
  • NTFS has had symbolic links for a good few decades too. So no worries there. :) – James Snell Feb 04 '15 at 09:25
  • 1
    @JamesSnell: Yes, that's what I meant by "split files on purpose". Thanks for the hint about time-based directories and NTFS, I edited my answer accordingly. – vog Feb 04 '15 at 09:26
0

Option 1 is definitely overkill.

Mike's answer is awesome, if you want to store data about directories and their structure. But that is not what you need here. You don't really care about folder hierarchy and the relations between them. You just want to know what the paths ARE.

You can get these parameters from either from a config/ini file or from a table (as you describe in Option 2). I find that config files are more popular, but there are advantages to both methods. You can find a good discussion of these in Scriptin's answer to Should I use a config file or database for storing business rules?

Zackie
  • 21
  • 2