18

So for example say I had it so that all of my files will be transferred from a windows machine to a unix machine as such: C:\test\myFile.txt to {somewhere}/test/myFile.txt (drive letter is irrelevant at this point).

Currently, our utility library that we wrote ourselves provides a method that does a simple replace of all back slashes with forward slashes:

public String normalizePath(String path) {
   return path.replaceAll("\\", "/");
}

Slashes are reserved and cannot be part of a file name, so the directory structure should be preserved. However, I'm not sure if there are other complications between windows and unix paths that I may need to worry about (eg: non-ascii names, etc)

MxLDevs
  • 779
  • 3
  • 10
  • 15
  • Usually the directory structure doesn't match either. Is that irrelevant too? –  Jun 16 '14 at 16:32
  • @delnan If the difference can be eliminated by adding enough parts to the path so that I can use a common leading prefix, then it would be irrelevant. eg: `/storage/c/test/myFile` if I wanted to keep the the drive letter. – MxLDevs Jun 16 '14 at 16:36
  • Related: http://serverfault.com/questions/242110/which-common-charecters-are-illegal-in-unix-and-windows-filesystems – Doc Brown Jun 16 '14 at 16:41
  • AFAIK, you can safely use / because paths are normalized. – Silviu Burcea Jun 16 '14 at 16:48
  • 6
    Just watch out for spaces -- putting spaces in windows folder names is much more common than in unix directory names. In particular, "\Program Files" gets me all the time. Depending on how you're using the paths, you might have to escape spaces with "\ ". – sea-rob Jun 16 '14 at 16:55
  • @MxyL Obviously you can make up intricate directory structures on any platform you'd like, but whether this makes sense is another thing. For example, a sane mapping of `%APPDATA%\MyApp` might be `~/.MyApp` (or `~/.config/MyApp` depending on who you ask). –  Jun 16 '14 at 16:56
  • 1
    @delnan for simplicity, let's limit the scope of the paths to exclude variable paths. – MxLDevs Jun 16 '14 at 16:59
  • @RobY that's a good one. We've got a number of paths with spaces. – MxLDevs Jun 16 '14 at 16:59
  • 2
    @MxyL The problem doesn't go away when you hard-code the path instead of using an environment variable. If you just want a path that doesn't blow up, you should be fine. If you want a meaningful path, or if you want to interact with other software (or user expectations...) you need per-path judgement calls. –  Jun 16 '14 at 17:09
  • 1
    @delnan I am mainly focused on producing a valid path, but that's a good point. The paths I am converting should be simple enough that they are meaningful by themselves. – MxLDevs Jun 16 '14 at 17:56
  • 3
    Backslashes are allowed in filenames on Linux, so replacing backslashes in a Linux path could add invalid directories. For example, `/foo\\bar` isn't equivalent to `/foo/bar` on Linux. –  Jun 24 '14 at 20:27
  • 1
    How are the filenames stored, if at all? Hard-coded? Properties files? Enumerated in code? –  Aug 22 '14 at 01:46
  • The encoding of path used by Windows and Unix differ, but since Java internally uses UTF-16, the encoding should not be a problem. It is a pain in C and C++ though. – Siyuan Ren Aug 22 '14 at 03:34
  • what about white spaces – Muhammad Umer May 17 '15 at 19:54
  • Take a look at this http://stackoverflow.com/questions/19999562/bash-script-to-convert-windows-path-to-linux-path and this http://stackoverflow.com/questions/13701218/windows-path-to-posix-path-conversion-in-bash – Tulains Córdova Jan 23 '16 at 00:54

5 Answers5

11

Yes, if you only do the replacement on Windows, and turn it off when running on other systems.

Doing the replacement on Unix-like systems is wrong because \ is a valid character in a file or directory name on Unix-like platforms. On these platforms, only NUL and / are forbidden in file and directory names.

Also, some Windows API functions (mostly the lower level ones) do not allow the use of forward slashes ― backslashes must be used with them.

Edit: It turns out that there are some special file systems on Windows on which / and NUL are valid characters, such as the Registry (available at \\?\GLOBALROOT\Registry from a Windows API perspective, or \Registry from the Native API perspective). In the Named Pipe File System (usually mounted at \??\pipe), all characters (including /, \, and even NUL) are valid. So in general, not only is it not valid to replace / with \ on Windows, it is not valid to assume that every Windows file can be accessed using the Windows API! To reliably access arbitrary files, one must use the Native API (NtCreateFile and friends). NtCreateFile also exposes an equivalent of openat(2), which isn’t exposed via the Windows API.

Demi
  • 826
  • 7
  • 18
5

Yes, but this whole thing is a moot point. Java seamlessly converts forward slashes to back slashes on Windows. You can simply use forward slashes for all paths that are hard-coded or stored in configuration and it will work for both platforms.

Personally, I always use the forward slash even on Windows because it is not the escape character. Whether the raw path is in code or externalized in a properties file, I encode it the same way.

Try it! This will work in Windows. Obviously, change the actual path to something that exists and your user has permission to read.

File f = new File("c:/some/path/file.txt");
if (!f.canRead()) {
  System.out.println("Uh oh, Snowman was wrong!");
}

Bonus: you can even mix slashes in the same path!

File f = new File("c:/some\\path/file.txt");
if (!f.canRead()) {
  System.out.println("Uh oh, Snowman was wrong again!");
}
  • 2
    If you read my entire answer, you would see where I say that _always_ using the Unix file separator will work correctly in both places, no conversion needed. –  Aug 22 '14 at 01:39
  • The question states that the _files_ will be transferred, and leaves open how the file _names_ are stored. I added a comment to the question asking for clarification on that point. Based on the response, I will edit my answer as appropriate. –  Aug 22 '14 at 01:45
  • It's quite unlikely that the program actually contains within it a manually entered list of all the files being transferred. It's vastly more likely that some automated mechanism is being used to enumerate the files. Given the problem's parameters as they are stated in the question, this mechanism delivers traditional Windows-style paths. In its current form, this answer is telling the OP to solve a different problem instead without telling them how or even *that* they should transform theirs into the different problem. – Eliah Kagan Aug 22 '14 at 01:48
  • Please read my previous comment. –  Aug 22 '14 at 01:49
  • I've read the (now edited) version of your previous comment. Consulting the OP as you've now done sounds good. Thanks for your attention to this issue. – Eliah Kagan Aug 22 '14 at 01:50
  • 3
    **Windows** recognizes both fowrard and backslashes, and has been that way since early MS-DOS. I.e. every Microsoft OS kernel has had forward slash separator support. Early `COMMAND.COM` interpreters had a run-time preference: you could configure which slash the interpreter would use for printing and parsing. – Kaz Jul 08 '16 at 03:10
5

Another complication on Windows is that it also supports UNC notation as well as the traditional drive letters.

A file on a remote file server can be accessed as \\server\sharename\path\filename.

Simon B
  • 9,167
  • 4
  • 26
  • 33
  • 1
    I think this is the only concern quoted so far that is actually a problem for this application. If there are UNC paths involved, they *cannot* be converted usefully into a Unix-style path. – Jules Jul 08 '16 at 07:19
2

No. There are far more things to think about than just the path separator (the "\ vs /" thing). As Rob Y mentions, there is how spaces are handled, and their high frequency in Windows usage. There are different illegal characters in the two environments. There is Unix's willingness to allow almost anything when escaped by a leading "\". There is Windows use of '"' to deal with embedded spaces. There is Windows' use of UCS-16 and Unix's use of ASCII or UTF-8.

etc., etc., etc.

But, for lots of applications that can put constraints on the pathnames they need to manipulate, you actually can do it just the way you suggest. And it will work in at least a large number of the cases, just not all of them.

Ross Patterson
  • 10,277
  • 34
  • 43
  • 2
    I don't think these concerns are valid for the question as posed The space handling is a user interface issue; Unix systems *can* handle spaces in filenames just as well as Windows can. The Windows illegal characters are a superset of the Unix ones. There can't be any backslashes in the Windows filenames (other than the directory separators which will be converted). Using quotes for embedded spaces is a user interface level concern, not a file handling issue. The conversion code is apparently in Java, so should handle UCS16->UTF8 conversion automatically. – Jules Jul 08 '16 at 07:18
0

Every Microsoft operating system, starting with MS-DOS, has understood, at the kernel level, both forward slashes and backslashes.

Therefore, on Windows, you can convert between them freely; both have equal status as reserved separators. In any valid path, you can replace backslashes with slashes and vice versa, without changing its meaning, as far as the kernel is concerned.

In early versions of DOS, Microsoft's command.com interpreter made it a configurable preference which slash was used to display and parse paths. That was eventually removed.

Some user-space programs in Windows such as, oh, the Windows shell (explorer.exe) do not like forward slashes. That's just shoddy programming in those programs.

Kaz
  • 3,572
  • 1
  • 19
  • 30
  • 1
    While this is true, I don't believe it's helpful for the OP's question which (AIUI) involved converting existing path names, which would already have included the backslashes in them. It *is* very useful for writing cross-platform code to realise that you can just use forward slashes and have them work in most contexts, but in this case I don't think it helps. – Jules Jul 08 '16 at 07:22
  • 1
    @Jules OP is transferring files from Windows. This answer explains that there are no backslashes to be replaced. They are not in the Windows filesystem itself at all. All the paths are expressible with forward slashes (and Windows even understands it). – Kaz Jul 08 '16 at 14:32