Handling extended characters in Windows commands?

I am debugging a windows batch command file. It is failing when extended (> 0x7f) characters are used in the paths or file names. The problem seems to be related to passing parameters to a command file that is CALLed from another.

For an example, this command works as expected:

xcopy "Pezuñero\1 - 001.wav" \temp

This does not:

call another.cmd "Pezuñero" 

Contents of “another.cmd”:

xcopy "%~1\1 - 001.wav"    \temp

The %~1 syntax expands a parameter and removes quotes. This is necessary because in the real command file, the paths in either the calling or called command file may have spaces.

The result of the second example (copied from the CMD window) is this:

C:\>call another.cmd "Pezu±ero"    

C:\>xcopy "Pezu±ero\1 - 001.wav" \temp
File not found - 1 - 001.wav
0 File(s) copied

Note that the “ñ” (0xF1) character has been changed to a “±” (0xB1).

Can anyone explain what is going on, and how to work around this?

Best Answer:

The script must be written in the same encoding cmd.exe uses.

Type chcp at the prompt and see what you get. Then open the file with an editor that supports that encoding. For me chcp outputs codepage 850, so I edit my script in JEdit selecting IBM850 as the file encoding. I get the same result editing the file in PSPad with Format set to OEM.

P.S.: I tested your steps in my machine and the ñ character that I write in notepad.exe (using the default ANSI encoding) is also converted to a ± when read from the command prompt, so it looks like your machine uses similar ANSI and OEM encodings. To be sure try replacing the ñ by a ¤ (with notepad.exe). That makes the script work correctly for me when run from the command prompt (because the byte value of the ANSI’s ¤ is the same as the OEM’s ñ).


Other Answer 1:

Thanks to McDowell and Romulo for pointing me in the right direction. I realized I needed to change my application (in Delphi) that generates the batch so it uses the proper (OEM) code page that is compatible with the command processor in Windows. I didn’t find anything to convert codepage strings, but I did discover the Windows API functions SetFileApisToOEM and SetFileApisToANSI;

I put these at the beginning and end of my program, like this:

{main procedure}
begin
SetFileApisToOEM;
{all the rest of the program}
SetFileApisToANSI;
end.

Now the batch files are generated with the OEM code page, and they work properly when run from a CMD prompt.


Other Answer 2:

I’ve been looking at character handling in cmd.exe and I think Romulo has hit the nail on the head. By default, the prompt uses old DOS (OEM) code pages (probably for compatibility with DOS programs). You are writing your file using (probably) the default Windows code page (likely 1252), which is different. Use edit.com to edit the batch file.

If I type chcp at the prompt, it reports the code page 850.

So, for example, if I use Notepad to type this:

DIR Pezuñero

…this is encoded as 1252 with the binary values:

                        ñ
44 49 52 20 50 65 7A 75 F1 65 72 6F

If I use edit to write the file, it is encoded as 850 with the binary values:

                        ñ
44 49 52 20 50 65 7A 75 A4 65 72 6F

One thing I haven’t looked at is using the cmd /U switch, but I’m pretty sure that is only for built in shell commands and won’t help you with XCOPY.


Other Answer 3:

Codepages are a problem in batch files as they are not allowed to contain Unicode. The easiest way to avoid this issue altogether would probably be to use WSH or Powershell. I haven’t found a workaround for batch files so far which really bothers me as I consider myself a Unicode zealot 🙂


Other Answer 4:

You may need to set the codepage to one that has the n with ~ on top.