[DVIPDFMx] dvipdfm vs. x vs. xx

Tue Dec 15 21:11:40 KST 2009

On Dec 15, 2009, at 7:49 PM, Heiko Oberdiek wrote:

> On Tue, Dec 15, 2009 at 05:58:50PM +0900, Jin-Hwan Cho wrote:
> 
>> On Dec 15, 2009, at 8:44 AM, Karl Berry wrote:
>> 
>>>   Heiko>
>>>   * UTF16-warning: AFAIK the hyperref code is correct in producing
>>>     correct PDF strings in UTF16. I do not have information about
>>>     xdvipdfmx, whether it uses different specials, expects the data
>>>     in different form, ...
>> 
>> I already read the discussion in the [tex-live] mailing list. Let's recall Volovich's sample:
>> 
>> \documentclass{article}
>> \usepackage[unicode,pdftitle={test}]{hyperref}
>> \pdfpagewidth=300bp
>> \pdfpageheight=300bp
>> \begin{document}
>> %\showthe\pdfpagewidth
>> This is a test.
>> \end{document}
>> 
>> 1. (xelatex) The "unicode" option makes the "UTF16-warning". The reason is as follows:
>> 
>> 	At first "hyperref" package encodes the pdftile "test" into the
>> 	UCS-2 encoding because the "unicode" option was specified
>> 	explicitly. After that xdvipdfmx tries to encode the UCS-2 encoded
>> 	pdftitle into UCS-2 again. So you got the warning message
>> 	"UTF16-warning" in the process of xelatex.
> 
> Agreed. The question is now, how this reencoding can be suppressed?
> The BOM marker is explicitly given "\376\377".

Without touching the source code of (x)dvipdfmx, there is no GOOD way to suppress
the reencoding (maybe_reencode_utf8() in xdvipdfmx does this job.)

One possible way (not GOOD) is:

	(1) Prepare an external CMap file Identity-Byte (attached in this mail)
	(2) Before calling hyperref, give the following lines.
		\usepackage{atbegshi}
		\AtBeginShipoutFirst{\special{pdf:tounicode Identity-Byte}}

Future plan for (x)dvipdfmx is to prepare the special "pdf: tounicode none" that
suppress the working of maybe_reencode_utf8() in xdvipdfmx.

Do you have any better idea?

>> 2. (latex+dvipdfmx) the 2nd line must included the driver option "dvipdfmx" as
>> 	\usepackage[dvipdfmx,unicode,pdftitle={test}]{hyperref})
>> 
>> 	There is no warning because dvipdfmx never tries to encode the pdftitle into UCS-2
>> 	without the special "pdf: tounicode [cmap_file]".
>> 
>> Here is the difference between dvipdfmx and xdvipdfmx.
>> 
>> 	xdvipdfmx assumes that the pdftitle was given in the encoding "UTF-8",
>> 	but dvipdfmx does not.
>> 
>> Here is my answer to Volovich's sample:
>> 
>> 	Never use the "unicode" option with xelatex. Even though pdftitle
>> 	contains CJK characters (encoded in UTF-8), xdvipdfmx translates it
>> 	into UCS-2 perfectly.
> 
> No. This is definitely wrong. Hyperref would use PDFDocEncoding, but
> xdvipdfmx assumes Unicode/UTF-8. However PDFDocEncoding is not a
> subset of Unicode. Some slots are different! Therefore I have disabled
> this way in hyperref. Only `pdfencoding=unicode' (same as `unicode=true')
> and `pdfencoding=auto' are possible.

You are right. PDFDocEncoding is different from Unicode. But I cannot catch
the exact meaning of "pdfencoding=auto" under XeTeX. This option does not
touch anything (reencoding) under XeTeX, right?

>> For example, the following code will work well with xelatex:
>> 
>> \documentclass{article}
>> \usepackage[pdftitle={??????test}]{hyperref}
>> \begin{document}
>> This is a test.
>> \end{document}
> 
> Now (6.79t) unicode is used and the failed conversion warning
> appears.

I don't understand why conversion warning does appear. I will try after
changing to the new version 6.79t, and will continue this discussion.

Best regards, ChoF.