Discussion:
[VM] bug in mime encoding of cached data
Julian Bradfield
2014-10-27 12:41:57 UTC
Permalink
I finally tracked down something that has been bugging me for years
now.

When I get mail From: non-ASCII names, whenever the inbox is
auto-saved, I get errors about invalid encodings; and when I save and
re-visit the folder, the non-ASCII has been replaced by ~ .

This is because when the folder is saved, the cached message data is
written out to the X-vm-v5-data header, with non-ASCII data being
mime-encoded. However, vm-mime-encode-words relies on

(defcustom vm-mime-encode-headers-words-regexp
(let ((8bit-word "\\([^ ,\t\n\r]*[^\x0-\x7f]+[^ ,\t\n\r]*\\)+"))
(concat "[ ,\t\n\r]\\(" 8bit-word "\\(\\s-+" 8bit-word "\\)*\\)"))
"*A regexp matching a set of consecutive words which must be encoded."
:group 'vm-mime
:type '(regexp))

which, as its name suggests, is written for use in headers, where the
content will always start with white space. By changing this to

(defcustom vm-mime-encode-headers-words-regexp
(let ((8bit-word "\\([^ ,\t\n\r]*[^\x0-\x7f]+[^ ,\t\n\r]*\\)+"))
(concat "\\(^\\|[ ,\t\n\r]\\)\\(" 8bit-word "\\(\\s-+" 8bit-word "\\)*\\)"))
"*A regexp matching a set of consecutive words which must be encoded."
:group 'vm-mime
:type '(regexp))

so that start of buffer is a legitimate word start, my problem was
solved.

But am I really the only person who has this problem?
Uday Reddy
2014-10-27 20:48:29 UTC
Permalink
Post by Julian Bradfield
When I get mail From: non-ASCII names, whenever the inbox is
auto-saved, I get errors about invalid encodings; and when I save and
re-visit the folder, the non-ASCII has been replaced by ~ .
That is bad. I don't have problems like this on Gnu Emacs. So, something
must be breaking for XEmacs.

I do get a somewhat related problem that MIMEd subject fields of summary
lines lose all the white space when saved to disk.
Post by Julian Bradfield
This is because when the folder is saved, the cached message data is
written out to the X-vm-v5-data header, with non-ASCII data being
mime-encoded. However, vm-mime-encode-words relies on
(defcustom vm-mime-encode-headers-words-regexp ...)
That is not how it works actually. See the VM Manual, Internals, Message
Internals (Sec 27.2) under Cached Data. The strings in the cached-data
vector have text properties that say what encoding, if any, should be used
to convert them to MIMEd ASCII. The function

vm-reencode-mime-encoded-words-in-string (vm-mime.el)

does the job of converting. I see that this function has been there since
Rob F inherited VM from Kyle Jones. However, there were some bugs in
applying this correctly to the cached-data vector when I inherited VM. I
fixed them quite some time ago and the fixes should be certainly there in
the trunk version.

If you can send me a message that has the cached-data line corrupted, I will
see what could be going wrong.

Cheers,
Uday
Julian Bradfield
2014-10-27 22:49:01 UTC
Permalink
Post by Uday Reddy
That is bad. I don't have problems like this on Gnu Emacs. So, something
must be breaking for XEmacs.
....
Post by Uday Reddy
Rob F inherited VM from Kyle Jones. However, there were some bugs in
applying this correctly to the cached-data vector when I inherited VM. I
Urgh. You're right.

I was still loading somewhere an ancient attempt to fix up one of the
old bugs.

Forget I spoke!
Julian Bradfield
2014-11-23 20:37:46 UTC
Permalink
On 2014-10-27, Julian Bradfield <jcb+***@jcbradfield.org> wrote:
[ story of non-ascii data in cached summaries being corrupted ]
Post by Julian Bradfield
I was still loading somewhere an ancient attempt to fix up one of the
old bugs.
Forget I spoke!
I spoke too soon. Even after clearing out my old failed attempts to
fix the problem, I still have a problem.

However, now I know why.

It's a FSFmacs/XEmacs difference.

In XEmacs, #'princ, and hence #'format, do what they say do, namely
output the printed representation to an output stream. Hence, text
properties on strings are lost.

In FSFmacs, #'princ builds a string representation (complete with
inherited properties), and then prints it to a stream only if the
output stream is an i/o stream; otherwise it inserts the strings into
the output buffer/string.

Thus the summary line, which is ultimately built by calling #'format
on various mime-decoded strings, doesn't contain the text properties
necessary for mime-reencoding to work.

I have applied the following patch to my own copy of vm, which
appears, so far, to deal with the issue.

*** vm-summary.el 2014/11/16 11:47:50 1.4
--- vm-summary.el 2014/11/23 20:16:36 1.5
***************
*** 954,965 ****
(setq token ''mark)
(setq sexp (cons (list 'vm-su-mark
'vm-su-message) sexp)))))
! (cond ((and (not token) vm-display-using-mime)
! ;; strings might have been already mime-decoded,
! ;; but there is no harm in doing it again. USR, 2010-05-13
! (setcar sexp
! (list 'vm-decode-mime-encoded-words-in-string
! (car sexp)))))
(cond ((and (not token) (match-beginning 1) (match-beginning 2))
(setcar sexp
(list
--- 954,967 ----
(setq token ''mark)
(setq sexp (cons (list 'vm-su-mark
'vm-su-message) sexp)))))
! ;; we're going to encode them, so don't decode them.
! ;; JCB 2014-11-23
! ;; (cond ((and (not token) vm-display-using-mime)
! ;; ;; strings might have been already mime-decoded,
! ;; ;; but there is no harm in doing it again. USR, 2010-05-13
! ;; (setcar sexp
! ;; (list 'vm-decode-mime-encoded-words-in-string
! ;; (car sexp)))))
(cond ((and (not token) (match-beginning 1) (match-beginning 2))
(setcar sexp
(list
***************
*** 990,999 ****
(match-beginning 4)
(match-end 4)))))))
;; Why do we reencode decoded strings? USR, 2010-05-12
! ;; (cond ((and (not token) vm-display-using-mime)
! ;; (setcar sexp
! ;; (list 'vm-reencode-mime-encoded-words-in-string
! ;; (car sexp)))))
(setq sexp-fmt
(cons (if token "" "%s")
(cons (substring format
--- 992,1003 ----
(match-beginning 4)
(match-end 4)))))))
;; Why do we reencode decoded strings? USR, 2010-05-12
! ;; Because XEmacs #'format doesn't preserve text properties
! ;; JCB 2014-11-23
! (cond ((and (not token) vm-display-using-mime)
! (setcar sexp
! (list 'vm-reencode-mime-encoded-words-in-string
! (car sexp)))))
(setq sexp-fmt
(cons (if token "" "%s")
(cons (substring format

Continue reading on narkive:
Search results for '[VM] bug in mime encoding of cached data' (Questions and Answers)
9
replies
what is better microsoft internet explorer 7 or mozilla firefox 2 and why???
started 2006-10-28 03:16:28 UTC
internet
Loading...