00703: (:markup
doesn't wraps utf-8 text correctly :)
Description: Function MarkupMarkup uses wordwrap for cutting long strings. But wordwrap incorreclty wraps words encoded by utf-8. :( In result lines are too short. For example: PmWikiRu.TextFormattingRules
I'm not sure what to do about this one, without somehow redefining or rewriting PHP's wordwrap() function. But I've marked it as a confirmed bug for the time being.
--Pm
I changed the MarkupMarkup function used by (:markup:) to accept a wrap argument:
function MarkupMarkup($pagename
, $text, $opt = '') {
$MarkupMarkupOpt = array('class' => 'vert');
$opt = array_merge($MarkupMarkupOpt, ParseArgs($opt));
$html = MarkupToHTML($pagename
, $text, array('escape' => 0));
$wrap = @$opt['wrap']; if($wrap == "") $wrap = 40; <<changed here
if (@$opt['caption'])
$caption = str_replace("'", ''',
"<caption>{$opt['caption']}</caption>");
$class = preg_replace('/[^-\\s\\w]+/', ' ', @$opt['class']);
if (strpos($class, 'horiz') !== false)
{ $sep = ''; $pretext = wordwrap($text, "$wrap"); } <<changed here
else
{ $sep = '</tr><tr>'; $pretext = wordwrap($text, $wrap); } <<changed here
return Keep(@"<table class='markup $class' align='center'>$caption
<tr><td class='markup1' valign='top'><pre>$pretext</pre></td>$sep<td
class='markup2' valign='top'>$html</td></tr></table>");
}
some other stuff I tried out with no success:
in css
pre, code {
white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;
}
in php
function utf8_wordwrap($str,$width=75,$break="\n", $cut=false){
return utf8_encode(wordwrap(utf8_decode($str), $width, $break, $cut)); }
function utf8_strlen($str){
return mb_strlen($str);
}
CarlosAB December 10, 2008, at 11:24 AM
The function utf8_decode($str)
will only work as expected for characters available in the Latin1 encoding, not for any other alphabets like Cyrillic, Greek, Chinese etc. From version 2.2.6 it will be possible to define a custom $MarkupWordwrapFunction which could break the lines as you like. I am closing this entry for the moment. --Petko September 02, 2009, at 10:06 AM
$MarkupWordwrapFunction = 'my_wordwrap'; function my_wordwrap($text, $length){ # multiply the requested length by 1.8 # (may be suitable for Cyrillic & Greek) $length = round($length*1.8); return wordwrap($text, $length); }
A couple of links, for the record :
- http://php.net/wordwrap has user-contributed snippets dealing with UTF-8
- another snippet
- there is a PCRE /u modifier for UTF-8