00703: (:markup

 doesn't wraps utf-8 text correctly :)
Summary: (:markup:) doesn't wraps utf-8 text correctly
Created: 2006-03-16 01:51
Status: Closed - workaround from 2.2.6
Category: Bug
From: Holo
Assigned:
Priority: 4
Version: 2.1.1
OS: Win2K/Apache2.0.55/5.1.2

Description: Function MarkupMarkup uses wordwrap for cutting long strings. But wordwrap incorreclty wraps words encoded by utf-8. :( In result lines are too short. For example: PmWikiRu.TextFormattingRules


I'm not sure what to do about this one, without somehow redefining or rewriting PHP's wordwrap() function. But I've marked it as a confirmed bug for the time being.

--Pm


I changed the MarkupMarkup function used by (:markup:) to accept a wrap argument:

function MarkupMarkup($pagename, $text, $opt = '') {

  $MarkupMarkupOpt = array('class' => 'vert');
  $opt = array_merge($MarkupMarkupOpt, ParseArgs($opt));
  $html = MarkupToHTML($pagename, $text, array('escape' => 0));
  $wrap = @$opt['wrap']; if($wrap == "") $wrap = 40;     <<changed here
  if (@$opt['caption']) 
    $caption = str_replace("'", ''', 
                           "<caption>{$opt['caption']}</caption>");
  $class = preg_replace('/[^-\\s\\w]+/', ' ', @$opt['class']);
  if (strpos($class, 'horiz') !== false) 
    { $sep = ''; $pretext = wordwrap($text, "$wrap"); }  <<changed here
  else 
    { $sep = '</tr><tr>'; $pretext = wordwrap($text, $wrap); }  <<changed here
  return Keep(@"<table class='markup $class' align='center'>$caption
      <tr><td class='markup1' valign='top'><pre>$pretext</pre></td>$sep<td 
        class='markup2' valign='top'>$html</td></tr></table>");

}

some other stuff I tried out with no success:

in css

pre, code {

    white-space: pre-wrap;
    white-space: -moz-pre-wrap;
    white-space: -pre-wrap;
    white-space: -o-pre-wrap;
    word-wrap: break-word;

}

in php

function utf8_wordwrap($str,$width=75,$break="\n", $cut=false){

     return utf8_encode(wordwrap(utf8_decode($str), $width, $break, $cut));
 }

function utf8_strlen($str){

    return mb_strlen($str);

}

CarlosAB December 10, 2008, at 11:24 AM

The function utf8_decode($str) will only work as expected for characters available in the Latin1 encoding, not for any other alphabets like Cyrillic, Greek, Chinese etc. From version 2.2.6 it will be possible to define a custom $MarkupWordwrapFunction which could break the lines as you like. I am closing this entry for the moment. --Petko September 02, 2009, at 10:06 AM

For example:
$MarkupWordwrapFunction = 'my_wordwrap';
function my_wordwrap($text, $length){
  # multiply the requested length by 1.8
  # (may be suitable for Cyrillic & Greek)
  $length = round($length*1.8);
  return wordwrap($text, $length);
}

A couple of links, for the record :