Regular expression hell in WordSpew

I always hated [regular expressions][]. And I still do. I hate them a lot.

About two weeks ago I saw that my [WordSpew][] plug-in seems to have an issue with the Cyrillic text. Some of the messages did not show well if they contain _more than 7 (seven) Cyrillic characters_. Why exactly seven? I have no f*cking idea. Why they do not show? I do not have idea, too!

Short analysis showed to me that the following code in [WordSpew][] is the reason for that misery:

$jal_user_text = preg_replace(“#((http|ftp)s?://\S+)|
‘”$0″==”$1″ || “$0″==”$2” ? “$0” : “$0 “‘, $jal_user_text);

In short – there is no short way to explain. Because regular expressions were always a nightmare to me, I did not even bother to analyze this one. Way too complex.

Commenting out that line fixed the problem. I was expecting my http and ftp links to get broken, but it seems that does something else instead, because my links are still showing.

I will be really interested if someone can help me to understand what exactly this reg-exp is doing. Also I’d like to know why it fails, if `jal_user_text` consists of more than 7 Cyrillic characters (wtf?!), for example “дддддддд”. The failure consists of… empty text, returned by the [preg_replace][] function.

Any ideas?

[WordSpew]: “WordSpew – an Ajax Shoutbox”
[regular expressions]: “Wikipedia about regular expressions”
[preg_replace]: “preg_replace @ PHP Manual”

One thought on “Regular expression hell in WordSpew

  1. Идеята на този регексп е да направи дългите думи разделени с интервал на всеки 16 символа, като линковете и дългите e-mail са изключение.

    Не познавам перл регексп-а, текста #ieu пробвай да го замениш с #ie (не модифицира в уникод). Според мен така ще работи, но не мога да ти дам обяснение защо.

    Още веднъж – аз съм юзър на ereg синтаксиса и preg ми е чужд.

Leave a Reply

Theme: Overlay by Kaira Extra Text