help me with this sed expression

Mon Jan 5 07:28:53 PST 2004

On Mon, Jan 05, 2004 at 07:49:43PM +0800, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to process 
> some text (without any luck, you kown I'm kinda newbie). I really 
> appreciate your help.
> 
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
> 
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word 
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English word 
> following in the same line, repeat the Chinese word to satisfy 1).
> 
> Define: Chinese word = one or more continous bytes of data where each byte 
> is greater then 128 in value. (it is true in GB2312 Chinese charset which 
> this email is written in.)
> Define: English word = one or more continous bytes of [a-z].
> 
> Say, for the original file:
> ===========
> ??a av
> ????????aaav
> ????????aacm
> ===========
> The target file should be:
> ===========
> a ??
> av ??
> aaav ????????
> aacm ????????
> ===========
> 
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is 
> too greedy and included the rest [a-z].

Dunno about sed(1) but you could do the job like this:

    perl -ne '($c, $e) = m/^([\x{81}-\x{ff}]+)([a-z ]+)\z/; foreach $x (split / /, $e) {  print "$c $x\n"; }'  filename

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                       26 The Paddocks
                                                      Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey         Marlow
Tel: +44 1628 476614                                  Bucks., SL7 1TH UK