Perl Question

coco06 · 09-01-2009 02:27PM #1

Hi

I have a pcl file that contains characters like the following

"(s1p10.00vsb16901T&l8C&a0C&a0R&t0P(19U
(d2p6.00vsb16901T*p94x6742YExample*p0x0Y&f2Y&f0X&f0S*b
3&f2X&t0P(19U(s1p10.00vsb16901T&l8C&a0C&a0R*p105x170Y&f2Y&n2W2&f2X&t0P(19U"

So what i want to do is to check if the string 'd2p6' is present and if it is i want to take the substring from 23 for 7 characters i.e. to get the word example.

Now i can locate the d2p6 fine but i dont know how to start from that point to read 23 chars across instead of the start of the file.

Clear as mud? Does anyone know how i could achieve this?

Thanks

Tom Dunne · 09-01-2009 02:44PM

It's been a long time since I looked at Perl, but you are looking for either an instring or substring function.

So your start location would be ((end location of d2p6) + 1) in either of the above functions.

fergalr · 09-01-2009 02:54PM

Regex and capture group ftw, surely?
It is perl...

fergalr · 09-01-2009 03:00PM

I haven't written a perl regex in a couple of years, and don't have a perl environment handy, but guess it'd be something like this:

if( $string =~ /d2p6.{23}(.{7})/)
{
my($exampleword) = $1;
}

coco06 · 09-01-2009 03:25PM

Hey
thanks for replies. I have achieved what i want to do by using

$mystring=~s/.*d2p6\?v=//;
$mystring=~s/.*d2p6//; # return everything after d2p6
$requiredText = substr($mystring,23,8);

May not be the best way but it works!

Thanks again.

coco06 · 14-01-2009 02:23PM

Hi again,

Thought id use the same thread as it is another Perl question for you guys.

I have a pcl file that i want to read through a perl script and output it to another file (with a little processing carried out on the first line). I currently have a perl script that opens input file, opens output file, reads through the input and writes it to the output file.

The problem is that it is not a mirror of the input file which i would expect, this then has an effect on the printing.

Can anyone advise me as to where the problem may lie?

Thanks.

daymobrew · 14-01-2009 02:40PM

coco06 wrote: »

I have a pcl file that i want to read through a perl script and output it to another file (with a little processing carried out on the first line). I currently have a perl script that opens input file, opens output file, reads through the input and writes it to the output file.

The problem is that it is not a mirror of the input file which i would expect, this then has an effect on the printing.

What are the diffs? Is it the line endings or is the actual content being changed?

Can you post some of the code, especially the lines that write to the output file.

coco06 · 14-01-2009 03:16PM

Hi

The line count is increased by 2, the contents is all there but it is the format it is output to the outfile that changes and thus has an impact on the printing.

My code is quiet simple to start with. I have tried using the chomp and tested it without it and it returns the same result.

#!/usr/local/bin/perl

my $input = $ARGV[0];

$Outpath = "C:\\All Projects\\Zurich\\Test\\";
$OutFile = $Outpath."JetInputnew.pcl";

unless (open(GETDAT, $input)) {

die ("cannot open input file file1\n");

}

unless (open(OUT, ">$OutFile")) {

die ("cannot open output file outfile\n");

}

$line = <GETDAT>;

while ($line ne "") {
chomp ($line);
print OUT ("$line\n");
$line = <GETDAT>;

}
close(GETDAT) or print LOGFILE scalar(localtime)." Cannot close the output file: $!\n" and exit 1;
close(OUT) or print LOGFILE scalar(localtime)." Cannot close the output file: $!\n" and exit 1;

Thanks.

daymobrew · 14-01-2009 04:28PM

coco06 wrote: »

The line count is increased by 2, the contents is all there but it is the format it is output to the outfile that changes and thus has an impact on the printing.

I tried it with a tiny data file (8 lines of text, including 1 empty line) and the input and output were identical.
I am using perl with cygwin on Vista.

I did get a warning when the end of file was reached (because I turned warnings on via the -w switch on the #! line). The line with the 'ne' check was the issue:

Use of uninitialized value $line in string ne at ./in-out-prob.pl line 29, <GETDAT> line 8.

Changing your code to the usual 'while ( <GETDAT> )' format will fix this. I've done this and the new code is below.

#!/usr/local/bin/perl -w

use strict;

my $input = $ARGV[0];

my $Outpath = "C:\\All Projects\\Zurich\\Test\\";
my $OutFile = $Outpath."JetInputnew.pcl";

unless (open(GETDAT, $input)) {
    die ("cannot open input file file1\n");
}

unless (open(OUT, ">$OutFile")) {
    die ("cannot open output file outfile\n");
}

while ( my $line = <GETDAT> ) {
  chomp $line;
  print OUT ("$line\n");
}
close(GETDAT) or print LOGFILE scalar(localtime)." Cannot close the output file: $!\n" and exit 1;
close(OUT) or print LOGFILE scalar(localtime)." Cannot close the output file: $!\n" and exit 1;

fergalr · 14-01-2009 04:39PM

Seems ok with simple input and output; could you print out the diff between the files, or include sample files?
It's not anything funny like you are processing a unix file on windows etc and the newline separators are getting confused?

coco06 · 14-01-2009 04:43PM

The file is am parsing is a pcl file made up of symbols, numbers, chars etc.
It will read to a certain point on the current line and then output it even if there is more text after it.

I can show you the pcl file if you would like to take a look at it. I presumed that the current line would be defined by the very last character it finds?

Thanks

coco06 · 14-01-2009 04:46PM

fergalr wrote: »

Seems ok with simple input and output; could you print out the diff between the files, or include sample files?
It's not anything funny like you are processing a unix file on windows etc and the newline separators are getting confused?

The sample file i would have to pm you as it is a pcl file and would be too large to paste here. it is 102kbs.

fergalr · 14-01-2009 04:58PM

I'm not familiar with the pcl format.
Are you saying it's a binary, as opposed to text, file?

If it's the case that it's a binary file, what it interprets as a 'line' in the file could indeed be behaving differently than what you intend.

Feel free to either PM me the file, or stick it up on a website for everyone to see, and I'll take a look...

fergalr · 14-01-2009 05:10PM

Just found a sample pcl online.
Without reading the technical references for the pcl format, I'm guessing that Perl is coming across something it recognises as a line terminator before it reaches the end of what you would like it to consider to be a line.

In the sample pcl file I got, from the openpcl project, it seemed like pages were separated by line endings, but seemed to contain non-text characters within the 'page'. I don't know whether the pcl spec allows arbitrary binary within each 'page' or not - and don't have time to read up on it for you!

But if it does, then you might be better off processing the file as a binary file, or using a library to load and edit the file, if you are trying to do any sort of complicated parsing of it.
I'd suggest first reading the technical docs to find out what the problem is, then maybe check out openpcl, or search cpan for pcl modules.

Hope this is some help.

coco06 · 15-01-2009 09:45AM

Hi

Now that I think of it, does anybody know how to carry out a find and replace in a file using a perl script? This may solve my problem instead of trying to write it to a new file.

Thanks

daymobrew · 15-01-2009 12:20PM

coco06 wrote: »

Now that I think of it, does anybody know how to carry out a find and replace in a file using a perl script? This may solve my problem instead of trying to write it to a new file.

Look at the -i command switch for examples on how to do this on the command line. Also look at the -p switch too.

coco06 · 15-01-2009 12:45PM

Thanks, i have to do it in a perl script though as i am carrying out a bit of processing on the text first. I am finding a certain string and replacing it.

When i write the new text out to the same file it is loosing its formatting slightly. The text replace is working fine what i need is to just change the specified text and leave everything else the same, but that aint happening..

daymobrew · 16-01-2009 09:12PM

coco06 wrote: »

Thanks, i have to do it in a perl script though as i am carrying out a bit of processing on the text first. I am finding a certain string and replacing it.

If you are good with regular expressions you might be able to do it on the command line.

Can you post the processing code?

fergalr · 16-01-2009 09:39PM

Been very busy over the last few days, so might be slightly off track here, but from what I can see, the the problem seems to be that you've got a file that is not really a 'text' file. It's a 'binary' file, where a lot of the binary just happens to be text.
That's as best I can see, without reading the PCL spec.

Your problem with doing the search and replace previously is you were reading the whole file in as text, line by line, matching your string, and writing it out line by line.
When your code came across a line of 'text' that had some sort of binary control character in it in a place you didn't expect, it gave you funny results - which makes sense, because your text processing code wasn't equipped to deal with chunks of arbitrary binary in the middle of the text.
Now, I don't know what the specific bit of binary that was giving you problems was - you might want to open up a hex editor and have a look at it.

If all you want to do is swap some binary values for some other binary values, then write a program to do this (or better find a binary search and replace tool) - that's probably the simplest thing to do.

I guess regex based tools will also work if they are written in a certain way - but I'm not sure how regex tools work on a binary file when they encounter special values.

fergalr · 16-01-2009 09:57PM

Just looked at the file you had sent me in a hex editor. (have access to proper computer now).
The binary code where the Diff of the before/after file is different is 0x08
Which is backspace. Presume perl was intrepeting that 0x08 as a backspace in your code in post 8, and that's why you were ending up with different output from input.

I'd give grep or sed or one of those a try, they might well do it correctly.

Perl Question

Comments