Jump to content

Perl - Need the last 8 chars of a string


Recommended Posts

$var = `sh hello.sh && echo "Complete"`;

I'm running something like this and have some conditional statements that are looking for "complete," but if hello.sh runs and has some output, then it will ALL save into $var. Is there a way I can save only the last eight characters?

Thanks.

EDIT: I think it would have to use substr, but I don't see how I can make it work assuming that I don't know how many characters are before "Complete."

Link to comment
Share on other sites

$var = `sh hello.sh && echo "Complete"`;

I'm running something like this and have some conditional statements that are looking for "complete," but if hello.sh runs and has some output, then it will ALL save into $var. Is there a way I can save only the last eight characters?

Thanks.

EDIT: I think it would have to use substr, but I don't see how I can make it work assuming that I don't know how many characters are before "Complete."

Oy vey, i think I got something I can work with....

$var = `sh hello.sh && echo "Complete"`;

if ($var =~ /Complete/)

Link to comment
Share on other sites

$var = `sh hello.sh && echo "Complete"`;

I'm running something like this and have some conditional statements that are looking for "complete," but if hello.sh runs and has some output, then it will ALL save into $var. Is there a way I can save only the last eight characters?

Thanks.

EDIT: I think it would have to use substr, but I don't see how I can make it work assuming that I don't know how many characters are before "Complete."

Yes you can use substr. You just have to use a negative offset. I use it regularly for manipulating DNA sequences. Let's say I have a file that contains the following sequence:

tcatccatcc

to return the last 8 characters, which would be atccatcc, I use:

perl -ne 'print substr($_, -9, 8), "\n"' dna

Hope this helps.

Link to comment
Share on other sites

Yes you can use substr. You just have to use a negative offset. I use it regularly for manipulating DNA sequences. Let's say I have a file that contains the following sequence:

tcatccatcc

to return the last 8 characters, which would be atccatcc, I use:

perl -ne 'print substr($_, -9, 8), "\n"' dna

Hope this helps.

so why the offset of -9 when I want the last 8 characters of the sequence in the file? I think it's because it counts the newline at the end of the seqeunce as well. consider the following when the sequence is assigned to a variable:

#!/usr/bin/perl
$dna = "tgatccatcc";
$chunk = substr($dna, -8, 8);
print "$chunk\n";

This will return atccatcc

Link to comment
Share on other sites

One of perl's strengths is regular expressions.

This online tool is helpful is making your regex http://gskinner.com/RegExr/

In your case you would want something like:

if ($var = ~ gmi/complete/)

{print "complete found"}

Actually that wouldn't quite do what you want. For example it would match against $var if it contained "This is complete" but it would also match against $var if it contained "Complete this is not". He would actually want something like this to force the complete to be at the end of the line. Note that the [\r\n]* in the regular expression will make sure that there aren't any problems caused by carriage returns and new line characters.

if($var=~m/complete[\r\n]*$/i)
{
  print "Complete found at end of output";
}

Link to comment
Share on other sites

You can also use regex's to return the last eight characters, only matching on a string which has at least eight characters.

if ($line =~ m/(.{8})$/) {
    print "'$1' is the last eight characters of the string!\n";
} else {
    print "The string didn't have eight characters!\n";
}

Link to comment
Share on other sites

  • 1 month later...

IMHO regex should only be used for stuff that substr(), index(), tr///, etc can't do. This is because regex is the most powerful tool in the box and will therefore (almost always) use up the most cpu cycles. Also, if you suspect newlines, you should use chomp() to trim them, because it only removes newlines, not chars you might want to preserve. Also a newline is not the same on all systems. On some machines, newline is even longer than one Byte.

Link to comment
Share on other sites

IMHO regex should only be used for stuff that substr(), index(), tr///, etc can't do. This is because regex is the most powerful tool in the box and will therefore (almost always) use up the most cpu cycles. Also, if you suspect newlines, you should use chomp() to trim them, because it only removes newlines, not chars you might want to preserve. Also a newline is not the same on all systems. On some machines, newline is even longer than one Byte.

I disagree. Regex is a very powerful tool but the resource drain isn't significant enough to warrant working around it unless what you are doing is a single instance of substr/index/etc.

Link to comment
Share on other sites

Another advantage using specialized functions is, that the readability of the code gets better, since it's easier to see at first glance, what it is supposed to do.

The difference in performance is quite significant, although if your program runs < 1 sec. you probably won't care or notice the difference. I had a case, where the daily network traffic had to be categorized into the known subnets. This was done using regexes mostly until the reporting script ran longer than 24h. At that point a coworker and i wrote a module using pack, unpack and bit shifting to match ips to subnets. That reduced the runtime to less than one hour.

Link to comment
Share on other sites

IMHO regex should only be used for stuff that substr(), index(), tr///, etc can't do. This is because regex is the most powerful tool in the box and will therefore (almost always) use up the most cpu cycles. Also, if you suspect newlines, you should use chomp() to trim them, because it only removes newlines, not chars you might want to preserve. Also a newline is not the same on all systems. On some machines, newline is even longer than one Byte.

Regexs are very lean, especially if used correctly, certainly no more intensive than the other methods that you have specified if doing the same job. This is also the beauty of perl, there are so many ways of doing things (admittedly it can be a bad thing as well), also Perl is an interpreted language, if you want out and out speed you should be using something like C++.

Newlines are never more than 1 Byte, you might have a carriage return on a system, but that is an additional special character. Of course thats ascii with unicode slightly different.

I never use chomp() as I dislike the function and they way it is used. Almost always I will match on what I want with a regex and just ignore the end.

Link to comment
Share on other sites

Regexs are very lean, especially if used correctly, certainly no more intensive than the other methods that you have specified if doing the same job.

Guess i have to benchmark to prove my point ... consider the following script:

use Benchmark qw/timethese/;

# generate 1 MB of pseudorandom string
my $string = join "", map{ ("A".."Z", " ")[rand 27] } 1..2**20;

timethese( 100, {
    'split' =&gt; 'test_split()',
    'regex' =&gt; 'test_regex()',
} );

sub test_regex {
    my @words = $string =~ /([^ ]+)/g;
    return \@words;
}

sub test_split {
    my @words = split ' ', $string;
    return \@words;
}

i get the following benchmarks on a perl 5.10 (other builds give me similar results):

Benchmark: timing 100 iterations of regex, split...
     regex:  8 wallclock secs ( 7.12 usr +  0.01 sys =  7.13 CPU) @ 14.03/s (n=100)
     split:  5 wallclock secs ( 4.47 usr +  0.02 sys =  4.49 CPU) @ 22.27/s (n=100)

This is also the beauty of perl, there are so many ways of doing things (admittedly it can be a bad thing as well), ...

agreed

also Perl is an interpreted language, if you want out and out speed you should be using something like C++.

Before i start off using C++, i usually try Perl. If it's not fast enough i begin to profile, optimize and, if all else fails, use Inline::C. This approach gets my work done faster than doing everything in C(++). Of course there are things you just can't do in Perl (device drivers or kernel modules e.g.) and there are also situations, when it just isn't the best Tool.

Newlines are never more than 1 Byte, you might have a carriage return on a system, but that is an additional special character. Of course thats ascii with unicode slightly different.

So, what you're saying is, that Newlines are never more than 1 Byte, except when they are longer ;)

But seriously, i was talking about the sequence of bytes a OS or editor thinks about as a signal for a new line, not the specific ASCII character, that would always be one Byte ..... except on machines using less than seven bit for a byte .... but those machines don't use ASCII anyway.

I never use chomp() as I dislike the function and they way it is used. Almost always I will match on what I want with a regex and just ignore the end.

chomp() just told me, it doesn't like you either. ;)

Is your disliking founded in some real disadvantages you are able to put into words or do you just don't like how chomp() looks at you?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...