Grep a variable length file in UNIX based on tilde delimiter
25 pts.
0
Q:
Grep a variable length file in UNIX based on tilde delimiter
Hello:

I need to grep a variable length UNIX file. This is a tilde delimited, 21 field file and I need to search for specific values in the 5th field.

Please help. My UNIX knowledge is not very extensive.

Thanks
ASKED: Jun 5 2008  7:27 PM GMT
0
210 pts.
0
A:
 RATE THIS ANSWER
+1
Click to Vote:
  •   1
  •  0
  • AddThis Social Bookmark Button
This depends on whether you want just that matching 5th field returned or the whole line that goes with it. To get just that fifth field assuming your filename is foo and you are looking for bar in the 5th field
cut -d~ -f5 foo | grep bar

To get the whole corresponding line:
egrep '*~*~*~*~*bar*~' foo


Hmmm... the regexp needs some tuning.
Say your file has lines like:
aaa~bbb~cccc~dd~this~www

where "aaa", "bbb" ... are anything w/o "~"s.
You want to find lines with "this" then the regexp for four "anything" and a "~" is (untested):
[^~]*~\{4}

The breakdown: if
[abc] 
matches a single character a, b or c
[^abc] 
is any single character but a, b or c
then
[^~] 
is "not a tilde"
[^~]* 
is it zero or more times
[^~]*~ 
is it plus a tilde
[^~]*~\{4} 
is it four times

Now, you said "specific valueS" in col #5, and the regexp looks only for "this".
To look for "this", "that" and "what" the regexp is:
\(this\|that\|what\)

The escaped parenthesis is to make it atomic, a single nugget, not sure if they are needed, not sure if your program will need the escaping slashes.
The escaped "|" are to separate alternatives. Again, your grep might not need the escaping slashes.

All together now:
[^~]*~\{4}\(this\|that\|what\)~

Will look for "this", "that" and "what" in the 5th element and a ~ closing this element if it's not the last one.
This is to avoid selecting, for example, "thisRexExpSucks".
--
Juan Lanus
Last Answered: Jun 6 2008  4:41 PM GMT by Jlanus   210 pts.
Latest Contributors: MarkK   330 pts.
0
0
Discuss This Answer:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _



0