![]() |
|
|||
|
In a database import of over 200,000 records, I noticed that a length of 30 chars was not long enough to hold all email addresses.
With assistance of a short awk script I could quickly determine the optimal length: Code:
$ awk -f field_length.awk import_file.csv 111628 065031051@rrrrrr.com 322626 0594500515@wwwwww.co.za [snip] 361682 huilin19 78@wrwrwrwr.c ;om xxxxxxxxx@hotmail.com 146 The awk script: Code:
BEGIN {
SIZE = 0 ;
FS = "," ;
}
# field 1 is the primary key
# field 2 is the email address
{
if (length($2) > SIZE ) {
print $1, $2 ;
SIZE = length($2) ;
}
}
END {
print $2 ;
print SIZE ;
}
Code:
$ sed -ne '/361682/p' masterlist_comma.csv 361682 huilin19 78@wrwrwrwr.c ;om Code:
$ awk -f field_length.awk import_file2.csv [snip] 58
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump Last edited by J65nko; 12th December 2011 at 05:17 AM. |
|
|||
|
In tomorrows episode a Perl script to count the number of comma's, and thus the fields, in an .csv file.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
|
Consider the following (assuming comma are not found in the data...):
Code:
#!/bin/env perl
use strict;
use warnings;
# array of test strings
my @a = (',', ',,', ',,,');
my $max = 0;
foreach my $line (@a) {
my @matches = ($line =~ /(\,)/g);
$max = @matches if @matches > $max;
}
print "max #occurrences = $max\n";
Last edited by ocicat; 12th December 2011 at 06:01 AM. |
|
|||
|
The issue I had on hand was, that the import file was over 200,000 records. So I needed to know exactly which record/line did not have 56 fields.
I took advantage of the split function which in scalar context returns the number of fields resulting from the split: Code:
#!/usr/bin/perl
use warnings ;
use strict ;
my @temp ;
my $nr ;
while (<>) {
chomp ;
$nr = (@temp = split /,/) ;
print "$nr $temp[0]\n" ;
}
Code:
$ head -10 masterlist_comma.csv | ./split.pl 56 id 56 100625 56 100626 56 100627 56 100628 56 100629 56 100630 56 100631 56 100632 56 100633 I checked these culprits by first redirecting to file fields_count.txt and then use grep(1) Code:
$ grep -v '^56 ' fields_count.txt $ There also were not any in the other 200,000.PS why grep -v? Code:
-v Selected lines are those not matching any of the specified
patterns.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| MySQL database import not helpful | J65nko | General software and network | 2 | 10th December 2011 01:45 AM |
| When is a database not so relational? | J65nko | News | 1 | 23rd May 2011 05:38 AM |
| zfs import | terryd | FreeBSD General | 1 | 23rd November 2008 08:26 PM |
| import physical freeBSD into VMWARE (ESX) server as a vServer | ccc | FreeBSD General | 6 | 3rd October 2008 07:04 AM |
| PHP database interfaces | TerryP | Programming | 6 | 11th September 2008 01:03 PM |