Posted on 2007-10-11 17:31:12-07 by mapcon
Optimising use of
Hello I have a large (1-2 million) number of points that I need to compare against several thousand polygons and list which polygon the points fall in. The code (below is working) but is quite slow on some larger and more complex polygons. I could simplify/generalize the line work of the polygons but this is not desirable for accuracy reasons. Are there any suggestions on ways to optimize this, perhaps by doing preliminary checks before passing the point to the contains_point routine? Thanks in advance for your suggestions.
-------------------------------------- BEGIN { use File::Basename; use Geo::Shapefile; # create batch listing file open (BATCH, ">> $ARGV[2]") or die "could not open $ARGV[2]"; # initialise the hash for the # of PC's in each Forest, Forest attributes, etc my %pcs_in_forest = (); my %Attribute = (); my $FS = (); # get State (abbreviated) from directory name $STATE = uc(substr(dirname($ARGV[0]),-8,2)); # determine zone the index is in from the .prj file (my $projection_file = $ARGV[0]) =~ s/\.shp/\.prj/; # strip off the extension and path informatio +n first open(PRJ,$projection_file) or die "could not open Projection file $projection_file for the index $ +ARGV[0]"; # get UTM Zone number from .prj file string, note that zones < 10 are 1 digit and zones over 10 are + 2 digits $zone_line = <PRJ>; if (uc(substr($zone_line,27,1)) eq "N") { $ZONE = "0" . uc(substr($zone_line,26,1)); } else { $ZONE = uc(substr($zone_line,26,2)); } # create the name of the forest service polygon area for the appropriate zone for this index $fs_poly_name = $ARGV[1] . "/National_Forest_UTM". $ZONE . ".shp"; printf("%s gives Zone = %d so will use file %s\n",$projection_file,$ZONE,$fs_poly_name); } # end of BEGIN # Main follows { # get the polygons file to test the index against my $fs_poly_obj = new Geo::ShapeFile($fs_poly_name); # open the polygon shape file to get its info # get the index file to process my $fs_photo_obj = new Geo::ShapeFile($ARGV[0]); # open the point shape file to get its info my $total_photos = $fs_photo_obj->shapes(); # Now loop through all the points in the index for $this_photo (1 .. $total_photos) { my $shape_photo = $fs_photo_obj->get_shp_record($this_photo); my $photo_centroid = new Geo::ShapeFile::Point(X => ($shape_photo->x_min() + $shape_photo->x_max( +))/2, Y => ($shape_photo->y_min() + $shape_photo->y_max())/2); # and test each of them against the polygons for my $this_forest (1 .. $fs_poly_obj->shapes()) { my $shape_poly = $fs_poly_obj->get_shp_record($this_forest); %Attributes = $fs_poly_obj->get_dbf_record($this_forest); $FS = $Attributes{"FOREST_ID"}; # name of the forest if ($shape_poly->contains_point($photo_centroid)) { # we have a point in the forest $pcs_in_forest{$FS} ++; # add PC to count for this forest # end checking if we have > 50% of photos from this index in any one forest die "More than 50% of PC's in $ARGV[0] are in forest $FS so quitting\n\n" unless ($pcs_in_fores +t{$FS} < $total_photos/2); } } # end of inner for $num_photos ++;; # count how many pcs in the file for later checks } # end of outer for } # end of MAIN END { my $max_pcs = -999; my $max_forest = ""; foreach $FS (keys %pcs_in_forest) { # find the forest with most photos if ($pcs_in_forest{$FS} >= $max_pcs ) { $max_pcs = $pcs_in_forest{$FS}; $max_forest = $FS; } printf("File %s in forest %s has %d pcs of %d and is in state %s\n",$ARGV[0], $FS, $pcs_in_forest +{$FS}, $num_photos, $STATE); } # end foreach loop if ($max_pcs == -999) { printf("%s with %d photos falls in no forest in %s but is in state %s\n",$ARGV[0], $num_photos, +$fs_poly_name, $STATE); } else { printf BATCH "mv %s %s/%s\n",$ARGV[0], $STATE, $max_forest; # create the mv command } # end if }
Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.