Error Detection and Correction of Hypsography Layers
Kevin S. Larson
Error Detection and Correction of Hypsography Layers
As with any large data set, the Digital Chart of the World (DCW) has some errors resulting from
the map digitizing and subsequent processing. The problem then becomes how to detect these
errors, and possibly correct them, as functions like TOPOGRID will not give desired results with
incorrect input. Fortunately, the DCW hypsography layers have a systematic labeling of the data,
allowing for a systematic solution.
Errors within the hypsography layers are detected using ARC/INFO's vector and raster tools.
Contour errors are detected first. To detect the contour errors, the elevation difference between
an arc and its neighbor is checked to determine if it is within a specified range. ARC/INFO's
raster tool EUCALLOCATION, which forms a polygon zone for each arc, allowing the neighbor
of each arc to be found and calculate the elevation difference. Arcs not within the specified range
are flagged as an error. A similar approach is then applied for the points. The contours are also
used with the points, however, the actual values between the two neighboring contours are
needed. Here, the boundaries formed by EUCALLOCATION, are expanded back to the original
arc's location with ARC/INFO's COSTALLOCATION function. Points with an elevation value
outside the elevation range of the contour are flagged as an error.
The only case in which data point correction can be automated is when the point data have been
generated based on another layer. The DCW supplemental point hypsography layer represents
locations and values of collapsed contours. Because they are collapsed contours, their elevations
will be based on the surrounding contours. Here, contour correction would be much more
difficult, and less certain, because more data than just an arc's neighbor is needed.
Data errors are flagged and, where possible, corrected after processing. Using the data to
detect/correct itself help keep problems (associated with incorporating external ancillary
data) like registration, and projections from complicating the situation.
Two major limitations are present in this solution. First, raster processing cannot represent vector
data exactly. Second, the raster functions used are slow, taking several hours for results on an
average DCW 5 degree tile.
1.1.1 Database Description
In 1992, the Defense Mapping Agency (DMA), contracted Environmental Systems Research
Institute (ESRI), to digitize their 1:1 million Operational Navigation Charts (ONC) of the world
to produce the database known as Digital Chart of the World (DCW). DCW is divided into 2,090
5x5 tiles and 3 larger tiles for Antarctica. DCW has a total of 17 layers. Each layer may not be
present for every tile. The focus of this paper is on the following 4 layers:
- HYNET - primary hypsography contours.
- HYPOINT - hypsography points.
- HSLINE - supplemental hypsography contours.
Information for coastlines come from PONET - political boundaries. These names coincide with those in
the ESRI's ARC/INFO version of DCW.
- HSPOINT - supplemental hypsography points of collapse contours.
The contour interval in HYNET is 1,000 feet, although in a few cases the interval is 2,000 feet.
HSLINE has contour intervals, when present, at 250 feet intervals, in areas less than 1,000 feet in elevation, and
500 feet intervals when elevations are greater then 1,000 feet in elevation (evenly divisible by 500
but not 1,000).
1.1.2 Problem Description
The four hypsography layers have errors in the elevation labeling. HYNET has three types of
- 1) ONC map boundaries labeled as elevation contours
- 2) arcs used to close off polygons labeled as elevation contours (HYNET was defined to have elevation zone polygons, and at times
connecting arcs where required);
This analysis can detect error two, but
a preprocessing step would eliminate this error. HSLINE has mislabeled elevations. The errors
for HYNET and HSLINE do not occur often, and have no distinctive pattern.
- 3) mislabeled elevations.
Figure 1 shows an example of a mislabeled contour. The dashed line represents a line that should
be labeled 12,000 instead of 13,000, based on the surrounding arcs and the original ONC,
(bounding box: -70 -35.15 -69.85 -35, DCW tile HD22) .
Of the two point layers, HSPOINT was the only one with any errors. HYPOINT was added
for completeness, though cases where points fall outside a specified elevation zone and has a data
flag indicating an unknown location are removed. Based on analysis of the HSPOINT errors,
there may be a pattern. It appears that sometimes when HSPOINT was given an elevation, its
value was based on the elevation of the surrounding HYNET contours, and did not consider the
values of any HSLINE contours that where in closer location to the points. An example of
mislabeled points is in the Horn Region of Africa, where there are HSLINE contours along the
coast at 500 and 750 feet, and between them HSPOINT has values of 1000, which is the first
primary contour from HYNET. Point values were also present that were labeled incorrectly, not
corresponding to any of the surrounding contours.
Figure 2 shows an example of the mislabeled points. The 1000 foot point elevations to the right
of the dashed line should be 500 feet, (bounding box: 38.1 -6.5 39.0 -6, DCW tile QF22) .
The described errors need to be addressed for digital elevation model generation (DEM), as part
of a project to generate a 30 arc second DEM of the world. The generated DEM will inherit any
errors from the source data. Another issue is that some errors are severe enough or, there are just
too many errors, to cause the surface generation software to terminate execution prematurely, due
to its inability to model a surface over inaccurate data. Generating a surface with these errors will
cause some applications, such as hydrologic analysis, to be inaccurate.
The reason error detection can be automated is that the data exist in a defined format. As
described earlier, the data exist at 1000 feet intervals, sometimes with 500 foot contours between
them. Therefore, the range between contours must be either 1000 feet when there are no
supplemental contours present, or 500 feet when supplemental contours are present. In areas below 1000 feet, 250 feet is
added to the definition when supplemental contours are present. With this definition, all that is
needed is to find out what a neighboring contours value is, and check that value, which must be in
the defined range. If the data are not in that range, a possible error has been detected.
Because the supplemental points are collapsed contours, they fall into the same definition as
above. The contours are also flagged as increasing or decreasing in value. Therefore if the
elevation zone is increasing, any found points within the zone must be the larger value. If the
zone is decreasing, any points must be the smaller value.
The goal is to detect, and either correct or remove, the incorrect elevations from the four layers
automatically, though some manual processing will be required. Automatic processing was done
using tools available in ARC/INFO from both the vector and raster environment. The intent is
also to make the process generic enough so that it may be applied to other data sets, and not just
for this project.
- 1. Retrieve DCW tile.
- 2. join_cov = HYNET + HSLINE + PONET coast line.
Storing the original arc number, original coverage, an elevation.
- 3. poly_cov = polygon elevation zones of each contour from join_cov. Zone boundaries are defined as the line equidistant separating surrounding contours.
- 4. poly_cov's arcs = left and right elevations from the elevation zone from step 3.
- 5. Check poly_cov's arcs, the left and right
elevations must be in the appropriate range, if not,
flag as a possible error, going back to the originals
to correct possible errors.
- 6. Flag possible arc errors in the original coverages.
- 7. Correct any arc errors manually.
- 8. poly_cov2 = join_cov polygon zones of neighboring contours,
containing the elevation range of the neighboring contours.
- 9. HSPOINT_tmp = HSPOINT intersect poly_cov2 on elevation and elevation
ranges not flagged.
- 10. if not (lower elevation less then or equal to HSPOINT_tmp point
less then or equal to higher elevation) then
- flag as error, HSPOINT_tmp point = higher elevation
- elseif zone flagged as error then
- flag point as uncheckable.
- - When in a depression, the elevation used is the lower value.
- 11. Repeat steps 9 and 10 for HYPOINT, except only flag the error,
though removing flagged points flagged with an unknown location
with an error.
The previous twelve steps were performed with ARC/INFO Revision 7.0.2, using a combination
of vector and raster routines to achieve the stated objectives.
- 12. While not done, goto step 1.
Information needed for each contour arc include its left and right neighbor, which defines a zone
between them with a value range. With this information, ranges for each arc can be determined.
There is no tool in ARC/INFO to do this automatically. As shown in Figure 3, there is no
automatic way for A to know its left neighbor is B, and its right neighbor is C.
The next approach is to buffer each arc until it meets its neighbor, which is also being buffered.
This point of contact is a line equidistant between the two arcs. Refer to Figure 4.
These lines now define polygons for the zone of influence for each contour arc. However,
ARC/INFO does not have a vector tool to do this. The vector routine BUFFER will not stop at
the equidistant point, it only expands out to a specified distance. ARC/INFO does have a raster
tool to do this, EUCALLOCATION. EUCALLOCATION computes for each cell the value of its
closest source cell (in Euclidean distance). The above works since A, B, and C all have different,
but consistent values, therefore it does not make any difference how many distinct arcs define A,
B, or C, since they all should have the same respective value. When expanded using the internal
number, the appropriate value can be retrieved directly. Using the internal arc number creates
multiple polygons with the same elevation, therefore it is necessary to dissolve these polygons
with the same elevation into one polygon. This is necessary for future processing. These
boundary arcs can now determine the proper elevation range, and if a contour is out of range,
using the original arc number, the appropriate arcs can be flagged.
The supplemental points elevation is determined by looking at the surrounding arcs. See Figure 5.
The values of A and B need to be considered to determine the value of K. However, ARC/INFO
does not define a tool to find all the arcs surrounding a point, NEAR only finds the closest arc.
Finding the closest arc would work when used with the results from the process illustrated in
Figure 4, if it were known on which side of A or B, K was found. If it is assumed A is closer,
with that data, A could look at its neighboring value on that side to determine the range K should
be within. Unfortunately, there is no utility available to determine this in ARC/INFO, plus there is
the following case to complicate maters.
Assume that a supplemental contour is added to the problem as shown in Figure 6. The
supplemental arcs in DCW are not continuous, so the case in Figure 6 could happen. Depending on
the location of K along A, would define the proper elevation range K belongs to (A - B: 2,000 -
3,000, A - E: 2,000 - 2,500). For A itself, if B and C are proper values, A's range can be
determined, since they are all primary arcs, and all E needs are A's and B's values to determine its
range. If there is an error, the appropriate arcs can still be flagged, though only the individual arc
segments will be flagged.
In Figure 6, if A is one arc, there is no single arc on the left of A defining a value range, and if K
is closer to A, there is no way to determine which value range it belongs too. An alternative is to
use the equidistant arcs, and find the closest arc there, with the closest existing value arc.
Consider the following though.
In Figure 7, K is on the left of A, with the defined zone being A and B, and K is closest to A.
When finding the closest equidistant arc though, the arc between A and C is returned, not A and
B, therefore that approach cannot be used.
The problem is then how to define the value range K belongs too. The solution is to take the
defined equidistant arcs and expand them back to their original locations. As noted earlier, there
is no vector tool available to do this. Also, EUCALLOCATION cannot be used either, as it does
not define a way to stop the expansion at a specific location.
Figure 8 shows the results of using EUCALLOCATION to expand the arcs back. The problem is
that the original arcs are not located equidistant to one another. There is another ARC/INFO
raster routine available to stop the expansion at specific locations, COSTALLOCATION.
COSTALLOCATION is designed for cost analysis, expanding cells of a grid into no data cells
like EUCALLOCATION, but also allowing the expanded values to be affected by a cost grid. This
cost grid also allows the process to define areas it cannot expand into by assigning those areas to
no data. Setting the cost grid to no data where the original arcs exist, and the value to 1 everywhere
else, will allow the equidistant arcs to be expanded back to the original arc locations, defining
elevation zone polygons. This will allow the proper value of points to be determined based on the
output from COSTALLOCATION.
There is another case in which this approach is not effective. Figure 9 shows the defining value
zone stops at arc A. Any point that falls in this zone would not be affected by values defined by
zone B, since these are values from A and C. This is a special case, where points falling in zone C
should be less then or equal to or greater than, or equal to A, depending if elevations are
increasing or decreasing. The solution used is to assign a special value to these polygons, to
indicate how to process the point here. In these cases, only errors can be determined, as there is
no proper way to determine the point's proper elevation. The problem arises since there is no
separating elevation zone to expand back.
There is one other problem, Figure 10. After COSTALLOCATION is run, there are still zones of
no data. These regions of no data are where the original arcs were located. Since points falling in
these locations could technically fall in either zone, EUCALLOCATION is run on the output of
COSTALLOCATION (after the special polygons are filled) to fill these no data regions.
After working with the four layers, the results include corrected or deleted contours and points.
When the appropriate elevation could not be determined either automatically or interactively, the
data were deleted. This reasoning is based on the need for accurate data. When information was
updated, it was also placed into a separate coverage. This will allow someone in the future to
reference which points and contours were changed directly.
Though not inherent in the algorithm, there is a mix of interactive and automated steps. As noted,
the contours with possible errors are only flagged. Therefore it is necessary for the user to go
back and look at the coverage with flagged contours, see if there are any errors by observation or
by referencing the original ONC, determining which arcs may be incorrect. Since there is no way
to automatically determine which contour was labeled incorrectly, without more processing or
information, the arcs on both sides are flagged, though only one arc may be incorrect. Within any
incorrect elevation zone, the point elevations cannot be checked, which may require running the
program again to check them, or interactively check them.
A definite problem is the inability of raster space to accurately represent vector space. This may
cause original contours to be lost when they are rasterized, therefore causing false flagging of
errors. To help eliminate this problem, a cell size of 3.75 arc seconds was used, resulting in 960
cells per degree. This eliminated many instances of false flagging, but not all of them.
There is also a problem with the data on the edge of some ONCs where edge matching is not
correct. This caused gaps with some contours, causing the zones in step 6 of the algorithm to
bleed across their appropriate boundaries. When this occurs, it could result in changing/deleting
correct points within the data. Fortunately this is not common.
The process used to determine the point elevation values, and to flag the appropriate arcs is time
consuming, especially with the cell size being used. To process a five degree tile requires about a
day on a Data General Aviion 410 processor.
To speed up processing, the steps involved for flagging the incorrect points, and correcting them
could be eliminated, saving 60 to 80% of the processing time. This means the elevation zone
process would be skipped, allowing points to be checked only against their nearest contour. Now
the zone definition is not there, so the point can be only checked against the elevation range
appropriate for a given contour. For example, a point is near a contour evenly divisible by 1000,
then an appropriate range is ±1000 feet. If it is near a supplemental contour, then its range is ±500
feet, ±250 when below 1000 feet. This approach flags few errors, since most miss labeled points
are incorrect do to intervening supplemental contours, which are within the specified 1000 foot
Processing could be improved if it could all be vector based instead of using raster tools. This
would eliminate the problem of contours being to close to accurately model in a raster
Limitations aside, this processing flow was important to the processing of the DCW
hypsography data for Africa, and South America. It has allowed the use of data that otherwise
would not have been included in processing.
1. Digital Chart of the World Description Data Dictionary, for use with ARC/INFO.
Environmental Systems Research Institute, Inc. 1992.
2. ARC DOC, ARC/INFO version 7.0.2. Environmental Systems Research Institute, Inc.
3. Digital Chart of the World, Edition 1. United Stated Defense Mapping Agency, July
Kevin S. Larson
Berger & Co.
2828 Routh Street, Suite 350, Lock Box 17
Dallas, Texas 75201
Telephone: (214) 922-8010
FAX: (214) 922-8020
http://www.berger.com (External Web Site)