The GISTom blog

Friday, January 6, 2012

Tips for caching with ArcGIS Server 10

I recently received a request for best practices for map caching. Below was my response.

First, read the help. The help has really improved on caching so it is well worth the read even if you have read it all before.

Here are a couple of my additional suggestions:

For basemaps you should use Mixed for the format. This will allow your basemap to overlay other basemaps. Try out different compression values before you build your cache. With Imagery I’ve found 55 is good but for vector basemaps 90 is better. As the number decreases the quality decreases but the speed increases.
Use MSD based services for caching whenever you can. If you have aerial imagery in a mosaic dataset you might find that the MSD service applies a stretch to the imagery. This is documented in NIM-070858. There is a workaround in that bug description that you can use if you run into this. This is fixed in 10.1.
Use WGS 1984 Web Mercator (Auxiliary Sphere) as your coordinate system and use the standard Esri/Google/Bing tiling scheme. This is the standard for the web. There is no reason to use anything else. If you have 6 inch resolution imagery you can build your cache down to 576 but I wouldn’t go any further than that. Also don’t remove any scales from the tiling scheme. If you don’t want a scale just don’t build it but leave the scale in your tiling scheme. I know I am taking an extreme position on using the standard CS and tiling scheme but if we all do this we will contribute to the same web GIS platform instead of our own web GIS silo.
Always test your cache settings before building a cache. I like to use on demand caching to get a quick visual of what the cache will look like before I build. If you want to get an idea of how big the cache will be and how long it will take to build, build 5% of your cache using cache by feature class. When evaluating the cache using a tool like Fiddler or Firebug to inspect the tiles. If they are bigger than 50 KB you probably need to change your image format or compression.
Always use compact cache. It is faster to build and the same speed for delivering tiles.
Build your cache with a staging server. You should never build cache directly on a production system. It can also be handy to have a big physical box somewhere for building cache. If you store your cache on a storage device like a SAN you can have your big cache builder machine build the cache in the same server directory as the production system. The easiest way to do this is to copy the configuration file (.cfg) from your production system to your cache building / staging system.
If you are dealing with vector data, project it all to WGS 1984 Web Mercator (Auxiliary Sphere) in a local (absolute path on the SOC machine) file geodatabase and use an MSD based map service.
Use on-demand caching very sparingly. It is rarely useful outside of testing.
After the cache is built you can change the map document used by the service. This is useful for reducing the amount of RAM used on the production system and removing a dependency on the source data. Once the cache is built, the content of the map document is only used for queries so it makes sense to simplify that map for just those items that will be queried.
If you are using cache by feature class make your features REALLY BIG. The help says this too. For a state like California I might only use 15 or 20 features. For a small city or county you probably don’t even need to do cache by feature class.
Read all of the help. There is a lot of great stuff in there.
The ArcGIS Server Blog also has some great content on map caching.

Caching will change with 10.1 as well so make sure to read the updated help when you start with 10.1. This was humbling for me. I've read the help many times but I found it incredibly useful to start at the beginning of the new 10.1 help and read it all again.

Friday, October 14, 2011

Update all ArcGIS Server Service properties at once

One question I have heard a lot for ArcGIS Server is how do I modify the configuration settings for all of my services at once. For example, let’s say I wanted to change my MinInstances across the board to 0. Or better yet set my MaxInstances for all services to 50. The solution I found for this is PowerShell (Windows only, sorry my Linux friends). I’m using the XML editing capabilities of PowerShell to find the right configuration node and change it. This works recursively on a folder for all files with the extension “.cfg”

Here is the PowerShell script:


$cfgProperty = "MaxInstances"
$newVal = 50
$folder = "C:\Program Files\ArcGIS\Server10.0\server\user\cfg"
$files = Get-childItem –filter *.cfg -path $folder -recurse

foreach($file in $files)
{
$xmldata = [xml](Get-Content $file.fullname)
$node = $xmldata.SelectSingleNode("ServerObjectConfiguration/" + $cfgProperty)
if($node -ne $null) { $node.set_InnerText($newVal) }
$xmldata.Save($file.FullName)
}

All you need to do is change the first three lines for your system. Set as follows.

$cfgProperty – Set this to the exact configuration name that you want to change. This is case sensitive. I recommend opening one of your cfg files and doing a Copy/Paste.
$newVal – Set this to whatever you want the value for the target property to be.
$folder – Set this to the location of your .cfg files.

I just modify the script and copy and paste it into the Windows PowerShell command prompt. Once you execute the code just restart the SOM service (ArcGIS Server Object Manager) and all your settings will be applied.

Hey Linux people. Feel free to add your own script to do this as a comment!

Tuesday, October 11, 2011

Max instances = a really big number (Update)

I am really glad I wrote the Max instances = a really big number blog post back in February. One great thing about it is that it started some really good discussions on this topic. I had some friends call me crazy but when we sat down and really talked through different scenarios with max instances set really high, my recommendations held up. However those discussions also revealed a couple of recommendations that I would like to change (aka. I was wrong). The two changes I would like to make are:

Set idle timeout much lower than 86400
Avoid using capacity on SOC machines

First let’s discuss why I changed my mind on the idle timeout (max time an idle instance can be kept running). If you need a refresher on this setting their online help is a good article on Tuning and configuring services in the online help. I said that I set my idle timeout to 86,400 seconds or one day. I did this so I could set the min instances to 0 but still avoid slow responses when people hit the service first thing in the morning. The trouble is an idle timeout of 86,400 ends up wasting a lot of RAM on the server because unused instance are not cleaned up fast enough. So I have changed my thinking on this and now I think setting min instances to 1 is the best way to avoid those early morning slow responses. Then you can just take the default for idle timeout. Now if you have a service where you want to set the min instances to 0 (maybe it is used very infrequently), then set your idle timeout higher for that service. For example, you might set the idle timeout for these services to 4 hours or (14,400). I wouldn’t go over 8 hours (28,800) for the idle timeout because unused instances just don’t get cleaned up fast enough.

SOC Capacity is the other recommendation that I would change. Again if you need a refresher on this setting their online help is a good article on Limiting the load on the server with the Capacity property in the online help. In my original post I said that you have to set your SOC capacity when setting your max instances to a large number. Now I say, leave your SOC capacity set to unlimited unless you have SOC machines with different levels of capacity. So if one SOC machine has half the CPU and RAM as another SOC machine, I would set the capacity on the smaller SOC machine to avoid outstripping its resources. In most circumstances your SOC machines should be identical. This is another benefit of virtualization where SOC VM’s can be cloned making them identical.

When your SOCs have the same hardware capacity, you should not set the capacity property. When the capacity property is set, pool shrinking occurs when the capacity property is reached. Pool shrinking itself puts a significant load on the system because instances must be destroyed and created to move the instances around. So the act of pool shrinking will itself reduce the throughput of your server. This is the point that I missed in my first post. I didn’t account for the overhead of pool shrinking.

Setting capacity is still a necessary evil in the unbalanced SOC case that I mentioned above and in the case where you don’t have enough RAM on your SOC machines. But in the case where you don’t have enough RAM I say “BUY MORE RAM!” Memory is incredibly cheap and it just doesn’t make sense to try to work around a problem that is so easily fixed with an inexpensive hardware change.

I stick by my recommendation to make the max instances a really big number (like 10,000,000). I even have more reasons why this is good. Take an example where you have a two tier system: web and SOM on tier 1 and SOC on tier 2. This is usually the best configuration for production systems because it allows you scale out your SOC tier without taking the system down. In this case, if you set your max instances based on the number of cores on the system (the most popular calculations are 1xCores for file geodatabase data and 2.5xCores for enterprise geodatabase data) you would have to update every service because the number of cores has changed. So in this scenario you have to have an outage just to update the max instances.

If you can’t scale the system for cost reasons, you could set max instances for a service to keep the system from reaching capacity. But in this case you are avoiding the main problem and you are passing along poor performance to your users. I could see using max instances in this case as a stop gap measure until you have a chance to scale your system but once you scale up the system, make sure to set your max instances back to a big number.

It is important to understand that there are rarely any absolutes when it comes to optimizing your system. You always have to consider the landscape to make a good choice. I just hope this gives you some more information to make the best choice.

Wednesday, June 22, 2011

CSV files + ArcGIS.com

ArcGIS.com is getting pretty cool. Presentations, built-in queries for use in mobile apps, creating editable layers, and configurable pop-ups are all awesome. But my new favorite thing is leveraging ArcGIS.com with the samples in the JavaScript API. The neat thing about some of these samples is you can completely configure them using their URL. For example, let’s say you want an interactive map of the tornadoes that just hit the US on 6/20. You could go to NOAA here: http://www.spc.noaa.gov/climo/reports/110620_rpts.html and see a nice list of tornadoes and a map. Unfortunately the map is not interactive so there is no way to tie an item from the list to a point in the map. I could also download a CSV file of the tornadoes by clicking the CSV link on the page. But then I just have a table of coordinates and values with no map. To solve this problem I go to the JavaScript samples, expand the ArcGIS.com samples, and select the CSV Data help topic. If you read this sample it tells you that it takes several arguments in the URL to configure the application. In our case, the NOAA CSV file is publicly accessible, so we can display those points by plugging that URL as an argument to the URL for this sample. We also need to download the CSV to look at the different field names to tell the sample app what column names stores the Latitude, Longitude, and primary Display name. The result is a URL that looks like this:
http://servicesbeta.esri.com/demos/ags/ags_MapwithTable.html?webmap=d5e02a0c1f2b4ec399823fdd3c2fdebd&dataUrl=http://www.spc.noaa.gov/climo/reports/110620_rpts_torn.csv&displayField=Location&title=Tornadoes&subTitle=6-20-2011&latitudeField=Lat&longitudeField=Lon
Notice, the first configuration setting is the webmap that these points are going to overlay. To display the same points on top of the current weather warnings web map you would use a URL like this:
http://servicesbeta.esri.com/demos/ags/ags_MapwithTable.html?webmap=a03a49082c1c4e869c6349d9cdccf2a3&dataUrl=http://www.spc.noaa.gov/climo/reports/110620_rpts_torn.csv&displayField=Location&title=Tornadoes&subTitle=6-20-2011&latitudeField=Lat&longitudeField=Lon
So by simply changing a URL I can mashup and CSV file with Latitude and Longitude data on the web with any web map on ArcGIS.com all for free. Not bad.

Friday, May 6, 2011

Popups in ArcGIS.com are pretty cool and easy to create. Here is a map I just made for stream gauges and weather stations. It is a copy of the USGS map with popups enabled. Just click the points in the map to get more information.

View Larger Map

Thursday, April 28, 2011

IGIC Webinar - GIS In the cloud follow up

In my presentation today for IGIC I discussed three aspects of GIS in the cloud.

Using ArcGIS.com entirely to share maps
Using ArcGIS Online to make you more productive with ArcGIS Desktop
Building an ArcGIS Server system in the cloud using Amazon EC2

Below is supporting information in each of these three categories.

Using ArcGIS.com entirely to share maps
The web map that I created in the presentation is accessible here: http://www.arcgis.com/home/webmap/viewer.html?webmap=b825be4db4ae40e6ad7b16629a5fb672

ArcGIS.com documentation:
http://help.arcgis.com/en/arcgisonline/help/index.html#//010q00000002000000.htm

I didn’t even get into ArcGIS Explorer online and making presentations. That is an incredibly powerful concept. ArcGIS Explorer online has its own help pages available here: http://help.arcgis.com/en/arcgisexplorer/help/

These web maps that you create in ArcGIS.com can also be used in an iPhone or Windows Phone 7 as well.
Using ArcGIS Online to make you more productive with ArcGIS Desktop
Desktop documentation for working with ArcGIS online:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Adding_data_from_ArcGIS_online/006600000441000000/
and:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00sp/00sp0000001z000000.htm

I showed downloading a layer package(Earthquake epicenters and Indiana Big Trees) from ArcGIS.com. There is more information on layer packages here:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s500000013000000.htm
and creating a map packages here:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Creating_a_map_package/006600000403000000/

The drive time analysis Geoprocessing tool that I used in the presentation is available here:
http://www.arcgis.com/home/item.html?id=16f372f4d05d40c3ab6040a98dbcae7d
Connecting to the server from ArcGIS Desktop can be done with this URL:
http://sampleserver1.arcgisonline.com/ArcGIS/Services

Building an ArcGIS Server system in the cloud using Amazon EC2
The resource center gallery for local government where I downloaded the Citizen Service Request Template is here:
http://localgovtemplates2.esri.com/gallery/gallery.html
And you can access the template directly here:
http://www.arcgis.com/home/item.html?id=cf64d38f5d1d4b34867a59073f5cd0b6
Documentation for working with ArcGIS Server on Amazon EC2 is available here:
http://help.arcgis.com/en/arcgisserver/10.0/help/arcgis_server_on_amazon_ec2/index.html

I worked through a replication workflow with the cloud. The ArcGIS Server on Amazon EC2 documentation has more information on this here:
http://help.arcgis.com/en/arcgisserver/10.0/help/arcgis_server_on_amazon_ec2/index.html#/Replication_to_an_Amazon_EC2_instance_using_geodata_services/00rq0000000p000000/
and replication documentation in the ArcGIS Desktop Help:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0027/002700000020000000.htm

I hope everyone found the webinar useful. If you haven't taken 5 minutes to fill out the survey yet, please click here.

Wednesday, February 2, 2011

Max instances = a really big number

First off, sorry for the long delays between posts. This year I plan to be more active.
So recently I got asked about the max instances used in ArcGIS Server. What should I set it to? How does this relate to capacity?
Max instances is the total number of map workers processes that can respond to a map request. so if a map service has more of these then it has more capacity to respond to requests. I use the term processes here loosely. They are actually only separate processes (ArcSOC.exe) when your service is set to high-isolation. when in low isolation the instances are worker threads inside of an ArcSOC.exe processes. Rarely is it a good idea to use low-isolation so lets just stick with the default high-isolation.

So back to the question. What should I set max instances to for a service. My recommendation is 1000000 or 10000000 or something ridiculously big like that. However, if you do this, you MUST also set capacity on each SOC machine. Capacity is the total number of instances that a SOC machine will support. If you don't have a good idea of what to set this to, guess (maybe 50). Then monitor your system. Its really difficult to get this number right out of the gate so system monitoring is really important. If you find yourself at max instance but the machine still has capacity then up the number accordingly. It is probably better to guess on the low side and adjust up then the other way around. 25 or 50 is a good place to start.
Now back to that ridiculously big max instance number. The reason I like to set this number so high is it lets ArcGIS Server adjust the instance to the popular services as it sees fit.

Lets say the capacity on your system is 10 and you have two map services A and B.
Scenario 1: Both A and B have max instances set to 5. Let’s say service A becomes wildly popular but nobody is interested in service B. The best you can do is use half the capacity of your system (5 instance).
Scenario 2: Both A and B have max instances set to 1,000,000. Again service A becomes wildly popular but nobody is interested in service B. Now the system can focus all of its capacity where it is needed, on service A. The total number of instances used will only ever equal your capacity.

So I generally recommend this ridiculously high maximum approach. Mostly because it is extremely difficult to figure out up front, what the overall usage is going to be for a service. This will increase the amount of instance destruction and creation (which is costly) but I think it is worth it to fully utilize your system. You can also adjust your idle timeout to a longer interval to reduce the amount of instance turnover. I like to set my min instances to 0 and max instances to 1000000 and set my idle timeout to about a day (86400). That way you won't dip to 0 instances for a service unless it has gone unused for a day. This could lead to a Monday morning groggy server problem but that could be mitigated with scripts or if you notice the services that are always hit Monday morning, set their minimums to 1.