George Armhold's Blog

Ocarina- Optical Character Recognition for Ruby

2012-10-04T19:45:00-04:00

I just published a bare-bones implementation of an Optical Character Recognizer implemented in Ruby.

It’s pretty basic, but it does successfully recognize its training set as well as the same characters with added “noise”. It uses a straightforward implementation of a feed-forward neural network.

It uses RMagick/ImageMagick to handle image processing, but apart from that it’s built from scratch!

You can grab a copy of the source from Github.

Blog Moved to Github/Octopress

2012-08-19T12:24:00-04:00

I’m experimenting with Octopress as a potential Wordpress replacement.

I’ve grown tired of trying to properly format code with the Wordpress editor, so I’ve moved my blog from Wordpress (and before that Blogger) to Jekyll + Octopress, hosted on Github.

Let’s see how code formats under Octopress:

print_tree

  def print_tree(node, indent)

    puts indent + "#{node} -> "

    if node.kind_of? Container
      node.children.each do |child|
        print_tree(child, indent + Control::INDENT)
      end
    end

  end

Beautiful!

Pixlshare-Rebooted

2012-05-23T20:06:00-04:00

In order to dive into Ruby, I decided to convert Pixlshare from Java+Wicket to Ruby on Rails.

It’s an admittedly small project, but I was surprised how easy it was. I had it basically working in under a day, and polished enough for production use in under 2 days.

But the best part was the code size: a reduction of 6 to 1 going from Java -> Ruby.

Twilio and Adobe Flash

2012-02-19T09:26:22-05:00

I started doing some Twilio development recently and ran into a problem with Adobe Flash. Twilio Client (which lets you make phone calls right from your browser) relies on the Flash plugin. It pops up this nice little settings dialog the first time it runs to ask your permission:

The problem is that on Chrome, it won’t let you actually click any of those buttons- the dialog is non-responsive to mouse clicks. This was really frustrating, and a few minutes of Googling showed that this was an old problem supposedly fixed by a Flash update months ago.

Updating to the latest Flash didn’t help (it’s apparently bundled with Chrome, and doesn’t use the version you can install manually on OSX).

Then I came across this trick: use tab to navigate the dialog checkboxes, and spacebar to select/deselect. Works like a charm.

Announcing: Wicket-Source plugin for Intellij IDEA

2012-02-16T07:06:07-05:00

The folks at 42lines have released an awesome Firefox plugin called “Wicket-Source”. It allows you to easily navigate from your browser to the corresponding Wicket source code.

Since their plugin is Eclipse-based, I wrote up a compatible plugin for Intellij IDEA. You can install it from the repository, or build it yourself from the source on Github.

There are two parts to this plugin: the Firefox extension (provided by 42lines) and the IDE plugin; you need both. To install the Firefox plugin, follow the directions from 42lines. Then to install the Intellij plugin do the following:

Open the Preferences dialog (Intellij IDEA menu -> Preferences)
Under “IDE Settings” select “Plugins”
Click the “Browse Repositories” button.
In the search box type “wicket”, which should narrow the results significantly.
Right-click “Wicket Source”, and select “Download and Install”.

You’ll be asked to re-start Intellij, and then you should be in business. The plugin uses port 9123 and no password by default (same as the Firefox plugin defaults). To change this, open the IDE Settings dialog and click “Wicket Source” to enter a password.

Enjoy!

Wicket: submitting a form over SSL from an unsecured page

2012-01-21T13:54:37-05:00

Lots of folks are making great use of Twitter Bootstrap, which includes a handy login button right at the top:

To protect your users’ privacy, you should make sure that form is sent over SSL. If the hosting page is https that happens automatically, but most domains don’t secure their entire site; only a subset of pages are typically secured with SSL. But since this header likely appears on all your pages, how can you secure the form?

The first step is to manually adjust the form’s action attribute to ensure that the form submission happens over https, rather than http.

But this is where we run into a problem with Wicket- if the hosting page is http, and you have also installed an HttpsMapper in your WicketApplication like this:

setRootRequestMapper(new HttpsMapper(getRootRequestMapper(), new HttpsConfig(HTTP_PORT, HTTPS_PORT)));

then Wicket will not allow your form to be sent over https; the mapper will notice the http/https mismatch, and instead of calling your form’s onSubmit() method, it will simply serve up the hosting page again, discarding your form submission.

The solution is to manually post your form to a different, secure page that is marked for https via @RequireHttps. Then the HttpsMapper will allow the form submission to take place.

First, we need a LoginForm that will adjust the form’s action attribute to point to our secure page:

public class LoginForm extends StatelessForm
{
    public LoginForm(String id)
    {
        super(id);
        add(new TextField("username").setRequired(true));
        add(new PasswordTextField("password").setRequired(true));
    }

   @Override
   protected void onComponentTag(ComponentTag tag)
   {
       super.onComponentTag(tag);
       String action = urlFor(LoginFormHandlerPage.class, null).toString();
       tag.put("action", action);
   }
}

Now we’ll need to create a page to handle the form submission:

@RequireHttps
public class LoginFormHandlerPage extends WebPage
{
    public LoginFormHandlerPage(PageParameters parameters)
    {
        HttpServletRequest req = (HttpServletRequest) getRequest().getContainerRequest();
        String username = req.getParameter("username");
        String password = req.getParameter("password");

        if (loginSuccessful(username, password))
        {
             if (! continueToOriginalDestination());
             {
                 setResponsePage(AccountPage.class);
             }
        }
        else
        {
            getSession().error("login failed"));
            // on failure send user to our regular login page
            setResponsePage(LoginPage.class);
        }
    }
}

Note that if you’re using Wicket 1.5.3 there is a bug that prevents the processing of form POST parameters (that’s why we’re reading the params manually from the HttpServletRequest). Fixed in Wicket 1.5.4.

The LoginFormHandlerPage will process the submitted form data over https, and if successful, log the user in, else send them to a page where then can re-enter their password.

You can get all the code (and quite a bit more useful login-related stuff) from github.

Credit where it’s due: the real gem here (submitting the form to a secure url) comes from this blog posting by Petri Kainulainen.

How to get JNDI working with Wicket 1.5 and Jetty 7.5

2011-12-27T22:29:03-05:00

The Wicket 1.5 archetype sets up a project to use Jetty 7.5. Quite a lot has changed in Jetty since version 6, and this broke my JNDI config. Here’s how I put things right again.

First of all, the imports have all been moved in 7.x. Here’s where they landed:

import org.eclipse.jetty.plus.webapp.EnvConfiguration;
import org.eclipse.jetty.webapp.WebInfConfiguration;
import org.eclipse.jetty.webapp.Configuration;
import org.eclipse.jetty.webapp.WebXmlConfiguration;

Next, you’ll need a jetty-env.xml.





    
    jdbc/mydatasource
    
        
            jdbc:mysql://localhost/mydatabase?characterEncoding=utf8
            username
            password

Normally this goes into src/main/webapp/WEB-INF, but you probably don’t want to deploy that with your app in your production war file. So instead I put mine in src/test/jetty/jetty-env.xml. You’ll need to modify your Start.java to tell Jetty to find the relocated config file.

EnvConfiguration envConfiguration = new EnvConfiguration();
URL url = new File("src/test/jetty/jetty-env.xml").toURI().toURL();
envConfiguration.setJettyEnvXml(url);
bb.setConfigurations(new Configuration[]{
    new WebInfConfiguration(),
    envConfiguration,
    new WebXmlConfiguration()
});

I found that I also had to set a couple of environment properties:

System.setProperty("java.naming.factory.url.pkgs",
                   "org.eclipse.jetty.jndi");
System.setProperty("java.naming.factory.initial",
                   "org.eclipse.jetty.jndi.InitialContextFactory");

With this, I can finally access my JNDI datasource happily from Wicket/Jetty.

Update: I’ve created a gist with the full source code.

Automatically generate Maven dependency coordinates for random Jar files

2011-09-03T12:07:38-04:00

Have you just inherited an Ant project that you’re trying to convert to Maven? Maybe it came with a “lib” directory full or random jar files. And worse, some thoughtless developer neglected to include version strings in the filenames?

Fear not! The Sonatype checksum search REST service can give you the Maven coordinates based on the jar’s SHA1 hash.

Still too much work? Not to worry, I just wrote a quick program to make it even easier for you. Provenance will take a directory full of jar files and write out the XML dependency information for every jar it finds. You can then copy/paste this right into the section of your pom.xml.

Enjoy.

Adding Git SHAs to Wicket Pages Automatically

2011-08-04T15:58:34-04:00

If you have a non-trivial project, it’s handy to be able to tell what code was used to build a particular release once it’s been deployed. Especially if you’ve recently discovered the joys of branching and merging with Git.

Here’s a handy way to add a Git SHA to all your app’s pages via Wicket and Maven.

Maven

First, we’ll use the exec-maven-plugin to create a git.properties file for us. Add this to the section in your pom.xml:


   org.codehaus.mojo
   exec-maven-plugin
   1.1
   
       
          compile
          
             exec
          
       
   
   
       git
       
            log
            --pretty=format:gitsha=%H %ci
            -n1
       
       target/classes/git.properties

This will create a git.properties file containing the Git SHA, along with the commit timestamp whenever your code is compiled. You can learn how to further customize this here.

Wicket Application Subclass

Now we’ll need to read in the git.properties file when our application starts up.

public class Application extends WebApplication
{
    private String gitSHA;

    public AppgravityApplication()
    {
        java.util.Properties props = new java.util.Properties();
        try
        {
            props.load(Thread.currentThread().getContextClassLoader().getResourceAsStream("git.properties"));
            gitSHA = props.getProperty("gitsha");
            log.info("gitsha: " + gitSHA);
        }
        catch (IOException e)
        {
            log.severe(e.getMessage());
            gitSHA = "unknown";
        }
    }

    public String getGitSHA()
    {
        return gitSHA;
    }

Wicket WebPage Subclass

Now we’ll create a WebPage subclass that renders the Git SHA into a tag when the page is rendered.

public abstract class MyPage extends WebPage
{
    @Override
    protected void onBeforeRender()
    {
        Label metaGitSHA = new Label("metaGitSHA", "");
        metaGitSHA.add(new AttributeModifier("content", Model.of(((Application) getApplication()).getGitSHA())));
        addOrReplace(metaGitSHA);
        super.onBeforeRender();
    }
}

You’ll want to extend MyPage for each of your pages. You’ll need to add the placeholder meta tag to each of your HTML pages like this:

And you’re done!

Pixlshare- an image sharing app

2011-06-06T18:27:00-04:00

Pixlshare is a new image-sharing webapp that I just started working on. It’s intended to be a low-friction way to do simple image sharing- upload an image and instantly get a URL that you can share with others. No accounts or logins needed- just click upload and you’re done.

It’s built in Wicket and tiny bit of JQuery. It’s fairly basic, but it has one fairly novel feature- you can add textual annotations to your uploaded images; the annotations appear as actual searchable text, rather than merely being part of the image bits.

I’m planning to add features like:

HTML5 drag-n-drop for uploads
upload multiple pictures at once to create an album
user comments

Give it a try!

How to make Ajax links crawlable with GWT and Google App Engine

2011-01-07T13:18:11-05:00

If you care about SEO you know that using GWT has a downside: much of your app’s content is generated dynamically via Javascript, and is therefore invisible to search engines. You might have dozens of pages of awesome content, but all the Googlebot sees is the static HTML page that hosts your app. This can be a real problem.

Fortunately, Google has proposed a solution for crawling ajax content:

change your hrefs to support “bang notation”: www.example.com/ajax.html#!key=value
when Googlebot makes requests of the form: www.example.com/ajax.html?escaped_fragment=key=value , you return a static HTML version of the Ajax content

So then the problem becomes one of generating static content from your Ajax links. You could do that by hand if your site is small and changes infrequently. More likely you’ll want a way to automate this.

Google recommends using a “headless browser” approach, i.e. using something like HtmlUnit. That’s a fine solution, but if you’re running on App Engine it’s almost guaranteed not to work because of the request timeout. So if you want to run on App Engine, you’re probably going to have to spider your own pages and pre-generate your HTML content.

My solution to this problem is to break the spidering up into small chunks, and farm them out as tasks on App Engine’s Task Queues. Whenever I update my app’s content, I submit a job that spiders the landing page looking for Ajax links. For each link that’s found, I submit a task that recursively spiders the link (taking care not to get into loops). Each task saves the HTML content into the data store, which is then returned as cached static content to Googlebot.

Suddenly my “simple” solution is sounding quite complicated, but it gets the job done reliably. Here’s some code to make it clearer.

I use a CachedAjaxLink data object to persist the static content:

public class CachedAjaxLink implements Serializable
{
    @Id
    private String href;
    private String cachedContent;
    private Date dateCached;
}

Then I use an AjaxCacher which crawls a given link, stores the results as CachedAjaxLinks, and queues Task Queue tasks for each link that it finds:

public class AjaxCacher
{
    protected static final Logger log = Logger.getLogger(AjaxCacher.class.getName());
    protected static final DAO dao = new DAO();

    public static final long PUMP_TIME = 5000;
    protected WebClient webClient;
    protected String crawlServletUrl;

    public AjaxCacher(String crawlServletUrl)
    {
        this.crawlServletUrl = crawlServletUrl;
        webClient = Holder.get();
    }

    public void crawl(URL urlToCrawl, Date crawlRequestTimestamp)
    {
        // URLs we've already queued
        Set queuedURLs = new HashSet();
        queuedURLs.add(urlToCrawl);

        try
        {
            HtmlPage page = webClient.getPage(urlToCrawl);

            // appengine hack because it's single threaded
            webClient.getJavaScriptEngine().pumpEventLoop(PUMP_TIME);

            String pageContent = page.asXml();

            CachedAjaxLink cachedAjaxLink = new CachedAjaxLink();
            cachedAjaxLink.setHref(urlToCrawl.getRef());
            cachedAjaxLink.setCachedContent(pageContent);
            cachedAjaxLink.setDateCached(new Date());  // time actually cached
            dao.updateCachedAjaxLink(cachedAjaxLink);

            List anchors = page.getAnchors();
            for (HtmlAnchor anchor : anchors)
            {
                // only care about ajax links
                if (! anchor.getHrefAttribute().startsWith("#")) continue;

                URL newUrl = new URL(urlToCrawl, anchor.getHrefAttribute());

                // don't queue multiple requests for the same URL
                if (queuedURLs.contains(newUrl)) continue;

                queuedURLs.add(newUrl);

                // prevent loops
                CachedAjaxLink link = dao.getCachedAjaxLink(newUrl.getRef());
                if (link == null || link.getDateCached().getTime() < crawlRequestTimestamp.getTime())
                {
                    queueCrawlRequest(newUrl.toString(), crawlRequestTimestamp);
                }
            }

        } catch (IOException e)
        {
            log.log(Level.SEVERE, e.getMessage(), e);
        }
        finally
        {
            webClient.closeAllWindows();
        }
    }

    /**
     * submits a crawl request to the queue; TaskQueueServlet will then handle the request asynchronously
     */
    public void queueCrawlRequest(String urlToCrawl, Date timeStamp)
    {
        Queue queue = QueueFactory.getDefaultQueue();
        TaskOptions options = TaskOptions.Builder.url(crawlServletUrl);
        options.param("encodedUrl", ServerUtils.encodeURL(urlToCrawl));
        options.param("timeStamp", ServerUtils.fromDate(timeStamp));
        options.method(TaskOptions.Method.GET);
        queue.add(options);
    }

    /**
     * try to cache a copy of the WebClient in ThreadLocal for faster startups on Google App Engine
     */
    public static class Holder
    {
        private static ThreadLocal holder = new ThreadLocal()
        {
            protected synchronized WebClient initialValue()
            {
                WebClient result = new WebClient(BrowserVersion.FIREFOX_3);
                result.setWebConnection(new UrlFetchWebConnection(result));
                return result;
            }
        };

        public static WebClient get()
        {
            return holder.get();
        }
    }
}

Finally, I use a TaskQueueServlet to handle the queued tasks:

public class TaskQueueServlet extends HttpServlet
{
    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
    {
        doPost(req, res);
    }

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
    {
        String encodedUrl = req.getParameter("encodedUrl");
        if (encodedUrl == null)
        {
            throw new IllegalArgumentException("missing param: encodedUrl");
        }

        String timeStamp = req.getParameter("timeStamp");
        if (timeStamp == null)
        {
            throw new IllegalArgumentException("missing param: timeStamp");
        }

        String decodedUrl = ServerUtils.decodeURL(encodedUrl);
        URL urlToCrawl = new URL(decodedUrl);

        getServletContext().getInitParameter("taskQueuePath");
        AjaxCacher cacher = new AjaxCacher(getServletContext().getInitParameter("taskQueuePath"));
        cacher.crawl(urlToCrawl, ServerUtils.toDate(timeStamp));
    }
}

Thanks Google, for making it so easy. ;-)

You can grab all of this code from my gwtquickstarter library. It’s the library that powers the best typing tutor on the web.

Add a badge to your blog with your typing score!

2010-12-28T10:45:09-05:00

Did you get an awesome word-per-minute score on your Quick Brown Frog typing speed test? Well now you can share it with the world by posting a Quick Brown Frog badge on your blog or website:

We just added a new feature to all our typing practice sessions- at the end of the lesson you’ll be shown the badge and the HTML code that generates it. Simply copy and paste the blue HTML code into your blog or website to add the badge for all to see.

New Feature: create a typing practice from random English words

2010-12-28T10:30:10-05:00

We’ve been busy adding new features over the Christmas holidays. The first of these is already available for you to use: you can now create a practice typing session from a set of random English words:

We’ve had numerous requests from users for a feature that would generate random typing lessons based on actual words. You can now generate instant typing practices, simply by choosing the number of words you wish to type.

We’re planning to extend this feature in the future to create automatically create lessons targeting letters that you need to practice (the ones you make the most typos with).

Enjoy!

Quick Brown Frog is now available in the Chrome Web Store

2010-12-09T11:46:25-05:00

The best typing tutor on the web is now available for sale in the Chrome Web Store.

It was surprisingly easy to get into the store if you’ve got an existing webapp running:

pay Google $5 (one-time developer fee; not per-app)
create a 16-line JSON manifest file
take some screen shots and create an icon (the hardest part, really)
bundle it into a zip
checkmark a few boxes, add some descriptive text, click “publish”

All you are really doing is bundling up some meta-data so that Chrome users can see your app as being “installed”. Even though my app is very much server-dependent, and has its own concept of user accounts and payments, Google is happy to have it in the store. And I’m happy for the potential extra customers.

It’s obvious that the Chrome Web Store is a great boon for developers. However it’s unclear whether users will actually find this useful, much less flock to it.

My guess is that once we start seeing more apps that really use HTML5 features like local storage, it might take off. I hope it does.

Patrick McKenzie launches Appointment Reminder

2010-12-06T05:43:37-05:00

Patrick McKenzie, a solo entrepreneur whom I admire greatly has launched his second project: Appointment Reminder.

Appointment Reminder is a service for personal business services (think: hair salons, medical offices, law firms, or anyone that regularly schedules appointments with clients). Appointment Reminder sends out reminders to clients automatically, via phone, SMS or email. Fewer forgotten appointments = increased revenue.

He’s leveraging Twilio, and API that I’m just dying to find an excuse to use.

Congrats Patrick, and good luck!

UI Refresh- new logo and other eye candy

2010-11-30T11:31:35-05:00

I spent some time (and some $$) prettying up the Quick Brown Frog user interface. I bought a logo from Logosamurai. Not bad for $67.

Then I added some gauges to display WPM and accuracy in real-time. They’re part of the Google Visualization API, and they’re awesome. That is, when they work. They don’t seem to work in IE8, so I’ve disabled them for that browser.

And they have some pretty odd resizing behavior: if you don’t explicitly specify a width attribute, the gauges will shrink every time you update the gauge value. It took me quite a bit of experimentation to figure that one out, but thankfully it seems to be working now.

You can check out the changes in this typing speed test.

Quick Brown Frog Typing Tutor is launched!

2010-11-25T04:34:18-05:00

After lots of helpful feedback from beta testers and Hacker News, my typing tutor app “Quick Brown Frog” is live and open for business!

Some of the things I changed in response to feedback include:

price reduction from $29.95 -> $9.95
support both Google Checkout and Paypal options for payment
stats reporting (WPM/Accuracy improvement over time)
report on frequently mis-typed keys after practice lessons
allow automatic single/double spacing after end of sentences (very frequently requested)
implemented crawlable Ajax links for SEO
plus many bug fixes for cross-browser compatibility

I consider this an MVP-level release, which means that it still needs lots of work and polish (in particular, the look-and-feel of the UI).

The important thing is to get it out there, and start getting feedback from actual customers. Absent that, I’d just be spinning my wheels.

Measuring Typing Accuracy

2010-11-22T05:38:25-05:00

It turns out that measuring typing accuracy (how accurately you hit the intended keys) is non-trivial. A quick survey of the online typing courses out there shows that there is considerable confusion over how to correctly measure typing accuracy. One site actually gives a negative percentage if you backspace too many times!

This was kind of surprising to me. I figured it would take me perhaps 10 minutes to implement. It ended up taking a day and a half of my time, $10 spent on an ACM article, and some quality time with a recent CS grad’s PhD thesis. No joke.

Well, to be completely honest, in the end, the implementation was in fact fairly trivial. What was hard was figuring out what to measure. It’s more of a human problem than a technical one.

Why is this hard?

Your intuition might be to simply measure the ratio of correct characters to total characters typed. At first glance this seems OK, but consider the following case:

    intended: The quick brown frog jumped over the lazy fox.

    actual  : The quck brown frog jumped over the lazy fox.

If we match up the typed characters with what was expected, the accuracy of this typed statement is a very low 15 %. But looking at it from a human perspective, what were the actual errors here? The typist missed the “i” character in the word “quick”, and as a result the rest of the line was offset by one. It seems like there ought to be a way to more accurately capture the fact that the user made a single typo. There is.

Edit Distance

Edit Distance is the number of operations required to transform one string into another. The operations include insertion, deletion and substitution. In this example, it requires only a single edit- ‘insert an i’.

Soukoreff’s thesis describes a Minimum String Distance Error Rate as the Edit Distance divided by the maximum of the lengths of the presented vs typed text, times 100%. For our “off by one” example above, the error rate is a mere 2%, i.e. 98% accurate. That seems more reasonable!

So for a while I was using the Levenshtein Distance algorithm to compute the edit distance, and hence the accuracy. This certainly worked better than my initial naive algorithm, but it still had a problem- What if the typist made lots of errors, but was fastidious about backspacing over the mistakes and correcting them? His error rate would be zero in this case, with an accuracy of 100%.

While correcting all your mistakes is admirable, it seems wrong to award a “100% accurate” rating to a typist sporting a bruised backspace key. Somehow, we’ve got to take the mistakes into account, even if they’ve been corrected.

Text-Entry Taxonomy

Soukoreff’s PhD thesis again provides a solution. Classify each of the keystrokes into one of the following categories:

C - Correctly typed
IF - Incorrect, but Fixed
INF - Incorrect, Not Fixed
F - Fixes, i.e. backspace or cursor movement used to correct mistakes

Using these classifications, he proposes a Total Error Rate defined as:

(INF + IF) / (C + INF + IF) * 100%

So, going back to our “off by one” example, if the typist noticed immediately that he missed the ‘i’ in ‘quick’, and backspaced to correct it,

The quc⌫ick brown frog jumped over the lazy fox.

the breakdown would be C: 47, F: 1, IF: 1,. INF: 0, for a total error rate of (0 + 1) / (47 + 0 + 1) = 2%, or 98% accurate. Same as using the Edit Distance alone, in this example.

But what if the user didn’t notice the mistake until 10 characters later? The input stream would look something like this:

The quck brown f⌫⌫⌫⌫⌫⌫⌫⌫⌫⌫ick brown frog jumped over the lazy fox.

What’s the error rate now? C:47, F:10, IF: 10, INF: 0, or (0 + 10) / (47 + 0 + 10) * 100% = 17.5%, or roughly 82% accurate. Aha!

I plugged this formula into Quick Brown Frog and watched the error rate go up and down while typing. Finally, it seemed to be doing the right thing- rewarding accurate, deliberate typing, and taking off points for outright mistakes as well expensive corrections.

Yet another example of why it’s hard to estimate software schedules- sometimes the ‘trivial’ is anything but.

Quick Brown Frog featured on StartupLift!

2010-11-17T16:05:34-05:00

I’m happy to report that StartUpLift.com, an awesome new site for promoting new startups, is featuring Quick Brown Frog.

http://startuplift.com/quick-brown-frog-learn-to-touch-type/

how to resolve App Engine timeouts when parsing web.xml

2010-11-17T12:09:48-05:00

For a few weeks now I’ve been plagued with timeouts while updating my App Engine apps. At first I thought it was a problem in the Intellij EAP beta I was running, but today I spent some time digging into it.

It seems to be unrelated to the EAP, as I get the same behavior with the App Engine command-line tools- timeout failure parsing web.xml roughly 75% of the time. It looks like this:

Nov 17, 2010 4:32:54 PM com.google.apphosting.utils.config.AbstractConfigXmlReader readConfigXml SEVERE: Received exception processing /Users/armhold/work/git-mega-repo/mystore/out/artifacts/Typing_Web_exploded/WEB-INF/web.xml com.google.apphosting.utils.config.AppEngineConfigException: Received IOException parsing the input stream for /Users/armhold/work/git-mega-repo/mystore/out/artifacts/Typing_Web_exploded/WEB-INF/web.xml at com.google.apphosting.utils.config.AbstractConfigXmlReader.getTopLevelNode(AbstractConfigXmlReader.java:210) at com.google.apphosting.utils.config.AbstractConfigXmlReader.parse(AbstractConfigXmlReader.java:228) at com.google.apphosting.utils.config.WebXmlReader.processXml(WebXmlReader.java:142) at com.google.apphosting.utils.config.WebXmlReader.processXml(WebXmlReader.java:22) at com.google.apphosting.utils.config.AbstractConfigXmlReader.readConfigXml(AbstractConfigXmlReader.java:111) at com.google.apphosting.utils.config.WebXmlReader.readWebXml(WebXmlReader.java:73) at com.google.appengine.tools.admin.Application.(Application.java:105) at com.google.appengine.tools.admin.Application.readApplication(Application.java:151) at com.google.appengine.tools.admin.AppCfg.(AppCfg.java:115) at com.google.appengine.tools.admin.AppCfg.(AppCfg.java:61) at com.google.appengine.tools.admin.AppCfg.main(AppCfg.java:57) Caused by: java.net.ConnectException: Operation timed out After some digging around, it seems to be related to a timeout deep in the bowels of Java’s XML libs while trying to validate the DTD for web.xml.

The fix is simple: elide the DTD declaration from the top of your web.xml: