It’s pretty basic, but it does successfully recognize its training set as well as the same characters with added “noise”. It uses a straightforward implementation of a feed-forward neural network.
It uses RMagick/ImageMagick to handle image processing, but apart from that it’s built from scratch!
You can grab a copy of the source from Github.
]]>I’ve grown tired of trying to properly format code with the Wordpress editor, so I’ve moved my blog from Wordpress (and before that Blogger) to Jekyll + Octopress, hosted on Github.
Let’s see how code formats under Octopress:
1 2 3 4 5 6 7 8 9 10 11 |
|
Beautiful!
]]>It’s an admittedly small project, but I was surprised how easy it was. I had it basically working in under a day, and polished enough for production use in under 2 days.
But the best part was the code size: a reduction of 6 to 1 going from Java -> Ruby.
]]>The problem is that on Chrome, it won’t let you actually click any of those buttons- the dialog is non-responsive to mouse clicks. This was really frustrating, and a few minutes of Googling showed that this was an old problem supposedly fixed by a Flash update months ago.
Updating to the latest Flash didn’t help (it’s apparently bundled with Chrome, and doesn’t use the version you can install manually on OSX).
Then I came across this trick: use tab to navigate the dialog checkboxes, and spacebar to select/deselect. Works like a charm.
]]>Since their plugin is Eclipse-based, I wrote up a compatible plugin for Intellij IDEA. You can install it from the repository, or build it yourself from the source on Github.
There are two parts to this plugin: the Firefox extension (provided by 42lines) and the IDE plugin; you need both. To install the Firefox plugin, follow the directions from 42lines. Then to install the Intellij plugin do the following:
Open the Preferences dialog (Intellij IDEA menu -> Preferences)
Under “IDE Settings” select “Plugins”
Click the “Browse Repositories” button.
In the search box type “wicket”, which should narrow the results significantly.
Right-click “Wicket Source”, and select “Download and Install”.
You’ll be asked to re-start Intellij, and then you should be in business. The plugin uses port 9123 and no password by default (same as the Firefox plugin defaults). To change this, open the IDE Settings dialog and click “Wicket Source” to enter a password.
Enjoy!
]]>To protect your users’ privacy, you should make sure that form is sent over SSL. If the hosting page is https that happens automatically, but most domains don’t secure their entire site; only a subset of pages are typically secured with SSL. But since this header likely appears on all your pages, how can you secure the form?
The first step is to manually adjust the form’s action attribute to ensure that the form submission happens over https, rather than http.
But this is where we run into a problem with Wicket- if the hosting page is http, and you have also installed an HttpsMapper in your WicketApplication like this:
setRootRequestMapper(new HttpsMapper(getRootRequestMapper(), new HttpsConfig(HTTP_PORT, HTTPS_PORT)));
then Wicket will not allow your form to be sent over https; the mapper will notice the http/https mismatch, and instead of calling your form’s onSubmit() method, it will simply serve up the hosting page again, discarding your form submission.
The solution is to manually post your form to a different, secure page that is marked for https via @RequireHttps. Then the HttpsMapper will allow the form submission to take place.
First, we need a LoginForm that will adjust the form’s action attribute to point to our secure page:
public class LoginForm extends StatelessForm
{
public LoginForm(String id)
{
super(id);
add(new TextField("username").setRequired(true));
add(new PasswordTextField("password").setRequired(true));
}
@Override
protected void onComponentTag(ComponentTag tag)
{
super.onComponentTag(tag);
String action = urlFor(LoginFormHandlerPage.class, null).toString();
tag.put("action", action);
}
}
Now we’ll need to create a page to handle the form submission:
@RequireHttps
public class LoginFormHandlerPage extends WebPage
{
public LoginFormHandlerPage(PageParameters parameters)
{
HttpServletRequest req = (HttpServletRequest) getRequest().getContainerRequest();
String username = req.getParameter("username");
String password = req.getParameter("password");
if (loginSuccessful(username, password))
{
if (! continueToOriginalDestination());
{
setResponsePage(AccountPage.class);
}
}
else
{
getSession().error("login failed"));
// on failure send user to our regular login page
setResponsePage(LoginPage.class);
}
}
}
Note that if you’re using Wicket 1.5.3 there is a bug that prevents the processing of form POST parameters (that’s why we’re reading the params manually from the HttpServletRequest). Fixed in Wicket 1.5.4.
The LoginFormHandlerPage will process the submitted form data over https, and if successful, log the user in, else send them to a page where then can re-enter their password.
You can get all the code (and quite a bit more useful login-related stuff) from github.
Credit where it’s due: the real gem here (submitting the form to a secure url) comes from this blog posting by Petri Kainulainen.
]]>First of all, the imports have all been moved in 7.x. Here’s where they landed:
import org.eclipse.jetty.plus.webapp.EnvConfiguration;
import org.eclipse.jetty.webapp.WebInfConfiguration;
import org.eclipse.jetty.webapp.Configuration;
import org.eclipse.jetty.webapp.WebXmlConfiguration;
Next, you’ll need a jetty-env.xml.
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<Configure id="wac" class="org.eclipse.jetty.webapp.WebAppContext">
<New class="org.eclipse.jetty.plus.jndi.EnvEntry">
<Arg>jdbc/mydatasource</Arg>
<Arg>
<New class="com.mysql.jdbc.jdbc2.optional.MysqlConnectionPoolDataSource">
<Set name="Url">jdbc:mysql://localhost/mydatabase?characterEncoding=utf8</Set>
<Set name="User">username</Set>
<Set name="Password">password</Set>
</New>
</Arg>
</New>
</Configure>
Normally this goes into src/main/webapp/WEB-INF, but you probably don’t want to deploy that with your app in your production war file. So instead I put mine in src/test/jetty/jetty-env.xml. You’ll need to modify your Start.java to tell Jetty to find the relocated config file.
EnvConfiguration envConfiguration = new EnvConfiguration();
URL url = new File("src/test/jetty/jetty-env.xml").toURI().toURL();
envConfiguration.setJettyEnvXml(url);
bb.setConfigurations(new Configuration[]{
new WebInfConfiguration(),
envConfiguration,
new WebXmlConfiguration()
});
I found that I also had to set a couple of environment properties:
System.setProperty("java.naming.factory.url.pkgs",
"org.eclipse.jetty.jndi");
System.setProperty("java.naming.factory.initial",
"org.eclipse.jetty.jndi.InitialContextFactory");
With this, I can finally access my JNDI datasource happily from Wicket/Jetty.
Update: I’ve created a gist with the full source code.
]]>Fear not! The Sonatype checksum search REST service can give you the Maven coordinates based on the jar’s SHA1 hash.
Still too much work? Not to worry, I just wrote a quick program to make it even easier for you. Provenance will take a directory full of jar files and write out the XML dependency information for every jar it finds. You can then copy/paste this right into the <dependencies>
section of your pom.xml.
Enjoy.
]]>Here’s a handy way to add a Git SHA to all your app’s pages via Wicket and Maven.
First, we’ll use the exec-maven-plugin to create a git.properties file for us. Add this to the
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>git</executable>
<arguments>
<argument>log</argument>
<argument>--pretty=format:gitsha=%H %ci</argument>
<argument>-n1</argument>
</arguments>
<outputFile>target/classes/git.properties</outputFile>
</configuration>
</plugin>
This will create a git.properties file containing the Git SHA, along with the commit timestamp whenever your code is compiled. You can learn how to further customize this here.
Now we’ll need to read in the git.properties file when our application starts up.
public class Application extends WebApplication
{
private String gitSHA;
public AppgravityApplication()
{
java.util.Properties props = new java.util.Properties();
try
{
props.load(Thread.currentThread().getContextClassLoader().getResourceAsStream("git.properties"));
gitSHA = props.getProperty("gitsha");
log.info("gitsha: " + gitSHA);
}
catch (IOException e)
{
log.severe(e.getMessage());
gitSHA = "unknown";
}
}
public String getGitSHA()
{
return gitSHA;
}
Now we’ll create a WebPage subclass that renders the Git SHA into a tag when the page is rendered.
public abstract class MyPage extends WebPage
{
@Override
protected void onBeforeRender()
{
Label metaGitSHA = new Label("metaGitSHA", "");
metaGitSHA.add(new AttributeModifier("content", Model.of(((Application) getApplication()).getGitSHA())));
addOrReplace(metaGitSHA);
super.onBeforeRender();
}
}
You’ll want to extend MyPage for each of your pages. You’ll need to add the placeholder meta tag to each of your HTML pages like this:
<head>
<meta wicket:id="metaGitSHA" id="metaGitSHA" name="metaGitSHA" content=""/>
</head>
And you’re done!
]]>It’s built in Wicket and tiny bit of JQuery. It’s fairly basic, but it has one fairly novel feature- you can add textual annotations to your uploaded images; the annotations appear as actual searchable text, rather than merely being part of the image bits.
I’m planning to add features like:
HTML5 drag-n-drop for uploads
upload multiple pictures at once to create an album
user comments
Fortunately, Google has proposed a solution for crawling ajax content:
change your hrefs to support “bang notation”: www.example.com/ajax.html#!key=value
when Googlebot makes requests of the form: www.example.com/ajax.html?escaped_fragment=key=value , you return a static HTML version of the Ajax content
So then the problem becomes one of generating static content from your Ajax links. You could do that by hand if your site is small and changes infrequently. More likely you’ll want a way to automate this.
Google recommends using a “headless browser” approach, i.e. using something like HtmlUnit. That’s a fine solution, but if you’re running on App Engine it’s almost guaranteed not to work because of the request timeout. So if you want to run on App Engine, you’re probably going to have to spider your own pages and pre-generate your HTML content.
My solution to this problem is to break the spidering up into small chunks, and farm them out as tasks on App Engine’s Task Queues. Whenever I update my app’s content, I submit a job that spiders the landing page looking for Ajax links. For each link that’s found, I submit a task that recursively spiders the link (taking care not to get into loops). Each task saves the HTML content into the data store, which is then returned as cached static content to Googlebot.
Suddenly my “simple” solution is sounding quite complicated, but it gets the job done reliably. Here’s some code to make it clearer.
I use a CachedAjaxLink data object to persist the static content:
public class CachedAjaxLink implements Serializable
{
@Id
private String href;
private String cachedContent;
private Date dateCached;
}
Then I use an AjaxCacher which crawls a given link, stores the results as CachedAjaxLinks, and queues Task Queue tasks for each link that it finds:
public class AjaxCacher
{
protected static final Logger log = Logger.getLogger(AjaxCacher.class.getName());
protected static final DAO dao = new DAO();
public static final long PUMP_TIME = 5000;
protected WebClient webClient;
protected String crawlServletUrl;
public AjaxCacher(String crawlServletUrl)
{
this.crawlServletUrl = crawlServletUrl;
webClient = Holder.get();
}
public void crawl(URL urlToCrawl, Date crawlRequestTimestamp)
{
// URLs we've already queued
Set queuedURLs = new HashSet();
queuedURLs.add(urlToCrawl);
try
{
HtmlPage page = webClient.getPage(urlToCrawl);
// appengine hack because it's single threaded
webClient.getJavaScriptEngine().pumpEventLoop(PUMP_TIME);
String pageContent = page.asXml();
CachedAjaxLink cachedAjaxLink = new CachedAjaxLink();
cachedAjaxLink.setHref(urlToCrawl.getRef());
cachedAjaxLink.setCachedContent(pageContent);
cachedAjaxLink.setDateCached(new Date()); // time actually cached
dao.updateCachedAjaxLink(cachedAjaxLink);
List anchors = page.getAnchors();
for (HtmlAnchor anchor : anchors)
{
// only care about ajax links
if (! anchor.getHrefAttribute().startsWith("#")) continue;
URL newUrl = new URL(urlToCrawl, anchor.getHrefAttribute());
// don't queue multiple requests for the same URL
if (queuedURLs.contains(newUrl)) continue;
queuedURLs.add(newUrl);
// prevent loops
CachedAjaxLink link = dao.getCachedAjaxLink(newUrl.getRef());
if (link == null || link.getDateCached().getTime() < crawlRequestTimestamp.getTime())
{
queueCrawlRequest(newUrl.toString(), crawlRequestTimestamp);
}
}
} catch (IOException e)
{
log.log(Level.SEVERE, e.getMessage(), e);
}
finally
{
webClient.closeAllWindows();
}
}
/**
* submits a crawl request to the queue; TaskQueueServlet will then handle the request asynchronously
*/
public void queueCrawlRequest(String urlToCrawl, Date timeStamp)
{
Queue queue = QueueFactory.getDefaultQueue();
TaskOptions options = TaskOptions.Builder.url(crawlServletUrl);
options.param("encodedUrl", ServerUtils.encodeURL(urlToCrawl));
options.param("timeStamp", ServerUtils.fromDate(timeStamp));
options.method(TaskOptions.Method.GET);
queue.add(options);
}
/**
* try to cache a copy of the WebClient in ThreadLocal for faster startups on Google App Engine
*/
public static class Holder
{
private static ThreadLocal holder = new ThreadLocal()
{
protected synchronized WebClient initialValue()
{
WebClient result = new WebClient(BrowserVersion.FIREFOX_3);
result.setWebConnection(new UrlFetchWebConnection(result));
return result;
}
};
public static WebClient get()
{
return holder.get();
}
}
}
Finally, I use a TaskQueueServlet to handle the queued tasks:
public class TaskQueueServlet extends HttpServlet
{
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
doPost(req, res);
}
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
String encodedUrl = req.getParameter("encodedUrl");
if (encodedUrl == null)
{
throw new IllegalArgumentException("missing param: encodedUrl");
}
String timeStamp = req.getParameter("timeStamp");
if (timeStamp == null)
{
throw new IllegalArgumentException("missing param: timeStamp");
}
String decodedUrl = ServerUtils.decodeURL(encodedUrl);
URL urlToCrawl = new URL(decodedUrl);
getServletContext().getInitParameter("taskQueuePath");
AjaxCacher cacher = new AjaxCacher(getServletContext().getInitParameter("taskQueuePath"));
cacher.crawl(urlToCrawl, ServerUtils.toDate(timeStamp));
}
}
Thanks Google, for making it so easy. ;-)
You can grab all of this code from my gwtquickstarter library. It’s the library that powers the best typing tutor on the web.
]]>We just added a new feature to all our typing practice sessions- at the end of the lesson you’ll be shown the badge and the HTML code that generates it. Simply copy and paste the blue HTML code into your blog or website to add the badge for all to see.
]]>We’ve had numerous requests from users for a feature that would generate random typing lessons based on actual words. You can now generate instant typing practices, simply by choosing the number of words you wish to type.
We’re planning to extend this feature in the future to create automatically create lessons targeting letters that you need to practice (the ones you make the most typos with).
Enjoy!
]]>It was surprisingly easy to get into the store if you’ve got an existing webapp running:
pay Google $5 (one-time developer fee; not per-app)
create a 16-line JSON manifest file
take some screen shots and create an icon (the hardest part, really)
bundle it into a zip
checkmark a few boxes, add some descriptive text, click “publish”
All you are really doing is bundling up some meta-data so that Chrome users can see your app as being “installed”. Even though my app is very much server-dependent, and has its own concept of user accounts and payments, Google is happy to have it in the store. And I’m happy for the potential extra customers.
It’s obvious that the Chrome Web Store is a great boon for developers. However it’s unclear whether users will actually find this useful, much less flock to it.
My guess is that once we start seeing more apps that really use HTML5 features like local storage, it might take off. I hope it does.
]]>Appointment Reminder is a service for personal business services (think: hair salons, medical offices, law firms, or anyone that regularly schedules appointments with clients). Appointment Reminder sends out reminders to clients automatically, via phone, SMS or email. Fewer forgotten appointments = increased revenue.
He’s leveraging Twilio, and API that I’m just dying to find an excuse to use.
Congrats Patrick, and good luck!
]]>Then I added some gauges to display WPM and accuracy in real-time. They’re part of the Google Visualization API, and they’re awesome. That is, when they work. They don’t seem to work in IE8, so I’ve disabled them for that browser.
And they have some pretty odd resizing behavior: if you don’t explicitly specify a width attribute, the gauges will shrink every time you update the gauge value. It took me quite a bit of experimentation to figure that one out, but thankfully it seems to be working now.
You can check out the changes in this typing speed test.
]]>Some of the things I changed in response to feedback include:
price reduction from $29.95 -> $9.95
support both Google Checkout and Paypal options for payment
stats reporting (WPM/Accuracy improvement over time)
report on frequently mis-typed keys after practice lessons
allow automatic single/double spacing after end of sentences (very frequently requested)
implemented crawlable Ajax links for SEO
plus many bug fixes for cross-browser compatibility
I consider this an MVP-level release, which means that it still needs lots of work and polish (in particular, the look-and-feel of the UI).
The important thing is to get it out there, and start getting feedback from actual customers. Absent that, I’d just be spinning my wheels.
]]>This was kind of surprising to me. I figured it would take me perhaps 10 minutes to implement. It ended up taking a day and a half of my time, $10 spent on an ACM article, and some quality time with a recent CS grad’s PhD thesis. No joke.
Well, to be completely honest, in the end, the implementation was in fact fairly trivial. What was hard was figuring out what to measure. It’s more of a human problem than a technical one.
Your intuition might be to simply measure the ratio of correct characters to total characters typed. At first glance this seems OK, but consider the following case:
intended: The quick brown frog jumped over the lazy fox.
actual : The quck brown frog jumped over the lazy fox.
If we match up the typed characters with what was expected, the accuracy of this typed statement is a very low 15 %. But looking at it from a human perspective, what were the actual errors here? The typist missed the “i” character in the word “quick”, and as a result the rest of the line was offset by one. It seems like there ought to be a way to more accurately capture the fact that the user made a single typo. There is.
Edit Distance is the number of operations required to transform one string into another. The operations include insertion, deletion and substitution. In this example, it requires only a single edit- ‘insert an i’.
Soukoreff’s thesis describes a Minimum String Distance Error Rate as the Edit Distance divided by the maximum of the lengths of the presented vs typed text, times 100%. For our “off by one” example above, the error rate is a mere 2%, i.e. 98% accurate. That seems more reasonable!
So for a while I was using the Levenshtein Distance algorithm to compute the edit distance, and hence the accuracy. This certainly worked better than my initial naive algorithm, but it still had a problem- What if the typist made lots of errors, but was fastidious about backspacing over the mistakes and correcting them? His error rate would be zero in this case, with an accuracy of 100%.
While correcting all your mistakes is admirable, it seems wrong to award a “100% accurate” rating to a typist sporting a bruised backspace key. Somehow, we’ve got to take the mistakes into account, even if they’ve been corrected.
Soukoreff’s PhD thesis again provides a solution. Classify each of the keystrokes into one of the following categories:
C - Correctly typed
IF - Incorrect, but Fixed
INF - Incorrect, Not Fixed
F - Fixes, i.e. backspace or cursor movement used to correct mistakes
Using these classifications, he proposes a Total Error Rate defined as:
(INF + IF) / (C + INF + IF) * 100%
So, going back to our “off by one” example, if the typist noticed immediately that he missed the ‘i’ in ‘quick’, and backspaced to correct it,
The quc⌫ick brown frog jumped over the lazy fox.
the breakdown would be C: 47, F: 1, IF: 1,. INF: 0, for a total error rate of (0 + 1) / (47 + 0 + 1) = 2%, or 98% accurate. Same as using the Edit Distance alone, in this example.
But what if the user didn’t notice the mistake until 10 characters later? The input stream would look something like this:
The quck brown f⌫⌫⌫⌫⌫⌫⌫⌫⌫⌫ick brown frog jumped over the lazy fox.
What’s the error rate now? C:47, F:10, IF: 10, INF: 0, or (0 + 10) / (47 + 0 + 10) * 100% = 17.5%, or roughly 82% accurate. Aha!
I plugged this formula into Quick Brown Frog and watched the error rate go up and down while typing. Finally, it seemed to be doing the right thing- rewarding accurate, deliberate typing, and taking off points for outright mistakes as well expensive corrections.
Yet another example of why it’s hard to estimate software schedules- sometimes the ‘trivial’ is anything but.
]]>http://startuplift.com/quick-brown-frog-learn-to-touch-type/
]]>It seems to be unrelated to the EAP, as I get the same behavior with the App Engine command-line tools- timeout failure parsing web.xml roughly 75% of the time. It looks like this:
Nov 17, 2010 4:32:54 PM com.google.apphosting.utils.config.AbstractConfigXmlReader readConfigXml SEVERE: Received exception processing /Users/armhold/work/git-mega-repo/mystore/out/artifacts/Typing_Web_exploded/WEB-INF/web.xml com.google.apphosting.utils.config.AppEngineConfigException: Received IOException parsing the input stream for /Users/armhold/work/git-mega-repo/mystore/out/artifacts/Typing_Web_exploded/WEB-INF/web.xml at com.google.apphosting.utils.config.AbstractConfigXmlReader.getTopLevelNode(AbstractConfigXmlReader.java:210) at com.google.apphosting.utils.config.AbstractConfigXmlReader.parse(AbstractConfigXmlReader.java:228) at com.google.apphosting.utils.config.WebXmlReader.processXml(WebXmlReader.java:142) at com.google.apphosting.utils.config.WebXmlReader.processXml(WebXmlReader.java:22) at com.google.apphosting.utils.config.AbstractConfigXmlReader.readConfigXml(AbstractConfigXmlReader.java:111) at com.google.apphosting.utils.config.WebXmlReader.readWebXml(WebXmlReader.java:73) at com.google.appengine.tools.admin.Application.(Application.java:105) at com.google.appengine.tools.admin.Application.readApplication(Application.java:151) at com.google.appengine.tools.admin.AppCfg.(AppCfg.java:115) at com.google.appengine.tools.admin.AppCfg.(AppCfg.java:61) at com.google.appengine.tools.admin.AppCfg.main(AppCfg.java:57) Caused by: java.net.ConnectException: Operation timed out After some digging around, it seems to be related to a timeout deep in the bowels of Java’s XML libs while trying to validate the DTD for web.xml.
The fix is simple: elide the DTD declaration from the top of your web.xml:
]]><!DOCTYPE web-app PUBLIC “-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN” ”http://java.sun.com/dtd/web-app_2_3.dtd”>