I became interested in natural language processing (without really knowing what the field entailed) when I was in third grade when I tried out AOL Instant Messenger (AIM), shortly after it was released circa 1997.

I was quite introverted and asocial growing up (ok, this is an understatement) and the idea of being able to cherry-pick friends based on the types of conversations I liked to have and being able to leave a conversation at any time (and for this to be commonplace and socially acceptable) were attractive to me.

But what I really loved was the idea of chat rooms. Watching how people interacted, what conversations worked, which caused people to get upset, which caused people to laugh.

For any who remember how AIM's early chat feature worked, you clicked a button to start a chat, it pre-populated a unique room ID, and you had the option either to rename the room, or enter the chat.

I remember thinking this was odd and vividly remember one (entire) day making it a priority to type in all the common room names I could think of to see what happened. One of the first I tried was "friends". Low and behold, I was in a room with a handful of people who were very confused as to why I had just intruded their private chat. But amazingly, it seemed almost okay. Some thought it was weird and bailed. But many others were intrigued to meet a digital pen pall. It's almost as if we were occupying public domain together. After a great conversation, I had suggested we keep the chat open and check back every few days.

I more or less lived in that "friends" chat room and started inviting what few other friends I had to it. I looked for other ephemeral rooms on AIM and invited the people I found to join the "friends" room as well. We quickly hit a limit of ~32 people.

And a few of us created another one. There was "hi", "hello", "hey", "chat", "chat2"... and so on. Over the next year or two, use of these unofficial AIM chat rooms really took off, many with themes and unique micro-cultures. It was like a mini-revolution; gosh it was exciting.

But nothing as exciting as what I saw next. A scrambler bot -- a fake username, a robot -- which initiated a game of word unscramble in the "friends" chat room. I had been learning qbasic, html, and introductory java at the time and I immediate asked the chat how the thing was working and who programmed it.

It became instantly obvious to me that there was an unparalleled opportunity to bend and augment the very fabric of the medium through which communication was occurring. There was a "magic" middleware in between me and them and an opportunity to optimize the very idea of communication, to enhance it.

The whole thing was a massive game. People who write bots to flood chat rooms and take them over so people couldn't organize with their friends. People who write tools to ban users. I was inspired by a friend "Mike311211" (a Disturbed [band] fan) who even created his own complete instant message client.

Up to this point, programming had mostly been a curiosity to me, but this got me serious about programming. I wanted the ability to protect the chat rooms I was in. To enhance them with features. To fundamentally push communication to the limit and redefine the paradigm beyond chat messages.

Just before middle school, I started writing a bot (much like the ones on IRC) to automatically save logs of various chat rooms so I could read them later. There was an "ego" log which listened to all the chats for when my name or screen name was mentioned. The bot respond to simple queries, like, "when was I last mentioned" or "say hi to . It allowed me to evaluate simple math expressions and programs by forwarding them to the java's eval. Around this time I learned of IRC and was becoming aware of what others were programming bots to do in more sophisticated settings and what was possible.

What I wanted a bot for was to do search. To research collaboratively with friends and to integrate chat with search. To create something like IBM's Watson. In 2017, much of this has since been realized (only recently).

Becoming a proficient programmer:

In middle school, I did what (sure, let's go with this) everyone else my age was doing during the summers. Going to NCC national computer camps and learning how to program better. I met all sorts of interesting and smart people like Katherine McCusker, this awesome dude named Tal who (someone should help me re-discover) had this great dangly rat tail thing in place of bangs and an otherwise crew cut head. I spent my next five or so summers this way, learning c++, unix, A+ certification type hardware stuff, and learning more about the TOC (talk to oscar AIM protocol). Still nothing research related.

Starting a search bot:

Once I had fundamentals in place, I spent the remainder of middle school and high school creating a search engine in my free time called GNUask. It was a question answer system like quora. It would sit in AIM chat rooms (and irc chat rooms) and record all the conversations. I had a new type of challenge, which was to detect which sentences typed by users were questions. And harder by far, where there answers provided in the chat room? The approach that seemed most reasonable to me was prompting chat room users with additional questions like, "did what user X say answer your question Y, Z?". By now, wikipedia was a valuable source and I was starting to learn about markov chains and methods for generating text based on corpra.

By the time I was applying for colleges, I had a database with tens of thousands of questions and answers, which also was piggy backing on other websites which had similar services, such as which launched around 2005. What I needed was advisors who could teach me how to clean the data. I applied to schools for computer science, thinking that I would be entering a software engineering program. What I got, I loved, but it was highly theoretical and focused on heavy math. Because of this, I turned to my peers for direction.

With the help of Tenzin, Robin, Jonathan, Dillan, Chris, Andrew, Gary (who has been a life long mentor of mine, and one of the most brilliant people I know), Stephen, Munkelwitz, Abe Barth-Werb, Jacob Beauregard, Evan Yandell, Zack Ney, Evan Flynn, Ethan Joachim Eldridge, Phelan Vendeville, Mike Torch, Leif K-Brooks, and a handful of other amazing personalities, we revived the Computer Science Student Association (CSSA) at the University of Vermont from a once a month meetup to a hackerspace which became home for 10% of the CS department. From them, I learned about recursive trampolining, decision trees, neural networks, x11 and intel graphics acceleration, beowulf clustering, password cracking, language theory, homoiconicity, data mining, wireless sensors, functional programming and tail call optimization, multiprocessing done right, and a whole bunch of other skills. I was also very inspired by Leif who was running Omegle -- vicariously it was a privilege to see what went into supporting an effort of that size and the concept is brilliant. The CSSA was a fantastic source of inspiration, knowledge, and a network I am very grateful to have worked with. Theme being, have a mentor. Or many mentors who know what you want to learn. During my time at UVM, I spent no less than 500 hours talking to my mentor Gary Johnson about everything he was learning during his grad classes about machine learning (often at the great expense of sleeping)

The second lesson was to secure an independent study, if possible. Working with Josh Bongard (who was our CSSA advisor) was my first dive during undergrad into working with basic neural networks. And taking relevant courses from teachers who know the industry. I was lucky to take a Data Mining class and a course on Artificial Intelligence with Dr. Xindong Wu, who was formerly the Editor-in-Chief for IEEE TKDE, AI&KP and Chair of ICDM. In these courses we used Peter Norvig and Stuart Russel's Artificial Intelligence a Modern Approach (which is one of my favorite books). We want through a detailed comparison of data mining algorithms ( and we implemented each one in different languages with interesting complimentary or illustrative features ranging from common lisp to prolog. The final for the KDM class was a mini dissertation on one of these algorithms, as well as suggested improvements (I did PageRank).

I applied to grad schools specifically working to continue work on my search engine. I was rather disappointed to find (what shouldn't have been surprising) that there were a lot of professors who wanted to work with me on their grants, but none who was willing to work on my project. And to add insult to injury, the professor I had hope to work with was on sabbatical. So, I continued to take classes on Artificial Intelligence at University of Delaware and ran my search engine servers from within the university (which I probably shouldn't be saying) and would bring questions to the faculty every chance I got.

Less than two years into my phd program I got dragged out by an opportunity to start a company, it had some funding offers, and so I decided to take the shot and leave my program. I was kept pretty busy over the next ~4 years w/ engineering efforts for the startups I was with, but I tried to continue to queue up and read academic papers and follow the trends of the industry. If you read my essay on this thread about reaching inaccessible people, I would frequently connect with researchers to ask them about their work and occasionally they were willing to walk me through their paper if I was stumped on something.

This all came in handy when I started Hackerlist, Inc and we pivoted to be an AI consultancy and I had to read papers, interview candidates, and pair the right person to the right contract.

Here's a list of ~75 papers (most AIML or NLP related) which I found worth noting (or was lucky to remember):