Saturday, September 30, 2017

Isabella: Touch-y Feel-y

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we did another SPIKE for the programming part of Isabella.

Now, the continuation...

Feelings

I have worked a bit with game programming, and design. I really enjoy it. There are so many interesting aspects of it, and once you study something like that, you start noticing it everywhere. Why do I bring this up?

We would like Isabella to be as relatable as possible. This is because, then we have more control over users, and the more Isabella feels like a human, the higher our tolerance for errors. One way to do this is to add a "feeling game". Here we mean, a small system where some things, like rude behavior, will affect the way she behaves and responds. Of course this should not be detrimental to the functionality, but there is still some room to work with.

To facilitate this we needed to rewrite a large portion of the matching system, and while doing this we also refactored our every "eval", and changed all callbacks to "promises". This is footnote, yet the task took hours and hours.

Having done this we could separate the "fluffy" part of responses, the "it is ___ o'clock" from the "9:45". Now we simply say that if she is in a good mood she says the entire thing, otherwise she only give the bare, cold information.

Now we just need something to affect her mood. The first thing we added was good and bad words, like cussing, or saying "please" and "thank you". We then had a cool idea. All speech recognition has the challenge of understanding variations of commands. Here we have a brand new innovation: we can teach our users. We tied the matching score to her mood, so the harder she has to work to figure out what you want the less happy she will be.

We also have plans for having offensive jokes or quotes, affect her mood as well, this way regaining some of the political correctness we threw away earlier in development. She might still say something inappropriate but at least she'll be mad about it as well.

Wednesday, September 27, 2017

Isabella: Programming, by Speech

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we added support for follow-up queries, so we could have "conversations".

Now, the continuation...

SPIKE

As we mentioned in the very first post, one of the goals was to have Isabella to assist with coding. In particular, coding herself. Now that we have added quite a few functions, and features such as wild cards, and follow-ups, it is time for another SPIKE to test if we are getting closer.

In the first SPIKE we reached a point where we could call functions, and generally evaluate expressions. This time we want to see if we can implement a function with Isabelle. The time-box is a few hours, at most an evening, thus tomorrow everything we write from this point will be deleted again.

The idea

Dictating everything is horrible, it is slow and error prone, so the Isabella should at least have some domain knowledge. It should feel like telling one of my students what to do. My idea is to have her take control of the conversation, this way she can keep track of where we are, thus simplifying navigation significantly.

Simply put, we initialize a series of questions, where each answer is translated into code differently depending on which part of the function we are in. That is, the answer to "what should I call the function" would be translated into an identifier, where as the answer to "what is this statement" would be parsed as f.ex. an assignment.

This "conversation" is much more complex that what we have used before, because there is no set limit to the number of questions. There could be no parameters, or there could be 15. How do we support this?

Counting (arrays)

Our first solution was to have her start an enumeration by asking "how many Xs will there be", and then just loop that many times for answers. This worked.

It did required us to could how many things we wanted up front, which is not natural when coding. Next time you are coding something try to predict how many statements are going to be in a function. For this reason we abandoned this solution.

If we think about it, this is exactly how arrays work. When we initialize them we have to specify their length. The problems we ran into are also recognized from arrays: it is difficult to change their length later.

Terminating (lazy lists)

Having realized that our first idea was basically arrays, we can use this to come up with an alternative idea: Lazy lists. Our next idea is to reserve a keyword ("done"), just keep asking for more until we said that word. This way we just start saying statements, and then when there are no more, we just say "done" and the function is complete.

This is much more natural! So this is probably the way to go.

Conclusion

Coming to the of the SPIKE, what have learned?

First of all we did succeed in programming a method using speech. And it did not take an unrealistically long time. We also learned a few important lessons for later when we decide to implement the real thing.

Unfortunately it was not as smooth as we had hoped, after an evening of working on it. It is clear that to make this useful we basically have to implement an entire language for programming. We did this for "a function" and it was nice, but we should also apply the idea to "statements", and probably "expressions" and so on. If we have to do this anyway Javascript might not be the right choice. Maybe we should invent a language which works specifically with speech? We'll have to see.

Saturday, September 23, 2017

Isabella: Now we're talking

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we continued adding more APIs: functionality for free.

Now, the continuation...

More APIs

As mentioned, this stage of development, is super fun, because so much is happening, so fast. Keeping with this we try to add at least two new things at a time.

The small easy addition this time is a quote API. If we thought the joke API was difficult to find, we did not anticipate how difficult it would be to find a good quote API. In fact, implementing features like this takes about 7 minutes, however finding a good API takes about 2 hours.

In the end we gave up the search for a "general, inspirational quote API". And settled for a "programming quote API". Again, we can take advantage of the target audience of Isabella; me, and maybe a few of my friends. We are fine with programming quotes.

At this point we have also added:

"Conversations"

A different thing completely, is that we eventually want Isabella to have some contextual understanding. Like saying "Play Ed Sheeran", and then following that up with "How old is he", or something like that.

The most basic examples of contextual understanding comes from saying "what" when you don't hear what she says. In this case we want her to repeat the last thing she said. One characteristic of this follow-up query is that you shouldn't have to say "Isabella" first, like normal, as it comes as part of a "conversation".

We introduced the Isabella name to simplify the command matching algorithm. So we could take advantage of the fact that we knew we should try to match one of the commands, and could do our "backwards trick". Now we want to remove this simplification, and that means that she listens to everything that is said after a command. However, this time not everything that is say should match a command. To facilitate this we needed to add a threshold to our matching algorithm, and say that we only register something if it is a sufficiently good match.

First we just add the option to say "what" after everything and she repeats. Then we add "thank you", to which she will reply "you are welcome". Then we took it to the next level.

We wanted to add notes. Here we did not want to use our wildcards, as the note might be quite long. Instead we wanted to use a custom follow-up, where we could capture everything and save it. So that is what we did.

We are probably going to use this "conversation" feature quite a lot for future commands.

Wednesday, September 20, 2017

Isabella: Growing

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we integrated Isabella with the first external API: Youtube.

Now, the continuation...

The fun part

This next part of a project like this, is the best! Because all the basic functionality, and structure is already in place, so adding new functionality is super easy. In this case adding more API is just a few lines of code, which means that it really feels like something is happening.

In a lot of the projects we work on there are so much code around that making a change is barely noticeable. To add to this, making even a small change, we usually need to spend a lot of extra time testing that our change didn't break any of the other stuff. Tedious, but necessary, work.

At this stage Isabella is still just a nimble little thing, and so there is barely anything to test. We have also enforced an extremely low coupling between different parts, thus there is no way adding something new could affect any existing commands.

So, for the next while we are just going to be integrating more and more APIs.

Jokes, and political correctness

When we tested Alexa, one of the best features was that she could tell jokes, so of course Isabella should be able to do that too. Therefore we set out of a quest to find a good joke API. This was a surprisingly difficult task, as a lot of them are either outdated, or not-free. I did eventually find this one, and the jokes are exactly my style.

In Alexa, and probably the others as well, the developers have another challenge here. Their product has to appeal to a very wide audience. Therefore they have to be careful not to offend anyone with f.ex. Alexa's jokes. Our goal is not to make a wide spread product, so we can just have Isabella say what we want, no politics.

Convenience is king!

Earlier this year I switched my lights to Phillips Hue, so I could control them from my phone. That was so much easier, as a lot of my light switches are in inconvenient places. Therefore I immediately got super used to controlling all the lights from my phone. I have a few lights which I could not switch over, for different reasons, and now I just avoid using these lights because I can't do it from my phone.

This made me re-realize the truth of the statement "Convenience is king!", and so you can imagine how it felt when I integrated Hue into Isabella, so I could control everything by speech. Someone made a great API for it – for Typescript (yes, he included the typings), so thanks to that guy!

Saturday, September 16, 2017

Isabella: Becoming useful

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we discussed the importance of shortening the distance between coding and testing.

Now, the continuation...

Hello, World!

So far all of the functionality we in Isabella (apart from the speech recognition), have been coded directly, Functions for getting the time, date, or day, are coded directly into her source code. And that is fine for some stuff; they are useful functions. However, if we really want her to grow fast, the next obvious step was to connect her to the rest of the world, by making calls to APIs.

There are so many APIs out there (APIs), with tons of functionality which can be integrated for free. I do listen to a lot of music, and it would be nice to integrate that somehow.

We do have an advantage over many of the other personal assistants like Siri, or Alexa; we have a very large screen at our disposal. There is a theater saying "if you bring it to the stage, use it!" the point being that if you have something you should aim to utilize it as much as possible. An example could be, if you bring a cane onto the stage, it can also be an umbrella, a pointing stick, a gun, and many other things. Therefore, as we have a large screen let's do something with it.

Youtube, and wildcards

Personally I spend a lot of time on youtube, listening to music, or re-watching classic videos. Therefore the first API I want to integrate is Youtube's. First we need to play Youtube videos. A quick Google search gives us the a link to Youtube API for embedding videos. This is perfect for what we need. The only slight problem was that we need the ID of the video we want to play.

So, we need some way to search. This took a bit more work, and we had to get the first API key for Isabella: Youtube Search API. However, once we figured out the call we wanted, combining the two APIs was very easy.

However, we still needed to extend our basic command format to support wildcards, so we could say "Play ___ from youtube", and the same command would work wether you said "ed sheeran", "adele", or whatever. This was not too difficult, but it is very powerful!

Finally, Isabella can do something that we are actually going to use!

Field Testing

As a test I challenged myself not to use youtube – manually – for an entire day. While working I constantly listen to music, and it worked great. It was actually easier to ask Isabella to lookup videos instead of: switching window (away from work), going to youtube, typing the name of the video I wanted, click the top result, and switch back to what I was doing. Just like I argued in the last post, not having to fully context switch was a noticeable improvement on my workflow.

This function alone, I think, is enough that I'm going to keep using her, for a long time.

Wednesday, September 13, 2017

Small Change, Big Difference

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we spiked functionality for voice coding, and then setup basic functionality for simple commands, like telling the time.

Now, the continuation...

Inventing on principle

First: go watch Bret Victors talk. His point is that the distance from code to result should be as short as possible. I have found that to be helpful on many occasions, and Isabella certainly wasn't an exception. Having to make a change, save, compile, switch windows, refresh, say something, switch back, and do that over and over, was such a pain.

This may not sound like a lot of work, especially compared to large programs that take several minutes to compile, but the fact is: every time we change what is in front of our eyes, we break the context, and our brains have to do some work to switch back and forth.

This bothered me, especially because I was just adding a lot of little low risk functions. Therefore I decided to spend the time to reduce this impediment. In this case it only took 5 minutes, but I would argue it would have been worth it even if it had taken a week.

The solution was to make a command to refresh the window. This meant that I never had to take my eyes off the code, so while I am testing I can look directly at the code in question. This was a huge improvement in terms of enjoyment of coding.

The actual coding of this is trivial, but the effect is huge. Again it's getting late and I will call it a day.

Tuesday, September 5, 2017

Personal assistants; Siri, Alexa, ...

Recently I tried the Amazon Echo for the first time. I had always thought the voice control was annoying and unnecessary. However Alexa stole my heart. I don't want to get into a discussion about which voice control is better, I don't care.

What I do care about however is: with the current state of the field, how difficult is it to make a voice controlled personal assistant? Here I'm thinking of all the libraries and apis that are freely available.

So I have set out to get a sense for this question. Let's follow one of Googles sayings:

  • First do it,
  • Then do it right,
  • Then do it better.

First do it...

To start something new like this we always start by making an example and a tiny prototype. During this spike we specifically try to get as close as possible, to the thing we think to be hardest in the project.

First things first

First we needed to find a decent speech recognition framework for Typescript. We quickly found Annyang, and started playing with it. We want some context awareness in our PA, so we cannot use Annyangs standard command recognition. Instead we used Annyang to parse everything and then build our own algorithm to match the meaning to a command.

We were also fortunate enough that HTML5 has support for text-to-voice already, so we just use that.

Calculations

Having it recognize (and answer to) "hello" took all but two minutes. So we started thinking about "what do we actually want it to do?". I know, this should have been our first question, but we were blinded by the idea of building Jarvis and becoming real life Tony Starks.

Then it hit us. We want it to build Jarvis. Or put simpler, we want it to be able to help us code. We want to be able to talk to it like we do to colleague, and then it should program what we tell it to. Let me just clearify: we don't want to build an AI, just an assistant, who we can tell "do a linear search, then sort the list, and return the median" or something.

First we wanted it to do simple calculations like 2+2. The easiest way to do this was to just eval what was said. It worked like a charm. Only 15 minutes in and we could already access the values of variables, and add numbers.

Function calls

Function calls were more tricky. Especially because we didn't want to say "open parenthesis", or the likes. We have seen the youtube video, and we don't tell my colleague where to put parens or commas after all.

We made a function to take progressively shorter prefixes of the input, camelcase them and test if they were function names. Here is some pseudo code to show the idea:

tryEval(exp: string)
  try { return eval(exp); }
  catch(e) { return undefined; }
matchFunction(words: string[])
  for i : words.length ... 0
    prefix <- words.take(i)
    identifier <- makeId(prefix)
    evalResult <- tryEval(identifier);
    if(typeof evalResult === "function")
      return identifier;

Notice, that even though we use eval this code does not actually call the function, it just finds the name.

Success! We could tell the computer to define variables, evaluate expressions, and even call functions. This was great news for the viability of the project. As the spike ended, we fulfilled our promise (to the extreme programming gods) and erased everything.

Then do it right...

For the next phase of a project like this, we start in the complete opposite end of the spectrum with all the lowest hanging fruits first. If you are very nerdy you could say that we use "shortest arrival time scheduling". We also make sure to make good decisions as this is potentially long lasting code.

Matching meaning

This time we needed a more solid "meaning" algorithm. We do have a great advantage over general AI: we only need to match the input to a command from a very small list. With this in mind we decided to flip the problem on its head, we have a list of results, what is closest to the input. The code went something like this:

foreach command : database
  match <- 0
  foreach cWord : command
    best <- cWord.length
    foreach iWord : input
      if(best > distance(cWord, iWord))
        best <- distance(cWord, iWord)
    match <- match - best;

Intuitively: for each work in each command look for a word in the input that matches, the command with the most matches wins. So we are matching the command against the input, not the other way around.

Of course we also made some normalization code to remove contractions and such, but that is pretty straight forward.

Isabella say "hello"

There is a hidden assumption in the matching algorithm: everything it hears is a command. This is not always the case, therefore we need some way to know that you are talking to it and not just saying. The way we solve that in the real world is with names, so let's use the same solution here. We needed a name that was distinct enough that we wouldn't say normally, and it shouldn't sound like other words.

For now we have settled on "Isabella", as it is a beautiful name, which no one in our social circle have.

Code written in this phase has to be a lot more maintainable, and so we have a tiny database with inputs and answers. It is trivial to add constant things like "what is your name?" "Isabella", but that isn't very fun. Therefore we built in support for "hooks" ($), where we can specify to call a function instead of just saying the string out loud.

I think that's enough for one day, time to go to bed!