Showing posts with label Project. Show all posts
Showing posts with label Project. Show all posts

Saturday, January 13, 2018

Isabella: Testing, and a Build Pipe-line

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we made some small changes with big impact.

Now, the continuation...

Strengthening the foundation

As we are in a phase of improvement, this seems an appropriate time to make the foundation more solid: automated testing. On one hand I think I should have done this from the start, but on the other hand...

I am a big fan of Dan North, in particular his [Spike and Stabilize pattern]. With this we treat all code as if it was a Spike, code fast and unstable. Then deploy the feature to the users and see if it is used. Then after a certain amount of time – like a month – come back and see if it is being used. If it is mostly unused then delete it, otherwise if it is used a lot refactor it and write automated tests for it. This way you invest only very little time in code that ends up being unused.

Isabella started as just an experiment and not code I expected to be long lived, I opted for prioritizing new features over a stable code base. Just like stated in Spike and Stabilize. I recently changed my outlook for Isabella; I now expect that I will use (and work on) her for a long time. Said in another way, I have deployed the code, waited, and I now know that the code is being used. So, it is time to stabilize!

Stabilizing the code was an incredible frustrating process, for primarily two collaborating reasons. The first reason requires a bit of explanation. Many non-technical people think that programmers spent most of their time coding. In practice this is far from the truth. Normally with code we spent most of our time searching for and fixing bugs. This I can easily handle. As a child I loved mazes, now I enjoy being lost in a very complex system, fighting to find my way out. I savour the victory when I finally crack it.

During this process I have spent almost all my time searching the internet for answers; which libraries, syntax meaning, and just hard problems. Very boring, and time consuming. And when I wasn't search the internet I was rewriting the same code again and again. I didn't feel like I was making any progress at all. And then finally, even when something worked there was very little visible effect. So it also didn't feel like a victory.

Here is the documentation of my journey, including the problems and solutions I encountered.

Jasmine

My first instinct was: I want testing, I should start writing tests. I had a bit of experience with jasmine-node, so I installed it, and started writing tests. Problem was, because this was client code the way I used multiple files was with multiple script tags. Thus jasmine-node couldn't find any of the dependencies. Adding import statements in all the files was an obvious solution, but that would give errors on the client side.

I did some research and found something called system.js, which would emulate import statements on the client side. This meant having to refer directly to the .js files instead of the .ts files. In spite of this it seemed like a neat solution.

Codeship, Bitbucket, and Git

My next idea was to setup an automatic test-and-deploy cycle. I wanted Heroku to run the app, [Codeship] to test it, and Bitbucket to host the code. So far I had just used Herokus as my code-host. I was faced with an entirely unfamiliar challenge. How to move from one git repo, to another?

I am no git guru. Unfortunately. I wish I could tell you exactly the steps I took to make this work, but I have no idea. I pull'ed one way, then the other, committed, merged, and suddenly I could push to Bitbucket. Codeship quickly picked up the push, and deployed it to Heroku. I'm skipping here a few small issues with some RSA keys for deploying from Codeship.

Gulp

If we think of problems as lianas, software development is like playing Tarzan; we are constantly swinging from one to the next, in a seemingly endless jungle. Usually when I worked on Isabella I would just start tsc -w in the background, and forget about it. Sometimes I would forget to start the compiler which would be super annoying, because then I would push the build to the cloud to test it, and nothing would happen. This was fairly bad, but having added tests it was much more annoying. First, there were now two things to remember (or forget). Sure it also takes a bit longer to deploy, but having Codeship reject a deploy, because I forgot to test it locally was just a slap in the face.

It was time to setup a build tool. I did some research and narrowed the decision down to Gulp and Grunt. To me they seemed fairly equal, and I don't even remember what the deciding factor ended up being. I went with Gulp.

With a build tool the great advantage is that you can add as many post processing steps as you want. As encouraged by [Typescript documentation] I suddenly wanted browserify, and uglyfy. I also wanted it to be "watching", so I couldn't forget anything.

Uglyfy was no problem, watching was easy, browserify was... difficult. As mentioned earlier my test files used imports. In fact this had been quite tricky to achieve. Now it stood in my way, and I was not about to poke that bear. Therefore I abandoned my dream of browserify.

Do-over: Grunt!

As I remember it: I was brawling with an issue with Gulps jasmine-node not supporting the later versions of ecmascript – in particular Promises (which I use heavily). When suddenly I stumble on some blog describing my dream of a build pipe-line. It had everything, a client part, server part, and common part. The server part was tested using jasmine-node, the client part was tested with jasmine, and phantomjs. The client was browserify-ed and uglyfy-ed. There was watching, and the folder structure was beautiful. It was a [fine template] for a project like this.

The only problem was, it used Grunt. I'm not one to be over-confident in my decisions, so if I learn something new I gladly change. Thus I deleted everything I had made up till this point, and tried swapping in Grunt.

This was not problem free, but it wasn't too bad. I ended up testing both client and server with jasmine-node. Isabella is very light on dom, and very heavy on APIs, this I can test just as easily with jasmine-node.

Conclusion

Although this was a tough stretch I did accomplish a few things. My code is now a bit more secure from my students copying it, due to uglify. It also takes up less space, thus loads faster, also because of uglify. It is browserify-ed, so I can use import as much as I want, and I can never forget to include a file in the HTML. I have testing up and running, so now I can start adding tests whenever I add new features, or fix bugs in current ones. I have a guarded deploy so even if I forget to test locally I am guaranteed that the tests will be run before a deploy.

I don't have any general words of wisdom. I wont say that you should always just Grunt, or anything. Setting up a good pipe-line is hard, but it is also invaluable. It is also a problem that we don't tackle often. I am familiar with the DevOps saying: If it hurts, do it more. Encouraging practicing the skills that we struggle with. If you are afraid of deploying, do it more, so you minimize the risk. If you are afraid of changing some code, delete it and write it again, so you know whats going on. While I agree wholeheartedly with this advice, I don't feel like going through this process again any time soon. If you are about to setup a pipe-line of your own: I wish you the best of luck.

Wednesday, January 10, 2018

Isabella: Change to followup

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we talked about the process and problems of moving her to HTTPS.

Now, the continuation...

Followups

In Isabella there is a concept of a followup, meaning that she is able to reply and wait for a response for certain queries. The most important implication of this is that you do not need to start a followup with "Isabella"; you can just say it. This is what we use for playing games, or taking notes.

Something that often annoyed me was saying "Isabella, turn on the lights", which turns on the lights to their last state. This means that in the morning the lights would be dimmed red, because thats their night setting. Therefore I often had to say "Isabella, bright lights", which switches to their bright setting. I did this sequence, or a similar one until I realized that finding the right lighting is an experimental process. I would often switch scenes a few times maybe try dimming, before I found the exact lighting that I wanted. With this in mind it is annoying to have to say "Isabella" again, and again.

Categories

The solution was so simple, because I already had the followup system in place. I simply added a category to every command, then when I execute a command I just push all commands in the same category to the followup database.

Just like the "refresh" command from an earlier post, this was trivial to implement, yet the user experience was tremendous. Last time I argued that you should prioritize changes which benefit the development, so I wont beat that dead horse anymore.

This time I will point out that only by using the product this became apparent to me. This usage over theory was ironically something I learned while taking a theoretical compilers course at university. The course ended with a competition to see who could make the best peephole optimizer, for a byte-code language. To simplify the challenge we were judged only on the number of instructions. Our optimizers were run on several small, but realistic applications, to see who did the best.

Grip by competitive spirit, we thought up hundreds of byte-code patterns which were stupid and could be optimized. Come the judgement, it was revealed that we were the team with the most patterns, one team had only 7 patterns. "They probably slacked off" we thought, and we were confident that with so many patterns we were sure to win. However, that was not what happened. Everybody clapped and cheered as the team with 7 patterns accepted their victory.

It took months of pondering before I fully realized what had happened. The other team had understood when the teachers said "realistic applications". So they had spent their time, not coming up with stupid patterns like my team, but instead coding realistic applications, and looking at the byte-code. Doing this they had spotted 7 weird, but very common patterns.

Then lesson I learned then has stayed with me since. I knew already to "optimize for the common case", yet I did not know the value of investigating what "the common case" is. My advice this time is: remember to walk in your users shoes once in a while. Sometimes it will reveal a tiny change with a huge effect.

"Isabella"

Getting back to the followups. I noticed that with Amazon Echo I would say "Alexa", and then wait, to make sure she was listening. If you have a long, or complicated command it is tedious to repeat it because it didn't hear its name. Again this was a nice touch, which was simple to add to Isabella. I just added an empty command, which would push all commands to the followup database.

Here is a video:

Saturday, January 6, 2018

Isabella: HTTPS

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we discussed how we added Spotify and OAuth to Isabella.

Now, the continuation...

Deploy Debriefing

Having put Isabella in the cloud there are certain things I have struggled with, and some I still do. On localhost we can easily get access to the microphone and speakers. However, online Chrome won't even ask for permissions if it is not an HTTPS connections. At first this might not seem like such a big deal, Heroku immediately supports HTTPS, so adding the S should just work. Right?

Server-side protocol

Part of OAuth is to have the third party (Spotify) redirect back to use. At first we just used

redirect_uri: 'http://' + req.get('host') + spotify_conf.redirect_uri,

Now we needed to find out if we should add the S. Doing that – server-side – turned out to be a bit annoying. Let me save you the trouble:

let protocol = req.headers["x-forwarded-proto"] || "http";

HTTP requests

We also used several APIs in the client, which only run on HTTP. Let alone the blatant security problems of having all the API keys in the client. This violates the HTTPS agreement, so Chrome kindly blocks them, and warns the user.

Again there is a simple – albeit tedious – solution to both problems: to move all the HTTP calls to the server. That way the client only has HTTPS calls (to the server and Spotify).

app.get('/joke', (req, res) => {
  request.get({ url: "https://icanhazdadjoke.com/", 
            headers: { "Accept": "application/json" }}, 
              (error, response, body) => {
    res.send(body);
  });
});

We do have to be a bit careful while doing this. Eg. we use an API to lookup our location based on our IP. Obviously if we just move this call to the server, we will get the servers location. As luck would have it, this particular API allowed us to input a specific IP and look it up. Now we only need to find the clients IP, and pass it along. Again this was cumbersome because we are sometimes running localhost, and sometimes not. Anyway the solution is:

app.get('/ip', (req, res) => {
  let ip = req.headers['x-forwarded-for'] || req.connection.remoteAddress;
  if (ip === "::1") ip = "";
  request.get("http://ip-api.com/json/" + ip, (error, response, body) => {
    res.send(body);
  });
});

Whew... or I mean Hue

Having solved these problems it seemed like we were ready to go full HTTPS. But not without one last problem. Hue. The Phillips Hue API runs locally, and uses only HTTP requests.

I don't want to rant about the general implications or irresponsibility of this.

Because the calls are to a local IP I cannot move the calls to the server. From my search: I cannot change Hue to run HTTPS. So I'm stuck. If anyone has a solution or even suggestions on how to solve this I am all ears.

So, the slightly uncomfortable conclusion is that: with the exception of warnings from each Hue call, we have successfully moved to HTTPS!

Wednesday, January 3, 2018

Isabella: Spotify and OAuth

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant (VCPA), like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we finally deployed her to the cloud.

Now, the continuation...

Voice Controlled Personal Assistants

As Isabella has grown, I have started to grow more and more dependant on her, and indeed more attached to her. In the beginning this was just a fun experiment to see how difficult it was to make something like Alexa. At the same time I was strongly considering buying a "real" VCPA like Amazon Echo or Google Home. This doesn't seem reasonable anymore. The other VCPAs do offer a few features that Isabella doesn't have... yet. To balance it out I have decided to add a feature to Isabella that aren't available in the other assistants.

Spotify

I listen to music quite a lot. Wether I'm working, or cooking, Spotify is usually playing in the background. Again I don't want to get into a discussion about which music streaming service is best by any measure, I just happen to use Spotify. Unfortunately playing music from Spotify is not supported by Amazon Echo – in my country, at the time of writing. Of course this is due to politics and not technology. However I still want it.

Research

Spotify has great documentation for their web api. My first idea was just to get some audio stream, pipe it into an audio-tag and boom, music from Spotify. Unfortunately this turned out to be impossible. You can only retrieve a 30 second clip of a song.

This was quite the roadblock, and it stumped me for several days. I looked over the API again and again, and it just seemed to have methods for searching, and "clicking" the different buttons in the interface. In a way the commands in the API could make a remote control. Then it hit me. A remote control was exactly what I was trying to build. I didn't want to build an entire music streaming platform, I just wanted to control one.

This does have the limitation that Spotify needs to be constantly running in the background. But it does also mean that Isabella can control Spotify playing on other devices like phones or tablets.

OAuth

The first step when working with the Spotify API is to implement their OAuth protocol. Luckily Spotify's OAuth is super easy to implement due to their documentation. Most people know OAuth only from the "login in with facebook" (or google), but it can do much more. I imagine that we will use this same protocol for many APIs that we add in the future, like calendars, email, etc. Therefore I briefly explain the basics of OAuth. In my experience OAuth is difficult to grasp at first sight, so you should not expect to gain a deep understanding from this presentation.

Because repetition is good for understanding, I'll explain it using two metaphors I like. Then I'll explain it with the technical terms, because repetition is good for understanding.

Imaging that we are managers in a ware house. We have access to many areas, some of them are restricted, meaning only we have access to them. Now, for some reason we want someone else to solve one of our tasks. But in order to solve this task they need access to some of the restricted areas, that we have access to. This is the fundamental problem that OAuth solves.

The protocol states that:

  • you ask the person who should perform the task.
  • the person asks the secretary for a key to the restricted area.
  • the secretary calls you to ask if this person is allowed into this particular restricted area.
  • you confirm.
  • she writes an official form and gives to the person.
  • the person takes the form to the janitor.
  • the janitor makes the necessary key and gives it to the person.

At this point the person can perform the task. We could imagine the same procedure if you are applying for a job, and the company wants to know your grades, which are usually secret. The protocol states that:

  • you send a job application.
  • the company asks your school (or university) for your grades.
  • the school calls you to ask if this company is allowed to see your grades.
  • you confirm.
  • she writes sends a link to the school.
  • the company opens this link in a browser.
  • the browser shows the company your grades.

Finally let's take the concrete example of Isabella and Spotify. The protocol states that:

  • you want Isabella to control Spotify, so you send a request to Isabella.
  • Isabella redirect this request to Spotify, adding some authentication information, so Spotify knows who "Isabella" is.
  • Spotify then presents you with a "this application wants access to these areas".
  • you click confirm/continue – ie. sends a request to Spotify.
  • Spotify redirects this request to Isabella adding a special token.
  • Using this token Isabella sends a request to Spotify.
  • Spotify returns an access_token.

Basically every call in Spotify's API requires this access_token.

The first step

The first step in the protocol is to show that you want Isabella to take control of Spotify. The standard way is to have a button, and that was my first approach too. This is because the first time you click it, it takes you away from Isabella, and you are confronted with a screen. From a Human-Computer Interaction view point, this view change is feedback, so it is fitting to have a button. However, any subsequent times you click it, Spotify remembers your consent and just sends you straight back to Isabella without you noticing it. This means that in the subsequent cases we have a button without noticeable feedback – not good.

Common computer science knowledge teaches us that we should optimize for the common case. Imagining that you want Isabella to take control of Spotify often... very often. We only log in "for the first time" once. The common case is clearly the subsequent times, where it does not make sense to have a button. Therefore I decided in the end to remove the button, and add a command to "log in to Spotify".

Now this does cause a problem with discovery; the process by which a user learns about features. It is easy to see a button and try to click it. It is harder if there is no visual ques. However this is a general problem for VCPAs, how do you know what you can do with it? With human interaction we assume that either the receiver know how to answer our query, or we can teach them. Is this an approach we can take with VCPAs? Start with a broad basis of tasks, and the have the users teach them what they need? How should they teach it? I will certainly look deeper into this in a later post.

For now, here is a video:

Wednesday, October 11, 2017

Isabella: Deployment and Gender

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we taught Isabella to play games, to improve her mood.

Now, the continuation...

Deployment

I finally got around to deploying Isabella to the cloud, so I could test her on other devices.

I have multiple computers which I use for different work tasks. As Isabella was a project related to my teaching, she was only running on my teaching-computer. This means that every time I got a new idea I had to find my other computer, wait for it to open, find the folder, open the editor, and finally write down the note, or implement the feature.

This of course meant that some times, if it was a small idea, I just wouldn't bother. I have spent a lot of words arguing for eliminating all annoyances and obstacle connected to coding, and yet this is pretty much the biggest impediment I can think of. And it is so easy to get rid of.

First we decide where to deploy her to. I am a big fan of Heroku, so that was the obvious choice for me. Then it is as easy as:

  • calling heroku create [project-name].
  • create a Procfile
web: node index.js
  • create a .gitignore
node_modules
  • make a trivial server script
import * as Express from 'express';
let app = Express();
app.set('port', (process.env.PORT || 5000));
app.get('/', function (req, res) {
  res.sendFile(__dirname + '/index.html');
});
app.get('/*', function (req, res) {
  res.sendFile(__dirname + req.url);
});
app.listen(app.get('port'), function () {
  console.log('listening on *:' + app.get('port'));
});
  • and finally commit everything with git
git add .
git commit -m "Deploy to Heroku"
git push

Of course she should have been in version control from the beginning, which would have meant I could easily clone her on my other computer and run her locally there too. It would also enable me to develop on her on the other computer, eliminating the impediment discussed above.

Gender

Once deployed to Heroku I could not wait to test her out on my other computer. I flew over to it, opened the URL, hit enter, and held my breath. She loaded, and finally I heard the words I was waiting for: "I am listening", spoken in a deep male voice. I was stunned, and then I broke out in laughter. Of all the things I expected he switching gender was not one of them.

I had already implemented functionality for changing her name, so the first thing I asked was "Can I call you David". This was very originally to solve the problem of having two devices listening at once, I needed a way to distinguish them. It was impressive to me how much effect the name had, just because I am calling her Isabella, I was completely set on her being a her.

I had previously played around with different voices, to make sure I was using the one I liked best for text-to-speech. But now I actually needed it, so I added a new command for changing her voice.

In order to complete the feature, I added a list of the most common male and female names, so that when you change her name, if you choose a male name, she will also try to find a male voice, and vice versa.

Saturday, October 7, 2017

Isabella: Games

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we explained our reasoning for adding feelings or moods to Isabella.

Now, the continuation...

Games

Now that Isabella have feelings, it makes sense to think about it a bit. What happens if she gets in a bad mood? How can you make her happy again? How do people make each other happy?

Obviously we cant give her a gift, or a hug. Although now that I'm thinking about it, it is not a bad business idea, making an Isabella gift shop, where you can buy virtual gifts for her, to improve her mood. Especially – let's be honest – she is just a fancy, useful tamagotchi.

Another way we humans improve our moods, is by playing games. This we can do with Isabella, and then give a boost to her mood. But which games can she play? The easy answer is: pretty much every game you can play while driving. The first one I thought of was "20 questions", and the easiest I could think of was: guess a number. So let's look at both in turn.

Guess a number

Guess a number is a very simple game, where one player thinks of a number between 0-100, and then the other player has to guess it, using as few guesses as possible.

The cool thing is that this game is exactly what you would expect a computer to like, based on popular preconceptions.

This game was really easy to implement (both ways), using our follow-up system. When she is thinking of the number, you just compare the input with the number and say higher, or lower. When you thinking of the number, she just uses binary search – like any good computer.

Akinator

20 questions is quite a bit more complex. But luckily, like so many times, somebody has already made a brilliant game called Akinator which is exactly what I want. Even more lucky: it has an API. Unfortunately, the API has no documentation. The closest was a couple of projects on github which tried to use it as well.

Unfortunately their code was not quite what I was looking for, so I, instead, made my own, very thin layer on top of the API. Maybe it will be useful for someone else, so here it is:

type StepInformation = {
  question: string,
  answers: {
    answer: string
  }[],
  step: string,
  progression: string,
  questionid: string,
  infogain: string
}
type AnswerResponse = {
  identification: {
    channel: number,
    session: string,
    signature: string
  },
  step_information: StepInformation
}
type CharacterResponse = {
  elements: {
    element: {
      id: string,
      name: string,
      id_base: string,
      proba: string,
      description: string,
      valide_contrainte: string,
      ranking: string,
      minibase_addable: string,
      relative_id: string,
      pseudo: string,
      picture_path: string,
      absolute_picture_path: string
    }
  }[],
  NbObjetsPertinents: string
}
class RawApinator {
  private session: string;
  private signature: string;
  private step = 0;
  constructor() { }
  hello() {
    return new Promise<AnswerResponse>((resolve, reject) => {
      $.ajax({
        url: 'http://api-us3.akinator.com/ws/new_session?partner=1&player=maxipaxi',
        dataType: "jsonp",
        error: reject,
        success: (data: { completion: string, parameters: AnswerResponse }) => {
          this.session = data.parameters.identification.session;
          this.signature = data.parameters.identification.signature;
          this.step = 0;
          resolve(data.parameters);
        }
      });
    });
  }
  sendAnswer(answerId: number) {
    return new Promise<StepInformation>((resolve, reject) => {
      $.ajax({
        url: 'http://api-us3.akinator.com/ws/answer?session=' + this.session 
           + '&signature=' + this.signature + '&step=' + this.step 
           + '&answer=' + answerId,
        dataType: "jsonp",
        error: reject,
        success: (data: { completion: string, parameters: StepInformation }) => {
          this.step++;
          resolve(data.parameters);
        }
      });
    });
  }
  getCharacters() {
    return new Promise<CharacterResponse>((resolve, reject) => {
      $.ajax({
        url: 'http://api-us3.akinator.com/ws/list?session=' + this.session 
           + '&signature=' + this.signature + '&step=' + this.step 
           + '&size=2&max_pic_width=246&max_pic_height=294&pref_photos=OK-FR&mode_question=0',
        dataType: "jsonp",
        error: reject,
        success: (data: { completion: string, parameters: CharacterResponse }) => {
          this.step++;
          resolve(data.parameters);
        }
      });
    });
  }
}

Here is a short video of us playing a game.

Saturday, September 30, 2017

Isabella: Touch-y Feel-y

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we did another SPIKE for the programming part of Isabella.

Now, the continuation...

Feelings

I have worked a bit with game programming, and design. I really enjoy it. There are so many interesting aspects of it, and once you study something like that, you start noticing it everywhere. Why do I bring this up?

We would like Isabella to be as relatable as possible. This is because, then we have more control over users, and the more Isabella feels like a human, the higher our tolerance for errors. One way to do this is to add a "feeling game". Here we mean, a small system where some things, like rude behavior, will affect the way she behaves and responds. Of course this should not be detrimental to the functionality, but there is still some room to work with.

To facilitate this we needed to rewrite a large portion of the matching system, and while doing this we also refactored our every "eval", and changed all callbacks to "promises". This is footnote, yet the task took hours and hours.

Having done this we could separate the "fluffy" part of responses, the "it is ___ o'clock" from the "9:45". Now we simply say that if she is in a good mood she says the entire thing, otherwise she only give the bare, cold information.

Now we just need something to affect her mood. The first thing we added was good and bad words, like cussing, or saying "please" and "thank you". We then had a cool idea. All speech recognition has the challenge of understanding variations of commands. Here we have a brand new innovation: we can teach our users. We tied the matching score to her mood, so the harder she has to work to figure out what you want the less happy she will be.

We also have plans for having offensive jokes or quotes, affect her mood as well, this way regaining some of the political correctness we threw away earlier in development. She might still say something inappropriate but at least she'll be mad about it as well.

Wednesday, September 27, 2017

Isabella: Programming, by Speech

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we added support for follow-up queries, so we could have "conversations".

Now, the continuation...

SPIKE

As we mentioned in the very first post, one of the goals was to have Isabella to assist with coding. In particular, coding herself. Now that we have added quite a few functions, and features such as wild cards, and follow-ups, it is time for another SPIKE to test if we are getting closer.

In the first SPIKE we reached a point where we could call functions, and generally evaluate expressions. This time we want to see if we can implement a function with Isabelle. The time-box is a few hours, at most an evening, thus tomorrow everything we write from this point will be deleted again.

The idea

Dictating everything is horrible, it is slow and error prone, so the Isabella should at least have some domain knowledge. It should feel like telling one of my students what to do. My idea is to have her take control of the conversation, this way she can keep track of where we are, thus simplifying navigation significantly.

Simply put, we initialize a series of questions, where each answer is translated into code differently depending on which part of the function we are in. That is, the answer to "what should I call the function" would be translated into an identifier, where as the answer to "what is this statement" would be parsed as f.ex. an assignment.

This "conversation" is much more complex that what we have used before, because there is no set limit to the number of questions. There could be no parameters, or there could be 15. How do we support this?

Counting (arrays)

Our first solution was to have her start an enumeration by asking "how many Xs will there be", and then just loop that many times for answers. This worked.

It did required us to could how many things we wanted up front, which is not natural when coding. Next time you are coding something try to predict how many statements are going to be in a function. For this reason we abandoned this solution.

If we think about it, this is exactly how arrays work. When we initialize them we have to specify their length. The problems we ran into are also recognized from arrays: it is difficult to change their length later.

Terminating (lazy lists)

Having realized that our first idea was basically arrays, we can use this to come up with an alternative idea: Lazy lists. Our next idea is to reserve a keyword ("done"), just keep asking for more until we said that word. This way we just start saying statements, and then when there are no more, we just say "done" and the function is complete.

This is much more natural! So this is probably the way to go.

Conclusion

Coming to the of the SPIKE, what have learned?

First of all we did succeed in programming a method using speech. And it did not take an unrealistically long time. We also learned a few important lessons for later when we decide to implement the real thing.

Unfortunately it was not as smooth as we had hoped, after an evening of working on it. It is clear that to make this useful we basically have to implement an entire language for programming. We did this for "a function" and it was nice, but we should also apply the idea to "statements", and probably "expressions" and so on. If we have to do this anyway Javascript might not be the right choice. Maybe we should invent a language which works specifically with speech? We'll have to see.

Saturday, September 23, 2017

Isabella: Now we're talking

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we continued adding more APIs: functionality for free.

Now, the continuation...

More APIs

As mentioned, this stage of development, is super fun, because so much is happening, so fast. Keeping with this we try to add at least two new things at a time.

The small easy addition this time is a quote API. If we thought the joke API was difficult to find, we did not anticipate how difficult it would be to find a good quote API. In fact, implementing features like this takes about 7 minutes, however finding a good API takes about 2 hours.

In the end we gave up the search for a "general, inspirational quote API". And settled for a "programming quote API". Again, we can take advantage of the target audience of Isabella; me, and maybe a few of my friends. We are fine with programming quotes.

At this point we have also added:

"Conversations"

A different thing completely, is that we eventually want Isabella to have some contextual understanding. Like saying "Play Ed Sheeran", and then following that up with "How old is he", or something like that.

The most basic examples of contextual understanding comes from saying "what" when you don't hear what she says. In this case we want her to repeat the last thing she said. One characteristic of this follow-up query is that you shouldn't have to say "Isabella" first, like normal, as it comes as part of a "conversation".

We introduced the Isabella name to simplify the command matching algorithm. So we could take advantage of the fact that we knew we should try to match one of the commands, and could do our "backwards trick". Now we want to remove this simplification, and that means that she listens to everything that is said after a command. However, this time not everything that is say should match a command. To facilitate this we needed to add a threshold to our matching algorithm, and say that we only register something if it is a sufficiently good match.

First we just add the option to say "what" after everything and she repeats. Then we add "thank you", to which she will reply "you are welcome". Then we took it to the next level.

We wanted to add notes. Here we did not want to use our wildcards, as the note might be quite long. Instead we wanted to use a custom follow-up, where we could capture everything and save it. So that is what we did.

We are probably going to use this "conversation" feature quite a lot for future commands.

Wednesday, September 20, 2017

Isabella: Growing

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we integrated Isabella with the first external API: Youtube.

Now, the continuation...

The fun part

This next part of a project like this, is the best! Because all the basic functionality, and structure is already in place, so adding new functionality is super easy. In this case adding more API is just a few lines of code, which means that it really feels like something is happening.

In a lot of the projects we work on there are so much code around that making a change is barely noticeable. To add to this, making even a small change, we usually need to spend a lot of extra time testing that our change didn't break any of the other stuff. Tedious, but necessary, work.

At this stage Isabella is still just a nimble little thing, and so there is barely anything to test. We have also enforced an extremely low coupling between different parts, thus there is no way adding something new could affect any existing commands.

So, for the next while we are just going to be integrating more and more APIs.

Jokes, and political correctness

When we tested Alexa, one of the best features was that she could tell jokes, so of course Isabella should be able to do that too. Therefore we set out of a quest to find a good joke API. This was a surprisingly difficult task, as a lot of them are either outdated, or not-free. I did eventually find this one, and the jokes are exactly my style.

In Alexa, and probably the others as well, the developers have another challenge here. Their product has to appeal to a very wide audience. Therefore they have to be careful not to offend anyone with f.ex. Alexa's jokes. Our goal is not to make a wide spread product, so we can just have Isabella say what we want, no politics.

Convenience is king!

Earlier this year I switched my lights to Phillips Hue, so I could control them from my phone. That was so much easier, as a lot of my light switches are in inconvenient places. Therefore I immediately got super used to controlling all the lights from my phone. I have a few lights which I could not switch over, for different reasons, and now I just avoid using these lights because I can't do it from my phone.

This made me re-realize the truth of the statement "Convenience is king!", and so you can imagine how it felt when I integrated Hue into Isabella, so I could control everything by speech. Someone made a great API for it – for Typescript (yes, he included the typings), so thanks to that guy!

Saturday, September 16, 2017

Isabella: Becoming useful

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we discussed the importance of shortening the distance between coding and testing.

Now, the continuation...

Hello, World!

So far all of the functionality we in Isabella (apart from the speech recognition), have been coded directly, Functions for getting the time, date, or day, are coded directly into her source code. And that is fine for some stuff; they are useful functions. However, if we really want her to grow fast, the next obvious step was to connect her to the rest of the world, by making calls to APIs.

There are so many APIs out there (APIs), with tons of functionality which can be integrated for free. I do listen to a lot of music, and it would be nice to integrate that somehow.

We do have an advantage over many of the other personal assistants like Siri, or Alexa; we have a very large screen at our disposal. There is a theater saying "if you bring it to the stage, use it!" the point being that if you have something you should aim to utilize it as much as possible. An example could be, if you bring a cane onto the stage, it can also be an umbrella, a pointing stick, a gun, and many other things. Therefore, as we have a large screen let's do something with it.

Youtube, and wildcards

Personally I spend a lot of time on youtube, listening to music, or re-watching classic videos. Therefore the first API I want to integrate is Youtube's. First we need to play Youtube videos. A quick Google search gives us the a link to Youtube API for embedding videos. This is perfect for what we need. The only slight problem was that we need the ID of the video we want to play.

So, we need some way to search. This took a bit more work, and we had to get the first API key for Isabella: Youtube Search API. However, once we figured out the call we wanted, combining the two APIs was very easy.

However, we still needed to extend our basic command format to support wildcards, so we could say "Play ___ from youtube", and the same command would work wether you said "ed sheeran", "adele", or whatever. This was not too difficult, but it is very powerful!

Finally, Isabella can do something that we are actually going to use!

Field Testing

As a test I challenged myself not to use youtube – manually – for an entire day. While working I constantly listen to music, and it worked great. It was actually easier to ask Isabella to lookup videos instead of: switching window (away from work), going to youtube, typing the name of the video I wanted, click the top result, and switch back to what I was doing. Just like I argued in the last post, not having to fully context switch was a noticeable improvement on my workflow.

This function alone, I think, is enough that I'm going to keep using her, for a long time.

Wednesday, September 13, 2017

Small Change, Big Difference

Previously on Dr. Lambda's blog:

In a previous post I presented my newest pet project: Isabella. Isabella is a voice controlled personal assistant, like Siri, Alexa, and others. We have decided to investigate how difficult it is to make such a program. In the last post we spiked functionality for voice coding, and then setup basic functionality for simple commands, like telling the time.

Now, the continuation...

Inventing on principle

First: go watch Bret Victors talk. His point is that the distance from code to result should be as short as possible. I have found that to be helpful on many occasions, and Isabella certainly wasn't an exception. Having to make a change, save, compile, switch windows, refresh, say something, switch back, and do that over and over, was such a pain.

This may not sound like a lot of work, especially compared to large programs that take several minutes to compile, but the fact is: every time we change what is in front of our eyes, we break the context, and our brains have to do some work to switch back and forth.

This bothered me, especially because I was just adding a lot of little low risk functions. Therefore I decided to spend the time to reduce this impediment. In this case it only took 5 minutes, but I would argue it would have been worth it even if it had taken a week.

The solution was to make a command to refresh the window. This meant that I never had to take my eyes off the code, so while I am testing I can look directly at the code in question. This was a huge improvement in terms of enjoyment of coding.

The actual coding of this is trivial, but the effect is huge. Again it's getting late and I will call it a day.

Tuesday, September 5, 2017

Personal assistants; Siri, Alexa, ...

Recently I tried the Amazon Echo for the first time. I had always thought the voice control was annoying and unnecessary. However Alexa stole my heart. I don't want to get into a discussion about which voice control is better, I don't care.

What I do care about however is: with the current state of the field, how difficult is it to make a voice controlled personal assistant? Here I'm thinking of all the libraries and apis that are freely available.

So I have set out to get a sense for this question. Let's follow one of Googles sayings:

  • First do it,
  • Then do it right,
  • Then do it better.

First do it...

To start something new like this we always start by making an example and a tiny prototype. During this spike we specifically try to get as close as possible, to the thing we think to be hardest in the project.

First things first

First we needed to find a decent speech recognition framework for Typescript. We quickly found Annyang, and started playing with it. We want some context awareness in our PA, so we cannot use Annyangs standard command recognition. Instead we used Annyang to parse everything and then build our own algorithm to match the meaning to a command.

We were also fortunate enough that HTML5 has support for text-to-voice already, so we just use that.

Calculations

Having it recognize (and answer to) "hello" took all but two minutes. So we started thinking about "what do we actually want it to do?". I know, this should have been our first question, but we were blinded by the idea of building Jarvis and becoming real life Tony Starks.

Then it hit us. We want it to build Jarvis. Or put simpler, we want it to be able to help us code. We want to be able to talk to it like we do to colleague, and then it should program what we tell it to. Let me just clearify: we don't want to build an AI, just an assistant, who we can tell "do a linear search, then sort the list, and return the median" or something.

First we wanted it to do simple calculations like 2+2. The easiest way to do this was to just eval what was said. It worked like a charm. Only 15 minutes in and we could already access the values of variables, and add numbers.

Function calls

Function calls were more tricky. Especially because we didn't want to say "open parenthesis", or the likes. We have seen the youtube video, and we don't tell my colleague where to put parens or commas after all.

We made a function to take progressively shorter prefixes of the input, camelcase them and test if they were function names. Here is some pseudo code to show the idea:

tryEval(exp: string)
  try { return eval(exp); }
  catch(e) { return undefined; }
matchFunction(words: string[])
  for i : words.length ... 0
    prefix <- words.take(i)
    identifier <- makeId(prefix)
    evalResult <- tryEval(identifier);
    if(typeof evalResult === "function")
      return identifier;

Notice, that even though we use eval this code does not actually call the function, it just finds the name.

Success! We could tell the computer to define variables, evaluate expressions, and even call functions. This was great news for the viability of the project. As the spike ended, we fulfilled our promise (to the extreme programming gods) and erased everything.

Then do it right...

For the next phase of a project like this, we start in the complete opposite end of the spectrum with all the lowest hanging fruits first. If you are very nerdy you could say that we use "shortest arrival time scheduling". We also make sure to make good decisions as this is potentially long lasting code.

Matching meaning

This time we needed a more solid "meaning" algorithm. We do have a great advantage over general AI: we only need to match the input to a command from a very small list. With this in mind we decided to flip the problem on its head, we have a list of results, what is closest to the input. The code went something like this:

foreach command : database
  match <- 0
  foreach cWord : command
    best <- cWord.length
    foreach iWord : input
      if(best > distance(cWord, iWord))
        best <- distance(cWord, iWord)
    match <- match - best;

Intuitively: for each work in each command look for a word in the input that matches, the command with the most matches wins. So we are matching the command against the input, not the other way around.

Of course we also made some normalization code to remove contractions and such, but that is pretty straight forward.

Isabella say "hello"

There is a hidden assumption in the matching algorithm: everything it hears is a command. This is not always the case, therefore we need some way to know that you are talking to it and not just saying. The way we solve that in the real world is with names, so let's use the same solution here. We needed a name that was distinct enough that we wouldn't say normally, and it shouldn't sound like other words.

For now we have settled on "Isabella", as it is a beautiful name, which no one in our social circle have.

Code written in this phase has to be a lot more maintainable, and so we have a tiny database with inputs and answers. It is trivial to add constant things like "what is your name?" "Isabella", but that isn't very fun. Therefore we built in support for "hooks" ($), where we can specify to call a function instead of just saying the string out loud.

I think that's enough for one day, time to go to bed!