Olimex’s Programming Challenge #4 – The First For Me

I’m following Olimex‘s blog for some time, now. Well, I believe since I bought the first products from them, which was in October if I remember right. If you think: ‘Olimex? Never heard of.’ Here is an eight words description: Olimex designs and sells microcontrollers targeted at makers. Others do that to and have a much bigger community around them – especially to mention are Arduino and Raspberry Pi. So what makes them stand out (for me)?

I build my first keyboard with an Arduino Uno (actually two). That’s possibly the best path to take when you are beginning with microcontrollers and electronics tinkering, but there are some things where Arduinos don’t shine. Power consumption, for example. Since I struggled to tune my keyboard for longer battery life, I browsed the web for alternatives. Olimex was mentioned sometimes on different forums, so I headed over and checked their products. Here are some plus points, I experienced so far:

  • The products palette is huge. That can be considered a minus point for them, as well.
  • The prices are fair.
  • They are at the edge off microcontroller development.
  • All products I looked at so far can be used in industrial environments.
  • In case a product is a clone of something, the clone is always a little better in some way compared to the original and costs less.
  • You can order directly, even in small quantities.

There are some things to consider, though. While they are trying to provide decent documentation and building a community, it is still much harder to make progress then in the Arduino world, for example. Another point is, since they are really fast, it happens sometimes, that there are some flaws in the design. They fix those usually quickly and are customer friendly. Haven’t heard any other things, yet.

I don’t have any relationships with Olimex, besides being a customer. Just for clarification.

Actually the topic of this post is a different one and after this lengthy plugging for Olimex, it’s time to address it. Olimex runs an over the weekend programming challenge at their blog. The fourth is actually running. Check it out, if you have any programming experience and fun in solving riddles. For me it’s the first time to participate in their challenge.

For this weekends challenge you have to find the longest isogram in a given text. An isogram is a word where no letter exists more than once. I immediately imagined a solution and hacked it quick and dirty down and send it to them. That was yesterday in the evening, after getting the kids to bed. Now after rethinking my approach I might have missed a thing or two. Here is what I have send them. I shortened the example text.

[gist https://gist.github.com/Crenshinibon/5379973]

What about apostrophes? Do they count? Is “Dirk’s” one word with six letters? Or five? My solution makes it a four letters word (Dirk) and an one letter word (s), since the apostrophe becomes a space. What about hyphenation? Do they think the algorithm has to look out for hacked apart words and combine them and count their total length? My solution doesn’t do that. I know, I should have asked those questions before submitting something. But, heh, it’s to late for me now, so stop the accusations.

My algorithm is not as efficient as possible, either. A most efficient solution should go only once, letter by letter through the whole text. Converting the actual letter on the fly to lower case. Well the following might be better, regarding the efficiency.

[gist https://gist.github.com/Crenshinibon/5380129]

What do you think? Are their further aspects I didn’t consider?

Update:

Here is a solution that takes an URL (e.g. http://fiction.eserver.org/novels/alice_in_wonderland.html) as the command line parameter and finds the longest isogram on the given page:

[gist https://gist.github.com/Crenshinibon/5393957]

With my UMTS connection this takes a little more then two seconds for Alice. And for completeness here is a solution that takes a file name as input:

[gist https://gist.github.com/Crenshinibon/5394158]

Alice takes 40ms and the Bible around 850ms. I ran those tests on my MacBookPro 13” early 2011.

Advertisements

7 thoughts on “Olimex’s Programming Challenge #4 – The First For Me

  1. Svetlin Zarev says:

    Ther is a room for a lot more optimisations. For example a word might be contained more than once in the text – so do we have to check if it is an isogram every time we encounter it?

    Can’t wait to see all the solutions 😀

    1. Dirk Porsche says:

      Well, if you look at the optimized, second solution. There I don’t actually look at every word separatly, for being an isogram. So, I guess I wouldn’t win anything if I remember seen words. It would be an extra step. The seconds solutions complexity, looks for me to be in the worst case O(2n). n is the length of the text. The loop over every character of the text is the first n. The running check if the current letter, defines the current word as an isogram is the second n.

      Well I don’t know much about the inner workings of JavaScript, but my first solution might be something around O(6n). It goes through the whole text: First to remove none letter charakters. Second to convert upper case to lower case. Third to split by space. Fourth the loop over all words found (well here it’s not the whole text). And fifth and sixth the check if a word is really an Isogram (splitting by ” and iterating).

  2. Svetlin Zarev says:

    To be honest, I didn’t look at the second solution. Either way I missed to mention a couple of ideas in my previous comment:

    1. In the rules it’s not clearly specified what is a “word”, so in my solution I consider as words : “five-edged” or “it’s” for example (this has nothing to do with performance) and don’t split them.

    2. There might be more than one isogram with maximum length, so IMHO, we should print them all out. I’m not much into JavaScript, so please correct me if I’m wrong, but once you find an isogram with length N, then you only look for isograms with length > N

    3. If a word is an isogram and its encountered M times, you’ll execute the first if clause M times * wordLength just for every repetition of this word, because you are treating the words as a sequence of characters, and not as separate entity. This is a lot of overhead. I did a test on the Bible downloaded from here http://printkjv.ifbweb.com/#downloads and I’ve got: 837305 words, 13769 of which are unique. So checking every word via checking a char sequence char by char will be much slower than checking just the unique words.

    4. Actualy if you filter out the duplicate words, you can sort them in descending order according to their length and only do the checking until you find an isogram, then it’ll be much easier to print all the isograms with the same (maximum) length, because the words are already sorted.

    My solution can be found here: https://github.com/SvetlinZarev/OlimexWPC4

    1. Dirk Porsche says:

      Thanks for the continued feedback. It is fun to develop a solution and I must admit it’s even a little more fun discussing it.

      1. I wrote about this things in the text. But decided against acknowleding this cases.

      I looked at your code. I’m not very good at decyphering RegEx so excuse me, if the following questions answer is obvious. Does your solution consider words that are broken at a line end?

      2. Right, I didn’t consider this. And I’m still not so sure if I should …

      3. You are right. I shouldn’t go into the if-clause if the current word is already identified as not being an isogram. The following checks are not necessary, but hopefully not that expensive.

      But I think you are not right, regarding the efficiency of first eliminating none unique words. To identify those, you (or the compiler/interpreter underneath) already have to scan the whole text character by character at least once. To identify words for example.

      I do check the whole text character by character only once. I find words and their boundaries, by looking if the current character is a space (well, not a letter describes it more correctly). As long as there is a letter the current word gets built up.

      It is not an isogram if the current letter is already contained in the current word, otherwise it is considered an isogram, until the next letter is examined.

      4. Well, that’s a nice strategy if the cost for getting those words wasn’t so high already.

    2. Dirk Porsche says:

      I have seen you measured the execution time. I did likewise and it results in around 39 ms for the example text on my old Intel Q6600 / Ubuntu 12.04, for the second solution. The first solution takes around 54 ms.

  3. Svetlin Zarev says:

    I don’t think that this kind of benchmark is fair, because we are using different hardware and the different languages have different runtime performance.

    What you consider “the sample text”? If it is the string “some example text”, then my solution takes around 80ms, “Alice”(167kb) takes around 1500-1600 ms, and the bible(4.4mb) – arond 8000ms on Intel Core2Duo T5500 @1GHz Debian/Sid. As you can see it does not scale linearly with the file size. That’s because at first java runs in “interpreted” mode, then the JIT kicks in.

    Giving it a second thought, I can see your point 🙂

    1. Dirk Porsche says:

      By the sample text I mean Alice, the one given by the challenge. But you are right, it’s quite unfair. I have the whole text in a variable straight available, no filesystem access and stuff.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s