Phergie refactoring idea

I’m taking a course this semester on software architecture — the high level design principles that go into building high-quality, maintainable software. The class is generally pretty decent, but the best part of it is the project. Over the course of the semester, teams have to learn and describe the architecture of an open source project; analyze how design patterns and design principles are applied; critique parts of the project that could benefit from refactoring; and then actually refactor the code — and if you’re feeling brave — submit the change back to the project.

My group is studying Phergie, an IRC bot that can moderate and perform administrative tasks on IRC channels. It can also do a few other fun things like pretend to “serve beer” to channel users, look up documentation for PHP code, etc.

We’re encouraged to get on project mailing lists and bug trackers and introduce ourselves to the developers. I did so and Matthew Turland was kind enough to give us suggestions on how to contribute back to the project — and even give me feedback on my homework!

I’ll post that homework here with some context. The goal is to find a “code smell” or some other kind of architectural defect; describe it; and then suggest a fix (a “refactoring”). We’re given points for ambition and we don’t actually have to implement the change — so we’re not limited by our ability to actually refactor the code.

Refactoring a large class in Phergie

Most often changed files. The blue line is the mean and the red line is one standard deviation above the mean.

Figure 1. Most often changed files. The blue line is the mean and the red line is one standard deviation above the mean.

I began my search for code smells by ranking the files by the number of commits in the git log that touched each file (see Figure 1.) (Edit: This idea comes from Michael Feathers’s talk here and if you think this sort of thing is cool, you should read his blog here.) The most committed-to file is also one of the largest at 740 lines of code — Phergie/Driver/Streams.php, which contains the Phergie_Driver_Streams class. Ostensibly, this class is for handling the TCP connection to the IRC server. I noticed two things immediately:

  1. Phergie_Driver_Streams is the sole child class of Phergie_Driver_Abstract. In my opinion, this is an over-generalization: there appears to be no reason (nor a plan) to have a non-streams-based implementation.
  2. Phergie_Driver_Streams is not only responsible for handling the connection to the server; it is also responsible for parsing and formatting IRC commands. The class is so large because it contains methods pertaining to both responsibilities, and methods that are (arguably) too large because they perform both duties as well.

For 1), the obvious solution is to flatten the hierarchy and use only the Streams class. For 2), my proposed solution is (see the provided UML diagrams):

  1. Move the parsing logic from getEvent() to its own method called parseEvent()
  2. Move the parseEvent() method to a new class called Phergie_IRC_Command_Handler.
  3. Move the formatting logic from send() to its own method called formatCommand().
  4. Move the formatCommand() method and all methods starting with do to Phergie_IRC_Command_Handler.

My best estimate is that this would split the class into two files with lengths of approximately 400 lines of code. This is closer to the mean (227 LOC) and in my opinion much more manageable and understandable — each class has more clearly defined responsibility.

Current architecture of the IRC/TCP subsystem in Phergie

Figure 2. Current architecture of the IRC/TCP subsystem in Phergie

Proposed refactoring of the IRC/TCP subsystem in Phergie

Figure 3. Proposed refactoring of the IRC/TCP subsystem in Phergie

What did the lead Phergie developer think of it?

I posted an earlier draft of this to the mailing list and Matthew Turland, the lead developer responded:

I agree that Phergie_Driver_Streams handling parsing and generation of IRC commands is part of why it’s so large, which is why I’m moving those into separate classes (and even libraries) in Phergie 3. See https://github.com/phergie/phergie-irc-parser and https://github.com/phergie/phergie-irc-generator. (These also use a Phergie\Irc subnamespace, in anticipation of one or more Jabber drivers also being developed.) See also https://github.com/phergie/phergie-irc-client-react, which is still very much in development but is an example of a driver implementation that still makes use of streams, but in a somewhat different way (because it uses the React library).

So, that’s cool: I accidentally anticipated a change that he had already made for Phergie 3 (which I didn’t realize existed). He decided to split the new class into a parser and a generator — something I chose not to do in my report for the sake of simplicity.

I’m also really pleased at how welcoming Matt’s been so far; he’s getting practically nothing in return except a bug fix or two (maybe) and he’s still more than willing to take the time to coddle newcomers like us. What a nice guy!

What a cool project!

This is a really great idea for a project; although not everyone is going to feel like sticking their neck out and embarrassing themselves on the internet like I did, it’s still a great opportunity to learn from more experienced developers and think about theory in the context of actual software. I certainly learned a lot and had a blast doing it.

Now if only the course also spent some time looking at more systems, as described by their developers…

Posted in The Performance of Open Source Applications, code, engineering | 2 Comments

A Semi-Coherent Review of PyCon Canada 2012

Two weeks ago I was foolish enough to take a few days to escape from university life long enough to go to PyCon Canada, a nice little conference in Toronto that can only be described with words that end with exclamation marks: fantastic!, awesome!, etc. I’m no veteran of tech conferences — this was, I think, the ninth I’ve ever attended1 — so I have a narrow view of what conferences can look like.

This was, however, the first non-student conference I’ve been to without it being related to work. That was nice — I could just relax and watch the talks and hang out with fellow Python enthusiasts like my friend (and former co-worker) Jon “VK” Villemaire-Krajden. That also drew attention to something I noticed about the conference: it was delightfully non-commercial, as far as conferences go. Sure, there were sponsors, and the sponsors said things at the microphone, and there was an area where you could schmooze with the sponsors — but on the whole, it felt like a conference of enthusiasts and open source people, not people trying to sell things.

People who spoke words

Jessica McKellar started things off with a talk on fostering a welcoming open source community. Near the beginning of her talk, she told a story about her time as an instructor at Hacker School: they took a few whiteboards and wrote questions on them like “What are your fears as a programmer?” She showed a slide with a bunch of students’ answers and it really resonated with me.

Sometimes programming is hard because it is hard, and sometimes programming is hard because of seemingly silly, trivial emotions. Sometimes programming is hard because you’re afraid of breaking something. Or because you’re afraid of looking stupid on the internet. Or because you’re afraid of looking stupid off the internet. Recognizing these things and talking about these things is more important than it sounds. To me, seeing this really smart open source hacker on a stage talk about these things and admit that they, too, are afraid of not being smart enough is so much more encouraging than just knowing that I can contribute to open source.

Michael Feathers spoke about functional programming. I’ve been drinking the FP Kool-Aid for a while now, so it wasn’t a mind-blowing talk for me; but he had a way with words and said what other people have been saying, but better. I’m not sure what his exact words are, but he said something along the lines of “these functional things are cool because you need to learn them only once,” that is, a lot of those functional programming tools that people talk about are pre-packaged general, common algorithms. Sure, you can get the job done in two nested for-loops, but maybe using someone else’s for-loops will work out better?

Fernando Perez ended the conference on a high note with his talk on the IPython Notebook, a browser-based tool for Python programming. The IPython Notebook is a bit like a REPL, a bit like an IDE, but the main idea is that it’s in the browser. That means if you write code that generates an image, it can show the image alongside your code. It means that if you have code that outputs a protein, the result can be an interactive 3D model. This project is so cool that I’ve wasted spent more time than I care to admit playing with it since the conference.

Code Sprints!

I wasn’t in town for long — and I wanted to spend some time with friends — but I did manage to drop by the code sprints for a while. The sprints were at the Ladies Learning Code space near Honest Ed’s, a lovely little building with a pretty decent cafe in the basement.

Sleep deprivation might have done strange things to me, but I’m pretty sure these things happened:

  • Pizza with beet slices instead of pepperoni (awesome)

  • I had a really nice chat with Fernando Perez about IPython. And then he got me a cortado, which was incredibly sweet of him.

  • After I thanked Diana Clark for putting on the conference and generally being awesome, she gave me a hug. I swear to Guido, she almost made me cry.

Although I wasn’t around to see much, it does sound like a lot got done at the sprint. I’ll be sure to stick around longer next time.

Things that were not good about this conference

I really don’t have anything bad to say about how the conference was run. Really. Sure, the venue was cold the first day. The wireless was a bit patchy here and there. But whatever. Did I mention my ticket cost $25?

Personally, I think all “code sprints” should be called “Happy Fun Best Friends Coding Club But Also Testing and Documentation and Learning Extravaganza,” but sports and athletics metaphors are, unfortunately, thoroughly entrenched in the software world and I suppose I’m not going to win this one. Sigh.

Oh, and it may not surprise you to hear that I detected a substantial dose of Python elitism. I like Python too, but can we all try not to alienate PHP and Java developers? Couldn’t hurt, anyways.

Would I go again? Would I bring a friend?

Yes and yes.


  1. Woah. When did that happen?

Posted in code, tech | 1 Comment

Pizza

Pizza

Posted in food | Comments Off

Getting the Patriot USB wireless adapter to work with the BeagleBone

(For the benefit of fellow “embedded systems” students…)

If you’re having trouble getting the Patriot USB wireless adapter working with the BeagleBone, I found this Raspberry Pi forum post really useful. You have to change the commands slightly for the Beagle.

To download the driver and install it, run the following on your Beagle:

ubuntu@arm $ wget http://tavisharmstrong.com/stuff/8192cu.tar.gz
ubuntu@arm $ tar xf 8192cu.tar.gz
ubuntu@arm $ sudo install -p -m 644 8192cu.ko /lib/modules/3.2.30-x14/kernel/drivers/net/wireless/
ubuntu@arm $ sudo depmod -a

Then reboot the Beagle:

ubuntu@arm $ sudo reboot

And then turn on wireless with ifconfig

ubuntu@arm $ sudo ifconfig wlan0 up
ubuntu@arm $ sudo ifconfig
ubuntu@arm $ sudo iwlist wlan0 scanning

The last command (the one with scanning in it) should output a list of the available wireless networks. If you’re in the lab, you should see ConcordiaUniversity on there.

Hope that works! Say so in the comments if it doesn’t work for you.

Posted in code, engineering, tech | 2 Comments

Teensy/Arduino timer simulator

I built a simulator for the 16-bit timer in the Teensy++/Arduino microcontroller in the hopes that people (e.g. fellow students of SOEN 422) might find it useful. Keep in mind that it’s a work in progress and has some bugs. A Simple and Interactive Explanation of the Teensy’s 16-bit timer (Timer1).

Posted in Uncategorized, school | Comments Off

The book is coming along…

Yesterday, on the AOSA/POSA blog, I wrote:

A few weeks ago Greg posted about the next book we’re doing: The Performance of Open Source Applications. Well, we don’t have a book yet, but we’ve made some progress. Earlier this week we had our 15th “yes” from an author, which puts us close to the chapter counts of AOSA. We’re excited about that and we hope you are too.

Hooray!

Posted in The Performance of Open Source Applications, books | Comments Off

The Performance of Open Source Applications

If you’ve spoken to me in the last few weeks you’ve probably heard that I’m co-editing a book on software performance. Well, we’re finally announcing it. From the AOSA blog:

We are pleased to announce that we are starting work on a third book in this series, which will be titled The Performance of Open Source Applications. Each chapter will discuss a performance issue in a real open source system—it could be an over-the-shoulder view of how a performance problem was fixed, a discussion of how design decisions affected performance in a particular application, or something else along those lines. Each entry will be 12-15 pages long, and we hope to have first drafts by October so that we can publish the book in Spring 2013. As with AOSA, royalties will go to Amnesty International and the book will be available for free online under a Creative Commons license. If you are interested in participating, please contact us at posabook@gmail.com.

Why performance rather than architecture? Because it’s something that every programmer has to deal with eventually, but which is usually left out of their education. The last general book on making programs fast that we know of was Jon Louis Bentley’s Writing Efficient Programs, which was published thirty years ago. There have been lots of more specialized books since (we’re particularly fond of Steve Souders’ High Performance Web Sites and Even Faster Web Sites, and of John Lakos’s Large-Scale C++ Software Design), but we think the time is right for something that touches on everything from squeezing the last few cycles out of every precious milliwat in an embedded sensor to maximizing throughput of large-scale e-commerce applications. We hope you’ll think so too, and we look forward to hearing from you.

My co-editor is Tony Arkles, a graduate student at the University of Saskatoon.

Posted in books, engineering, writing | 2 Comments

The Architecture of Open Source Applications, Volume 2

The second volume of The Architecture of Open Source Applications was just released thanks to the hard work of Amy Brown and Greg Wilson. I had the privilege of helping copyedit a few chapters of the book. Here’s the blurb:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well — usually programs they wrote themselves — and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This second volume of The Architecture of Open Source Applications aims to change that. In it, the authors of twenty-four open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to this book provide unique insights into how they think.

Go buy it at Lulu (ebook versions will also be available). It’ll be available on Amazon at some point, but Lulu is preferred, because a greater percentage of the price goes towards royalties — which are going to Amnesty International.

A free online version will be up at some point next week. The online version is available here

Posted in books, code, engineering, hack, tech, writing | Comments Off

Diversity in practice: How the Boston Python User Group grew to 1700 people and over 15% women

The sheer humility, honesty, and deliberate action these two people took to fight a problem they saw in the world is inspiring. They listened to people, really listened to people, and didn’t shy away from the faults in their approach. This is the most practical guide for how to get fresh blood into programming that I’ve seen yet.

I haven’t been programming for that long. Three years ago, when I was in my first year of school, I really wanted to learn how to program. At that point, I had considered going to Montreal Python meetups, but I was too shy and didn’t think I’d know what was going on, or that I wouldn’t fit in. So when people make an effort to reach shy outsiders, especially people who are minorities in the development community who may feel even more shy than I did for that reason, it makes me really happy.

[On advertising workshops:] Make the tone very clear. We’re about being inclusive and growing communities, and not about being exclusive. So if you’re just a little bit careful about your language there, I think you’ll find that everyone is thrilled to support you in this. Men, women, everyone. — Jessica McKellar

Posted in code, tech | Comments Off

Winner of the 2012 Spam Comments Award goes to…

Long time fan and reader of the Tavish Armstrong blog, 2012 UEFA Euro Football Cup, had this pithy quote to share:

Make no judgments where you have no compassion. — Anne McCaffrey

In that vein, Adrian Chen’s profile of Horse_ebooks is worth a read. Spambots are starting to get really weird.

Posted in funny, tech | Comments Off