I remember in 2001 when I started to learn Java how awesome it was compared to C / C++. No more malloc, no more free, no more pointers, lots of easy to use Collections. Plus built in GUI tools e.g. AWT (remember that!) and Applets (*shudder*). Well lately it’s been a blast learning Objective-C, which is C but objectified 🙂 For those Java programmers out there that want to ramp up quickly here’s a quick translation guide – it doesn’t explain C pointers etc but you should get a good idea for the significant syntactical/stylistic differences between the two languages. https://docs.google.com/document/d/1iRv-8qQxPlMVKLgHPGbkgK4QeruHJ7tqfkEYJxXmd2o/edit?usp=sharing I am sure I have more to add Please let me know if you find any problems or misstatements
Creating a Free Computer Science Degree
Everyone who wants to go to college and can afford to should do so. But what about those who want to, but simply cannot afford to? Yes college pays off over the long term (even at todays exorbitant rates in the USA). But for some families they just see the cost side of the equation. The decision, once clear cut, is becoming a bit more nuanced. For example read
150 of 3500 US Colleges worth the investment
MOOCs (Massively Open Online Courses) are a very interesting recent phenomenon – the idea that you could put college-level courses online and make them free is truly amazing. The high drop-out and poor attendance rates for MOOCs show something important is missing from your average MOOC.
And I think the thing that is missing is the “social” aspect of schooling. When you go to college you make a public commitment to education – at least to your friends and family – that says “I am going to do this”. Then you get to class, make friends and you have an incentive to stay. You see your friends get on, maybe see them and others succeed and have higher scores and now you have a competition oriented incentive that takes hold.
I think the “Social” incentives and support, if they could be captured and ignited, would be a fascinating enabler for MOOCs.
Last year I also saw this article “$200K for a computer science degree? Or these free online classes?” that provided a listing of a number of college level courses from the likes of Stanford, MIT and Princeton that could form the basis of a Computer Science education.
And so my creative side (which doesn’t get out much) got to thinking. What would it take to have a space, with WiFi, and a collaborative work environment in which people could take these courses with mentorship and guidance from professional software engineers and perhaps some college educators looking to “give back”. Not much I think.
If you pair with local software companies (desperate for software engineers of all stripes) you could put together a challenging but realistic education to help propel these kids to a good future.
A key advantage of this “free education” would be the freedom to sculpt a syllabus that is more personalized that the traditional CS degree. Some folks I know from my CS class and my later 20 years of experience are suited to different tracks. Not due to inherent ability or intelligence but a result of what they are inherently interested in. Look some folks just want to get a better job. Others are more committed to a large up-front educational investment (of time).
In addition, we need to recognize that attending college from 18-22 and then retiring at 65 without further educational investment in between is a 20th century concept whose usefulness has passed. We need an educational model that looks more like a “continuous improvement” model – maybe a few courses to bootstrap, then a course or two every year, year-after-year. More like Scrum – less Waterfall.
When I graduated college the key skills were: Unix, C, RDBMSes and CORBA and we were hot about Neural Networks and 64 kbps ISDN lines
Today: We have Internet technologies, Objective C, Java, C# – a plethora of NoSQL technologies – the cloud, mobile etc etc.
Pretty much I’d say your technology skills need a near complete revamping every 3-5 years. True, the principles don’t change that much, but the tools, technologies in use and the problems being solved definitely do.
A Computer Science education has milestones but is NEVER done!
So what syllabus would I pick? Well I think the InfoWorld article is a good start but I would add
Hardware Classes
Computer Architecture @ Princeton
Computer Architecture @ Saylor.org
The basics of microprocessors is critical for this being a true CS education. Being hands-on is a challenge but maybe there are opportunities with local “maker” communities.
And that’s probably a minimum. A course in electronics and digital logic design would be good in addition.
Math & Statistics
I can hear the groaning, but you won’t get far in technology without understanding basic stats (averages, standard deviation, medians, probability distributions etc. But here are some good places to start.
Introduction to Statistics @ Berkeley (EdX)
Introduction to Statistics @ Stanford (Coursera)
Advanced Level: Computing for Data Analysis @ Johns Hopkins (Coursera)
Everyone should have statistics – but for the feel of “true” CS degree you’ll want a bunch of work on calculus, discrete math, geometry – they are critical in advanced areas of image processing, crytography, computer graphics etc.
Internship
Software development and computer programming are a craft and I think a healthy smattering of hands-on practical exposure in a business environment is critical. It will need to be done for a large chunk of time (3-6 months at a time) and will help to ground the student and focus them on how to hone their craft.
In addition project work with peers is another great way to get this much needed practice time.
Mobile & Internet Technologies
In any CS degree – basic principles and math is critical as well as the need for core systems knowledge. But the kids should get a flavor of the “cool” technologies too.
Mobile: Objective-C / iOS or Java/Android
Web Development: HTML/CSS, Javascript and perhaps some Ruby/Rails (to show people how a dynamic language can make you more efficient)
Not to mention UX design.
So much to learn – but it doesn’t have to be all at once. Get enough to get a solid SWE / Web Developer job and then continue to learn – one course at a time, perhaps one each semester – that should be enough.
The Challenge
Could we take MOOCs and pair them with local SWEs and college educators and provide a solid CS education at a very low price? I think the answer is Yes. You don’t need the massive classrooms. The stadiums. The dorm rooms. The cafeterias. You can probably figure out something around text books too. Yes you need a space. Yes you need WiFi. You can “employ” some of your better students as mentors too and have them give back before they leave.
I choose the analogy of the “Model T” where Ford created a car affordable to the middle classes (all while paying his workers well above average) and he helped created a revolution in manufacturing.
In the end, it is the result, that proves the model – if you can get these kids hired into good Web Developer and Software Engineer jobs at good companies with near-typical salaries / benefits what more would you need?
AWS Migration Patterns
A typical “in house” 3-tier web architecture |
Parallel Architectures before the “Big Bang” switchover |
Hybrid architecture to lower switchover risk |
Get your surprises before you go “all in”!
1) Billing – it’s not what you think it is! You’re usage is often very different than your original cost estimation (Hint: It’s mostly EC2 + Database)
p.s. One extra bonus – once you’ve got the “ramp-up and migration” process down, you can use the same process to stand-up more instances of your architecture in different regions. Sadly for now, you can’t have RDS create a read-replica in a different region but you CAN look into putting in place a scheme for putting local writes on an SQS queue / SNS topic and persisting it remotely to give yourself a “roll your own” data replication methodology.
[Edit: I just found out that AWS RDS *does* support cross-region replication link]
It is time for a revolution in Tech hiring: Why we need to copy Baseball’s farm system
Lets just be real for a moment – there’s not enough software talent in the world today. There just isn’t. At least in the West for sure – the US and Europe for definite. Also I don’t buy what the IEEE says – not by a long shot. There’s lots of resumes out there – not much talent (a lot of that mismatch – lots of people, not enough “talent”, I think, is just really due to a lack of training).
What’s wrong with hiring and recruiting today?
Anyway no matter how good your recruiter is, they’re all fighting for the same small talent pool with the same toolset (LinkedIn).
In addition: resume screening, and interviewing is so flawed with holes and assumptions it’s ridiculous. How can you boil down 20 years of technology experience to 1 or 2 pages? How can you get a sense of how someone will perform in month 1, month 6 and month 60 based on a five or six 45 minute conversations with artificial problem sets?
It worked for “Build a Bear”
You can hire all the recruiters you want or hire the “best” recruiter, but there is an easier solution. A famous man once said:
“The best way to predict your future is to create it”
(that man was Abraham Lincoln by the way, although I believe it is also attribute to Alan Kay)
The devil is in the details
During that 6 months you see these people during ups and downs, during challenges, in teams and working by themselves. You see them learn, adapt and hopefully have fun – isn’t that the ULTIMATE employability metric?
Isn’t that exactly what we need? A pipeline of prospective talent? Not every software team needs uber-developers. Sometimes a few good Web devs are all you need. I doubt Healthcare.gov needed Top-10%ers.
What’s the payoff?
Such a system would be a win for employers – it’s a win for kids who might not get an opportunity to go to college or otherwise get a STEM career and this would also create a larger pool to recruit from – so it’s a win for recruiters.
It could also be used to increase diversity within technology ranks.
In addition it would take pressure off the H-1B system of work visas that are so over-subscribed their quota for a year is filled in 5 days! For those senior tech folks already out there – they might be thinking more supply will reduce salaries – but even despite AMAZING demand for people there has been flat salary growth especially at the high end. So much for the law of supply and demand eh! And no – 65,000 H-1Bs is not the cause of flat salaries when these positions go unfilled for 12 weeks or more and Microsoft has over 3,300 openings. Question for another day – so why are those IT salaries flat?
Anyway here’s my proposal – Software and IT firms need a farm system – Employers can’t find the talent no matter how much they pay (unless they’re facebook or apple probably) – so they need to create the talent they need. It’s a win for employers and society as a whole. Probably some unintended second order effects but lower unemployment, fewer crazy hours in software, and some more diversity of backgrounds can’t be a bad thing net net.
You can already see the demand for such a system with companies like App Academy (in SF), Launch Academy (in Boston), the Academy for Software Engineering and the Flatiron school (both in NYC) getting off the ground. But I see the likes of Facebook, Google, Apple doing their own programs the same way the each baseball team has their own farm system.
Thoughts? Could this work? What would prevent its adoption?
All Scalability problems have only a few solutions . . .
I was having a great conversation before with some technical folks about some very very hard scalability and throughput (not necessarily response time) issues they were facing.
I racked my brain to think of what I had done in the past and realized it came down to a few different classes of solutions. First and foremost though the key to finding the answer is instrumenting your code and/or environment to find out where the bottleneck(s) are.
1) Simplest: Do less work
Kind of obvious but if it’s taking you 10 hours to read through some log files and update the database perhaps the easiest thing is to do LESS work. e.g. read fewer log files or do fewer database updates.
You could use techniques like reservoir sampling. But maybe you have to calculate a hard number – the total cost of a set of Stock market trades for example – estimates don’t work. Then again perhaps your log files don’t need to be so big? Every byte you write has to be FTP’d (and could get corrupted) and that byte has to read later (even if it’s not needed).
I find a lot of people forget another alternative here that involves the theme of “Do less work”. Basically if you have a good (enough) model of your input data stream then you can get a “fast but slightly inaccurate” estimate soon and then get “eventual consistency” later. It’s kind of like that old adage – “You can have it fast, correct and cheap. Pick two!” or like the CAP theorem – something’s gotta give. Every dev team should have a Math nerd on it – because Mathematicians have been solving problems like this for decades.
2) Simple-ish: Tune what you already have
Maybe you’ve got a MySQL DB – does it have enough memory? Perhaps Network I/O is a bottleneck – dual NICs then? Check your NIC settings too (I’ve hit that once – 100 Mbps settings on GBps network). Perhaps you need to lower priority on other jobs on the system. Is your network dedicated? What’s the latency from server to DB (and elsewhere).
Maybe when you FTP data files you should gzip them first (CPU is cheap and “plentiful” relative to Memory and I/O – network and disk). If the write is not the problem, perhaps you can you tune your disk read I/O? Are you using Java NIO? Have you considered striping your disks? Not suprisingly for Hadoop speedup many of the tuning recommendations are I/O related.
Perhaps you have a multi-threaded system – can you throw more threads at it? More database connections?
For the database: Reuse database connections? Do you really need all those indexes? I’ve seen it be faster to drop indexes, do batch uploads and reapply indexes than to leave the indexes in place. Are you seeing database table contention – locking etc?
3) Moderate: Throw hardware at it
Seems like a cop-out for a developer to say throw hardware at it but if you look at the cost of (say) $20k in better hardware (more memory, faster memory, faster disk I/O etc.) vs. spending 4 developers for a month (costing in the US anyways $40k+) it’s clear where the upside is at. Developers are probably the most scarce/precious resource you have (in small organizations anyway) so spend their time wisely. They’ll appreciate you for it too!
3) Harder: Fix or redesign the code you have
This is what coders usually do but it’s expensive (considering how much devs cost these days).
Are there more efficient algorithms? How about batching inserts or updates?
Do you have a hotspot – e.g. disk I/O due to 10 parallel processes reading from disk?
Is the database a bottleneck – perhaps too MANY updates to the same row, page or table?
If throughput (and not response time) is your issue then perhaps making things quite a bit more asynchronous, decoupled and multi-threaded will improve your overall throughput.
Maybe instead of a process whereby you: Read tonnes of data from a file, update some counters, flush to DB all in the same thread
You decouple the two “blocking” pieces (reading from disk, writing to DB) and that way you can split the problem a bit better – perhaps splitting the file and having more threads read smaller files? Drop all intermediate data into some shared queue in memory (or memcached etc.) and then have another pool of threads read from that shared queue. Instead of one big problem you have two smaller problems each whose solution can be optimized independently of the other.
Kind of a mix of “Fix the code” and #1 “Do less work” is when you realize you are redoing the same calculations over and over again. For example taking an average from the last 30 days requires you
do get todays new data but also re-retrieve 29 prior days worth of data. Make sure you precalculate and cache everything you can. If you are summing the past 30 days of data for example (D1 . . . D30), tomorrow you will need (D2 . . D31) – you can precalculate (D2 . . D30) today for tomorrow. Not that math is hard for CPUs but you get the idea . . . . spend CPU today to save I/O tomorrow!
An example of being smart about what you calculate is here in this MIT paper “Fast Averaging“. If your data is “well behaved” you can get an average with a lot less work.
Decoupling with Queues is my favorite technique here but you have to be smart about what you decouple.
4) Hardest: Rearchitect what you have
Developers love to do this – it’s like a greenfield but with cool new technology – but it should be the last on your list. Sometimes however it’s just necessary. Amazon and eBay have done it countless times. I am sure Google and Facebook have too. I mean they INVENTED whole new systems and architectures (NoSQL, BigTable, DynamoDB etc.) to handle these issues. Then again Facebook still uses MySQL 🙂
Summary
Again all of these approaches, if they are to be successful and a good use of time rely on knowing where your bottleneck is in the first place – identifying it and beating on that problem until it cries “Momma!” 🙂 But lets never forget that the classes of solutions are pretty constant – and the choice basically come down to how much time and money can you afford to fix it.
Ok over to you dear reader – what did I miss, what did I forget? Is there another class of solution?
7 Steps to Software Delivery Nirvana
1) Hire great people
Attitude
Communications Skills
Knowledge & Programming Skill
Remember you won’t get everything you need – if you need to give up something go for a “fast learner” who doesn’t have all the technical knowledge (he or she will get there)
2) Hire great people and know what the customers priorities are
Manage Requirements Risk
Fast beats perfect – Be prepared to demo/ship -> learn -> iterate.
3) Hire great people, only build what you need and set expectations
Manage Design Risk – especially do NOT “over design” – keep it simple and refactor as you learn
YAGNI remember “Done beats perfect”
Spot dependencies up front and prepare to manage them
4) Hire great people and show progress
Manage Development Risk – especially estimates & dependencies
– Unit Tests
– Coverage
– Continuous Integration
5) Hire great people and show a quality product
Manage test & delivery risk
Regular (continuous) releases to customers
Quality != Bug free. “Shipped beats perfection”
6) Keep your great people
Remove demotivators
– people and processes
– recurring bugs etc.
Encourage them to learn new things (balancing against delivery risk)
Understand what motivates developers & testers: autonomy, mastery and purpose.
Different people want different things: many managers think developers want to be managers.
Well if you look at most managers they don’t seem too happy to me. It’s a hard job.
Lot of people like to design and build and solve technical challenges.
7) Sharpen the saw
- Keep your CI fast
- Keep learning
- Retrospectives
- Balance
- There’s more to life than building software
What is NOT AS important (emphasis on “AS”)
1) Waterfall vs. Agile
2) Scrum vs. Kanban
3) Java vs. C# vs. Python vs. Ruby
4) SQL vs. NoSQL
Yes each delivers some incremental improvement (in SOME context – not ALL contexts). But if you don’t have great people it won’t matter if you use Waterfall, Scrum, Kanban in Java, Python or Ruby.
What I love about code . . .
After doing more managing than normal and getting back to coding I realize just how much I like to code and why . . . .
Code either works or it doesn’t. There’s no room for subjectivity between it and me.
And if it doesn’t, you can fix it. It doesn’t have to be cajoled, mentored, advised or given feedback.
You don’t need to worry about the motivation of code.
You don’t need to worry about what code thinks about you – you can test it as much or as little as you want. It just does it’s job – gets compiled / interpreted and executed.
The code doesn’t care if your dependencies are in place or not – it just *IS*.
The code doesn’t worry about reorgs or P&L or whether it’s executed in your own datacenter or AWS or your desktop or laptop.
It doesn’t care if you have documentation or not, code coverage or not, customers or not.
But as much as that’s awesome, we live in a world that is so much more – a world where perception matters. Where we work in teams with people who are people – different, fallible, with ups & downs, with other stuff going on and often with different priorities and different motivations. Where reorgs and P&L matter. Where ultimately we need to build a product that people love (or at least like).
Code is awesome – but as Coders we can’t just live in that world – most of the “real” problems in Software are people problems – the coding problems are easy in comparison.
POSTSCRIPT: After re-reading this I should give some props to 1 Corinthians – replacing “love” with “code” 🙂
Is the STEM crisis a Myth?
Is there a shortage of STEM workers? STEM being Science, Technology, Engineering and Management. I think most people would say yes but Robert Charette wrote a very detailed argument in IEEE spectrum saying “The STEM crisis is a myth“.
As a technical lead / team manager in the Boston technology space for 12 years I find this VERY hard to believe. Year after year I find a very significant shortage of good software engineers.
So let me try to drill into the evidence and see if I can challenge this finding.
Statement #1: “wages for U.S. workers in computer and math fields have largely stagnated since 2000”
First off who is the source of that statement? It’s the “Economic Policy Institute”. Sounds great right? Except when you drill into who the EPI is – it’s a bunch of Union bosses . . . um conflict of interest anyone?
Even if the data they cite is accurate – that wages only increased 5.3% over 11 years that doesn’t necessarily mean there is NO shortage of engineers. Yes there is a law of supply and demand but there are also inefficiencies in the system – and if as later evidence suggests “10 years after receiving a STEM degree, 58 percent of STEM graduates had left the field” it could just be a function of experienced people leaving so the wage number is dominated by a large number of junior people.
That said I still question the number given the motivation of the source (Unions).
Statement #2: 11.4m STEM workers currently work outside of STEM? (Image link)
Well this I can believe – STEM is hard work, you have to be good technically and good with people (to communicate designs, bugs etc). Companies do layoffs (been there), move out of state (been there too) and after the dot-com implosion and the much overhyped offshoring trend I think a lot of people gave up and looked for less complicated, more stable (but lower paying) jobs.
But that still doesn’t mean there isn’t a shortage of STEM workers.
Statement #3: “At least in the United States, you don’t need a STEM degree to get a STEM job”
Definitely true, but if I was hiring a non-STEM degree person, I would need to see real STEM experience first. But this raises a good possibility – retraining. With new startups like the Flatiron school and LaunchAcademy I think there is great evidence to suggest that we can retrain people relatively quickly to become great junior developers.
I definitely think firms like Microsoft, Google, Apple etc. have the Capital backing and the incentives to start retraining programs.
Statement #4: “In fact, though, more than 370 000 science and engineering jobs in the United States were lost in 2011”
I tried to find the original source for this but could not. However the source that was cited – is an author from the Center for Immigration Studies which certainly seems to be largely anti-immigration so again I question the motivation here. Although the BLS numbers may be accurate without a source I cannot be sure.
Statement #5: “Employers seldom offer generous education and training benefits to engineers to keep them current, so out-of-work engineers find they quickly become technologically obsolete.”
I agree with this statement too – but I will say it doesn’t necessarily have to be the Employers job to keep engineers current. I have expended quite a bit of energy learning Web technologies (I graduated in 1994 just before the Web craze), Java programming (in 2000), SOA, ESB, SaaS and a host of other technologies over the past few years – most on the job.
That said I will say that not training people is short-sighted and definitely helps increase the shortage. However in employers defense what I will say is that many engineers who take training are doing so to leave the company and get a better job.
Statement #6: 180,000 new STEM jobs per year vs. 252,000 STEM degree graduates – so what shortage?
We have to realize that a degree is just the starting point. A degree does not guarantee employment or employability . . . . I will touch on this more soon . . . .
Statement #7: ““If there was really a STEM labor market crisis, . . . you would see signing bonuses, you’d see wage increases. You would see these companies really training their incumbent workers.”
While I agree that this is likely true – it makes several assumptions and overlooks some facts.
i) Employers are paying $20,000+ to find and hire great Software Engineering employees – just how much more do they need to shell out before finding a productive employee?
ii) Signing bonuses don’t work – when Software engineers with 3-5 years experience are pulling in $80k to $100k a $10k bonus (after tax) doesn’t help much. After a while engineers would rather work somewhere fun and exciting with great opportunities to learn
Again I agree with the training thing – however in my experience when companies provide training (say in some hot area like Hadoop or Data Science), many of these employees typically leave for better jobs. Although the training is good for the pool of engineers as a whole, that company just spent money to lose an employee 😦
Now for the Conspiracy theory . . .
Statement #8 “Clearly, powerful forces must be at work to perpetuate the cycle. One is obvious: the bottom line”
Do you see the cash that companies like Microsoft, Apple, Oracle and Google make every year!?!?! They have so much cash they don’t know what to do with it. Microsoft even began a dividend finally a few years back. I mean look at Yahoo’s Acquire to Hire spending spree. Because Software engineers have so much choice – Yahoo needs to buy whole failed startups at millions of dollars each to get their engineers (and lock them into contracts for a few years).
Tech firms – especially Software – have some of the highest gross and net margins of any business. I don’t think Mark Zuckerberg, Sergey Brin, Steve Ballmer etc are hurting for cash.
Statement #9 “DOD representatives state virtually unanimously that they foresee no shortage of STEM workers in the years ahead except in a few specialty fields.”
As the report cited states further on “Because of the relatively small and declining size of the DOD STEM workforce there is no current or projected shortage of STEM workers for DOD and its industrial contractor base except in specialized, but important, areas” (page 6) – so that invalidates that argument.
And here is a key finding in that same report which I think is reflected across the industry
“The STEM issue for DOD is the quality of its workforce, not the quantity available.”
When I am hiring for an engineer I use the following rule of thumb based on my recent hiring experiences
- 40 resumes produces
- 20 phone screens produces
- 10 in-person interviews produces
- 1 hire
I am beginning to think the issue is quality not quantity
Conclusion
After thinking through my own personal experiences and looking at the numbers here I am beginning to agree – I don’t think there is a large shortage of STEM professionals in terms of just numbers. It’s a quality problem not a quantity problem. There aren’t enough “good” STEM people out there. What’s the solution then?
What’s the motivation then behind this clarion call for more STEM graduates? I have always loved this quote and it’s derivatives
“Never attribute to malice that which is adequately explained by stupidity.”
known as Hanlon’s (Heinlein’s?) razor.
I think STEM firms are just too inner-focused and linear thinking to see what the “cheap” and effective answer is. Not more H-1Bs or more STEM graduates (of whom maybe 1/2 to 1/4 will stay in the business and be the “good” employees). Perhaps we need to get together and retrain people from various other disciplines including the unemployed. Hopefully firms like LaunchAcademy, the Flatiron school etc. will prove that we don’t need to send people through 4 year Comp Sci programs to produce effective programmers.
Another key question is how many STEM graduates who have left would be willing to come back to STEM careers and how to do that?
I think these STEM firms are looking for the cheap and easy way out . . . . but in the end waiting for more kids to come through 4 year degree programs is ridiculous – when people can be retrained now and done so quickly.
My Ideal Greenfield Development Platform: Now vs. 5 Years ago
As I’ve grown as a developer (sometimes two steps forward and one step back – ha ha!) I’ve had the privilege to work with some very smart people – not just “IQ” intelligent but “savvy” people who saw where things were going technology wise. So I’ve learned a lot in the last five years about my technology preferences – sometimes by choice, sometimes by necessity – in some cases just playing catch up (e.g. JavaEE vs. Spring) and bleeding edge in others (e.g. Memcached, JBehave).
As a result of that experience, every so often I dream of starting a project from scratch and imagine what technologies I would choose to use for my Java stack based on what I know now. Admittedly what’s below is a very Java centric stack and I need to work on looking into Ruby on Rails / Python & Django to broaden my skill set too etc.
Anyway I hope you find the following table an interesting comparison of technology choices now vs. 5 years ago. In many cases the “Then” technology is still around and still viable. Typically what makes the “Now” technology more attractive is a fantastic combination of lower price and better / faster (and free) support – after that features and speed of execution are winning attributes.
Component
|
Then
(5 Years Ago) |
Now
(2013) |
Why?
|
Middle-Tier Framework | JavaEE (JBoss) | Spring | In 2005-2006 I went from WebSphere to preferring JBoss because IBM could not move fast and the App Serve was slow. Then I found JBoss also had some issues – less of them – but enough. Just figuring out which JMS messaging solution they would implement in a release was a chore (JBossMQ? JBoss Messaging? HornetQ?). Spring is so much faster in terms of performance, has fewer issues and has faster release schedules and no “two tier” system GA vs. supported |
Deployment Environment | Your Environment / Your Data Center | The Cloud (AWS) | You need to scale on demand these days and pay only for what you need – and that includes Ops folks, DBAs etc. And why AWS as opposed to someone else – simple – maturity of their offering. |
Build Tool | Ant | Maven | Why? “Convention over configuration” although I have found Maven and its plugins a bit more buggy than those for Ant. So its not clear cut. |
Build Environment | Local Builds | Jenkins | What do you live under a rock? If you aren’t doing continuous builds with all your tests automated and hooked in (unit, integration, acceptance, performance) you’re crazy! 🙂 |
Relational Database | Oracle | MySQL | There’s no reason to pay for a database anymore – Facebook runs on MySQL for crying out loud – InnoDB too! |
Key-Value Store / NoSQL | None | MySQL or DynamoDB (or Couchbase) | Sometimes you need a Relational DB and sometimes you just need a KV Store. Well since I’m all about AWS you gotta start with DynamoDB. But with AWS you gotta implement your inter-region replication yourself – however a up and comer that impressed me and one to look at is Couchbase. |
Caching Tier | Roll your own / Terracotta | Memcached | Fast, cheap and great support on the web. |
Web Container | Tomcat | Tomcat or NGINX | Tomcat still rocks and I haven’t built an app yet where Tomcat was the bottleneck but NGINX is getting some play and is worth a look |
Unit Testing | Junit | JUnit | JUnit always rocks! Always will! |
Functional Testing | WinRunner/LoadRunner etc. | JBehave | Again in the spirit of “Fast, Cheap” and hooks into JUnit JBehave is emerging as a great BDD tool. Hopefully it will keep emerging and develop some snazzy reports. |
Bug Tracking | Roll Your Own / ClearQuest | JIRA | JIRA just works . . . very well. I wish the Scrum interactions were a bit better (Grasshopper I find non-intuitive at times). |
Source Code Control | Perforce | Git | Git is free, fast and branches are easy. It’s a bit hard to get used to but once you do you’ll never look back. Oh and GitHub. |
IDE | Eclipse | Eclipse | As much as I’d love to switch to IntelliJ the plugins and support just don’t match Eclipse (esp. for AWS) |
UI | Web (HTML/Javascript/CSS) | Web AND Native Mobile |
I still love the web and Javascript but HTML5 is still in its infancy. With Mobile web and app usage skyrocketing the best way to get good performance is (for now) to go Native. Still perhaps in 3 or 4 years HTML5 support will be better. |
Messaging Platform | JMS | SQS | Again since I love AWS I gotta go with SQS but what I really love with it is it’s failover capabilities (backed by 3 copies in S3) and it’s separation of read vs. delete. Brilliant. Their Pub-Sub solution (SNS) is equally great. |
Testing | 50% Manual, 50% Automated | 100% Automated | Speed matters – automation + continuous integration (and ideally continuous release) is critical to that end. |
I could go into more stuff like Team communication / projects (e.g. Sharepoint vs. Wikis vs. SaaS providers) but I haven’t seen anything that makes me go “Wow” – although after my experience with JIRA I’d probably start with Atlassian stuff.
I’m sure in 5 years time I will be making newer and more informed choices. The great thing about Software is all parts of it are always on the move. The hard thing about Software is trying to keep up with the same! 🙂
I’d like to hear people’s thoughts on the above, their own personal experiences, preferences and if they are aware of anything I’ve forgotten or if they have questions about technology X?
I wish Steve McConnell was on Twitter . . . . Book review of "Software Estimation"
Published over 6 years ago “Software Estimation” by Steve McConnell is a great read.
As a practitioner of the agile arts I must say in reading it now this book seems like the last great attempt to “fix” waterfall and “big design up front” (BDUF) methodologies which were known for their very distinct big phases of requirements, design, development, testing and release. The kicker for these techniques was often that the development and testing estimates were VERY far off. If they followed McConnell’s advice as he laid out in this book they’d have had more success.
Agile basically works around many of the problems McConnell tries to solve by focusing on short iterations (of < 4 weeks) with new releasable and functional software produced at the end. Basically it avoids many of the risks inherent in Waterfall/BDUF by making the software cycle "too small to fail".
That said there are a great many things still to be learned from Steve’s great book. Waterfall and BDUF are by no means dead and even so there are some lessons here about the inherent nature of (errors in) estimations by developers. Even in agile I have experienced serious under-estimation (by a factor of 2x or 3x) by developers – where a story that should have taken 1 sprint takes 3 or more. So we still have much to learn. However the key theme this book drove home to me was pretty much that “Software estimation is so hard that we pretty much gave up and are doing short iterations because that’s the most we can estimate”. I am sure that wasn’t Steve’s point but that was my inference because since the book’s release estimation has taken the back-burner to story points, burn down charts, stand-ups and sprints.
More people are becoming Scrum masters and fewer are taking PMP
In this long blog article I try to capture some of the key learnings I made from this book
Part 1: Critical estimation concepts
CHAPTER 1: What Is an “Estimate”?
Tip #1: Distinguish between Estimates, Targets and Commitments
- When business people are asking for an “estimate”, they’re really often asking for a commitment or a plan to meet a target
- Estimation is not planning
- When you see a “single point estimate” ask whether the number is an estimate or whether its really a target
- A common assumption is that the distribution of software project outcomes follows a bell curve. The reality is much more skewed.
- What is a good estimate?
- The approach should provide estimates that are withing 25% of the actual results 75% of the time
- Events that happen during the project always invalidate the assumptions that were used in the estimate.
- The primary purpose of software estimation is to determine whether a project’s targets are realistic enough to allow the project to be controlled to meet them. The executives want a plan to deliver as many features as possible by a certain date.
- “A good estimate is an estimate that provides a clear enough view of the project reality to allow the project leadership to make good decisions about how to control the project to hit its targets”
- Studies have confirmed that most people’s intuitive sense of “90% confident” is really comparable to something closer to “30% confident”
- Where does the pressure to use narrow ranges come from? You or an external source?
- Is it better to overestimate or underestimate?
- If a project is overestimated, stakeholders fear that Parkinson’s law will kick in – work will expand to fill the time allotted.
- Another concern is given too much time, developers will procrastinate until late in the project.
- A related motivation for underestimating is the desire to instill a sense or urgency
- Figure 3.1: The penalties for underestimation are more severe than the penalties for overestimation
- The Software Industry’s Estimation Track Record
- Failure rates
- 1000 LOC 2%
- 10,000 LOC 7%
- 100,000 LOC 20%
- 1,000,000 LOC 48%
- 10,000,000 LOC 65%
- The Software industry has an underestimation problem.
- What top executives value most is predictability – business need to make commitments to customers, investors, suppliers, the marketplace and other stakeholders
- Four Generic sources
- Inaccurate information about the project being estimate
- Inaccurate information about the capabilities of the organization that will perform the project
- Too much chaos IN the project to support accurate estimation (i.e. try to estimate a moving target)
- Inaccuracies arising from the estimation process itself
- Simple example of a Telephone Number checker and the requirements questions / uncertainties that could result in very different design approaches.
- The cone of uncertainty
- Initial Concept 0.25x to 4x (Range = 16x)
- Approved Product Definition 0.5x to 2x (Range = 4x)
- Requirements Complete 0.67x to 1.5x
- UI Design Complete 0.8x to 1.25x
- Detailed Design Complete 0.9x to 1.1x
- The cone of uncertainty is the BEST-case accuracy possible to have. It isn’t possible to be more accurate – it’s only possible to be more lucky.
- The cone does not narrow itself – if a project is not well controlled you can end up with a cloud of uncertainty that contains even more estimation error.
- “What you give up with approaches that leave requirements undefined until the beginning of each iteration if long-range predictability”
- Sources of project chaos
- Requirements that were not investigated very well
- Poor designs leading to lots of code rewrite
- Poor coding practices leading to extensive bug fixing
- Inexperienced personnel
- Incomplete or unskilled project planning
- Prima Donna team members
- Abandoning planning under pressure
- Developer gold-plating
- Lack of source code control software
- In practice, project managers often neglect to update their cost and schedule assumptions as requirements change.
- Omitted Activities (pp.44)
- Missing Requirements
- Non functional requirements: Accuracy, modifiability, Performance, Scalability, Security, Usability etc.
- Missing software-development activities
- Ramp-up time for team members
- Mentoring
- Build & Smoke Test support
- Requirements clarification
- Creating test data
- Beta program management
- Technical reviews
- Integration work
- Attendance at meetings
- Performance tuning
- Learning new tools
- Answering questions
- Reviewing technical documentation etc.
- Missing non-software-development activities
- Vacations, Holidays, Sick days, Training, Weekends(!?!?)
- Company meetings, department meetings, setting up new workstations
- Developer estimates tend to contain an optimism factor of 20 to 30%. Although managers complain that developers sandbag their estimates – the reverse is true. Boehm also found a “fantasy factor” of 1.33
- The obvious one: Project Size
- Diseconomies of scale a 1M LOC project takes more than 10x the effort of a 100k LOC project.
- The basic issues is that in larger projects coordination among larger groups of people requires more communication. As Project size increases, the number of communication paths among people increases as a SQUARED function of the number of people on the project.
- 10k LOC project –> 2k to 25k
- 100k LOC project –> 1k to 20k
- 1M LOC project –> 700 to 10k
- 10M LOC project –> 350 to 5k
- Other influences: The kind of software being developed
- Personnel factors
- According to Cocomo II on a 100k LOC project the combined effect of personnel factors can swing a project estimate by as much as a factor of 22!
- The KEY personnel decision: Requirements Analyst Capability and only THEN the programmer
- The magnitude of these factors has been confirmed in numerous other studies
- Other influences: Programming Language
- Lots of other adjustment factors: See table 5-4 on page 66
- Key Learning: Small and Medium-sized projects can succeed largely on the basis of strong individuals. Large projects however still need strong individuals but project management, organizational maturity and how well the team coalesces are just as significant.
CHAPTER 6: Introduction to Estimation Techniques
Considerations in choosing estimation techniques
- What’s being estimated – features, schedule, effort
- Project Size
- Small: < 5 total technical staff. Best estimates are usually "botom-up" techniques created by individuals who will do the actual work
- Large: 25+ people that lasts 6 to 12 months or more. For these teams the best estimation approaches tend to be “top-down” approaches in the early stages. As the project progresses more bottom-up techniques are introduced and the projects own historical data will provide more accurate estimates.
- Medium: 5 to 25 people lasting 3 to 12 months. Can use any of the techniques above.
- Software Development Style: Iterative vs. Sequential
- Evolutionary Prototyping
- Extreme Programming
- Evolutionary Delivery
- Staged Delivery
- RUP
- Scrum
- Count first
- Count if at all possible, compute when you can’t cout. Use judgement alone ONLY as a last resort
- What to count? Find something to count that’s highly correlated with the size of the software you are estimating. And find something to count that is available sooner rather than later in the development.
- Historical data
- Average effort hours per requirement for development
- Average total effort hours per use case / story
- Average dev/test/doc effort per change request
- Used to convert counts to estimates – lines of code to effort, user stories to calendar time, requirements to number of test cases
- Your estimates can be calibrated using any of three kinds of data
- Industry data
- Historical data
- Project data
- Using data helps avoid subjectivity, unfounded optimism and some biases.
- It also helps reduce estimation politics
- Start with a small set of data
- Size (LOC)
- Effort (Staff months)
- Time (Calendar months)
- Defects (classified by severity)
- Be careful how you measure e.g. 8 hour work days? How about vacations? Overtime?
- It is surprisingly difficult in many organizations to determine how long a particular project lasted
- To create the task-level estimates, have the people who will actually do the work create the estimates
- When estimating at the task level decompose estimates that will require no more than about 2 days of effort. Tasks larger than that will contain too many places that unexpected work can hide. Ending up with estimates that are 0.25 to 0.5d of granularity is appropriate.
- Use Ranges to help identify risks and where things can (and often do) go wrong
- Best Case
- Most Likely Case
- Worst Case
- Expected Case
- Estimate Checklist
- Is what’s being estimated clearly defined?
- Does the estimate include all the KINDS of work needed to complete the task?
- Does the estimate include all the FUNCTIONALITY AREAS needed to complete the task?
- Is the estimate broken down into enough detail to expose hidden work?
- Have you looked at notes from past work rather than estimating from pure memory?
- Is the estimate approved by the person who will actually do the work?
- Is the productivity assumed in the estimate similar to what has been achieved on similar assignments in the past
- Does the estimate include a Best Case, Worst Case and Expected Case?
- Have the assumptions in the estimate been documented?
- Has the situation changed since the estimate was prepared?
- Compare actual performance to estimated performance so that you can improve estimates over time.
- The key is if you create several smaller estimates some of the estimation errors will be on the high side and some will be on the low side. The errors will tend to cancel each other out to some extent. Research has found that summing task durations was negatively correlated with cost and schedule overruns.
- Since developers tend to give near-Best case estimates, schedule overruns often compound on one another since the chance of each of the estimates coming in as scheduled is so very low.
- Get detailed size, effort and cost results for a similar previous project
- Compare the size of the new project to a similar past project
- Build up the estimate for the new project’s size as a percentage of the old project’s size
- Create an effort estimate based on the size of the new project compared to the previous project
- Check for consistent assumptions across the old and new projects
- Fuzzy Logic
- Very Small
- Small
- Medium
- Large
- Very Large
- As a rule of thumb the differences in size between adjacent categories should be at least a factor of 2
- Story Points e.g. Fibonacci sequence.
- Cautions about rating scales – the use of a numeric scale implies that you can perform numeric options on the numbers: multiplication, addition, subtraction and so on. But if the underlying relationships aren’t valid – that is a story worth 13 points doesn’t really require 13/3 as much effort as a story worth 3 points – then performing numeric operations on the 13 isn’t any more valid than performing numeric operations on “Large” or “Very Large”
- T-Shirt Sizing
- Remember that the goal of software estimation is not pinpoint accuracy but estimates that are accurate enough to support effective project control
- In this approach developers classify each feature’s size relative to other features as Small, Medium, Large, XL etc.
- This allows the business to trade-off and look for features with the most business value and lowest development cost.
- Group Reviews
- Have each team member estimate pieces of the project individually, and then meet to compare estimates
- Don’t just average estimates – discuss the differences
- Arrive at a consensus estimate that the whole group accepts
- Individual estimates have a Magnitude of Relative Error (MRE) of 55%.
- Group-reviewed estimates average an error of only 30%
- Studies have found that the use of 3 to 5 experts with different backgrounds seems to be sufficient.
- Wideband-Delphi Technique
- Allows you to simulate different project outcomes
- Data you’ll need to calibrate tools
- Effort in staff months
- Schedule, in elapsed months
- Size, in lines of code
- Summary of available tools – see pp.163 (valid as of 2006)
- Use multiple estimation techniques and look for convergence or spread among the results
- When you reestimate in response to a missed deadline base the new estimate on the project’s ACTUAL progress not on the project’s planned progress.
- Discovery
- Approved preliminary business case
- Scoping
- Approved product vision
- Approved marketing requirements
- Planning
- Approved software development plans
- Approved budget
- Approved final business case
- Development
- Approved software release plan
- Approved marketing launch plan and operations plan
- Approved software test plan
- Pass release criteria
- Testing and Validation
- Pass release criteria
- Launch
- Emphasize counting and computing rather than use of judgement
- Calls for the use of multiple estimation approaches
- Communicates a plan at predefined points
- Contains a clear description of an estimate’s inaccuracy
- Defines when an estimate can be used as the basis for a project budget
- Defined when as estimate can be used as the basis for internal or external commitments.
- Using Lines of Code in Size estimation (data is easily collected but translation into “staff months” of effort is error prone)
- Function-Point Estimation
- The number of function points in a program is based on the number and complexity of
- External inputs (e.g. screens, forms, dialog boxes)
- External outputs (e.g. screens, reports, graphs etc)
- External queries
- Internal Logical files
- External interface files
- Productivity variations among different kinds of software projects can show very different effort estimates (per LOC) and cost (per LOC)
- Basic Schedule Equation
- Schedule In Months = 3.0 x cubeRoot(StaffMonths)
- Schedules compress and the shorted schedule
- If the feature set is not flexible, shortening the schedule depends on adding staff to do more work in less time
- Numerous estimation researchers have investigated the effects of compressing a nominal schedule.
- All researchers have concluded that shortening the nominal schedule will increase total development effort.
- There is also an impossible zone and you can’t beat it – the consensus of researchers is that schedule compression of more than 25% nominal is not possible
- Similarly you can reduce costs by lengthening the schedule and conducting the project with a smaller team
- Lawrence Putnam conducted fascinating research on the relationship between team size, schedule and productivity.
- Schedule decreases (and effort increases) as you add team members – until you hit 5-7 on a team. After that the effort goes up very much more quickly and schedule ALSO starts to get longer.
- Thus a team size of 5 to 7 people appears to be economically optimal for medium-sized business system projects.
- Estimating Architecture, Requirements, Management effort for projects of different sizes. The larger the project the more the architecture, test, requirements and management costs.
- Developer-to-test ratio is settled more by planning than by estimation – that is it is determined more by what you think you SHOULD do than by what you predict you will do.
- Good analogy about ideal time and planned time: football game – 60 minutes vs. 2 to 4 hours elapsed time.
- Defect Removal
- Formal Design Inspections: 55% rate of removal (mode)
- Informal design review: 35%
- Formal code inspection: 60%
- Informal code review: 25%
- Low Volume (< 10 sites) Beta Test: 35%
- High Volume (> 1,000 sites): 75%
- System Test: 40%
- Other rules of thumb
- To go from one-company, one-campus development to multi-company, multi-cit: allow for 25% increase in effort.
- To go from one-company, one campus development to international outsource, allow for a 40% increase in effort.
- Communicating Estimate Assumptions
- Which features are in scope
- Which features are out of scope
- Availability of resources
- Dependencies on 3rd-parties (and their performance)
- Unknowns
- Expressing Uncertainty
- Try to present your estimate in units that are consisten with the estimate’s underlying accuracy
- Ranges are the most accurate way to reflect the inherent uncertainty in estimates at various points in the Cone of uncertainty.
- Do not present a commitment as a range, a commitment needs to be specific
- Estimate negotiations tend to be between introverted and more junior technical staff and seasons professional negotiators.
- Understand that executives are assertive by nature and by job description and plan your estimation discussions accordingly.
- You can negotiate the commitment but do NOT negotiate the estimate
- Educate nontechnical stakeholders about effective software estimation practices
- Treat estimation discussions as problem solving, not negotiations. Recognize that all project stakeholders are on the same side of the table. Everyone wins, or everyone loses.
- Getting to Yes
- Separate the people from the problem
- Focus on interests, not positions
- Invent options for mutual gain
- Insist on using object criteria