Friday, March 20, 2009

MythTV Hardware Upgrade

When I started this blog, one of my goals was to consistently post any hardware or software problems I encountered while tinkering with my computers. This serves two purposes: if, in the future, I find myself having similar difficulties, I have a written record of what I did to refresh my memory (simply typing it all out will strengthen my memory). The other advantage is giving back to the community: so many times I've encountered a strange error or situation, only to find some obscure blog post that explains the exact same situation and solution. Perhaps some day a desperate computer enthusiast will use my blog to find and answer to their woes.

First, the short version: I need to pass the "pci=nomsi" parameter to the Linux kernel in order for my Asus M3N78-EM motherboard to recognize the hard drive.

Now the long version...

I'm a compulsive hardware tinkerer, always tweaking my computers, both software and hardware. I haven't bought a prebuilt computer for at least 15 years; all my computers were assembled piecewise by yours truly. Last week I was rebuilding my backup server (affectionately named "dumpster"), replacing the Intel CPU and motherboard with an AMD solution. My MythTV media PC ("cesspool") also used AMD. However, it had an older, slower CPU---a BE-2350---than the one I was about to put in the backup server, a 4850e. I thought to myself, why put the better processor in a machine that doesn't need it? Why not use the faster hardware for the machine that gets regular use?

There's one fact of which my wife loves to remind me; a fact I feel obligated to report in the interest of full-disclosure: there was absolutely nothing wrong with our MythTV prior to me attempting the CPU change; it didn't need a faster processor. But like I said, I have this compulsion: computers are toys to me, and I just want to play! Besides, what could go wrong with a simple, quick CPU swap?

I put the better CPU, the 4850e, in MythTV. I hit the power button and... nothing! The computer wouldn't POST (power on self-test). As I thought about what might be wrong, I wondered if the CPU was too new for the motherboard, a Biostar Tforce TF7025-M2. Some quick Google work confirmed my suspicions: Biostar's CPU support page doesn't list the 4850e as supported; likewise, Newegg's customer reviews say the same. D'oh!

Sad, but no problem: just put the old BE-2350 back, and finish the backup server... and it still won't POST! What happened? At this point, I assumed the motherboard was dead. (In hindsight, I don't know why I thought this.) I ordered a new board, an Asus M3N78-EM. This motherboard sports the nVidia GeForce 8300 video chip set.

When the new board arrived, I was excited: I get to use the 4850e (I checked teh M3N78's CPU compatibility before ordering), and would have all-around faster/newer hardware. So I dropped in the CPU, connected all the cables, hit the power button and... nothing. It wouldn't POST!

Now I was really broken. What was the problem? When a home built computer won't even POST, it could be one of any number of components: CPU, motherboard, memory, power supply. My first thought was that the case itself was causing a short. So I took the motherboard out of the case, set it on some cardboard (non-conductive), and tried again... No dice. Now I got to play the swap-one-component-at-a-time game to isolate the problem.

Foolish as I am, I tried everything in the wrong order: hardest-to-easiest. The simplest first check would be to swap RAM. But I went straight to disassembling the backup server to borrow its parts:
  1. Tried a different power supply
  2. Tried a different CPU (the old BE-2350, that I had to pull from the backup server)
  3. Tried the 4850e in the backup server to verify the CPU wasn't dead
None of those worked, meaning, either the brand-new motherboard itself was dead, or the RAM was bad. I tried some different memory... success! After kicking myself for not trying the simpler troubleshooting first, I then kicked myself some more: I probably didn't even need a new motherboard in the first place. If only I had tried different RAM before impetuously ordering a new motherboard, I might not have had to deal with any of this mess!

On the other hand, I do have newer hardware that supports VDPAU, which is supported by MythTV.

Elated, I re-assembled the computer, hit the power button, and... now Linux fails to boot! Worse, my grub configuration specified the use of a splashimage, which wasn't found, resulting in garbled, unreadable boot text. I could make out enough of the text to tell that in mid boot there was a kernel panic, but couldn't discern the actual error. I then grabbed an Ubuntu installation/live CD. When it booted, I discovered that the hard drive wasn't being recognized. I turned to Google for my answer.

The second search result turned up this Ubuntu bug report. It talked about totally different hardware, by the symptoms were the same, basically:
ata1: SATA link up 3.0 Gbps (SStatus 13 SControl 30)
ata1.0: qc timeout (cmd 0xec)
ata1.0: failed to identify (I/O error, errmask=0x4)
ata1: failed to recover some devies, retrying in 5secs
ata1: SATA link up 3.0 Gbps (SStatus 13 SControl 30)
ata1.0: qc timeout (cmd 0xec)
ata1.0: failed to identify (I/O error, errmask=0x4)
ata1: failed to recover some devies, retrying in 5secs
The suggested workaround was simple enough: configure the SATA interface as AHCI in the BIOS (which I had already done), and pass the "pci=nomsi" option to the kernel at boot.

(Side note: that error message says "devies", shouldn't that be "devices"? Must have been a typo in that kernel version.)

I did exactly this, and... success! The system is now up and running. It hasn't even been 24 hours yet, but so far the system is stable.