Jump to content
RESET Forums (homeservershow.com)

Sporadic Overheat Condition reported in ILO, causing shutdown (Sensor 4 - HD-MAX) - ILOv4 Bug?


raholmesuk
 Share

Recommended Posts

Hi all,

 

First post on here, so bear with me, i've had a HP G8 Microserver for little over 4-5 months, been following some suggestions on here to add additional disks, get lower ILO4 temperatures reported, and generally make the server 'quieter' without reducing cooling.

 

The problem I have seems to happen sporadically (currently has happened twice in under 1 month). The server shuts down, reporting a few errors in the 'Integrated Management Log' (IML) and the ILO Event Log around the time that the shutdown occured.

 

I can turn the server back on, without issue, the server fans didnt (appear) to have stopped, or the temperature in the room or where the server is, has not increased, so this 'seems' more like an ILO bug or similar, but thought i'd ask if anyone has had similar issues...

 

The errors reported are:

'Server power removed'

'Embedded Flash/SD-Card: Failed restart' (after shutdown though! - 2mins after)

'System Overheating (Temperature Sensor 4; Location System: Temperature XXXc)' -- temperatures reported at 90c and 122c (unlikely/impossible where the server is!)

 

Screen Shot 2014-04-20 at 19.34.02.png

Screen Shot 2014-04-20 at 19.34.19.png

Screen Shot 2014-04-20 at 19.43.31.png

 

The server setup is as below:

HP Microserver G8 - Bios J09, ILO v1.40, HP Intelligent Provision 1.60

16GB - 2x Kingston KTH-PL316E/8G

Intel Xeon E3-1230v2 CPU (replacing G1610T) - Hyperthreading enabled, turbo disabled, VT-d enabled

Onboard RAID Enabled - B120i - 2x SSD's (Crucial M500s) setup in RAID1 - firmware v3.54 - in ODD bay - never run hot!

HP P222 (PCI slot) - Firmware v5.22 - 4x 4TB WD-Reds connected, RAID5

HP Custom ESXi Image - v5.5 U1 - 17th March 2014 - running off 32GB Samsung SD in SD slot.

 

Mods:

Antec Spot Cool 80mm Fan to cool HP P222 attached next to PSU, ambient temperatures as in one of the screenshots, not overly -high- that I can see, appears to be that keeping HP P222 and the ambient Inlet temperature low, stops the fans going above 11%, otherwise they creep up to 50% (once inlet goes over 33c)...

 

 

Seems to always be Sensor 4 (HD-Max) that has the insanely high temperature (supposidly).

 

Any ideas? - if more information is needed, please do ask :)

 

Thanks,

Richard

 

 

Edited by raholmesuk
Link to comment
Share on other sites

I don't have a Gen8 but do you know where Sensor 4 is located?  I wonder if the thermal compound on one of the heat sinks needs to be redone.

Link to comment
Share on other sites

Which fan setting do you have in BIOS?  With my 45W TDP CPU I've found the Intermediate Cooling as low a setting in BIOS that I'm comfortable -- even with extra fans assist.  Just my 2 cents.

Link to comment
Share on other sites

I've no idea if there actually is a temperature sensor 4, from reading HP docs/quickspecs it seems to suggest its a cumulative value based on HDD temperatures, i'm guessings it read via SMART or something similar? :S - either way, in the screenshot, temperature of sensor 4 is only 35~C, so its not like its at 90 or 122C as reported based on the ILO Event Logs...

 

Regarding fan setting, I have mine set to 'Optimal Cooling' (not increase or max etc); and power settings in bios are set to Dynamic. Not 100% sure I believe the CPU @ 40C, but the passive heatsink never seems to even 'feel' hot (with the case off - so cooling 'less' efficient at that time..) Highest temperature I see is the PCI slot, but thats due to the HP P222 RAID Controller, that (stupid HP!); shouldnt really be specified as the 'officially supported' controller for the HP Microserver G8, as there is next-to-NO cooling on the PCI side, and the card essentially create a 'air blockage' by even being in the slot... max temperature i've ever seen of -any- of the temp sensors, is 90C, and that was the HP P222, when i had no cooling to the card at all, and the case off...

 

Ordinarily, the HP P222 averages between 58C and 66C, which is not 'insanely'hot.. 

 

I've tried updating firmwares of the SSD's (Crucial M500s) and the only 'tweaks' I can find for the WD Red's is the LLC patch ( http://support.wdc.com/product/download.asp?groupid=619&sid=201&lang=en ). Will see how it goes... 

 

If anyone has any other suggestions or hints as to where (if it exists) Temp Sensor 4 is?; debating logging a call with HP Support, but thought I'd ask first :)

 

@Joe_Miner: i'm gonna install 4x Noctua NF-A4x10 fans in push/pull setup either side of the passive heatsink like i've seen someone on here do, to increase the overall cooling around the CPU area ;)

Link to comment
Share on other sites

You may want to re-think the 40mm Noctua fans – I tried installing 2 on my CPU heat sink and I had some serious PIA clearance issues – Ultimately I tried the Evercool EC3007M12CA 30mm x 30mm x 7mm and that worked. On the P222 I mounted two Schoondoggy fans with one targeted to my SDM and the other targeted at the P222 which for the last 2 days has kept my P222 temps below 50C. My BIOS is set on Intermediate Cooling but my system Fan speed has yet to move up from 35%. The only high temp I’m now seeing now is with my LOM which has gotten as high as 55C. I’m still messing and hope to post some pictures and results soon – I did put a couple of Pictures up on my Twitter feed of the Fans.


I'm doing this from my phone which limits me at the moment as soon as I can I'll get those pictures posted

 

This is the 30x30x7mm Evercool I tied to the CPU Heat Sink with 20ga wire

gallery_1229_89_111318.jpg

 

The two Digi-Key Blowers -- 1 directed to SDM and 1 directed to P222

gallery_1229_89_42603.jpg

 

 

 

Sensor data -- I still have more testing to do but P222 temps have not gone above 50C in 2 days and the System Fan is at 35% -- I have Increased Cooling as my Fan Profile in BIOS

gallery_1229_89_51900.jpg

 

I could not get adequate clearance to install the system board with the 40x40x10mm Noctua's and had to abandon that approach.

Edited by Joe_Miner
Link to comment
Share on other sites

  • 2 weeks later...

Which fan setting do you have in BIOS?  With my 45W TDP CPU I've found the Intermediate Cooling as low a setting in BIOS that I'm comfortable -- even with extra fans assist.  Just my 2 cents.

 

Note further that the OP's CPU is a 69w TDP processor.

 

If things aren't done just right, I could easily see thermal overload using the E3-1230v2, even though it works in the Gen8 compatibility-wise.  This would be especially true when running a SmartArray P222, which adds more heat, plus a ton of peripherals.

 

Raholmesuk, I'd definitely set the fan speeds higher in BIOS; if that isn't an option for you due to noise, I'd switch the CPU for a low-voltage Xeon (e.g., the E3-1260L or E3-1265L v2), as the Gen8's heatsink was designed for a 35w TDP.

Link to comment
Share on other sites

  • 8 months later...

First off, Hi, it’s my first post here. I work in IT with Windows servers, but I have gained some great information in these forums.

 

I got a Gen8 (G1610T) at the beginning of December, I already have a N36 and N54, but at the time I picked it up for £189 (when others were still selling it for at least £100 more), so it was just too good a deal to resist (though in the UK you can now pick it up for about £150).  

 

Anyway, just wanted to say that I have also had this Sensor 4 - HD-MAX high temp, and then server shutdown.

 

I am running the server with no mods, with a M550 SSD off the internal “5th” SATA port (in B120 raid mode). The server came with BIOS 06/06/14, and I updated the ilo from 1.x (can’t remember what it was), to 2.00. As a test I installed Windows 2012, and had that on for about a week, and all was fine. I then installed ESXi 5.1 with the HP image, and a day or so later the server shutdown with the Sensor 4 - HD-MAX high temp error (saying it was about 93c). Now the server is in a “storage” cupboard about 2ft x 4ft x 8ft, and at the time the ambient air intake was between about 25c and 30c, and every time I had looked at the temps, sensor 4 always said 35c, so I assumed it was an “error”.

 

I started the server back up, and all was ok for about 36 hours, and then it want again. I decided to log a call with HP, but also had a look around the web, and on the HP site discovered that there was a later ilo firmware, 2.03, and in the notes it mentions about incorrect temp readings, however it mentioned something about cpu, or sensor 1, when I was getting it with sensor 4.

 

HP called me back, and asked for some logs, so that evening I provided them, but also at the same time I updated the ilo to 2.03.

 

The next day HP said they were sending me a new motherboard, and would I be ok fitting it. I said I would, but I was going away for Christmas, so it would be a week or so before I would be able too.

 

While I was away the server was ok, and it didn’t shut down again, so assumed the ilo update had fixed it, however despite this I decided to replace the motherboard anyway.

 

The new (I say “new”, but think it was either a referb, or removed from another server, as on in the ilo it had an entry in the “server name”, and the ilo event logs id number was already over 1000) motherboard was on an older BIOS, so I updated this to 06/06/14, and also updated the ilo to 2.03, and the server ran ok (with ESXi still) for about a week or so. I have since re-built it to Server 2012 (and added 4 * 3TB NGST NAS Drives), and that has also been fine.

 

Not sure if it was just coincidence, but mine occurred a day or so after installing ESXi 5.1 (after running 2012 ok for a week) so that may play a part, also the fact I just had a SSD installed, and nothing in the drive bays, but if anyone else gets the error, try updating to ilo firmware 2.03.  

 

Ian.

Edited by iandrews
  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...