Jump to content
RESET Forums (homeservershow.com)
Altecice

ILO stuck at 40c CPU temp and wont spin the fan up above 6%?

Recommended Posts

FCM

After reading through several threads, it seems like the onboard raid controller is designed to read hdd temps only when in raid mode *and* arrays are configured. Otherwise, it just reports a default temp which in turn makes the fan spin at 30%.

 

Moreover, it seems that with some sata drives, even if in raid mode and with configured arrays, temp reports are still defaulted due to the fact that some ssd do not report temp or is it a fixed value.

 

Furthermore, the CPU temperature reported by iLO is not tied to the intel on-die sensors, but most likely placed somewhere around the heatsink. That is way most of us report slow moving temps even when upgrading the CPU to a higher TDP one. The heatsink has trouble removing all the aditional heat which in turn gets accumulated faster in the CPU. (I'm sure this can be explaind by some fancy thermodynamics equations).

 

As for my specific config:

- I'm using Ubuntu 14.04 LTS - the driver (hpvsa) for the raid controller is not certified and as such not available in the install disk.

- I'm using linux software raid (madm)

- I've upgraded the CPU to Xeon E3 1230 V2

- I've added a new temp sensor readout to the htop linux utility which reads the temp on the CPU cores

- Strees testing with phoronix's linux test suite and cpustress test for about 10 minutes gets the CPU temp to around 70 degrees (I suspect thermal thorttling kicks in at this point, but I have yet to confirm it). During this time, the iLO temp read out goes up 3 degrees max. The fan also spins up 3% - 4% to 34% max

- When the server is idle, the fan sits at around 30%

- Updating to the latest bios which promises a fix for the exact issue above changed nothing

 

Conclusion: when the opportunity arises for a reinstall, i will look to switch to fedorda or centos, which seem to support the software raid controller by default.

 

PS: I've also tried to set the controller from AHCI SATA to Raid and not configure any arrays, the fan is still at 30%.

 

My personal suspicion is that HP pushed to get Canonical to pay for certification on the drivers and used this "not being able to read temps from hdd results high noise; temps can only be read through certified drivers" as leverage. I guess Canonical didn't pay. Fortunately, the server is placed in a somewhat remote area so the noise isn't an issue at all. I just wanted to get full functionatlity from a product I payed for, but HP decided that shouldn't happen.

Share this post


Link to post
Share on other sites
mar72kuss

Hello yesterday I have upgraded my gen8 with a xeon 1220v2. Checking temps I have noted same issue. .cpu fixed 40°..I have stressed cpu but no result. Resetted ILO..but no way.. for what I understand there is no solution? Only wait for 2016 ILO or bios. ?

 

Inviato dal mio SM-G925F utilizzando Tapatalk

Share this post


Link to post
Share on other sites
FCM

It seems that ILO reads CPU temperature not from the intel sensor (die) but from a separate sensor from the motherboard, close to the heat spreeder. That's why that temp appears to be stuck at 40 degrees.

 

I say appears because if you stress the server long enough (at least 30 mins), it will actually go up and the fan will increase its rpm

 

As for reading temps from Hdds, you need to have the controller set to raid mode and the appropriate drivers installed. Otherwise, the fan will not spool up based on Hdd temps.

Share this post


Link to post
Share on other sites
r00t
Posted (edited)
On 7/16/2015 at 1:32 PM, FCM said:

After reading through several threads, it seems like the onboard raid controller is designed to read hdd temps only when in raid mode *and* arrays are configured. Otherwise, it just reports a default temp which in turn makes the fan spin at 30%.

 

Moreover, it seems that with some sata drives, even if in raid mode and with configured arrays, temp reports are still defaulted due to the fact that some ssd do not report temp or is it a fixed value.

 

Furthermore, the CPU temperature reported by iLO is not tied to the intel on-die sensors, but most likely placed somewhere around the heatsink. That is way most of us report slow moving temps even when upgrading the CPU to a higher TDP one. The heatsink has trouble removing all the aditional heat which in turn gets accumulated faster in the CPU. (I'm sure this can be explaind by some fancy thermodynamics equations).

 

As for my specific config:

- I'm using Ubuntu 14.04 LTS - the driver (hpvsa) for the raid controller is not certified and as such not available in the install disk.

- I'm using linux software raid (madm)

- I've upgraded the CPU to Xeon E3 1230 V2

- I've added a new temp sensor readout to the htop linux utility which reads the temp on the CPU cores

- Strees testing with phoronix's linux test suite and cpustress test for about 10 minutes gets the CPU temp to around 70 degrees (I suspect thermal thorttling kicks in at this point, but I have yet to confirm it). During this time, the iLO temp read out goes up 3 degrees max. The fan also spins up 3% - 4% to 34% max

- When the server is idle, the fan sits at around 30%

- Updating to the latest bios which promises a fix for the exact issue above changed nothing

 

Conclusion: when the opportunity arises for a reinstall, i will look to switch to fedorda or centos, which seem to support the software raid controller by default.

 

PS: I've also tried to set the controller from AHCI SATA to Raid and not configure any arrays, the fan is still at 30%.

 

My personal suspicion is that HP pushed to get Canonical to pay for certification on the drivers and used this "not being able to read temps from hdd results high noise; temps can only be read through certified drivers" as leverage. I guess Canonical didn't pay. Fortunately, the server is placed in a somewhat remote area so the noise isn't an issue at all. I just wanted to get full functionatlity from a product I payed for, but HP decided that shouldn't happen.

 

I have exactly the same problem on one DL380p, which has the latest BIOS and iLO 2.72 with CPUs Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz.

 

No matter what OS I use, I have the same result.

 

The funny thing is that I have another DL380p with older BIOS and iLO 2.02 with CPUs Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, same OS as the other one and the temperature readings are working (but below the readings I get from psensor).

 

Someone found a solution of this issue?

 

Thanks.

 

Edited by r00t

Share this post


Link to post
Share on other sites
r00t
On 1/13/2016 at 6:26 PM, FCM said:

It seems that ILO reads CPU temperature not from the intel sensor (die) but from a separate sensor from the motherboard, close to the heat spreeder. That's why that temp appears to be stuck at 40 degrees.

 

 

That may be the reason why is stuck at 40C, because even in the server the temp is not fixed at 40C the readings are way below the readings I get from psensor...

Share this post


Link to post
Share on other sites
r00t

I've observed that the difference in the readings depends heavily on the inlet temp. In the first server the inlet temp is 11-12C and in the second the inlet temp is 23-25C . It turns out that the error difference in temp reading is larger in the first.

 

Share this post


Link to post
Share on other sites
r00t

I was hoping windows 10 can handle this issue better, but it is even worse, because fans are not even detected and by the noise I am guessing are always at 6%.

 

Share this post


Link to post
Share on other sites
schoondoggy
1 hour ago, r00t said:

I was hoping windows 10 can handle this issue better, but it is even worse, because fans are not even detected and by the noise I am guessing are always at 6%.

 

Have you gone into the BIOS and changed the cooling profile?

Share this post


Link to post
Share on other sites
r00t
Posted (edited)
1 hour ago, schoondoggy said:

Have you gone into the BIOS and changed the cooling profile?

 

Yes it is exactly the remedial I am using for the moment.

 

The issue is that the "increased" option (not even "maximum") just generates a lot of unnecessary noise.

 

Even at 0% cpu use, the fans are spinning between 33% and 43% of the total capacity.

At 100% cpu load the fan speed increases just 2% to 3% and the cpu temp is below 50%.

 

So the increased option is just brute force.

 

I think the problem is that hp is not using the cpu sensors but other motherboard sensor (named "PROCESSOR_ZONE")

 

#1        AMBIENT              12C/53F    42C/107F 
#2        PROCESSOR_ZONE       40C/104F   70C/158F 
#3        PROCESSOR_ZONE       40C/104F   70C/158F 
#4        MEMORY_BD            14C/57F    87C/188F 
#5        MEMORY_BD            14C/57F    87C/188F 
#6        MEMORY_BD            14C/57F    87C/188F 
#7        MEMORY_BD            14C/57F    87C/188F 
#8        MEMORY_BD            16C/60F    87C/188F 
#9        MEMORY_BD            17C/62F    87C/188F 
#10       MEMORY_BD            15C/59F    87C/188F 
#11       MEMORY_BD            15C/59F    87C/188F 
#12       SYSTEM_BD            35C/95F    60C/140F 
#13       SYSTEM_BD            44C/111F   105C/221F
#14       POWER_SUPPLY_BAY     21C/69F     -       
#15       POWER_SUPPLY_BAY      -          -       
#16       POWER_SUPPLY_BAY     19C/66F    75C/167F 
#17       SYSTEM_BD            21C/69F    115C/239F
#18       SYSTEM_BD            23C/73F    115C/239F
#19       SYSTEM_BD            24C/75F    115C/239F
#20       SYSTEM_BD            21C/69F    115C/239F
#21       SYSTEM_BD            22C/71F    115C/239F
#22       SYSTEM_BD            24C/75F    115C/239F
#23       SYSTEM_BD            19C/66F    90C/194F 
#24       SYSTEM_BD            20C/68F    90C/194F 
#25       SYSTEM_BD            45C/113F   100C/212F
#26       SYSTEM_BD            22C/71F    90C/194F 
#27       I/O_ZONE              -          -       
#28       I/O_ZONE              -          -       
#29       I/O_ZONE             48C/118F   100C/212F
#30       I/O_ZONE              -          -       
#31       I/O_ZONE              -          -       
#32       I/O_ZONE              -          -       
#33       I/O_ZONE              -          -       
#34       I/O_ZONE             18C/64F    65C/149F 
#35       I/O_ZONE             19C/66F    66C/150F 
#36       I/O_ZONE             20C/68F    66C/150F 
#37       I/O_ZONE              -          -       
#38       I/O_ZONE              -          -       
#39       I/O_ZONE              -          -       
#40       I/O_ZONE             21C/69F    66C/150F 
#41       I/O_ZONE              -          -       
#42       SYSTEM_BD            15C/59F    95C/203F 
#43       SYSTEM_BD            28C/82F    90C/194F 
#44       SYSTEM_BD            20C/68F    80C/176F 
#45       SYSTEM_BD            11C/51F    65C/149F 
#46       SYSTEM_BD            22C/71F    75C/167F 
#47       SYSTEM_BD            20C/68F    75C/167F 
#48       SYSTEM_BD            23C/73F    75C/167F 
#49       CHASSIS_ZONE         21C/69F    75C/167F 
#50       CHASSIS_ZONE         20C/68F    75C/167F 
 

As you can see in my temp output, I have my racks in a low temp room (10C) in order to optimize cooling but it turns out that hp does an incredible sloppy CPU temp sensoring. They have placed 50 sensors in the rack but none on the CPUs. It's totally insane.

 

 

Edited by r00t

Share this post


Link to post
Share on other sites
r00t

What I am wondering now is if there is any way to fake "PROCESSOR_ZONE" sensors to believe the temp is higher.

 

Perhaps to sandwich a thin copper wire between the processors and solder that wire to the temp sensors....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...