Closed
Bug 428123
Opened 17 years ago
Closed 16 years ago
win32 buildbot slaves should reboot ready for use
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: bhearsum)
References
Details
Attachments
(7 files, 2 obsolete files)
33.99 KB,
image/png
|
Details | |
47 bytes,
text/plain
|
Details | |
1.68 KB,
text/plain
|
Details | |
3.48 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
6.91 KB,
patch
|
ted
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
2.19 KB,
patch
|
coop
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
191 bytes,
text/plain
|
Details |
Splitting out from bug#417887, as each o.s. will have different gotchas.
Basically, how to make each buildbot master/slave reboot cleanly, reconnect and handle new jobs?
Assignee | ||
Comment 1•17 years ago
|
||
Buildbot has a script that will add it as a Windows Service. IIRC there was problems related to not having a real console or not having a desktop to launch things on. This requires further investigation.
Comment 2•17 years ago
|
||
example services properties dialog on vista. Note the Interact With Desktop switch.
Comment 3•17 years ago
|
||
I believe there's also a way to set a service's running policy through the properties dialog. I don't have a local copy of win2k3 to test this though.
Updated•17 years ago
|
Summary: win32 buildbot masters/slaves should reboot ready for use → win32 buildbot slaves should reboot ready for use
Assignee | ||
Comment 4•16 years ago
|
||
We chatted a bunch about this today and decided that part of this will be doing scheduled, periodic reboots of staging machines both to iron out kinks in the rebooting and to look for potential performance gains.
Status: NEW → ASSIGNED
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → bhearsum
Assignee | ||
Comment 5•16 years ago
|
||
Starting working on this today. Looking promising, Buildbot was able to launch Firefox after being started as a win32 service. Initial problems:
* $PATH is different, I imagine it doesn't inherit the system or user set $PATH. We can probably fix this from Buildbot.
* Noticed MochiTest saying a lot of things like 'INFO Error: Unable to restore focus, expect failures and timeouts.' yet the tests still pass.
I haven't run a full set of unittests yet, but I plan to soon. I'm sure there's going to be more problems down the road, but I'm encouraged by these initial results.
*fingers crossed*
Priority: P3 → P2
Assignee | ||
Comment 6•16 years ago
|
||
So, it turns out we get *tons* of failure when Buildbot is started as a service. I suspect this is completely because of the fact that firefox.exe isn't running in any sort of "real" Desktop. I tried a few things to work around this, including running as the Local System Account with "allow desktop access" checked - but that made no difference.
Before going further into this blackhole I'm doing a test to see if running unittests from the "console" (that is to say, the real display) makes a difference in terms of time. Going this route is much more well documented. There's lots of information about how to automatically logon in Windows 2003, start processes, etc.
If it increases test run time by a significant amount I'll try and track down the issues with running as a service.
Assignee | ||
Comment 7•16 years ago
|
||
As it turns out, running tests in the "console" session of a win32 VM makes almost no difference in timing. Unit tests overall took 2 minutes longer - which is a trivial amount in the grand scheme of things.
Additionally, unittests pass completely when running here (nb. I tripped a legitimate mochitest leak, filed in bug 477066).
Given the above, I'm going to work on clean reboots using the console session.
Assignee | ||
Comment 8•16 years ago
|
||
Made good progress today. Here's a summary of what I did on moz2-win32-slave21:
* Installed RealVNC
* Turned off Firewall
* Edited its VMX file by hand to add the following:
svga.maxHeight = 1024
svga.maxWidth = 1280
svga.vramSize = 16777216
* Added a couple of batch files (to be attached shortly) to aid in starting buildbot on boot.
* Edited the registry to automatically login cltbld
It's now currently logging in and starting Buildbot on boot, and currently running unittests. I'm going to let things run overnight and if it looks good I'll be looking to apply these changes to all of the staging slaves next week.
Assignee | ||
Comment 9•16 years ago
|
||
Had some additional fails overnight. Some of them are related to the fact that there's no audio driver in the console session.
I tried installing http://www.rigexpert.net/gettingstarted/reaudio.htm, which helped, but some tests ended up hanging.
Then, I installed the demo of http://software.muzychenko.net/eng/vac.html and video tests started passing. The demo claims to be "feature limited", which I suspect means some recording features aren't available. More importantly, the demo doesn't seem to be time limited, so I think we're totally within our rights to use it for as long as we want. I'm going to leave moz2-win32-slave21 running builds and tests over the weekend to get some more results out of it.
Assignee | ||
Comment 10•16 years ago
|
||
Things ran perfectly well on mozilla-1.9.1 unittests over the weekend. The only failure was the mochitest leak mentioned in comment #7. Given that, I think I'm ready to roll this out into the real staging environment. I'm tempted to adjust the mochitest leak threshold to cope with the failures for now...I'm not getting the impression it'll be easy to get energy on bug 477066 right now.
Assignee | ||
Comment 11•16 years ago
|
||
Assignee | ||
Comment 12•16 years ago
|
||
Assignee | ||
Updated•16 years ago
|
Attachment #362741 -
Attachment is patch: false
Assignee | ||
Comment 13•16 years ago
|
||
Password removed.
Reporter | ||
Comment 14•16 years ago
|
||
(In reply to comment #10)
> Things ran perfectly well on mozilla-1.9.1 unittests over the weekend. The only
> failure was the mochitest leak mentioned in comment #7. Given that, I think I'm
> ready to roll this out into the real staging environment. I'm tempted to adjust
> the mochitest leak threshold to cope with the failures for now...I'm not
> getting the impression it'll be easy to get energy on bug 477066 right now.
Making these changes in staging is fine. However, bug#477066 needs to be fixed (or the test disabled?) before we can make these changes to the production slaves.
Depends on: 477066
Assignee | ||
Comment 15•16 years ago
|
||
(In reply to comment #14)
> (In reply to comment #10)
> > Things ran perfectly well on mozilla-1.9.1 unittests over the weekend. The only
> > failure was the mochitest leak mentioned in comment #7. Given that, I think I'm
> > ready to roll this out into the real staging environment. I'm tempted to adjust
> > the mochitest leak threshold to cope with the failures for now...I'm not
> > getting the impression it'll be easy to get energy on bug 477066 right now.
>
> Making these changes in staging is fine. However, bug#477066 needs to be fixed
> (or the test disabled?) before we can make these changes to the production
> slaves.
I guess that's an option. But we have a --leak-threshold for Mochitest specifically so we can run tests that are known to cause leaks, and not turn the tree orange.
Assignee | ||
Comment 16•16 years ago
|
||
Here's more detailed instructions on how to deploy:
* Shut down VM, add the following lines to its vmx file:
svga.maxHeight = 1024
svga.maxWidth = 1280
svga.vramSize = 16777216
* Start the VM back up again, login as Administrator
* Download VNC from: http://realvnc.com/products/free/4.1/download.html
* Install with defaults
* When post-install dialog pops up set a password and turn off the java viewer (configure -> 'Serve Java Viewer...')
* Start -> Run -> 'services.msc'
* Disable and turn off Windows Firewall
* Download and install http://software.muzychenko.net/vac409.zip
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362743, edit with proper password, import into registry.
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362741 to ~cltbld/start menu/programs/startup
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362742 to /d/mozilla-build
* Make sure the Buildbot slave is located in /e/builds/moz2_slave (if you have to rename the directory make sure to update buildbot.tac).
* Restart
* Login with VNC and set resolution to 1280x1024
From this point forward you should NOT be logging in as cltbld with RDP.
Assignee | ||
Comment 17•16 years ago
|
||
One last thing, cltbld must be given permission to reboot the system:
* Start menu -> Run -> gpedit.msc
* Computer Configuration -> Windows Settings -> Security Settings -> Local Policies -> User Rights Assignment
* Double click 'Shut down the system', add cltbld to the list.
* Reboot for the changes to take effect.
Assignee | ||
Comment 18•16 years ago
|
||
Removing blocking in favour of setting the leak threshold.
No longer blocks: 472517
Assignee | ||
Comment 19•16 years ago
|
||
Pretty simple patch, just allows you to pass a leak threshold on to the mochitest step.
Attachment #362779 -
Flags: review?(catlee)
Assignee | ||
Comment 20•16 years ago
|
||
Pretty simple master side patch. Enable reboots every 5 builds, just like Linux and Mac, and add the 188 byte mochitest leak threshold to win32 builds.
Attachment #362781 -
Flags: review?(catlee)
Assignee | ||
Comment 21•16 years ago
|
||
Nick pointed out to me yesterday that it would be better not to download the RealVNC and software audio driver from the internet every time we need it. I'm going to import them into the mofo repo for safekeeping and update my instructions.
Assignee | ||
Comment 22•16 years ago
|
||
Comment on attachment 362781 [details] [diff] [review]
periodic reboots + leak threshold
I need to update the leak thresholds here.
Attachment #362781 -
Flags: review?(catlee)
Assignee | ||
Comment 23•16 years ago
|
||
After examining the logs on staging-master I've noticed that sometimes we leak 188 bytes, and sometimes we leak 200. I guess this means the threshold needs to be 200, which kindof sucks since it means it's possible to miss another leak (albeit, a small one). Is there a better way of dealing with this?
Attachment #362781 -
Attachment is obsolete: true
Attachment #362904 -
Flags: review?(ted.mielczarek)
Assignee | ||
Comment 24•16 years ago
|
||
Alright, those two packages are now checked into the mofo repo:
Checking in vac409.zip;
/mofo/ref-platforms/win32/vac409.zip,v <-- vac409.zip
initial revision: 1.1
done
RCS file: /mofo/ref-platforms/win32/vnc-4_1_3-x86_win32.exe,v
done
Checking in vnc-4_1_3-x86_win32.exe;
/mofo/ref-platforms/win32/vnc-4_1_3-x86_win32.exe,v <-- vnc-4_1_3-x86_win32.exe
initial revision: 1.1
done
Updated•16 years ago
|
Attachment #362779 -
Flags: review?(catlee) → review+
Comment 25•16 years ago
|
||
Comment on attachment 362904 [details] [diff] [review]
leak threshold for tm, 1.9.1, not for m-c
I am saddened, but bhearsum says he is looking into what patch fixed this on m-c.
Attachment #362904 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Updated•16 years ago
|
Attachment #362904 -
Flags: checked‑in+
Assignee | ||
Updated•16 years ago
|
Attachment #362779 -
Flags: checked‑in+
Assignee | ||
Comment 26•16 years ago
|
||
It's looking like moz2-win32-slave03 is able to run mochitests without leaking. Seems like there's some subtle difference between it and the other two. I'm going to try and track down what it is so the leak threshold isn't necessary.
Assignee | ||
Comment 27•16 years ago
|
||
So I misread before, moz2-win32-slave03 *and* 04 were passing all of the unittests. Only moz-win32-slave21 was failing. The only appreciable difference I found was a software audio driver I was testing being installed on it. After uninstalling that the tests have started passing. I have no idea if this is coincidence or what, I'm not sure how this driver (which isn't a browser plugin AFAIK). I don't see any suspicious checkins to 1.9.1, either.
I have some other things to do right now, so I'm just going to let this run in staging for a few days or a week and monitor it. If things stay green we can turn the leak threshold down to 0 and proceed.
Assignee | ||
Comment 28•16 years ago
|
||
Not a single run of 1.9.1 unittests on moz2-win32-slave21 since Friday. However, as of Friday, 6:30pm EST it was still leaking. I'd like one more run to confirm this before digging deeper...
Assignee | ||
Comment 29•16 years ago
|
||
moz2-win32-slave21 is still failing. As a last resort, I'm going to try recloning the VM and applying the changes exactly as I did to slave03 and 04. Maybe there's something strange from when I was testing RDP and other various things?
Assignee | ||
Comment 30•16 years ago
|
||
After recloning moz2-win32-slave21 it seems that the mochitest leak has gone away. I suspect something I did to it early on tripped the failure. I'm going to let it run for a day or two before declaring it gone for realz, though.
Assignee | ||
Comment 31•16 years ago
|
||
Disable the leak threshold on 1.9.1/tm, since we haven't seen it in forever.
Attachment #365216 -
Flags: review?(ccooper)
Updated•16 years ago
|
Attachment #365216 -
Flags: review?(ccooper) → review+
Assignee | ||
Comment 32•16 years ago
|
||
Comment on attachment 365216 [details] [diff] [review]
backout leak threshold
changeset: 976:27c75f479ff3
Attachment #365216 -
Flags: checked‑in+
Assignee | ||
Comment 33•16 years ago
|
||
I'm planning to roll this out on Monday, March 16th starting in the EDT morning. It's probably going to take half the day or so to fully deploy, but no downtime will be needed.
No longer blocks: 472517
Assignee | ||
Comment 34•16 years ago
|
||
Updated deployment instructions:
* Shut down VM, add the following lines to its vmx file:
svga.maxHeight = 1024
svga.maxWidth = 1280
svga.vramSize = 16777216
* Start the VM back up again, login as Administrator
* Start menu -> Run -> gpedit.msc
* Computer Configuration -> Windows Settings -> Security Settings -> Local
Policies -> User Rights Assignment
* Double click 'Shut down the system', add cltbld to the list.
* Reboot for the changes to take effect.
* Download VNC from: http://realvnc.com/products/free/4.1/download.html
* Install with defaults
* When post-install dialog pops up set a password and turn off the java viewer
(configure -> 'Serve Java Viewer...')
* Start -> Control Panel -> Windows Firewall
* Add TCP/5900 as an exception.
* Download and install http://software.muzychenko.net/vac409.zip
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362743, edit with
proper password, import into registry.
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362741 to
~cltbld/start menu/programs/startup
* Download https://bugzilla.mozilla.org/attachment.cgi?id=362742 to
/d/mozilla-build
* Make sure the Buildbot slave is located in /e/builds/moz2_slave (if you have
to rename the directory make sure to update buildbot.tac).
* Restart
* Login with VNC and set resolution to 1280x1024
From this point forward you should NOT be logging in as cltbld with RDP.
Assignee | ||
Comment 35•16 years ago
|
||
Attachment #362743 -
Attachment is obsolete: true
Assignee | ||
Comment 36•16 years ago
|
||
I got the last slave updated today. This is done!
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•