Details Matter, episode #349523
by David Ross
In a series of posts from last year, I described the process of getting master laptops ready for an event such as Pulse. 400 laptops with approximately 30 different image sets configured to host about 180 different hands-on lab experiences. That was a lot of work. Yet, as I discovered during final setup, there was much more work to be done. The Wednesday before the opening of Pulse on Sunday, the process of cloning out the master images to the image sets began. I arrived Wednesday evening, and by Thursday afternoon, we had all the laptops ready to go, right on schedule.
So far, so good.
This year, we went with a wireless network in the lab room, with the thinking that it would be far simpler and much neater overall. We were right on some of those counts. Setup and takedown was much faster, and it looked much nicer than previous events. However, I was totally unprepared for what happened around the wireless network configuration.
By Friday afternoon at 2:00, we had all the laptops “locked and blocked” (cable locks in place, power bricks connected), and they were connecting to our lab SSID. We were having some network issues, but I was thinking: “Wow, all we have to do is iron out a few network details and we are DONE!”
Details, details, details. This is being written after the fact, so my timelines and sequencing are probably a bit different from reality, but you will see some of the things we did not plan for, and in some cases could not have known about, prior to the event setup in Las Vegas.
The first major problem was the inability to connect to the survey server in the room. In order to track metrics, we asked participants to provide feedback, which was served from a laptop in the room and captured to an SQL database. For about 2 hours, I struggled with the ability to connect to the survey server. Meanwhile, the networking team was busy with other issues across the entire conference, so we were stilted in our interactions. Finally, I had one of their team come by and we began investigation. I was thinking we were on the wrong SSID, had a gateway wrong, or something. I could ping the gateway, but not the server. The server could ping the gateway, but not a laptop on the network. The simple answer: “We are blocking peer-to-peer traffic in here.”. DOH! So, they removed the block, and we could see the survey server. One down.
In the meantime, we had rogue SSIDs appearing. Our SSID was PULSELABS, hidden and passphrase-protected. We had Pulselabs and pulselabs SSIDs broadcasting from an ad-hoc network. More investigation and exploration ensued, and we found out that some of the lab proctors (there were about 50 of us) had misconfigured some of the laptops to be on an ad-hoc network, instead of defining an infrastructure connection. Jose and BJ from the network team brought a sniffer and we isolated and killed all of the ad-hoc networks in the room. This was late Friday, and I was still nervous, but optimistic.
Even though we were connecting to the proper SSID, our connections were not stable. To test, I set up a row of laptops with working connections. After about 10 to 15 minutes, half of those systems were disconnected. I made another call, and I had “the gang” from the networking team in the room shortly afterward. They were looking at their radios and access points, comparing MAC addresses to behavior they were seeing, and they agreed that something had to be done. It was now evening, so they agreed to continue working with me on Saturday.
We were still ahead of schedule, but the clock was ticking – labs opened Sunday morning, and some proctors had not checked their labs out yet, since they required connections from the lab room to remote systems. The adventures continue in the next post!
David Ross is a senior Technical Enablement Specialist with IBM Corporation in the Tivoli Cloud Enablement group. David specializes in Tivoli Service Automation Manager and IBM SmartCloud Provisioning. He joined IBM in 2000.