Incident History
View Incident History
Completed [22/10/2024 11:32]
Resolved [21/10/2024 21:03]
Completed [30/09/2024 22:01]
Works completed successfully.
Completed [24/06/2024 22:59]
Works have been completed successfully with no interuption to service.
Resolved [21/05/2024 14:11]
The issues impacting the broadband checker and ordering processes has been resolved. We apologise for the disruption this has caused.
Resolved [21/05/2024 09:13]
Completed [09/05/2024 09:35]
Resolved [30/04/2024 17:37]
The fibre breaks have been corrected and we are seeing normal paths and performance restore. Apologies for the disruption caused.
Resolved [10/04/2024 13:54]
Resolved [29/02/2024 17:16]
== Reason for Outage Summary ==
During routine configuration updates, unforeseen repercussions occurred in an unrelated segment of our network, leading to disconnections to approximately 30% of our BT Wholesale based broadband connections. Leased lines were not impacted. Upon identification, the configuration changes were promptly reverted, initiating service restoration. However, due to the inherent nature of PPP connections, some customer devices experienced delays in reconnecting, resulting in a number of lingering stale sessions.
== Response and Mitigation ==
The incident has been attributed to a potential bug and has been escalated to our vendor's Technical Assistance Center (TAC) for thorough investigation. Following the restoration process, the service has stabilised, and we have no expectations of a recurrence.
Resolved [17/01/2024 10:57]
Virgin have yet to provide an official RFO. So far they have not been able to explain the outage experienced which affected ourselves, other ISP, and their own retail broadband operations.
Resolved [11/12/2023 12:08]
Service has now restored.
Resolved [24/10/2023 10:49]
Incident has now been resolved
Completed [12/09/2023 20:06]
Resolved [05/09/2023 13:46]
As previously reported, the ultimate cause of the outage was a crash of an active switch in a virtual switch chassis at our Telehouse North PoP following the replacement of the failed standby switch. This is a procedure that we have carried out many times in the past and it has always been a hitless operation and is indeed documented as such. Following post-mortem analysis involving vendor TAC it has been concluded that the supervisor on the active switch must have entered a partially failed state when it switched over from standby to active after the switch failure the following week. Had this been visible to us in any way we would have scheduled the replacement work in an out of hours maintenance window. In light of this incident we will of course plan to carry out replacements of this nature out of hours should we see any switch failures in these systems going forwards.
This particular switch chassis had an uptime of just over six and a half years prior to the outage last week. Despite this solid stability we are now planning to move away from these virtual switch systems as part of our planned network upgrades. This will see our network transition to a more modern and efficient spine-leaf architecture where the failure of a single device will have limited to no impact to service. These upgrades will see significant investment and will be rolled out to all PoPs within the next 1-2 years.
All maintenance work at our THN PoP is now complete and its previous stability is being observed. Please accept our apologies again for the downtime witnessed.
Resolved [25/05/2023 02:38]
The issue was associated to planned notifications from Virgin (C01390678). The list of associated circuits was not exchaustive and hence the confusion. The planned works are now complete and no further disruption is expected.
Resolved [03/05/2023 17:11]
The issue has now been resolved. Apologies for the disruption caused.
Resolved [13/04/2023 17:51]
All affected circuits have now been restored. Refer to the incident in the control panel for further details from the carrier regarding the cause.
Resolved [22/12/2022 16:52]
Virgin Media confirmed they replaced a faulty transmission card to restore all services.
Resolved [21/12/2022 10:47]
In summary, here are details of the issue observed on Friday afternoon / evening:
- It was observed that there was a significant and unexpected memory leak on core equipment in our Telehouse West (THW) core.
- It was determined that the best course of action was to carry out a controlled reload out of hours.
- We began slowly culling broadband sessions terminating at THW and steering them to other PoPs in preparation.
- A short time later the memory exhausted on the THW core, the BGP process terminated and resulted in all broadband sessions on LNSs at the PoP disconnecting.
- All broadband circuits that were operating via THW were automatically steered to other PoPs in our network.
- At this point we had no choice but to carry out an emergency reload of the core.
- Leased lines operating from THW were impacted throughout.
- Reload of the core took 30 minutes to complete, however a secondary issue was identified with the hardware of one of the switches.
- Half of the leased lines were restored, whilst on-site hands moved the affected NNIs from the failed switch to the other. This involved configuration changes.
- Circuits were impacted between 1 hour and 4 hours at worst. The majority of circuits were up around the 1 to 2 hour point.
- We are not set to move the NNIs again, to ensure that there is no further disruption.
- Owing to fulfilment issues the replacement hardware is now expected to arrive today, but to avoid any further risks, installation will be postponed until the New Year.
- We have raised the memory leak issue with Cisco TAC.
We apologise for the disruption this would have caused.
Resolved [21/12/2022 21:26]
Resolved [08/12/2022 14:17]
Resolved [19/10/2022 12:38]
The issue has been traced to a line card rebooting on a switch at our Telehouse West PoP. This resulted in some carrier NNIs going offline briefly and subsequently the Ethernet circuits terminating on them whilst the card rebooted. Diagnostics are not showing any issues following the event but we have raised it with the hardware vendor's TAC for further investigation. Apologies for the disruption this may have caused.
Resolved [27/10/2022 09:25]
Resolved [22/09/2022 16:12]
Resolved [05/08/2022 11:01]
Resolved [05/06/2022 23:00]
The carrier has corrected their issue. Normal sevice has been witnessed since.
Resolved [22/05/2022 07:35]
The issue has now been resolved. Full details will be supplied via the control panel incident for the impacted circuits.
Resolved [14/04/2022 11:03]
Resolved [14/03/2022 10:21]
The issue remains with our NOC team and Cisco.
Resolved [14/03/2022 10:21]
Resolved [15/02/2022 23:09]
Issue resolved.
Resolved [01/02/2022 17:29]
We are now able able to access TalkTalk services without issue. Apologies for the disruption this may have caused.
Resolved [29/01/2022 14:47]
We are now seeing service restored to the remaining NNI. The majority of associated circuits are showing as up, however if you have any issues please reboot the NTU and any associated supplyed routers before raising a fault. We apologise for this prolonged and unexpected outage today.
Resolved [16/12/2021 23:38]
Virgin have identified and corrrected a fibre break as of 23:10. This was part of a wider major service outage. Service has now been restored to the impacted circuits. Full details of the issue have been relayed as part of the incident available within the control panel. We apologise for the disruption caused.
Resolved [18/09/2021 19:40]
Virgin have confirmed the issue was an faulty attenuator at Telehouse West. That was replaced to resolve the fault. Circuits restored at approximately 13:08.
Resolved [02/07/2021 13:44]
Whilst investigating a degraded performance issue on a dark fibre at our LD8 PoP, a third party engineer inadvertently disconnected another dark fibre that connects LD8 to a third location. This subsequently resulted in LD8 becoming isolated from the rest of the network for a short period, between 00:06:02 and 00:09:56.
As previously reported, during this time leased line circuits terminating at LD8 would have experienced a loss of connectivity. Broadband circuits were impacted further due to a large number of subscriber sessions that were terminating at LD8 disconnecting.
Whilst the majority of the affected broadband subscribers regained a session at another PoP relatively quickly, others whose sessions were steered to a particular aggregation router on the network failed to start. Our engineers investigated and discovered that the router was experiencing a fault condition and took it out of service. At this point the vast majority of remaining subscribers re-gained their sessions.
Apologies for the disruption this may have caused.
Resolved [14/06/2021 10:00]
At 12:32:23 on 13/06/21 a supervisor in a core switch at our THW PoP experienced an inexplicable reboot. Shortly afterwards at 12:32:40 a hot standby supervisor took over the active role and restored the overall connectivity to the PoP.
The original active supervisor that rebooted was back in service as a hot standby by 12:41:52. By 12:54:47 it had brought all its line cards online following a full and successful diagnostics run. All connectivity was restored to the site by this point.
Non-resilient leased line circuits that terminate on NNIs directly connected to the rebooted supervisor would have experienced an outage between 12:32:23 and 12:54:47.
All other non-resilient leased line circuits as well as any broadband circuits that were terminating at THW would have seen a loss of connectivity between 12:32:23 and 12:32:40.
We have raised this to the vendor's TAC for further investigation. The device is currently stable and not showing any signs of issues. As such we do not deem the site to be at further risk at this time.
Apologies for the disruption this may have caused.
Resolved [18/05/2021 17:01]
The carrier has resolved the issue and the majority of affected circuits are online. A power cycle of the router may be required to force a reconnection.
Resolved [10/05/2021 12:58]
Resolved [28/06/2021 11:14]
Resolved [17/02/2021 17:46]
CityFibre have confirmed that all affected services have been restored and a full investigation is underway. We apologise for those customers affected by this issue.
Resolved [02/03/2021 12:23]
Resolved [18/12/2020 16:21]
We are now receiving responses from the various affected systems. Confirmation of a resolution hasn't been announced by Openreach, so services should be considered at risk.
Resolved [15/12/2020 13:21]
The issue has been resolved. Control panel users can see further details - https://control.interdns.co.uk/notification.aspx?id=13739839
Resolved [02/12/2020 09:39]
This issue has now been resolved and diagnostics are working again.
Resolved [12/01/2021 15:03]
Resolved [30/10/2020 06:33]
Service was resumed at approximately 02:15. We apologise for this unexpected outage.
Resolved [14/10/2020 11:15]
Apologies for the session drops this morning. The cause was linked to additional interconnects being patched into one of our London POPs. This caused an issue with one of our broadband LNS which dropped sessions, only for them to be able to reconnect. It would have impacted any circuits routed via that LNS across TalkTalk and BT Wholesale.
This was unexpected behaviour and should not have occurred. We will continue to monitor and will raise this with the manufacturer as a suspected bug.
Resolved [26/09/2020 08:18]
Fault was tracked down to a power failure within a Virgin Media rack. All circuits are operational.
Resolved [15/09/2020 16:20]
We have seen near-normal levels of sessions restore through the afternoon. Anyone unable to reconnect should be able to do so with a power cycle. If this doesn't address it try powering down for an hour and reconnect. Failure to connect still may require assistance from our support team.
We will not terminate sessions to force a reconnection back to Telehouse North, they will naturally spread out as sessions drop of their own accord.
We are reviewing this outage internally, but ultimately the cause lay with the carrier.
Resolved [31/08/2020 22:47]
The root cause was TalkTalk maintained hardware failure affecting 1 Ethernet NNI of ours and 6000 other B2B clients. TalkTalk fault incident resolution states:
<-- snip -->
NOC monitoring identified an FPC10 (Flexible PIC Concentrator) failure at NGE001.LOH. This caused a total loss of service to approx. 6k B2B circuits from approx. 12:47 (31/08). The Core Network Ops team were engaged and their investigations found that the FPC10 had failed and could not be restored remotely. To restore service as of approx. 17:23 a field engineer attended site and replaced the faulty FTP10 with support from the core network ops team. This incident will now be closed with any further root cause analysis being completed via the problem management process.
<-- snip -->
Apologies for the disruption caused this afternoon.
Resolved [30/08/2020 21:25]
The issue was resolved around 16:10. CenturyLink responded via Twitter to say:
<-- snip -->
We are able to confirm that all services impacted by today’s IP outage have been restored. We understand how important these services are to our customers, and we sincerely apologize for the impact this outage caused.
<-- snip -->
Although we and their other global customers withdrew routes and shut down peering sessions, they continued to announce them to their peers regardless. This caused black holing of any inbound traffic routed via CenturyLink. All affected customers were left powerless and it has been a case of having to wait for them resolve the issue.
Thankfully less than 10% of our overall traffic routes in via CenturyLink's network, so the impact was minimal. We know of only a small handful of destinations that were unreachable during their outage. Apologies if your access was disrupted.
Resolved [04/10/2020 13:45]
Resolved [23/07/2020 17:25]
Following a small fire at one of our Newcastle Upon Tyne exchanges earlier today, Openreach have now restored power to all services. All Broadband and Ethernet services should now be up and working.
Resolved [04/07/2020 21:50]
Resolved [13/06/2020 06:33]
Resolved [22/04/2020 15:07]
Resolved [13/06/2020 06:34]
Resolved [19/02/2020 13:10]
The issue was related to LINX (the London Internet Exchange), which has now been resolved and would have potentially affected several Internet providers in the UK. We are awaiting a full RFO from them to confirm the cause.
Resolved [08/10/2019 15:26]
The cause has been located and service has now stablised. If a connection hasn't returned please power cycle the router to force a reconnection attempt. Apologies for the disruption witnessed.
Completed [30/08/2019 00:10]
The upgrade was successful and cleared the fault condition as suspected. We have been monitoring for the past hour and have not seen any further instability.
Resolved [29/08/2019 17:42]
We are seeing services restored now. If any connections remain offline please reboot the routers. The root cause is under investigation.
Resolved [30/05/2019 12:29]
The fault was resolved with all circuits restored by 13:35.
Full notes from Virgin Media Business surrounding the handling of this fault can be found here:
https://cdn.interdns.co.uk/downloads/support-downloads/RFO_Virgin_29_05_2019.pdf
We apologise again for the prolonged outage which affected working hours.
Resolved [17/04/2019 17:19]
BT have confirmed that a line card needed to be reloaded in order to resolve the issue, we consider services to no longer be at risk.
Please let support@icuk.net know if you have any further concerns.
Resolved [05/04/2019 13:02]
The affected circuits appear to have been restored. We have received no further communication from Virgin, so please consider service to be at risk.
Resolved [25/02/2019 12:23]
TalkTalk's systems appear to be operational again. However, please consider them to be at risk as we have not received any communication from them to confirm that everything is back to normal.
Resolved [18/01/2019 10:11]
The issue with routing has been resolved, we apologise for any inconvenience caused this morning.
If you continue to have any problems please contact the support desk with specific examples.
Completed [14/12/2018 01:31]
The maintenance is now complete.
Resolved [15/11/2018 17:33]
Resolved [16/10/2018 08:59]
Virgin have supplied the following reason for outage:
"In relation to the issue identified in the London area regarding loss of service. This issue was fully restored at 20:43 yesterday evening when a faulty DC output breaker was discovered at our Hayes hubsite and services were moved away from it onto a different output breaker. All services have been stable since that time."
Resolved [09/10/2018 12:05]
All services are back working now.
Resolved [02/10/2018 09:47]
Resolved [25/09/2018 12:11]
This issue is fully resolved now.