Please enable JavaScript.
Coggle requires JavaScript to display documents.
Module 12: Network Troubleshooting, image, image, image, image, image -…
Module 12: Network Troubleshooting
12.1. Network Documentation
12.1.1 Documentation Overview
Documentation
is the starting point for effective troubleshooting; it provides a
known baseline
of a functional network.
Up-to-date documentation reduces the time required to isolate and resolve network problems.
12.1.2 Network Topology Diagrams
Topology diagrams
show the physical and logical layout of the network, including device connections, IP addressing schemes, and redundancy paths.
These diagrams are essential for understanding the
flow of traffic
and identifying where a fault may lie.
12.1.3 Network Device Documentation
Includes detailed information for each device, such as the
IOS version
,
hardware model
,
feature sets
, and the
configuration file
(both running and startup).
This allows engineers to verify configurations against the intended design.
12.1.4 Establish a Network Baseline
A
baseline
is a snapshot of the network's normal, operational status (performance, CPU usage, bandwidth utilization).
It provides a
reference point
against which current, problematic performance can be compared.
12.1.5 Step 1 - Determine What Types of Data to Collect
Data collection should focus on metrics that impact performance, such as
interface utilization
,
CPU load
,
memory usage
, and
error counts
(e.g., CRC errors).
The data chosen must be relevant to the expected operation and potential failure points.
12.1.6 Step 2 - Identify Devices and Ports of Interest
Data should be collected on
critical devices
(Core/Distribution switches, high-utilization routers) and on
key interfaces
(WAN links, server connections, inter-switch trunks).
12.1.7 Step 3 - Determine the Baseline Duration
Baselines should be collected over a period long enough (e.g.,
7 days to 30 days
) to capture peak usage times, slow periods, and regular fluctuations.
This duration ensures the baseline is statistically representative of normal network behavior.
12.1.8 Data Measurement
Data is measured by periodically collecting key metrics using tools like
SNMP (Simple Network Management Protocol)
.
This raw data is then trended and analyzed to establish the normal operational limits and detect deviations.
12.2. Troubleshooting Process
12.2.1 General Troubleshooting Procedures
Effective troubleshooting requires a
systematic, methodological approach
rather than random guesswork.
Procedures should be
consistent
to ensure all necessary steps are taken and to quickly isolate the root cause of the problem.
12.2.2 Seven-Step Troubleshooting Process
This comprehensive process starts with
defining the problem
and ends with
documenting the solution
.
Key steps involve gathering information, analyzing it, creating a probable cause, testing solutions, and verifying full functionality.
12.2.3 Question End Users
The first step in gathering information is to
interview the user
to obtain details about the problem, including the
symptoms
, when it started, and any recent changes.
User input helps define the
scope and nature
of the problem accurately.
12.2.4 Gather Information
Information gathering involves using commands (
'show'
,
'ping'
,
'traceroute'
) and tools to collect data on the affected devices and links.
Comparing this data to the established
network baseline
helps pinpoint deviations.
12.2.5 Troubleshooting with Layered Models
The
OSI
or
TCP/IP model
provides a framework for troubleshooting, allowing engineers to work systematically from one layer to the next.
The most common approach is
bottom-up
(starting at Layer 1), but can also be top-down or dividing the problem in the middle.
12.2.6 Structured Troubleshooting Methods
Three main methods exist:
Bottom-Up
(start at Layer 1, move up),
Top-Down
(start at Application Layer, move down), and
Divide-and-Conquer
(start at an intermediate layer like Network Layer).
The best method depends on the nature of the problem and the initial symptoms.
12.2.7 Guidelines for Selecting a Troubleshooting Method
If the problem is
physical
(cabling, power), use
Bottom-Up
. If the problem is
application-related
, use
Top-Down
.
If you have a good hunch where the fault lies (e.g., in a router), use the
Divide-and-Conquer
method.
12.3. Troubleshooting Tools
12.3.1 Software Troubleshooting Tools
Includes network monitoring software (
NMS
), commands like
ping, traceroute
, and device-specific diagnostic commands like
'show interfaces'
.
These tools provide
real-time data
on connectivity, latency, path issues, and device health.
12.3.2 Protocol Analyzers
Protocol Analyzers
(or sniffers, e.g., Wireshark) capture and decode network traffic, allowing deep inspection of
packet headers and payloads
.
They are essential for identifying problems with protocol communication, application errors, and security issues.
12.3.3 Hardware Troubleshooting Tools
Includes physical tools like
digital multimeters
(to test cable continuity and electrical characteristics) and
cable testers
(to check cable mapping and length).
These are used primarily for troubleshooting issues at the
Physical Layer (Layer 1)
.
12.3.4 Syslog Server as a Troubleshooting Tool
A centralized
Syslog Server
collects and archives log messages (events, errors, status changes) from all network devices.
It is critical for
historical analysis
and correlating events across multiple devices to determine a sequence of failures.
12.4. Symptoms and Causes of Network Problems
12.4.1 Physical Layer Troubleshooting
Symptoms:
No connectivity, link lights are off, high collision/error rates.
Causes:
Bad or disconnected cables, wrong cable type (e.g., straight-through instead of crossover), incorrect interface speed/duplex settings, or a malfunctioning NIC/interface.
12.4.2 Data Link Layer Troubleshooting
Symptoms:
High traffic utilization (broadcast storms), MAC address table issues, or device is up but cannot communicate on the local segment.
Causes:
STP (Spanning Tree Protocol) loops, mismatch in encapsulation protocols (e.g., HDLC vs. PPP), or incorrect VLAN assignments.
12.4.3 Network Layer Troubleshooting
Symptoms:
Devices on different subnets cannot communicate, but local communication works, or traffic follows a suboptimal path.
Causes:
Incorrect IP address or subnet mask, missing or wrong
static route
, incorrect routing protocol configuration, or ACLs blocking traffic.
12.4.4 Transport Layer Troubleshooting - ACLs
Symptoms:
An application (e.g., web browser, email client) can't establish a session, even though Layer 3 connectivity (ping) works.
Causes:
An
ACL
(Access Control List) is incorrectly configured to
deny TCP/UDP traffic
on a specific port or protocol, preventing application flow.
12.4.5 Transport Layer Troubleshooting - NAT for IPv4
Symptoms:
Internal devices can access the Internet, but external devices cannot access an internal server, or external sessions fail unexpectedly.
Causes:
Missing or incorrect
Static NAT
entry,
PAT
overload pool exhaustion, or an ACL incorrectly filtering translated addresses.
12.4.6 Application Layer Troubleshooting
Symptoms:
Connectivity works (ping, trace), but a specific service (e.g., DNS resolution, HTTP access) fails.
Causes:
Service or application is
stopped/crashed
on the server, incorrect
DNS
or
DHCP
configuration, or incorrect client application settings.
12.5. Troubleshooting IP Connectivity
12.5.1 Components of Troubleshooting End-to-End Connectivity
Successful troubleshooting requires a systematic approach, starting from the
Physical Layer
and moving up, verifying addressing, routing, and services.
This ensures foundational issues are addressed before investigating complex protocol problems.
12.5.2 End-to-End Connectivity Problem Initiates Troubleshooting
A lack of communication between two points (e.g., a PC and a server) is the most common trigger for the troubleshooting process.
Initial steps like
ping
and
traceroute
quickly help to localize the failure point along the path.
12.5.3 Step 1 - Verify the Physical Layer
Check the most basic issues first: confirm
cable connections
are secure and the
link status
(lights) on the interfaces is up/up.
Use the
'show interfaces'
command to verify the interface is not administratively down.
12.5.4 Step 2 - Check for Duplex Mismatches
A
duplex mismatch
(one side set to full, the other to half) causes severe performance degradation, high error counts, and late collisions.
Verify that the
duplex and speed settings
match on both connected devices.
12.5.5 Step 3 - Verify Addressing on the Local Network
Use
'ipconfig'
on the PC or
'show ip interface brief'
on the router/switch to confirm the correct
IP address
,
subnet mask
, and
VLAN
assignment.
Ensure the local host can ping its
default gateway
.
12.5.6 Troubleshoot VLAN Assignment Example
If a device cannot communicate with others on the same subnet, verify that the device's port is correctly assigned to the
intended VLAN
.
An incorrect VLAN assignment acts as a Layer 2 boundary, preventing local communication.
12.5.7 Step 4 - Verify Default Gateway
The
default gateway
must be the correct IP address for the local subnet's router interface, and it must be reachable.
Without a correct gateway, the host cannot communicate with devices outside its local network.
12.5.8 Troubleshoot IPv6 Default Gateway Example
In IPv6, the default gateway can be determined via
SLAAC
or
DHCPv6
. Verify the host has learned the correct
Router Advertisement (RA)
and can reach the router's link-local address.
12.5.9 Step 5 - Verify Correct Path
Use
'traceroute'
to determine the exact path the packets are taking and where the communication
fails to progress
.
This quickly identifies issues with routing protocol convergence or a broken link further down the path.
12.5.10 Step 6 - Verify the Transport Layer
If the ping works but the application fails, the problem may be at Layer 4 (Transport). This often involves checking the
state of TCP connections
or verifying the correct
port numbers
are being used.
12.5.11 Step 7 - Verify ACLs
ACLs are common failure points. Use
'show access-lists'
and
'show running-config'
to check if a specific ACL is
incorrectly filtering
traffic based on source/destination IP or port number.
12.5.12 Step 8 - Verify DNS
If connectivity works via IP address but fails via hostname, the issue is
DNS resolution
.
Verify the host's
DNS server address
and confirm the DNS server is operational and reachable.