Extractor for PCAP Files¶
pcapkit.foundation.extraction
contains
Extractor
only,
which synthesises file I/O and protocol analysis,
coordinates information exchange in all network layers,
extracts parametres from a PCAP file.
- class pcapkit.foundation.extraction.Extractor(fin=None, fout=None, format=None, auto=True, extension=True, store=True, files=False, nofile=False, verbose=False, engine=None, layer=None, protocol=None, ip=False, ipv4=False, ipv6=False, tcp=False, strict=True, trace=False, trace_fout=None, trace_format=None, trace_byteorder='little', trace_nanosecond=False)[source]¶
Bases:
object
Extractor for PCAP files.
Notes
For supported engines, please refer to
run()
.- Parameters
fin (Optional[str]) – file name to be read; if file not exist, raise
FileNotFound
fout (Optional[str]) – file name to be written
format (Optional[Formats]) – file format of output
auto (bool) – if automatically run till EOF
extension (bool) – if check and append extensions to output file
store (bool) – if store extracted packet info
files (bool) – if split each frame into different files
nofile (bool) – if no output file is to be dumped
verbose (bool | VerboseHandler) – a
bool
value or a function takes theExtractor
instance and current parsed frame (depends on engine selected) as parameters to print verbose output informationengine (Optional[Engines]) – extraction engine to be used
layer (Optional[Layers]) – extract til which layer
protocol (Optional[Protocols]) – extract til which protocol
ip (bool) – if record data for IPv4 & IPv6 reassembly
ipv4 (bool) – if perform IPv4 reassembly
ipv6 (bool) – if perform IPv6 reassembly
tcp (bool) – if perform TCP reassembly
strict (bool) – if set strict flag for reassembly
trace (bool) – if trace TCP traffic flows
trace_fout (Optional[str]) – path name for flow tracer if necessary
trace_format (Optional[Formats]) – output file format of flow tracer
trace_byteorder (Literal["big", "little"]) – output file byte order
trace_nanosecond (bool) – output nanosecond-resolution file flag
- __init__(fin=None, fout=None, format=None, auto=True, extension=True, store=True, files=False, nofile=False, verbose=False, engine=None, layer=None, protocol=None, ip=False, ipv4=False, ipv6=False, tcp=False, strict=True, trace=False, trace_fout=None, trace_format=None, trace_byteorder='little', trace_nanosecond=False)[source]¶
Initialise PCAP Reader.
- Parameters
fin (Optional[str]) – file name to be read; if file not exist, raise
FileNotFound
fout (Optional[str]) – file name to be written
format (Optional[Formats]) – file format of output
auto (bool) – if automatically run till EOF
extension (bool) – if check and append extensions to output file
store (bool) – if store extracted packet info
files (bool) – if split each frame into different files
nofile (bool) – if no output file is to be dumped
verbose (bool | VerboseHandler) – a
bool
value or a function takes theExtractor
instance and current parsed frame (depends on engine selected) as parameters to print verbose output informationengine (Optional[Engines]) – extraction engine to be used
layer (Optional[Layers]) – extract til which layer
protocol (Optional[Protocols]) – extract til which protocol
ip (bool) – if record data for IPv4 & IPv6 reassembly
ipv4 (bool) – if perform IPv4 reassembly
ipv6 (bool) – if perform IPv6 reassembly
tcp (bool) – if perform TCP reassembly
strict (bool) – if set strict flag for reassembly
trace (bool) – if trace TCP traffic flows
trace_fout (Optional[str]) – path name for flow tracer if necessary
trace_format (Optional[Formats]) – output file format of flow tracer
trace_byteorder (Literal["big", "little"]) – output file byte order
trace_nanosecond (bool) – output nanosecond-resolution file flag
- Warns
FormatWarning – Warns under following circumstances:
If using PCAP output for TCP flow tracing while the extraction engine is PyShark.
If output file format is not supported.
- Return type
None
- __iter__()[source]¶
Iterate and parse PCAP frame.
- Raises
IterableError – If
self._flag_a
isTrue
, as such operation is not applicable.- Return type
- __next__()[source]¶
Iterate and parse next PCAP frame.
It will call
_read_frame()
to parse next PCAP frame internally, until the EOF reached; then it calls_cleanup()
for the aftermath.- Return type
Frame | ScapyPacket | DPKTPacket
- __call__()[source]¶
Works as a simple wrapper for the iteration protocol.
- Raises
IterableError – If
self._flag_a
isTrue
, as iteration is not applicable.- Return type
Frame | ScapyPacket | DPKTPacket
- property info: VersionInfo¶
Version of input PCAP file.
- Raises
UnsupportedCall – If
self._exeng
is'scapy'
or'pyshark'
, as such engines does not reserve such information.- Return type
VersionInfo
- property format: Formats¶
Format of output file.
- Raises
UnsupportedCall – If
self._flag_q
is set asTrue
, as output is disabled by initialisation parameter.- Return type
Formats
- property output: str¶
Name of output file.
- Raises
UnsupportedCall – If
self._flag_q
is set asTrue
, as output is disabled by initialisation parameter.- Return type
- property frame: tuple[Frame, ...]¶
Extracted frames.
- Raises
UnsupportedCall – If
self._flag_d
isTrue
, as storing frame data is disabled.
- property reassembly: ReassemblyData¶
Frame record for reassembly.
ipv4
– tuple of TCP payload fragment (ipv4.datagram)ipv6
– tuple of TCP payload fragment (ipv6.datagram)tcp
– tuple of TCP payload fragment (tcp.datagram)
- Return type
- property trace: tuple[Index, ...]¶
Index table for traced flow.
- Raises
UnsupportedCall – If
self._flag_t
isTrue
, as TCP flow tracing is disabled.- Return type
- property engine: Engines¶
PCAP extraction engine.
- Return type
Engines
- run()[source]¶
Start extraction.
We uses
import_test()
to check if a certain engine is available or not. For supported engines, each engine has different driver method:Default drivers:
Global header:
record_header()
Packet frames:
record_frames()
DPKT driver:
_run_dpkt()
Scapy driver:
_run_scapy()
PyShark driver:
_run_pyshark()
- Warns
EngineWarning – If the extraction engine is not available. This is either due to dependency not installed, or supplied engine unknown.
- Return type
- record_header()[source]¶
Read global header.
The method will parse the PCAP global header and save the parsed result as
self._gbhdr
. Information such as PCAP version, data link layer protocol type, nanosecond flag and byteorder will also be save the currentExtractor
instance.If TCP flow tracing is enabled, the nanosecond flag and byteorder will be used for the output PCAP file of the traced TCP flows.
For output, the method will dump the parsed PCAP global header under the name of
Global Header
.- Return type
- record_frames()[source]¶
Read packet frames.
The method calls
_read_frame()
to parse each frame from the input PCAP file; and calls_cleanup()
upon complision.Notes
Under non-auto mode, i.e.
self._flag_a
isFalse
, the method performs no action.- Return type
- classmethod register(format, module, class_, ext)[source]¶
Register a new dumper class.
Notes
The full qualified class name of the new dumper class should be as
{module}.{class_}
.
- classmethod make_name(fin='in.pcap', fout='out', fmt='tree', extension=True, *, files=False, nofile=False)[source]¶
Generate input and output filenames.
The method will perform following processing:
sanitise
fin
as the input PCAP filename;in.pcap
as default value and append.pcap
extension if needed andextension
isTrue
; as well as test if the file exists;if
nofile
isTrue
, skips following processing;if
fmt
provided, then it presumes corresponding output file extension;if
fout
not provided, it presumes the output file name based on the presumptive file extension; the stem of the output file name is set asout
; should the file extension is not available, then it raisesFormatError
;if
fout
provided, it presumes corresponding output format if needed; should the presumption cannot be made, then it raisesFormatError
;it will also append corresponding file extension to the output file name if needed and
extension
isTrue
.
- Parameters
fin (str) – Input filename.
fout (str) – Output filename.
fmt (Formats) – Output file format.
extension (bool) – If append
.pcap
file extension to the input filename iffin
does not have such file extension; if check and append extensions to output file.files (bool) – If split each frame into different files.
nofile (bool) – If no output file is to be dumped.
- Returns
input filename
output filename / directory name
output format
output file extension (without
.
)if split each frame into different files
- Return type
Generated input and output filenames
- Raises
FileNotFound – If input file does not exists.
FormatError – If output format not provided and cannot be presumpted.
- _read_frame()[source]¶
Headquarters for frame reader.
This method is a dispatcher for parsing frames.
For Scapy engine, calls
_scapy_read_frame()
.For DPKT engine, calls
_dpkt_read_frame()
.For PyShark engine, calls
_pyshark_read_frame()
.For default (PyPCAPKit) engine, calls
_default_read_frame()
.
- Return type
Frame | ScapyPacket | DPKTPacket
- Returns
The parsed frame instance.
- _cleanup()[source]¶
Cleanup after extraction & analysis.
The method clears the
self._expkg
andself._extmp
attributes, setsself._flag_e
asTrue
and closes the input file.- Return type
- _default_read_frame()[source]¶
Read frames with default engine.
This method performs following operations:
extract frames and each layer of packets;
make
Info
object out of frame properties;write to output file with corresponding dumper;
reassemble IP and/or TCP datagram;
trace TCP flows if any;
record frame
Info
object to frame storage.
- Return type
- Returns
Parsed frame instance.
- _run_scapy(scapy_all)[source]¶
Call
scapy.all.sniff()
to extract PCAP files.This method assigns
self._expkg
asscapy.all
andself._extmp
as an iterator fromscapy.all.sniff()
.- Parameters
scapy_all (
module
) – Thescapy.all
module.- Warns
AttributeWarning – If
self._exlyr
and/orself._exptl
is provided as the Scapy engine currently does not support such operations.- Return type
- _scapy_read_frame()[source]¶
Read frames with Scapy engine.
- Return type
ScapyPacket
- Returns
Parsed frame instance.
See also
Please refer to
_default_read_frame()
for more operational information.
- _run_dpkt(dpkt)[source]¶
Call
dpkt.pcap.Reader
to extract PCAP files.This method assigns
self._expkg
asdpkt
andself._extmp
as an iterator fromdpkt.pcap.Reader
.- Parameters
dpkt (
module
) – Thedpkt
module.- Warns
AttributeWarning – If
self._exlyr
and/orself._exptl
is provided as the DPKT engine currently does not support such operations.- Return type
- _dpkt_read_frame()[source]¶
Read frames with DPKT engine.
- Returns
Parsed frame instance.
- Return type
See also
Please refer to
_default_read_frame()
for more operational information.
- _run_pyshark(pyshark)[source]¶
Call
pyshark.FileCapture
to extract PCAP files.This method assigns
self._expkg
aspyshark
andself._extmp
as an iterator frompyshark.FileCapture
.- Parameters
pyshark (types.ModuleType) – The
pyshark
module.- Warns
AttributeWarning – Warns under following circumstances:
if
self._exlyr
and/orself._exptl
is provided as the PyShark engine currently does not support such operations.if reassembly is enabled, as the PyShark engine currently does not support such operation.
- Return type
- _pyshark_read_frame()[source]¶
Read frames with PyShark engine.
- Return type
PySharkPacket
- Returns
Parsed frame instance.
See also
Please refer to
_default_read_frame()
for more operational information.
- _exptl: Protocols¶
Extract til protocol.
- _exlyr: Layers¶
Extract til layer.
- _exeng: Engines¶
Extract using engine.
- _expkg: Any¶
Extract module instance.
- _extmp: Any¶
Extract iterator instance.