Extractor for PCAP Files¶
pcapkit.foundation.extraction contains
Extractor only,
which synthesises file I/O and protocol analysis,
coordinates information exchange in all network layers,
extracts parametres from a PCAP file.
- class pcapkit.foundation.extraction.Extractor(fin=None, fout=None, format=None, auto=True, extension=True, store=True, files=False, nofile=False, verbose=False, engine=None, layer=None, protocol=None, ip=False, ipv4=False, ipv6=False, tcp=False, strict=True, trace=False, trace_fout=None, trace_format=None, trace_byteorder='little', trace_nanosecond=False)[source]¶
Bases:
objectExtractor for PCAP files.
Notes
For supported engines, please refer to
run().- Parameters
fin (Optional[str]) – file name to be read; if file not exist, raise
FileNotFoundfout (Optional[str]) – file name to be written
format (Optional[Formats]) – file format of output
auto (bool) – if automatically run till EOF
extension (bool) – if check and append extensions to output file
store (bool) – if store extracted packet info
files (bool) – if split each frame into different files
nofile (bool) – if no output file is to be dumped
verbose (bool | VerboseHandler) – a
boolvalue or a function takes theExtractorinstance and current parsed frame (depends on engine selected) as parameters to print verbose output informationengine (Optional[Engines]) – extraction engine to be used
layer (Optional[Layers]) – extract til which layer
protocol (Optional[Protocols]) – extract til which protocol
ip (bool) – if record data for IPv4 & IPv6 reassembly
ipv4 (bool) – if perform IPv4 reassembly
ipv6 (bool) – if perform IPv6 reassembly
tcp (bool) – if perform TCP reassembly
strict (bool) – if set strict flag for reassembly
trace (bool) – if trace TCP traffic flows
trace_fout (Optional[str]) – path name for flow tracer if necessary
trace_format (Optional[Formats]) – output file format of flow tracer
trace_byteorder (Literal["big", "little"]) – output file byte order
trace_nanosecond (bool) – output nanosecond-resolution file flag
- __init__(fin=None, fout=None, format=None, auto=True, extension=True, store=True, files=False, nofile=False, verbose=False, engine=None, layer=None, protocol=None, ip=False, ipv4=False, ipv6=False, tcp=False, strict=True, trace=False, trace_fout=None, trace_format=None, trace_byteorder='little', trace_nanosecond=False)[source]¶
Initialise PCAP Reader.
- Parameters
fin (Optional[str]) – file name to be read; if file not exist, raise
FileNotFoundfout (Optional[str]) – file name to be written
format (Optional[Formats]) – file format of output
auto (bool) – if automatically run till EOF
extension (bool) – if check and append extensions to output file
store (bool) – if store extracted packet info
files (bool) – if split each frame into different files
nofile (bool) – if no output file is to be dumped
verbose (bool | VerboseHandler) – a
boolvalue or a function takes theExtractorinstance and current parsed frame (depends on engine selected) as parameters to print verbose output informationengine (Optional[Engines]) – extraction engine to be used
layer (Optional[Layers]) – extract til which layer
protocol (Optional[Protocols]) – extract til which protocol
ip (bool) – if record data for IPv4 & IPv6 reassembly
ipv4 (bool) – if perform IPv4 reassembly
ipv6 (bool) – if perform IPv6 reassembly
tcp (bool) – if perform TCP reassembly
strict (bool) – if set strict flag for reassembly
trace (bool) – if trace TCP traffic flows
trace_fout (Optional[str]) – path name for flow tracer if necessary
trace_format (Optional[Formats]) – output file format of flow tracer
trace_byteorder (Literal["big", "little"]) – output file byte order
trace_nanosecond (bool) – output nanosecond-resolution file flag
- Warns
FormatWarning – Warns under following circumstances:
If using PCAP output for TCP flow tracing while the extraction engine is PyShark.
If output file format is not supported.
- Return type
None
- __iter__()[source]¶
Iterate and parse PCAP frame.
- Raises
IterableError – If
self._flag_aisTrue, as such operation is not applicable.- Return type
- __next__()[source]¶
Iterate and parse next PCAP frame.
It will call
_read_frame()to parse next PCAP frame internally, until the EOF reached; then it calls_cleanup()for the aftermath.- Return type
Frame | ScapyPacket | DPKTPacket
- __call__()[source]¶
Works as a simple wrapper for the iteration protocol.
- Raises
IterableError – If
self._flag_aisTrue, as iteration is not applicable.- Return type
Frame | ScapyPacket | DPKTPacket
- property info: VersionInfo¶
Version of input PCAP file.
- Raises
UnsupportedCall – If
self._exengis'scapy'or'pyshark', as such engines does not reserve such information.- Return type
VersionInfo
- property format: Formats¶
Format of output file.
- Raises
UnsupportedCall – If
self._flag_qis set asTrue, as output is disabled by initialisation parameter.- Return type
Formats
- property output: str¶
Name of output file.
- Raises
UnsupportedCall – If
self._flag_qis set asTrue, as output is disabled by initialisation parameter.- Return type
- property frame: tuple[Frame, ...]¶
Extracted frames.
- Raises
UnsupportedCall – If
self._flag_disTrue, as storing frame data is disabled.
- property reassembly: ReassemblyData¶
Frame record for reassembly.
ipv4– tuple of TCP payload fragment (ipv4.datagram)ipv6– tuple of TCP payload fragment (ipv6.datagram)tcp– tuple of TCP payload fragment (tcp.datagram)
- Return type
- property trace: tuple[Index, ...]¶
Index table for traced flow.
- Raises
UnsupportedCall – If
self._flag_tisTrue, as TCP flow tracing is disabled.- Return type
- property engine: Engines¶
PCAP extraction engine.
- Return type
Engines
- run()[source]¶
Start extraction.
We uses
import_test()to check if a certain engine is available or not. For supported engines, each engine has different driver method:Default drivers:
Global header:
record_header()Packet frames:
record_frames()
DPKT driver:
_run_dpkt()Scapy driver:
_run_scapy()PyShark driver:
_run_pyshark()
- Warns
EngineWarning – If the extraction engine is not available. This is either due to dependency not installed, or supplied engine unknown.
- Return type
- record_header()[source]¶
Read global header.
The method will parse the PCAP global header and save the parsed result as
self._gbhdr. Information such as PCAP version, data link layer protocol type, nanosecond flag and byteorder will also be save the currentExtractorinstance.If TCP flow tracing is enabled, the nanosecond flag and byteorder will be used for the output PCAP file of the traced TCP flows.
For output, the method will dump the parsed PCAP global header under the name of
Global Header.- Return type
- record_frames()[source]¶
Read packet frames.
The method calls
_read_frame()to parse each frame from the input PCAP file; and calls_cleanup()upon complision.Notes
Under non-auto mode, i.e.
self._flag_aisFalse, the method performs no action.- Return type
- classmethod register(format, module, class_, ext)[source]¶
Register a new dumper class.
Notes
The full qualified class name of the new dumper class should be as
{module}.{class_}.
- classmethod make_name(fin='in.pcap', fout='out', fmt='tree', extension=True, *, files=False, nofile=False)[source]¶
Generate input and output filenames.
The method will perform following processing:
sanitise
finas the input PCAP filename;in.pcapas default value and append.pcapextension if needed andextensionisTrue; as well as test if the file exists;if
nofileisTrue, skips following processing;if
fmtprovided, then it presumes corresponding output file extension;if
foutnot provided, it presumes the output file name based on the presumptive file extension; the stem of the output file name is set asout; should the file extension is not available, then it raisesFormatError;if
foutprovided, it presumes corresponding output format if needed; should the presumption cannot be made, then it raisesFormatError;it will also append corresponding file extension to the output file name if needed and
extensionisTrue.
- Parameters
fin (str) – Input filename.
fout (str) – Output filename.
fmt (Formats) – Output file format.
extension (bool) – If append
.pcapfile extension to the input filename iffindoes not have such file extension; if check and append extensions to output file.files (bool) – If split each frame into different files.
nofile (bool) – If no output file is to be dumped.
- Returns
input filename
output filename / directory name
output format
output file extension (without
.)if split each frame into different files
- Return type
Generated input and output filenames
- Raises
FileNotFound – If input file does not exists.
FormatError – If output format not provided and cannot be presumpted.
- _read_frame()[source]¶
Headquarters for frame reader.
This method is a dispatcher for parsing frames.
For Scapy engine, calls
_scapy_read_frame().For DPKT engine, calls
_dpkt_read_frame().For PyShark engine, calls
_pyshark_read_frame().For default (PyPCAPKit) engine, calls
_default_read_frame().
- Return type
Frame | ScapyPacket | DPKTPacket
- Returns
The parsed frame instance.
- _cleanup()[source]¶
Cleanup after extraction & analysis.
The method clears the
self._expkgandself._extmpattributes, setsself._flag_easTrueand closes the input file.- Return type
- _default_read_frame()[source]¶
Read frames with default engine.
This method performs following operations:
extract frames and each layer of packets;
make
Infoobject out of frame properties;write to output file with corresponding dumper;
reassemble IP and/or TCP datagram;
trace TCP flows if any;
record frame
Infoobject to frame storage.
- Return type
- Returns
Parsed frame instance.
- _run_scapy(scapy_all)[source]¶
Call
scapy.all.sniff()to extract PCAP files.This method assigns
self._expkgasscapy.allandself._extmpas an iterator fromscapy.all.sniff().- Parameters
scapy_all (
module) – Thescapy.allmodule.- Warns
AttributeWarning – If
self._exlyrand/orself._exptlis provided as the Scapy engine currently does not support such operations.- Return type
- _scapy_read_frame()[source]¶
Read frames with Scapy engine.
- Return type
ScapyPacket
- Returns
Parsed frame instance.
See also
Please refer to
_default_read_frame()for more operational information.
- _run_dpkt(dpkt)[source]¶
Call
dpkt.pcap.Readerto extract PCAP files.This method assigns
self._expkgasdpktandself._extmpas an iterator fromdpkt.pcap.Reader.- Parameters
dpkt (
module) – Thedpktmodule.- Warns
AttributeWarning – If
self._exlyrand/orself._exptlis provided as the DPKT engine currently does not support such operations.- Return type
- _dpkt_read_frame()[source]¶
Read frames with DPKT engine.
- Returns
Parsed frame instance.
- Return type
See also
Please refer to
_default_read_frame()for more operational information.
- _run_pyshark(pyshark)[source]¶
Call
pyshark.FileCaptureto extract PCAP files.This method assigns
self._expkgaspysharkandself._extmpas an iterator frompyshark.FileCapture.- Parameters
pyshark (types.ModuleType) – The
pysharkmodule.- Warns
AttributeWarning – Warns under following circumstances:
if
self._exlyrand/orself._exptlis provided as the PyShark engine currently does not support such operations.if reassembly is enabled, as the PyShark engine currently does not support such operation.
- Return type
- _pyshark_read_frame()[source]¶
Read frames with PyShark engine.
- Return type
PySharkPacket
- Returns
Parsed frame instance.
See also
Please refer to
_default_read_frame()for more operational information.
- _exptl: Protocols¶
Extract til protocol.
- _exlyr: Layers¶
Extract til layer.
- _exeng: Engines¶
Extract using engine.
- _expkg: Any¶
Extract module instance.
- _extmp: Any¶
Extract iterator instance.