Base Crawler¶

pcapkit.vendor.default contains Vendor only, which is the base meta class for all vendor crawlers.

Vendor Crawler¶

class pcapkit.vendor.default.Vendor[source]¶

Bases: object

Default vendor generator.

Inherit this class with FLAG & LINK attributes, etc., to implement a new vendor generator.

Return type: Vendor

static __new__(cls)[source]¶

Subclassing checkpoint.

Raises: VendorNotImplemented – If cls is not a subclass of Vendor.
Return type: Vendor

__init__()[source]¶

Generate new constant files.

Return type: None

static wrap_comment(text)[source]¶

Wraps long-length text to shorter lines of comments.

Parameters: text (str) – Source text.
Return type: str
Returns: Wrapped comments.

safe_name(name)[source]¶

Convert enumeration name to enum.Enum friendly.

Parameters: name (str) – original enumeration name
Return type: str
Returns: Converted enumeration name.

rename(name, code, *, original=None)[source]¶

Rename duplicated fields.

Parameters

name (str) – Field name.
code (str) – Field code.
original (Optional[str]) – Original field name (extracted from CSV records).

Return type

str

Returns

Revised field name.

Example

If name has multiple occurrences in the source registry, the field name will be sanitised as ${name}_${code}.

Otherwise, the plain name will be returned.

process(data)[source]¶

Process CSV data.

Parameters: data (list[str]) – CSV data.
Returns: Enumeration fields and missing fields.
Return type: tuple[list[str], list[str]]

count(data)[source]¶

Count field records.

Parameters: data (list[str]) – CSV data.
Returns: Field recordings.
Return type: Counter[str]

context(data)[source]¶

Generate constant context.

Parameters: data (list[str]) – CSV data.
Returns: Constant context.
Return type: str

request(text=None)[source]¶

Fetch CSV file.

Parameters: text (Optional[str]) – Context from LINK.
Returns: CSV data.
Return type: list[str]

_request()[source]¶

Fetch CSV data from LINK.

This is the low-level call of request().

If LINK is None, it will directly call the upper method request() with NO arguments.

The method will first try to GET the content of LINK. Should any exception raised, it will first try with proxy settings from get_proxies().

Note

Since some LINK links are from Wikipedia, etc., they might not be available in certain areas, e.g. the amazing PRC :)

Would proxies failed again, it will prompt for user intervention, i.e. it will use webbrowser.open() to open the page in browser for you, and you can manually load that page and save the HTML source at the location it provides.

Returns: CSV data.
Warns: VendorRequestWarning – If connection failed with and/or without proxies.
Return type: list[str]

Crawler Template¶

pcapkit.vendor.default.LINE(NAME, DOCS, FLAG, ENUM, MISS, MODL)¶

Default constant template of enumeration registry from IANA CSV.

Parameters

NAME (str) – name of the constant enumeration class
DOCS (str) – docstring for the constant enumeration class
FLAG (str) – threshold value validator (range of valid values)
ENUM (str) – enumeration data (class attributes)
MISS (str) – missing value handler (default value)
MODL (str) – module name of the constant enumeration class

Return type

str

Crawler Proxy¶

pcapkit.vendor.default.get_proxies()[source]¶

Get proxy for blocked sites.

The function will read PCAPKIT_HTTP_PROXY and PCAPKIT_HTTPS_PROXY, if any, for the proxy settings of requests.

Returns: Proxy settings for requests.
Return type: dict[str, str]

Base Crawler¶

Vendor Crawler¶

Crawler Template¶

Crawler Proxy¶

PyPCAPKit

Navigation

Related Topics