Submodules

fornax.api module

class fornax.api.Connection(url, **kwargs)[source]

Bases: object

Create a new database connection. If the database is empty Connection will create any missing schema.

Currrently sqlite and postgresql are activly supported as backend databases.

In addition to the open/close syntax, Connection supports the context manager syntax where the context is treaded as a transaction. Any changes will be automatically rolled back in the event of an exception:

with Connection("postgres:://user/0.0.0.0./mydb") as conn:
    graph = fornax.GraphHandle.create(conn)
Parameters:url (str) – dialect[+driver]://user:password@host/dbname[?key=value..]
SQLITE_MAX_SIZE = 9223372036854775807
close()[source]

Close the fornax database connection and free any connections in the connection pool

open()[source]

Open the fornax database connection and create any absent tables and indicies

class fornax.api.Edge(start: int, end: int, edge_type: str, meta: dict, weight=1.0)[source]

Bases: object

Representation of an Edge used internally be QueryHandle

Parameters:
  • start (int) – id of start node
  • end (int) – id of end node
  • edge_type (str) – either query target or match
  • meta (dict) – dictionary of edge metadata to be json serialised
  • weight – weight between 0 and 1, defaults to 1.
Raises:

ValueError – Raised if type is not query, target or match

end
meta
start
type
weight
class fornax.api.GraphHandle(connection: fornax.api.Connection, graph_id: int)[source]

Bases: object

Create a handle to an existing graph with id graph_id accessed via connection.

Parameters:
  • connection (Connection) – fornax database connection
  • graph_id (int) – unique id for an existing graph
add_edges(sources: Iterable, targets: Iterable, **kwargs)[source]

Append edges to a graph representing relationships between nodes

Parameters:
  • sources (typing.Iterable) – node id_src
  • targets (typing.Iterable) – node id_src

Keyword arguments can be used to attach metadata to the edges. For example to add three edges with a relationship attribute friend or foe:

graph_handle.add_edges(
    sources=[0, 1, 2],
    targets=[1, 2, 0],
    relationship=['friend', 'friend', 'foe']
)

Keyword arguments can be used to attach any arbitrary JSON serialisable data to edges.

Note

The following reserved keywords are not reserved and will raise an exception

  • start
  • end
  • type
  • weight
add_nodes(**kwargs)[source]

Append nodes to a graph

Parameters:id_src (Iterable) – An iterable of unique hashable identifiers, default None

Keyword arguments can be used to attached arbitrary JSON serialised metadata to each node:

#  create 3 nodes with ids: 0, 1, 2
#  and names 'Anne', 'Ben', 'Charles'
graph_handle.add_nodes(names=['Anne', 'Ben', 'Charles'])

By default, each node will be assigned a sequential integer id starting from 0. A custom id can be assigned using the id_src keyword provided that all of the ids are hashable:

#  create 3 nodes with ids: 'Anne', 'Ben', 'Charles'
#  and no explicit name field
graph_handle.add_nodes(id_src=['Anne', 'Ben', 'Charles'])

Note

id is a reserved keyword argument which will raise an exception

classmethod create(connection: fornax.api.Connection)[source]

Create a new empty graph via connection and return a GraphHandle to it

Parameters:connection (Connection) – a fornax database connection
Returns:GraphHandle to a new graph
Return type:GraphHandle
delete()[source]

Delete this graph.

Delete the graph accessed through graph handle and all of the associated nodes and edges.

graph_id

Get the unique id for this graph

Graph id’s are automaticly assigned at creation time.

classmethod read(connection: fornax.api.Connection, graph_id: int)[source]

Create a new GraphHandle to an existing graph with unique identifier graph_id

Parameters:
  • connection (Connection) – a fornax database connection
  • graph_id (int) – unique identifier for an existing graph
Returns:

A new graph handle to an existing graph

Return type:

GraphHandle

exception fornax.api.InvalidEdgeError(message: str)[source]

Bases: Exception

exception fornax.api.InvalidMatchError(message: str)[source]

Bases: Exception

exception fornax.api.InvalidNodeError(message: str)[source]

Bases: Exception

class fornax.api.Node(node_id: int, node_type: str, meta: dict)[source]

Bases: object

Representation of a Node use internally by QueryHandle

Parameters:
  • node_id (int) – unique id of a node
  • node_type (str) – either source or target
  • meta (dict) – meta data to attach to a node to be json serialised
Raises:

ValueError – Raised is type is not either source or target

id
meta
type
class fornax.api.NullValue[source]

Bases: object

A dummy nul value that will cause an exception when serialised to json

class fornax.api.QueryHandle(connection: fornax.api.Connection, query_id: int)[source]

Bases: object

Create a handle to an existing query via connection with unique id query_id.

Parameters:
  • connection (Connection) – a fornax database connection
  • query_id (int) – unique id for an existing query
add_matches(sources: Iterable[int], targets: Iterable[int], weights: Iterable[float], **kwargs)[source]

Add matches between the query graph and the target graph

Parameters:
  • sources (typing.Iterable[int]) – Iterable of src_id in the query graph
  • targets (typing.Iterable[int]) – Iterable of src_id in the target graph
  • weights (typing.Iterable[float]) – Iterable of weights between 0 and 1

For example, to add matches between

  • node 0 in the query graph and node 0 in the target graph with weight .9
  • node 0 in the query graph and node 1 in the target graph with weight .1

then:

query.add_matches([0, 0], [0, 1], [.9, .1])

Note

Adding weights that compare equal to zero will raise an exception.

classmethod create(connection: fornax.api.Connection, query_graph: fornax.api.GraphHandle, target_graph: fornax.api.GraphHandle)[source]

Create a new query and return a QueryHandle for it

Parameters:
  • connection (Connection) – a fornax database connection
  • query_graph (GraphHandle) – subgraph to find target graph
  • target_graph (GraphHandle) – Graph to be searched
Returns:

new QueryHandle

Return type:

QueryHandle

delete()[source]

Delete this query and any associated matches

execute(n=5, hopping_distance=2, max_iters=10)[source]

Execute a fuzzy subgraph matching query finding the top n subgraph matches between the query graph and the target graph.

Parameters:
  • n (int, optional) – number of subgraph matches to return
  • hopping_distance (int, optional) – lengthscale hyperparameter, defaults to 2
  • max_iters (int, optional) – maximum number of optimisation iterations
Returns:

query result

Return type:

dict

static is_between(target_ids, edge)[source]
query_graph() → fornax.api.GraphHandle[source]

Get a QueryHandle for the query graph

Returns:query graph
Return type:GraphHandle
classmethod read(connection: fornax.api.Connection, query_id: int)[source]

Create a new QueryHandle to an existing query with unique id query_id via connection.

Parameters:
  • connection (Connection) – a fornax database connection
  • query_id (int) – unique identifier for a query
Returns:

new QueryHandle

Return type:

QueryHandle

target_graph() → fornax.api.GraphHandle[source]

Get a QueryHandle for the target graph

Returns:target graph
Return type:GraphHandle

fornax.model module

class fornax.model.Edge(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Joins Nodes in a Graph

end
end_node
graph_id
meta
start
start_node
class fornax.model.Graph(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

A graph containing nodes and edges

graph_id
class fornax.model.Match(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Joins Query Nodes to Candidate Target Nodes

end
end_graph_id
end_node
meta
query_id
start
start_graph_id
start_node
weight
class fornax.model.Node(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Node in a Graph

graph_id
meta
neighbours()[source]
node_id
class fornax.model.Query(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

end_graph_id
query_id
start_graph_id

fornax.opt module

class fornax.opt.Base[source]

Bases: numpy.recarray

A Base class for subclassing numpy record arrays

Returns:
np.recarray – A subclass of np.recarray
columns = []
types = []
class fornax.opt.InferenceCost[source]

Bases: fornax.opt.Base

A table representing all valid inference costs between query node u and target node v

columns = ['v', 'u', 'cost']
cost

Get column cost - all valid inference costs for query node v and target node u.

Eq 14 in the paper (U)

Returns:
np.ndarray – array of costs as floats
types = [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]
u

Get column u

Returns:
np.ndarray – array of target node ids as integers
v

Get column v

Returns:
np.ndarray – array of query node ids as integers
class fornax.opt.NeighbourHoodMatchingCosts[source]

Bases: fornax.opt.Base

Represents a table of all valid neighbourhood matching costs

columns = ['v', 'u', 'vv', 'uu', 'cost']
cost

Get column cost - all valid neighbourhood matching costs.

Eq 2 in the paper - multiplied by 1 - lambda

Returns:
np.ndarray – array of costs and floats
types = [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]
u

Get column u

Returns:
np.ndarray – array of target node ids as integers
uu

Get column uu - written u prime (u’) in the paper where u’ is a target node within hopping distance h of target node u

Returns:
np.ndarray – array of target node ids as integers
v

Get column v

Returns:
np.ndarray – array of query node ids as integers
vv

Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v

Returns:
np.ndarray – array of query node ids as integers
class fornax.opt.OptimalMatch[source]

Bases: fornax.opt.Base

Table representing the cost of the optimal match for query node v going to u

columns = ['v', 'u', 'cost']
cost

Get column cost - the optimal matching cost for u going to v.

Eq 10 in the paper (O)

Returns:
np.ndarray – array of costs as floats
types = [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]
u

Get column u

Returns:
np.ndarray – array of target node ids as integers
v

Get column v

Returns:
np.ndarray – array of query node ids as integers
class fornax.opt.PartialMatchingCosts[source]

Bases: fornax.opt.Base

A table representing all valid partial matching costs

columns = ['v', 'u', 'vv', 'cost']
cost

Get column cost - all valid partial matching costs.

Eq 13 in the paper (W) - but with beta multiplied by a factor of 1 - lambda

Returns:
np.ndarray – array of costs as floats
types = [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]
u

Get column u

Returns:
np.ndarray – array of target node ids as integers
v

Get column v

Returns:
np.ndarray – array of query node ids as integers
vv

Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v

Returns:
np.ndarray – array of query node ids as integers
class fornax.opt.QueryResult[source]

Bases: fornax.opt.Base

Represents a query from the database as a numpy rec array

columns = ['v', 'u', 'vv', 'uu', 'dist_v', 'dist_u', 'weight']
dist_u

The hopping distance between target node u and target node uu (u’)

Returns:
np.ndarray – array of hopping distances as integers
dist_v

The hopping distance between query node v and query node vv (v’)

Returns:
np.ndarray – array of hopping distances as integers
types = [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
u

Get column u

Returns:
np.ndarray – array of target node ids as integers
uu

Get column uu - written u prime (u’) in the paper where u’ is a target node within hopping distance h of target node u

values less than zero indicate that uu (u’) has no corresponding matches to any node v’

Returns:
np.ndarray – array of target node ids as integers
v

Get column v

Returns:
np.ndarray – array of query node ids as integers
vv

Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v

Returns:
np.ndarray – array of query node ids as integers
weight

String matching score between uu (u’) and vv (v’)

Returns:
np.ndarray – array of floating point weights
class fornax.opt.Refiner(neighbourhood_matching_costs: fornax.opt.NeighbourHoodMatchingCosts)[source]

Bases: object

Take each of the matches and recursivly find all of their neighbours via a greedy algorithm

static valid_neighbours(first: tuple, second: tuple)[source]

Function that governs a valid hop between nodes

Arguments:
first {int, int} – source query_node, target_node id pair second {int, int} – target query_node, target_node id pair
Returns:
Bool – True is a valid transition
fornax.opt.group_by(columns, arr)[source]

Split an array into n slices where ‘columns’ are all equal within each slice

Arguments:
columns {List[str]} – a list of column names arr {np.array} – a numpy structured array
Returns
keys: np.array – the column values uniquly identifying each group groups: List[np.array] – a list of numpy arrays
fornax.opt.group_by_first(columns, arr)[source]

Split an array into n slices where ‘columns’ all compare equal within each slide Take the first row of each slice Combine each of the rows into a single array through concatination

Arguments:
columns {[str]} – a list of column names arr {[type]} – a numpy structured array
Returns:
np.array - new concatinated array
fornax.opt.solve(records: List[tuple], max_iters=10, hopping_distance=2)[source]

Generate a set of subgraph matches and costs from a query result

Arguments:
records {List[tuple]}

fornax.select module

fornax.select.join(query_id: int, h: int, offsets: Tuple[int, int] = None) → sqlalchemy.orm.query.Query[source]
fornax.select.neighbours(h: int, start: bool, query_id: int) → sqlalchemy.orm.query.Query[source]

Module contents