Submodules¶
fornax.api module¶
-
class
fornax.api.
Connection
(url, **kwargs)[source]¶ Bases:
object
Create a new database connection. If the database is empty
Connection
will create any missing schema.Currrently sqlite and postgresql are activly supported as backend databases.
In addition to the open/close syntax, Connection supports the context manager syntax where the context is treaded as a transaction. Any changes will be automatically rolled back in the event of an exception:
with Connection("postgres:://user/0.0.0.0./mydb") as conn: graph = fornax.GraphHandle.create(conn)
Parameters: url (str) – dialect[+driver]://user:password@host/dbname[?key=value..] -
SQLITE_MAX_SIZE
= 9223372036854775807¶
-
-
class
fornax.api.
Edge
(start: int, end: int, edge_type: str, meta: dict, weight=1.0)[source]¶ Bases:
object
Representation of an Edge used internally be QueryHandle
Parameters: - start (int) – id of start node
- end (int) – id of end node
- edge_type (str) – either query target or match
- meta (dict) – dictionary of edge metadata to be json serialised
- weight – weight between 0 and 1, defaults to 1.
Raises: ValueError – Raised if type is not query, target or match
-
end
¶
-
meta
¶
-
start
¶
-
type
¶
-
weight
¶
-
class
fornax.api.
GraphHandle
(connection: fornax.api.Connection, graph_id: int)[source]¶ Bases:
object
Create a handle to an existing graph with id graph_id accessed via connection.
Parameters: - connection (Connection) – fornax database connection
- graph_id (int) – unique id for an existing graph
-
add_edges
(sources: Iterable, targets: Iterable, **kwargs)[source]¶ Append edges to a graph representing relationships between nodes
Parameters: - sources (typing.Iterable) – node id_src
- targets (typing.Iterable) – node id_src
Keyword arguments can be used to attach metadata to the edges. For example to add three edges with a relationship attribute friend or foe:
graph_handle.add_edges( sources=[0, 1, 2], targets=[1, 2, 0], relationship=['friend', 'friend', 'foe'] )
Keyword arguments can be used to attach any arbitrary JSON serialisable data to edges.
Note
The following reserved keywords are not reserved and will raise an exception
- start
- end
- type
- weight
-
add_nodes
(**kwargs)[source]¶ Append nodes to a graph
Parameters: id_src (Iterable) – An iterable of unique hashable identifiers, default None Keyword arguments can be used to attached arbitrary JSON serialised metadata to each node:
# create 3 nodes with ids: 0, 1, 2 # and names 'Anne', 'Ben', 'Charles' graph_handle.add_nodes(names=['Anne', 'Ben', 'Charles'])
By default, each node will be assigned a sequential integer id starting from 0. A custom id can be assigned using the id_src keyword provided that all of the ids are hashable:
# create 3 nodes with ids: 'Anne', 'Ben', 'Charles' # and no explicit name field graph_handle.add_nodes(id_src=['Anne', 'Ben', 'Charles'])
Note
id is a reserved keyword argument which will raise an exception
-
classmethod
create
(connection: fornax.api.Connection)[source]¶ Create a new empty graph via connection and return a GraphHandle to it
Parameters: connection (Connection) – a fornax database connection Returns: GraphHandle to a new graph Return type: GraphHandle
-
delete
()[source]¶ Delete this graph.
Delete the graph accessed through graph handle and all of the associated nodes and edges.
-
graph_id
¶ Get the unique id for this graph
Graph id’s are automaticly assigned at creation time.
-
classmethod
read
(connection: fornax.api.Connection, graph_id: int)[source]¶ Create a new GraphHandle to an existing graph with unique identifier graph_id
Parameters: - connection (Connection) – a fornax database connection
- graph_id (int) – unique identifier for an existing graph
Returns: A new graph handle to an existing graph
Return type:
-
class
fornax.api.
Node
(node_id: int, node_type: str, meta: dict)[source]¶ Bases:
object
Representation of a Node use internally by QueryHandle
Parameters: - node_id (int) – unique id of a node
- node_type (str) – either source or target
- meta (dict) – meta data to attach to a node to be json serialised
Raises: ValueError – Raised is type is not either source or target
-
id
¶
-
meta
¶
-
type
¶
-
class
fornax.api.
NullValue
[source]¶ Bases:
object
A dummy nul value that will cause an exception when serialised to json
-
class
fornax.api.
QueryHandle
(connection: fornax.api.Connection, query_id: int)[source]¶ Bases:
object
Create a handle to an existing query via connection with unique id query_id.
Parameters: - connection (Connection) – a fornax database connection
- query_id (int) – unique id for an existing query
-
add_matches
(sources: Iterable[int], targets: Iterable[int], weights: Iterable[float], **kwargs)[source]¶ Add matches between the query graph and the target graph
Parameters: - sources (typing.Iterable[int]) – Iterable of src_id in the query graph
- targets (typing.Iterable[int]) – Iterable of src_id in the target graph
- weights (typing.Iterable[float]) – Iterable of weights between 0 and 1
For example, to add matches between
- node 0 in the query graph and node 0 in the target graph with weight .9
- node 0 in the query graph and node 1 in the target graph with weight .1
then:
query.add_matches([0, 0], [0, 1], [.9, .1])
Note
Adding weights that compare equal to zero will raise an exception.
-
classmethod
create
(connection: fornax.api.Connection, query_graph: fornax.api.GraphHandle, target_graph: fornax.api.GraphHandle)[source]¶ Create a new query and return a QueryHandle for it
Parameters: - connection (Connection) – a fornax database connection
- query_graph (GraphHandle) – subgraph to find target graph
- target_graph (GraphHandle) – Graph to be searched
Returns: new QueryHandle
Return type:
-
execute
(n=5, hopping_distance=2, max_iters=10)[source]¶ Execute a fuzzy subgraph matching query finding the top n subgraph matches between the query graph and the target graph.
Parameters: - n (int, optional) – number of subgraph matches to return
- hopping_distance (int, optional) – lengthscale hyperparameter, defaults to 2
- max_iters (int, optional) – maximum number of optimisation iterations
Returns: query result
Return type: dict
-
query_graph
() → fornax.api.GraphHandle[source]¶ Get a QueryHandle for the query graph
Returns: query graph Return type: GraphHandle
-
classmethod
read
(connection: fornax.api.Connection, query_id: int)[source]¶ Create a new QueryHandle to an existing query with unique id query_id via connection.
Parameters: - connection (Connection) – a fornax database connection
- query_id (int) – unique identifier for a query
Returns: new QueryHandle
Return type:
-
target_graph
() → fornax.api.GraphHandle[source]¶ Get a QueryHandle for the target graph
Returns: target graph Return type: GraphHandle
fornax.model module¶
-
class
fornax.model.
Edge
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Joins Nodes in a Graph
-
end
¶
-
end_node
¶
-
graph_id
¶
-
meta
¶
-
start
¶
-
start_node
¶
-
-
class
fornax.model.
Graph
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
A graph containing nodes and edges
-
graph_id
¶
-
-
class
fornax.model.
Match
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Joins Query Nodes to Candidate Target Nodes
-
end
¶
-
end_graph_id
¶
-
end_node
¶
-
meta
¶
-
query_id
¶
-
start
¶
-
start_graph_id
¶
-
start_node
¶
-
weight
¶
-
fornax.opt module¶
-
class
fornax.opt.
Base
[source]¶ Bases:
numpy.recarray
A Base class for subclassing numpy record arrays
- Returns:
- np.recarray – A subclass of np.recarray
-
columns
= []¶
-
types
= []¶
-
class
fornax.opt.
InferenceCost
[source]¶ Bases:
fornax.opt.Base
A table representing all valid inference costs between query node u and target node v
-
columns
= ['v', 'u', 'cost']¶
-
cost
¶ Get column cost - all valid inference costs for query node v and target node u.
Eq 14 in the paper (U)
- Returns:
- np.ndarray – array of costs as floats
-
types
= [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]¶
-
u
¶ Get column u
- Returns:
- np.ndarray – array of target node ids as integers
-
v
¶ Get column v
- Returns:
- np.ndarray – array of query node ids as integers
-
-
class
fornax.opt.
NeighbourHoodMatchingCosts
[source]¶ Bases:
fornax.opt.Base
Represents a table of all valid neighbourhood matching costs
-
columns
= ['v', 'u', 'vv', 'uu', 'cost']¶
-
cost
¶ Get column cost - all valid neighbourhood matching costs.
Eq 2 in the paper - multiplied by 1 - lambda
- Returns:
- np.ndarray – array of costs and floats
-
types
= [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]¶
-
u
¶ Get column u
- Returns:
- np.ndarray – array of target node ids as integers
-
uu
¶ Get column uu - written u prime (u’) in the paper where u’ is a target node within hopping distance h of target node u
- Returns:
- np.ndarray – array of target node ids as integers
-
v
¶ Get column v
- Returns:
- np.ndarray – array of query node ids as integers
-
vv
¶ Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v
- Returns:
- np.ndarray – array of query node ids as integers
-
-
class
fornax.opt.
OptimalMatch
[source]¶ Bases:
fornax.opt.Base
Table representing the cost of the optimal match for query node v going to u
-
columns
= ['v', 'u', 'cost']¶
-
cost
¶ Get column cost - the optimal matching cost for u going to v.
Eq 10 in the paper (O)
- Returns:
- np.ndarray – array of costs as floats
-
types
= [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]¶
-
u
¶ Get column u
- Returns:
- np.ndarray – array of target node ids as integers
-
v
¶ Get column v
- Returns:
- np.ndarray – array of query node ids as integers
-
-
class
fornax.opt.
PartialMatchingCosts
[source]¶ Bases:
fornax.opt.Base
A table representing all valid partial matching costs
-
columns
= ['v', 'u', 'vv', 'cost']¶
-
cost
¶ Get column cost - all valid partial matching costs.
Eq 13 in the paper (W) - but with beta multiplied by a factor of 1 - lambda
- Returns:
- np.ndarray – array of costs as floats
-
types
= [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>]¶
-
u
¶ Get column u
- Returns:
- np.ndarray – array of target node ids as integers
-
v
¶ Get column v
- Returns:
- np.ndarray – array of query node ids as integers
-
vv
¶ Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v
- Returns:
- np.ndarray – array of query node ids as integers
-
-
class
fornax.opt.
QueryResult
[source]¶ Bases:
fornax.opt.Base
Represents a query from the database as a numpy rec array
-
columns
= ['v', 'u', 'vv', 'uu', 'dist_v', 'dist_u', 'weight']¶
-
dist_u
¶ The hopping distance between target node u and target node uu (u’)
- Returns:
- np.ndarray – array of hopping distances as integers
-
dist_v
¶ The hopping distance between query node v and query node vv (v’)
- Returns:
- np.ndarray – array of hopping distances as integers
-
types
= [<class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]¶
-
u
¶ Get column u
- Returns:
- np.ndarray – array of target node ids as integers
-
uu
¶ Get column uu - written u prime (u’) in the paper where u’ is a target node within hopping distance h of target node u
values less than zero indicate that uu (u’) has no corresponding matches to any node v’
- Returns:
- np.ndarray – array of target node ids as integers
-
v
¶ Get column v
- Returns:
- np.ndarray – array of query node ids as integers
-
vv
¶ Get column vv - written v prime (v’) in the paper where v’ is a query node within hopping distance h of query node v
- Returns:
- np.ndarray – array of query node ids as integers
-
weight
¶ String matching score between uu (u’) and vv (v’)
- Returns:
- np.ndarray – array of floating point weights
-
-
class
fornax.opt.
Refiner
(neighbourhood_matching_costs: fornax.opt.NeighbourHoodMatchingCosts)[source]¶ Bases:
object
Take each of the matches and recursivly find all of their neighbours via a greedy algorithm
-
fornax.opt.
group_by
(columns, arr)[source]¶ Split an array into n slices where ‘columns’ are all equal within each slice
- Arguments:
- columns {List[str]} – a list of column names arr {np.array} – a numpy structured array
- Returns
- keys: np.array – the column values uniquly identifying each group groups: List[np.array] – a list of numpy arrays
-
fornax.opt.
group_by_first
(columns, arr)[source]¶ Split an array into n slices where ‘columns’ all compare equal within each slide Take the first row of each slice Combine each of the rows into a single array through concatination
- Arguments:
- columns {[str]} – a list of column names arr {[type]} – a numpy structured array
- Returns:
- np.array - new concatinated array