摘要:一、HashMap1.1 hashmap 結(jié)構(gòu) HashMap使用hash算法來實現(xiàn)字典數(shù)據(jù)結(jié)構(gòu)(dictionary),任何hash算法必然伴隨著hash沖突問題,而常用解決的方案有大致上有線性探測,二次探測,再hash等等,Hashmap中使用重鏈表法來解決。思路非常簡單,即hash table中存儲的不是數(shù)據(jù),而是一個鏈表的數(shù)據(jù)結(jié)構(gòu),當(dāng)發(fā)生沖突的時候,使用鏈表一并保存。static&nbs
一、HashMap
1.1 hashmap 結(jié)構(gòu)
HashMap使用hash算法來實現(xiàn)字典數(shù)據(jù)結(jié)構(gòu)(dictionary),任何hash算法必然伴隨著hash沖突問題,而常用解決的方案有大致上有線性探測,二次探測,再hash等等,Hashmap中使用重鏈表法來解決。思路非常簡單,即hash table中存儲的不是數(shù)據(jù),而是一個鏈表的數(shù)據(jù)結(jié)構(gòu),當(dāng)發(fā)生沖突的時候,使用鏈表一并保存。
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; //....部分源碼
1.2 hash算法的實現(xiàn)
hash算法是什么?簡單來說就是通過一種將Object->int將任何的對象轉(zhuǎn)換成一個int數(shù)值,根據(jù)這個int值進行再次計算,(可以是對整個數(shù)組長度去模)實現(xiàn)迅速定位數(shù)組下標(biāo),從而進行高效put和get操作的算法。HashMap中并不是使用取模,而是使用了更加巧妙的方法,下文中會進行描述。
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
說明:(h>>>16),將得到的32位int往右移動了16位和 h=(key.hashCode())計算到的hash碼做^運算,結(jié)果就是高16位不變,低16位和高16位異或計算。同時用到了高16位和低16位,計算結(jié)果更加發(fā)散。
取數(shù)組下標(biāo)的時候運算是 (n-1) & hash進行計算,其中n=tab.length,hash是計算到的hash結(jié)果,一般的想法都是計算hash值然后對n-1進行取模,設(shè)計者先抑或計算可以使數(shù)據(jù)更加的分散,從而減少hash聚集(hash聚集就是hash沖突的誘因),減少hash沖突。而且(n-1)&hash算法使得HashMap resize更加的巧妙和容易。下面進行描述
1.3 resize算法的實現(xiàn)
HashMap中的長度默認(rèn)是16,即2的4次方,每次擴容也是2倍,是2的整數(shù)倍,根據(jù)上面的定位算法,(n-1)&hash。當(dāng)從16擴容為32時
因此,在擴容的時候,不需要重新的計算hash,hash算法也是有相當(dāng)cost的,只需要看原來的hash值往左一位是0還是1,是0的話不變,是1的話+oldCap,例如
這樣設(shè)計非常巧妙,即省去了重新計算hash值的時間,同時保證了一定的均勻。
1.4 put(Object key)實現(xiàn)思路
(1) key對象必須重寫hashCode方法以計算hash值,通過上文描述的方式來計算索引位置index
(2) 如果沒有發(fā)生hash碰撞,以鏈表的的形式存入buckets數(shù)組對應(yīng)位置
(3) 如果發(fā)生了hash碰撞:
a. 如果碰撞導(dǎo)致鏈表過長>TREEIFY_THRESHOLD,將鏈表結(jié)構(gòu)轉(zhuǎn)換為紅黑樹結(jié)構(gòu)(TreeNode就是紅黑樹),提高lookup效率
b. 沒有過長就直接放到鏈表結(jié)構(gòu)中
(4) 注意會調(diào)用key的Equals()方法判斷對象已經(jīng)存在,存在就替換
(5) 如果是新增且bucket超過了factor*capacity限制,resize操作
1.5 get(Object key)實現(xiàn)思路
根據(jù)key的hashcode計算索引:
(1) 不存在就返回null
(2) 存在,鏈表結(jié)構(gòu),根據(jù)equals()方法依次判斷..O(n)的時間復(fù)雜度
(3) 存在,紅黑樹結(jié)構(gòu),根據(jù)equals()方法查找... O(logN)的時間復(fù)雜度
1.6 總結(jié)
HashMap的文檔其實非常詳細,而根據(jù)以上描述的內(nèi)容,應(yīng)該比較好理解該文檔。
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly[粗糙的] equivalent[等價] to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses[分散] the elements properly among the buckets. Iteration over collection views requires time proportional[比例,均衡] to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
(跟hash算法有關(guān)的數(shù)據(jù)結(jié)構(gòu)中,Key對象Class必須重寫Equals()方法和hashCode()方法)
上述文檔中描述了以下內(nèi)容:
(1) 基于map接口實現(xiàn)的hash表,實現(xiàn)了map接口所有方法,允許null values和the null key(多個null values 和唯一的null key),不保證map中元素的有序;
(2) 算法實現(xiàn)中,put和get方法保證常量級別的get()和put()效率,且有較好的分散性;
(3) 對集合的迭代操作耗時與(hashmap對象的大小,即number of buckets * bucket中元素個數(shù))成比例
(4) 初始化hashmap的時候不要把size調(diào)太高或者loadfactor調(diào)太低;
(5) 一個hashmap的對象實例有2個參數(shù)很重要:
initital capacity: The capacity is the number of the buckets in the hash table
load factor: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically incresed.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed. (that is , 內(nèi)部數(shù)據(jù)結(jié)構(gòu)會重建),因此hash table大約擴容2倍
(6) 通用準(zhǔn)備,默認(rèn)的load factor為0.75比較合適。較大的load factor會decrease the space overhead,即降低空間上的負(fù)載(增長的慢),但是會增加the lookup cost(reflected in most of the operations of the HashMap class),為什么較大的factor會降低get和put的效率,hashmap設(shè)計采用重鏈表法來解決hash,較大的factor值導(dǎo)致了更少的可用索引,自然會導(dǎo)致更多的hash聚集,增加了hash沖突的概率,每個數(shù)據(jù)節(jié)點鏈表中mappings多的話,查找和刪除的成本自然就增大了。
(7) 如果非常多的映射要存,設(shè)置一個相對較大的capacity可以減少resize次數(shù),提高效率.如果Key Class實現(xiàn)的hashcode()方法不高明,或者說這個算法容易出現(xiàn)hash聚集現(xiàn)象,HashMap性能勢必不高.
(8) HashMap線程不安全的
(9) 迭代器操作采取是的fail-fast的方式,即假如正在進行迭代操作,如果有其他的線程或者當(dāng)前線程中調(diào)用remove()或者put()方法修改了數(shù)據(jù),直接跑出異常ConcurrentModificationException,迭代迅速失敗
(10) fail-fast并不能保證線程安全,如果對并發(fā)安全有要求,請不要使用HashMap
二、LinkedHashMap
public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V>
繼承于HashMap,LinkedHashMap實現(xiàn)和HashMap實現(xiàn)的不同之處在于,LinkedHashMap維護著一個運行于所有條目的雙重鏈接列表。此鏈接列表定義了迭代順序,這個順序可以使插入元素的順序,也可以是訪問元素的順序。
private static class Entry<K,V> extends HashMap.Entry<K,V> { Entry<K,V> before, after; …… }
2.1 構(gòu)造器
public LinkedHashMap(int initialCapacity, float loadFactor,boolean accessOrder) { super(initialCapacity, loadFactor); this.accessOrder = accessOrder; }
當(dāng)accessOrder=false時候表示元素的順序按照插入元素的順序,true則表示是訪問元素的順序
由于改成了雙向鏈表,因此,重寫了方法
void reinitialize() { super.reinitialize(); head = tail = null; }
下面是一個小demo:
//1.插入元素的順序 Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,false); //2.訪問元素的順序 //Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,true); map.put("apple", "蘋果"); map.put("watermelon", "西瓜"); map.put("banana", "香蕉"); map.put("peach", "桃子"); map.get("watermelon"); map.get("peach"); Iterator iter = map.entrySet().iterator(); while (iter.hasNext()) { Map.Entry entry = (Map.Entry) iter.next(); System.out.println(entry.getKey() + "=" + entry.getValue()); }
2.2 實現(xiàn)
這里根據(jù)源碼進行簡介
LinkedHashMap維護順序的思路其實非常簡單,用和table無關(guān)的2個變量head,tail根據(jù)是訪問順序還是讀取順序來記錄,將以前的數(shù)據(jù)結(jié)構(gòu)改為雙端鏈表結(jié)構(gòu)即可。由于需要雙端鏈表來維護訪問順序或者插入順序,需要重寫節(jié)點的構(gòu)造以及刪除方法,實現(xiàn)如下:
Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) { LinkedHashMap.Entry<K,V> p = new LinkedHashMap.Entry<K,V>(hash, key, value, e); linkNodeLast(p); return p; } Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; LinkedHashMap.Entry<K,V> t = new LinkedHashMap.Entry<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; } TreeNode<K,V> newTreeNode(int hash, K key, V value, Node<K,V> next) { TreeNode<K,V> p = new TreeNode<K,V>(hash, key, value, next); linkNodeLast(p); return p; } TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; TreeNode<K,V> t = new TreeNode<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; }
在jdk1.8中,HashMap為LinkedHashMap專門提供了鉤子方法:
// Callbacks to allow LinkedHashMap post-actions void afterNodeAccess(Node<K,V> p) { } void afterNodeInsertion(boolean evict) { } void afterNodeRemoval(Node<K,V> p) { }
實現(xiàn)如下:
// 在節(jié)點刪除之后unlink,即雙端都設(shè)置為null void afterNodeRemoval(Node<K,V> e) { // unlink LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.before = p.after = null; if (b == null) head = a; else b.after = a; if (a == null) tail = b; else a.before = b; } //插入節(jié)點之后,很有可能要移除元素 void afterNodeInsertion(boolean evict) { // possibly remove eldest LinkedHashMap.Entry<K,V> first; if (evict && (first = head) != null && removeEldestEntry(first)) { K key = first.key; removeNode(hash(key), key, null, false, true); } } // 在元素被訪問之后,用于修改鏈表順序,將被訪問的的元素放置到最后 void afterNodeAccess(Node<K,V> e) { // move node to last LinkedHashMap.Entry<K,V> last; if (accessOrder && (last = tail) != e) { LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.after = null; if (b == null) head = a; else b.after = a; if (a != null) a.before = b; else last = b; if (last == null) head = p; else { p.before = last; last.after = p; } tail = p; ++modCount; } }
在讀取的時候,有可能會根據(jù)訪問順序來重新排序
public V get(Object key) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return null; if (accessOrder) afterNodeAccess(e); return e.value; } /** * {@inheritDoc} */ public V getOrDefault(Object key, V defaultValue) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return defaultValue; if (accessOrder) afterNodeAccess(e); return e.value; }
2.3 LinkedHashMap與LRU算法
LRU 緩存利用了這樣的一種思想。LRU 是 Least Recently Used 的縮寫,翻譯過來就是“最近最少使用”,也就是說,LRU 緩存把最近最少使用的數(shù)據(jù)移除,讓給最新讀取的數(shù)據(jù)。而往往最常讀取的,也是讀取次數(shù)最多的,所以,利用 LRU 緩存,我們能夠提高系統(tǒng)的 performance。以下主要用于演示LRU Cache的實現(xiàn)思路,實踐中LRU緩存更傾向于使用Guava中的實現(xiàn)。
https://github.com/tracylihui/LRUcache-Java上有LRUCache的簡單實現(xiàn),思路非常簡單,重寫LinkedHashMap中的鉤子方法removeEldestEntry方法即可
package ysz.demo;import java.util.LinkedHashMap;import java.util.Collection;import java.util.Map;import java.util.ArrayList;/** * An LRU cache, based on <code>LinkedHashMap</code>. * * <p> * This cache has a fixed maximum number of elements (<code>cacheSize</code>). * If the cache is full and another entry is added, the LRU (least recently * used) entry is dropped. * * <p> * This class is thread-safe. All methods of this class are synchronized. * * <p> * Author: Christian d'Heureuse, Inventec Informatik AG, Zurich, Switzerland<br> * Multi-licensed: EPL / LGPL / GPL / AL / BSD. */public class LRUCache<K, V> { private static final float hashTableLoadFactor = 0.75f; private LinkedHashMap<K, V> map; private int cacheSize; /** * Creates a new LRU cache. 在該方法中,new LinkedHashMap<K,V>(hashTableCapacity, * hashTableLoadFactor, true)中,true代表使用訪問順序 * * @param cacheSize * the maximum number of entries that will be kept in this cache. */ public LRUCache(int cacheSize) { this.cacheSize = cacheSize; int hashTableCapacity = (int) Math .ceil(cacheSize / hashTableLoadFactor) + 1; map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor, true) { // (an anonymous inner class) private static final long serialVersionUID = 1; @Override protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > LRUCache.this.cacheSize; } }; } /** * Retrieves an entry from the cache.<br> * The retrieved entry becomes the MRU (most recently used) entry. * * @param key * the key whose associated value is to be returned. * @return the value associated to this key, or null if no value with this * key exists in the cache. */ public synchronized V get(K key) { return map.get(key); } /** * Adds an entry to this cache. The new entry becomes the MRU (most recently * used) entry. If an entry with the specified key already exists in the * cache, it is replaced by the new entry. If the cache is full, the LRU * (least recently used) entry is removed from the cache. * * @param key * the key with which the specified value is to be associated. * @param value * a value to be associated with the specified key. */ public synchronized void put(K key, V value) { map.put(key, value); } /** * Clears the cache. */ public synchronized void clear() { map.clear(); } /** * Returns the number of used entries in the cache. * * @return the number of entries currently in the cache. */ public synchronized int usedEntries() { return map.size(); } /** * Returns a <code>Collection</code> that contains a copy of all cache * entries. * * @return a <code>Collection</code> with a copy of the cache content. */ public synchronized Collection<Map.Entry<K, V>> getAll() { return new ArrayList<Map.Entry<K, V>>(map.entrySet()); } // Test routine for the LRUCache class. public static void main(String[] args) { LRUCache<String, String> c = new LRUCache<String, String>(3); c.put("1", "one"); // 1 c.put("2", "two"); // 2 1 c.put("3", "three"); // 3 2 1 c.put("4", "four"); // 4 3 2 if (c.get("2") == null) throw new Error(); // 2 4 3 c.put("5", "five"); // 5 2 4 c.put("4", "second four"); // 4 5 2 // Verify cache content. if (c.usedEntries() != 3) throw new Error(); if (!c.get("4").equals("second four")) throw new Error(); if (!c.get("5").equals("five")) throw new Error(); if (!c.get("2").equals("two")) throw new Error(); // List cache content. for (Map.Entry<String, String> e : c.getAll()) System.out.println(e.getKey() + " : " + e.getValue()); } }